Ideal model for grouping comments

platanas20platanas20 Member Posts: 22 Maven
edited November 2018 in Help
Hi all,

I want to create a project that gets comments from an excel file.In the next step i want to automatically "group" the comments i took before in 3 categories(positive,negative,neutral).Is there any ideal model to do that? Please Help me.

Thank you very much!!!!!

Answers

  • platanas20platanas20 Member Posts: 22 Maven
    Here is the XML code which takes the comments from an excel file and we use X-validation.Are we in the correct way?
    Also there is a problem in Naive Bayes operator so we cant execute our project.Any help?Sry for my english!! ;D

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
        <process expanded="true" height="377" width="547">
          <operator activated="true" class="read_excel" compatibility="5.1.001" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="C:\Users\platanas\Desktop\ypepth_comments_41.xls"/>
            <parameter key="imported_cell_range" value="G1:G178"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="??????.true.text.label"/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.1.001" expanded="true" height="76" name="Select Attributes" width="90" x="246" y="75">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="G"/>
            <parameter key="use_except_expression" value="true"/>
          </operator>
          <operator activated="true" class="split_validation" compatibility="5.1.001" expanded="true" height="112" name="Validation" width="90" x="447" y="165">
            <process expanded="true" height="528" width="379">
              <operator activated="true" class="naive_bayes" compatibility="5.1.001" expanded="true" height="76" name="Naive Bayes" width="90" x="112" y="30"/>
              <connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
              <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="528" width="379">
              <operator activated="true" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_classification" compatibility="5.1.001" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
                <list key="class_weights"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    If I understand you correctly and without knowing your data I guess that the Text Processing extension might be helpful for you. Additionally, if the classification positive/negative/neutral is stored as a column in your excel sheet, you would have to define that column as label. Furthermore, your Select Attributes operator selects only one single attribute named "G" (if one such exists) and removes all other attributes from the example set. This is surely not what you want. Before you do anything else set a breakpoint at the Select Attributes operator and check the example set to understand what this operator is doing. Naive Bayes by design can't do anything with a set containing only one attribute, so it's not a problem with naive bayes, but with the process.

    Cheers,
    Marius
Sign In or Register to comment.