how to classify using my own guidelines

mkqmkq Member Posts: 9 Contributor II
edited November 2018 in Help
I want to classify the forum posts using my own guideline except for the method rapidminer offers.For examle, I want to classify based on the keywords. If one post include one keyword, then I classify it into a class.

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    非常快。 Very quickly. 

    If it's just a short list of keywords you are using to classify you can use the Filter Documents operator to do this. 
    This example process demonstrates one keyword matching per class, but you can also use Regular Expressions in the operator to expand this out to simple (small) lists or patterns. 
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.4.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="6.4.000" expanded="true" height="76" name="DemoData" width="90" x="45" y="30">
            <process expanded="true">
              <operator activated="true" class="text:create_document" compatibility="6.4.001" expanded="true" height="60" name="Create Document" width="90" x="45" y="30">
                <parameter key="text" value="我是Class1。"/>
              </operator>
              <operator activated="true" class="text:create_document" compatibility="6.4.001" expanded="true" height="60" name="Create Document (2)" width="90" x="45" y="120">
                <parameter key="text" value="我是Class2。"/>
              </operator>
              <operator activated="true" class="collect" compatibility="6.4.000" expanded="true" height="94" name="Collect" width="90" x="246" y="75"/>
              <connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
              <connect from_op="Create Document (2)" from_port="output" to_op="Collect" to_port="input 2"/>
              <connect from_op="Collect" from_port="collection" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="multiply" compatibility="6.4.000" expanded="true" height="94" name="Multiply" width="90" x="112" y="120"/>
          <operator activated="true" class="text:filter_documents_by_content" compatibility="6.4.001" expanded="true" height="76" name="Filter Class1" width="90" x="246" y="30">
            <parameter key="string" value="Class1"/>
            <description align="center" color="transparent" colored="false" width="126">&amp;#35831;&amp;#36755;&amp;#20837;&amp;#24744;&amp;#30340;&amp;#25991;&amp;#26412;&amp;#23383;&amp;#31526;&amp;#20018;&amp;#22312;&amp;#36825;&amp;#37324;&amp;#12290;</description>
          </operator>
          <operator activated="true" class="text:filter_documents_by_content" compatibility="6.4.001" expanded="true" height="76" name="Filter Class2" width="90" x="246" y="165">
            <parameter key="string" value="Class2"/>
            <description align="center" color="transparent" colored="false" width="126">&amp;#35831;&amp;#36755;&amp;#20837;&amp;#24744;&amp;#30340;&amp;#25991;&amp;#26412;&amp;#23383;&amp;#31526;&amp;#20018;&amp;#22312;&amp;#36825;&amp;#37324;&amp;#12290;</description>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="6.4.001" expanded="true" height="76" name="Documents to Data" width="90" x="380" y="30">
            <parameter key="text_attribute" value="ForumPost"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="76" name="Generate Attributes" width="90" x="514" y="30">
            <list key="function_descriptions">
              <parameter key="Label" value="&quot;Class1&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="6.4.001" expanded="true" height="76" name="Documents to Data (2)" width="90" x="390" y="165">
            <parameter key="text_attribute" value="ForumPost"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="514" y="165">
            <list key="function_descriptions">
              <parameter key="Label" value="&quot;Class2&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="append" compatibility="6.4.000" expanded="true" height="94" name="Append" width="90" x="648" y="120"/>
          <connect from_op="DemoData" from_port="out 1" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Filter Class1" to_port="documents 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Filter Class2" to_port="documents 1"/>
          <connect from_op="Filter Class1" from_port="documents" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Filter Class2" from_port="documents" to_op="Documents to Data (2)" to_port="documents 1"/>
          <connect from_op="Documents to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Append" to_port="example set 1"/>
          <connect from_op="Documents to Data (2)" from_port="example set" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 2"/>
          <connect from_op="Append" from_port="merged set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <description align="center" color="red" colored="true" height="66" resized="true" width="233" x="10" y="241">&lt;br/&gt;http://www.rapidminerchina.com</description&gt;
        </process>
      </operator>
    </process>
  • mkqmkq Member Posts: 9 Contributor II
    Thank you very much ! You help me a lot!   ;D
  • mkqmkq Member Posts: 9 Contributor II
    hi, JEdward! I have read your answer, but I have a question! If I have a csv file with many rows , I want to calssify for the content in each row. If there is a keyword in this row, I'll classify it into a class. How can I do it?
Sign In or Register to comment.