Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Text Mining in RM5"

ratheesanratheesan Member Posts: 68 Maven
edited June 2019 in Help
Hi,
In RM 4.6 I used Excel Example set Reader,Nominal to string and String text input to take the data in to RM from Excel sheet.Similarly which operators we can use to take excel data in to RM5 for Text mining.

Thanks
Ratheesan

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    this is really simple: The read excel operator does ... reading the excel file :) Then you can use Nominal to Text operator to change the value type to text and use the process documents from data operator to do the text mining preprocessing part, that was former done by the StringTextInput.

    Greetings,
      Sebastian
  • ratheesanratheesan Member Posts: 68 Maven
    Thanks Sebastian,
    I worked with the above mentioned operators ,but always getting an error message "com.rapidminer.example.set.NonSpecialAttributesExampleSet cannot be cast to com.rapidminer.operator.text.Document"

    Here I am attaching my process,Could you please point out my fault

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input>
         <location/>
       </input>
       <output>
         <location/>
         <location/>
       </output>
       <macros/>
     </context>
     <operator activated="true" class="process" expanded="true" name="Process">
       <process expanded="true" height="314" width="762">
         <operator activated="true" class="read_excel" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
           <parameter key="excel_file" value="C:\Documents and Settings\GRACE\Desktop\1000.xls"/>
         </operator>
         <operator activated="true" class="nominal_to_text" expanded="true" height="76" name="Nominal to Text" width="90" x="45" y="120"/>
         <operator activated="true" class="text:process_documents" expanded="true" height="94" name="Process Documents" width="90" x="179" y="120">
           <parameter key="vector_creation" value="Term Occurrences"/>
           <process expanded="true">
             <operator activated="true" class="text:tokenize" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
             <portSpacing port="source_document" spacing="0"/>
             <portSpacing port="sink_document 1" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="text:tokenize" expanded="true" height="60" name="Tokenize (2)" width="90" x="313" y="120"/>
         <operator activated="true" class="text:filter_stopwords_english" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="447" y="120"/>
         <operator activated="true" class="text:filter_by_length" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="581" y="120"/>
         <connect from_op="Read Excel" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
         <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents" to_port="documents 1"/>
         <connect from_op="Process Documents" from_port="example set" to_op="Tokenize (2)" to_port="document"/>
         <connect from_op="Tokenize (2)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
         <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
         <connect from_op="Filter Tokens (by Length)" from_port="document" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>

    Thanks
    Ratheesan
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    you have to put the text processing operators INTO the Process Documents operator. This is a super operator, that contains a subprocess. You can go there by double clicking on it.

    Greetings,
      Sebastian
  • ratheesanratheesan Member Posts: 68 Maven
    Hi,

    I used the operators as you said,but then also I couldn't solve my problem ,Getting the same error.

    This is my process.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="284" width="762">
          <operator activated="true" class="read_excel" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="C:\Documents and Settings\GRACE\Desktop\1000.xls"/>
            <list key="annotations"/>
          </operator>
          <operator activated="true" class="nominal_to_text" expanded="true" height="76" name="Nominal to Text" width="90" x="180" y="30"/>
          <operator activated="true" class="text:process_documents" expanded="true" height="94" name="Process Documents" width="90" x="315" y="30">
            <process expanded="true" height="284" width="762">
              <operator activated="true" class="text:tokenize" expanded="true" height="60" name="Tokenize" width="90" x="39" y="96"/>
              <operator activated="true" class="text:filter_stopwords_english" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="247" y="119"/>
              <operator activated="true" class="text:filter_by_length" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="380" y="120"/>
              <connect from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
              <connect from_op="Filter Stopwords (English)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
              <connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
          <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents" to_port="documents 1"/>
          <connect from_op="Process Documents" from_port="example set" to_port="result 2"/>
          <connect from_op="Process Documents" from_port="word list" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

    Thanks
    Ratheesan
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello Ratheesan,

    Change the "Process Documents" operator to "Process Documents from Data"

    regards

    Andrew
Sign In or Register to comment.