The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

keyword-based text mining

seba77seba77 Member Posts: 2 Learner II
edited December 2018 in Help

Hello there,

I have a list of 50 keywords and want to analyze their occurence frequency in my dataset.
The general text mining process is not the problem. But I only want to analyze these 50 keywords. 
How can I apply this?

 

Thank you very much in advance!

 

Best Answer

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    You just create a wordlist with those 50 words and then apply that specific wordlist (using the wordlist input port) for any subsequent document you are going to process.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts

Answers

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @seba77,

     

    if I good understand, you can use use the Create ExampleSet operator to write yout list of 50 keywords and the Process Documents

    from Data and Process Documents operators to filter all the others words from your document.

    Here an example of process to adapt to your keywords and document : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="operator_toolbox:create_exampleset_from_doc" compatibility="0.7.000" expanded="true" height="68" name="Create Exampleset" width="90" x="45" y="34">
    <parameter key="Input Csv" value="att1&#10;apples&#10;oranges&#10;bananas"/>
    <parameter key="Parse all as Nominal" value="true"/>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="179" y="34">
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="447" y="34"/>
    <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
    <connect from_op="Tokenize (2)" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="447" y="187">
    <parameter key="text" value="apples are sweeter than oranges but bananas are the sweetest of them all"/>
    </operator>
    <operator activated="true" class="text:process_documents" compatibility="7.5.000" expanded="true" height="103" name="Process Documents" width="90" x="581" y="85">
    <process expanded="true">
    <connect from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Create Exampleset" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents" to_port="word list"/>
    <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
    <connect from_op="Process Documents" from_port="example set" to_port="result 2"/>
    <connect from_op="Process Documents" from_port="word list" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    Does this example of process answer to your need ?

     

    Regards,

     

    Lionel

Sign In or Register to comment.