keyword-based text mining

seba77seba77 Member Posts: 2 Contributor I
edited December 2018 in Help

Hello there,

I have a list of 50 keywords and want to analyze their occurence frequency in my dataset.
The general text mining process is not the problem. But I only want to analyze these 50 keywords. 
How can I apply this?


Thank you very much in advance!


Best Answer

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    You just create a wordlist with those 50 words and then apply that specific wordlist (using the wordlist input port) for any subsequent document you are going to process.


    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts


  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @seba77,


    if I good understand, you can use use the Create ExampleSet operator to write yout list of 50 keywords and the Process Documents

    from Data and Process Documents operators to filter all the others words from your document.

    Here an example of process to adapt to your keywords and document : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="operator_toolbox:create_exampleset_from_doc" compatibility="0.7.000" expanded="true" height="68" name="Create Exampleset" width="90" x="45" y="34">
    <parameter key="Input Csv" value="att1&#10;apples&#10;oranges&#10;bananas"/>
    <parameter key="Parse all as Nominal" value="true"/>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="179" y="34">
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="447" y="34"/>
    <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
    <connect from_op="Tokenize (2)" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="447" y="187">
    <parameter key="text" value="apples are sweeter than oranges but bananas are the sweetest of them all"/>
    <operator activated="true" class="text:process_documents" compatibility="7.5.000" expanded="true" height="103" name="Process Documents" width="90" x="581" y="85">
    <process expanded="true">
    <connect from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    <connect from_op="Create Exampleset" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents" to_port="word list"/>
    <connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
    <connect from_op="Process Documents" from_port="example set" to_port="result 2"/>
    <connect from_op="Process Documents" from_port="word list" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>

    Does this example of process answer to your need ?





Sign In or Register to comment.