Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Extract Words from Text based on predefined set of keywords"

evelyn_baranievelyn_barani Member Posts: 1 Learner II
edited May 2019 in Help

Hi all,

I am very knew to RapidMiner.  

 

I have a set of news articels and I want to find out if the articels include given words (from an excel file). I also want to find out how oft one particular word ocurrs. 

 

I've been reading a lot in the forum, but havent found a solution yet. 

 

Can anybody help out?

 

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @evelyn_barani - welcome to the community. So it would be very helpful if you could post your data set so we can see exactly what you're working on.

     

    There are many tools to do what you want to do with these articles. Most likely you'll want to download the Text Processing extension from the marketplace and use those tools. Like this:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.0.003" expanded="true" height="68" name="Retrieve REDUCED job post data set (5862 examples)" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Community Samples/Community Data Science/Text Mining Tutorials by Neil McGuigan/data/REDUCED job post data set (5862 examples)"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="9.0.003" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="JobDescription.contains.manager"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">job description contains &amp;quot;manager&amp;quot;</description>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="34">
    <parameter key="create_word_vector" value="false"/>
    <parameter key="keep_text" value="true"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="45" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve REDUCED job post data set (5862 examples)" from_port="output" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

     

Sign In or Register to comment.