Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

how to count "exampleword" in tweet

Mustafa_AVDANMustafa_AVDAN Member Posts: 34 Contributor I
edited December 2018 in Help

Ekran Görüntüsü (1).pnghello again;Im sorry to I asked the same question , but I need some help for my project and ı cant continue...Anybody help me?How can I count "HASTAGHWORD" at each tweets in rapid miner?which operator can help me?ı didnt use Exel , this picture is just for example...I will got tweets with Search tweet Operator, after that ı will count some words,on the Search tweet dataset.Finally ı will generate a new column and ı will add this value of counter to my new column for each row(each tweets)...Please help me ; I must do it this week!:[

Tagged:

Answers

  • Edin_KlapicEdin_Klapic Employee, RMResearcher, Member Posts: 299 RM Data Scientist

    Hi @Mustafa_AVDAN,

     

    One starting point might be to use the Split Operator and split the examples by HASHTAGWORD...

    On the other side, you may also take a look at the Text Processing extension. The Operators in there could also help.

     

    Best regards,

    Edin

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi Mustafa,

     

    If you have Python installed on your computer, you can use the "Execute Python" operator (to download and install from Marketplace).

    There are only 5 lines of codes to perform the wanted task.

    From the "Search Twitter" operator, i added a "select attribute" operator to retain only the "Text" attribute where there are the Tweets.

    To modify your hashtagword, you have just to :

     -  Click on "Execute Python" operator -> parameters -> Edit text 

     - in the code, set hashtagword = "xxxxx" where xxxxx is your wanted hashtagword

     

    Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
    <parameter key="connection" value="dkk"/>
    <parameter key="query" value="video"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="447" y="34">
    <parameter key="script" value="import pandas as pd&#10;import numpy as np&#10;import re&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main(data):&#10;&#10; hashtagword = &quot;of&quot;&#10; occurence = np.arange(len(data))&#10; &#10; for i in range(len(data)) : &#10; occurence[i] = len(re.findall(hashtagword,data.iloc[i,0]))&#10; &#10; data['Occurence'] = occurence&#10;&#10; # connect 2 output ports to see the results&#10; return data"/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Execute Python" to_port="input 1"/>
    <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope this will be helpful,

     

    Regards,

     

    Lionel

     

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi again @Mustafa_AVDAN

     

    After further investigation, your task is possible without Python.

    1. First you have to download and install the Text Processing extension from the marketplace.

    2. Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
    <parameter key="connection" value="dkk"/>
    <parameter key="query" value="video"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.0.001" expanded="true" height="82" name="Nominal to Text" width="90" x="313" y="34"/>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="648" y="34">
    <parameter key="vector_creation" value="Term Occurrences"/>
    <parameter key="keep_text" value="true"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="782" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="attribute" value="Text"/>
    <parameter key="regular_expression" value="of\b"/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    3. Set your "hashtagword" in the "select Attributes(2)" operator parameters : 

    for example in this process, I have perform some tests with the word "of". So you have to replace of

    by your own hashtagword in the regular expression parameter : 

    process_tweets.png

     

    4. The results view looks like this : 

    results_tweets.png 

     

    I think you have now response elements.

     

    Regards,

     

    Lionel

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @Mustafa_AVDAN again (and again)

     

    You can find a second release of the last process more in the "RM spirit" (easier to use and with note)

    Here the process : 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
    <parameter key="connection" value="dkk"/>
    <parameter key="query" value="video"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.0.001" expanded="true" height="82" name="Nominal to Text" width="90" x="313" y="34"/>
    <operator activated="true" class="set_macro" compatibility="8.0.001" expanded="true" height="82" name="Set hashtagword" width="90" x="447" y="34">
    <parameter key="macro" value="hashTagword"/>
    <parameter key="value" value="of"/>
    <description align="center" color="red" colored="true" width="126">Set your hashtagword by modifying the parameter &amp;quot;value&amp;quot; (don't modify the &amp;quot;macro&amp;quot; name)</description>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="648" y="34">
    <parameter key="vector_creation" value="Term Occurrences"/>
    <parameter key="keep_text" value="true"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="782" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="attribute" value="Text"/>
    <parameter key="regular_expression" value="%{hashTagword}\b"/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Set hashtagword" to_port="through 1"/>
    <connect from_op="Set hashtagword" from_port="through 1" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>
Sign In or Register to comment.