The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

how to count "exampleword" in tweet

Mustafa_AVDANMustafa_AVDAN Member Posts: 34 Contributor I
edited December 2018 in Help

Ekran Görüntüsü (1).pnghello again;Im sorry to I asked the same question , but I need some help for my project and ı cant continue...Anybody help me?How can I count "HASTAGHWORD" at each tweets in rapid miner?which operator can help me?ı didnt use Exel , this picture is just for example...I will got tweets with Search tweet Operator, after that ı will count some words,on the Search tweet dataset.Finally ı will generate a new column and ı will add this value of counter to my new column for each row(each tweets)...Please help me ; I must do it this week!:[

Tagged:

Answers

  • Edin_KlapicEdin_Klapic Employee, RMResearcher, Member Posts: 299 RM Data Scientist

    Hi @Mustafa_AVDAN,

     

    One starting point might be to use the Split Operator and split the examples by HASHTAGWORD...

    On the other side, you may also take a look at the Text Processing extension. The Operators in there could also help.

     

    Best regards,

    Edin

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi Mustafa,

     

    If you have Python installed on your computer, you can use the "Execute Python" operator (to download and install from Marketplace).

    There are only 5 lines of codes to perform the wanted task.

    From the "Search Twitter" operator, i added a "select attribute" operator to retain only the "Text" attribute where there are the Tweets.

    To modify your hashtagword, you have just to :

     -  Click on "Execute Python" operator -> parameters -> Edit text 

     - in the code, set hashtagword = "xxxxx" where xxxxx is your wanted hashtagword

     

    Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
    <parameter key="connection" value="dkk"/>
    <parameter key="query" value="video"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="447" y="34">
    <parameter key="script" value="import pandas as pd&#10;import numpy as np&#10;import re&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main(data):&#10;&#10; hashtagword = &quot;of&quot;&#10; occurence = np.arange(len(data))&#10; &#10; for i in range(len(data)) : &#10; occurence[i] = len(re.findall(hashtagword,data.iloc[i,0]))&#10; &#10; data['Occurence'] = occurence&#10;&#10; # connect 2 output ports to see the results&#10; return data"/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Execute Python" to_port="input 1"/>
    <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope this will be helpful,

     

    Regards,

     

    Lionel

     

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi again @Mustafa_AVDAN

     

    After further investigation, your task is possible without Python.

    1. First you have to download and install the Text Processing extension from the marketplace.

    2. Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
    <parameter key="connection" value="dkk"/>
    <parameter key="query" value="video"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.0.001" expanded="true" height="82" name="Nominal to Text" width="90" x="313" y="34"/>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="648" y="34">
    <parameter key="vector_creation" value="Term Occurrences"/>
    <parameter key="keep_text" value="true"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="782" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="attribute" value="Text"/>
    <parameter key="regular_expression" value="of\b"/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    3. Set your "hashtagword" in the "select Attributes(2)" operator parameters : 

    for example in this process, I have perform some tests with the word "of". So you have to replace of

    by your own hashtagword in the regular expression parameter : 

    process_tweets.png

     

    4. The results view looks like this : 

    results_tweets.png 

     

    I think you have now response elements.

     

    Regards,

     

    Lionel

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @Mustafa_AVDAN again (and again)

     

    You can find a second release of the last process more in the "RM spirit" (easier to use and with note)

    Here the process : 

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
    <parameter key="connection" value="dkk"/>
    <parameter key="query" value="video"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.0.001" expanded="true" height="82" name="Nominal to Text" width="90" x="313" y="34"/>
    <operator activated="true" class="set_macro" compatibility="8.0.001" expanded="true" height="82" name="Set hashtagword" width="90" x="447" y="34">
    <parameter key="macro" value="hashTagword"/>
    <parameter key="value" value="of"/>
    <description align="center" color="red" colored="true" width="126">Set your hashtagword by modifying the parameter &amp;quot;value&amp;quot; (don't modify the &amp;quot;macro&amp;quot; name)</description>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="648" y="34">
    <parameter key="vector_creation" value="Term Occurrences"/>
    <parameter key="keep_text" value="true"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="782" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="attribute" value="Text"/>
    <parameter key="regular_expression" value="%{hashTagword}\b"/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Set hashtagword" to_port="through 1"/>
    <connect from_op="Set hashtagword" from_port="through 1" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>
Sign In or Register to comment.