A newby question here...

arlenecaballeroarlenecaballero Member, University Professor Posts: 3  University Professor
edited November 2018 in Help

Hello experts!

I'm a newby as well. Can anyone help me how to collect data from twitter feeds of a specific tweeter account? I would lile to get the count of dominant words in the feeds then eventually analyze if there's a pattern.

 

I hope you can help me out. Please experts! Godspeed.

 

Warm regards,

Ms. Arlene

Best Answers

  • mschmitzmschmitz Posts: 2,113  RM Data Scientist
    Solution Accepted

    Dear arlenacaballero,

     

    i think there is no one click solution for this, because Rapidminer's Twitter operators can "only" search for keywords and do not get the history of a specific tweeter. There should be a easy way to do this with a very small script. I would guess my colleague @Thomas_Ott has something at hand?

     

    ~Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • JEdwardJEdward Posts: 564   Unicorn
    Solution Accepted

    I think the operator you're looking for is called 'Get Twitter User Statuses'.  Configure this and point it at the user you want and download their latest posts. 

     

    Be careful about Twitter's rate limit though as if you are analysing accounts that post a lot of statuses you might need to be careful on how you loop & store the since_id & max_id parameters.  (You only want fresh content, not repeats) 

Answers

  • arlenecaballeroarlenecaballero Member, University Professor Posts: 3  University Professor

    Hi Sir Martin!

    Thank you for your prompt response. As a newby, I feel delighted with your suggestions. I hope your collegue @Tbone could shed light on how to use the tools to finish my case study. Thank you. 

     

    By the way, may I ask how may twitter feeds can  I get using the tool, or rather how can I get the older post like at least 1year. Thank again. =)

  • arlenecaballeroarlenecaballero Member, University Professor Posts: 3  University Professor

    Thank you Sir Edward! Really appreciate your response. 

    Just like to ask if there's a link or online Help that could assist me to configure the 'Get Twitter Statuses'. For a newby like me, I really need a guide to figure out the solution to what I really need. Thank you.

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    Twitter does indeed have a rate limit and how far you can go back into the Tweet stream for their non-paying customers. For the Twitter Search API you can only get about 5,000 tweets per search term. The paid Twitter API (firehose) removes all those limits.

     

    THe best course of action is to save your initial Twitter Search and then append the data set over time. That is typically how I do it.  It does require some planning and so forth but it can be done.

     


    @JEdward wrote:

    I think the operator you're looking for is called 'Get Twitter User Statuses'.  Configure this and point it at the user you want and download their latest posts. 

     

    Be careful about Twitter's rate limit though as if you are analysing accounts that post a lot of statuses you might need to be careful on how you loop & store the since_id & max_id parameters.  (You only want fresh content, not repeats) 


     

    mschmitz
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    This is a follow up showing a sample process that uses the Twitter Search operator. I arbitrarily assigned a positive and negative label so you can see the word frequencies and where they fall in the class labels.

     

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.0.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
    <parameter key="connection" value="Twitter Connection"/>
    <parameter key="query" value="iPhone6"/>
    <parameter key="limit" value="1000"/>
    <parameter key="language" value="en"/>
    </operator>
    <operator activated="true" class="split_data" compatibility="7.1.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="85">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.5"/>
    <parameter key="ratio" value="0.5"/>
    </enumeration>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.1.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
    <list key="function_descriptions">
    <parameter key="fake label" value="&quot;Positive&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.1.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="313" y="136">
    <list key="function_descriptions">
    <parameter key="fake label" value="&quot;Negative&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="append" compatibility="7.1.001" expanded="true" height="103" name="Append" width="90" x="447" y="85"/>
    <operator activated="true" class="set_role" compatibility="7.1.001" expanded="true" height="82" name="Set Role" width="90" x="581" y="85">
    <parameter key="attribute_name" value="fake label"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles">
    <parameter key="Id" value="id"/>
    </list>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="7.1.001" expanded="true" height="82" name="Nominal to Text" width="90" x="715" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="849" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.1.001" expanded="true" height="82" name="Process Documents from Data" width="90" x="983" y="85">
    <parameter key="prune_method" value="percentual"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.1.001" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Split Data" to_port="example set"/>
    <connect from_op="Split Data" from_port="partition 1" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Split Data" from_port="partition 2" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Append" from_port="merged set" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

Sign In or Register to comment.