A newby question here...

arlenecaballero · June 2016

Hello experts!

I'm a newby as well. Can anyone help me how to collect data from twitter feeds of a specific tweeter account? I would lile to get the count of dominant words in the feeds then eventually analyze if there's a pattern.

I hope you can help me out. Please experts! Godspeed.

Warm regards,

Ms. Arlene

MartinLiebig · June 2016

Dear arlenacaballero,

i think there is no one click solution for this, because Rapidminer's Twitter operators can "only" search for keywords and do not get the history of a specific tweeter. There should be a easy way to do this with a very small script. I would guess my colleague @Thomas_Ott has something at hand?

~Martin

JEdward · June 2016

I think the operator you're looking for is called 'Get Twitter User Statuses'. Configure this and point it at the user you want and download their latest posts.

Be careful about Twitter's rate limit though as if you are analysing accounts that post a lot of statuses you might need to be careful on how you loop & store the since_id & max_id parameters. (You only want fresh content, not repeats)

arlenecaballero · June 2016

Hi Sir Martin!

Thank you for your prompt response. As a newby, I feel delighted with your suggestions. I hope your collegue @Tbone could shed light on how to use the tools to finish my case study. Thank you.

By the way, may I ask how may twitter feeds can I get using the tool, or rather how can I get the older post like at least 1year. Thank again.

arlenecaballero · June 2016

Thank you Sir Edward! Really appreciate your response.

Just like to ask if there's a link or online Help that could assist me to configure the 'Get Twitter Statuses'. For a newby like me, I really need a guide to figure out the solution to what I really need. Thank you.

Thomas_Ott · June 2016

Twitter does indeed have a rate limit and how far you can go back into the Tweet stream for their non-paying customers. For the Twitter Search API you can only get about 5,000 tweets per search term. The paid Twitter API (firehose) removes all those limits.

THe best course of action is to save your initial Twitter Search and then append the data set over time. That is typically how I do it. It does require some planning and so forth but it can be done.

@JEdward wrote:
I think the operator you're looking for is called 'Get Twitter User Statuses'. Configure this and point it at the user you want and download their latest posts.

Be careful about Twitter's rate limit though as if you are analysing accounts that post a lot of statuses you might need to be careful on how you loop & store the since_id & max_id parameters. (You only want fresh content, not repeats)

Thomas_Ott · June 2016

This is a follow up showing a sample process that uses the Twitter Search operator. I arbitrarily assigned a positive and negative label so you can see the word frequencies and where they fall in the class labels.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="7.0.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
        <parameter key="connection" value="Twitter Connection"/>
        <parameter key="query" value="iPhone6"/>
        <parameter key="limit" value="1000"/>
        <parameter key="language" value="en"/>
      </operator>
      <operator activated="true" class="split_data" compatibility="7.1.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="85">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.5"/>
          <parameter key="ratio" value="0.5"/>
        </enumeration>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="7.1.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
        <list key="function_descriptions">
          <parameter key="fake label" value="&quot;Positive&quot;"/>
        </list>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="7.1.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="313" y="136">
        <list key="function_descriptions">
          <parameter key="fake label" value="&quot;Negative&quot;"/>
        </list>
      </operator>
      <operator activated="true" class="append" compatibility="7.1.001" expanded="true" height="103" name="Append" width="90" x="447" y="85"/>
      <operator activated="true" class="set_role" compatibility="7.1.001" expanded="true" height="82" name="Set Role" width="90" x="581" y="85">
        <parameter key="attribute_name" value="fake label"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles">
          <parameter key="Id" value="id"/>
        </list>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="7.1.001" expanded="true" height="82" name="Nominal to Text" width="90" x="715" y="85">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="849" y="85">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.1.001" expanded="true" height="82" name="Process Documents from Data" width="90" x="983" y="85">
        <parameter key="prune_method" value="percentual"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.1.001" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Split Data" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 1" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Split Data" from_port="partition 2" to_op="Generate Attributes (2)" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Append" to_port="example set 1"/>
      <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 2"/>
      <connect from_op="Append" from_port="merged set" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

A newby question here...

Best Answers

Answers