RapidMiner

Learner I elibartholf
Learner I

HELP obtaining Twitter user details (I'm new to RapidMiner)

Hello helpful people! I am trying to use the "Get Twitter user details" operator in order to get the following for each ID in my Twitter search:

- location

- number of followers

- number of friends

- number of favorites

- number of tweets, etc.

 

I see that the "Get Twitter user details" operator will get me results for one Twitter ID at a time. However, I have 5,000 IDs that I need the above information for. Is there a way to obtain this simulanteously? Or perhaps using another operator? THANK YOU! Smiley Happy

8 REPLIES
RM Staff
RM Staff

Re: HELP obtaining Twitter user details (I'm new to RapidMiner)

Hi Elibart,

 

your question got me interested and I think that you need to use the Loop Values operator in combination with the Get Twitter User Details operator. Here is a simple process showing what I meant:

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
  <operator activated="true" class="read_csv" compatibility="7.5.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
    <parameter key="csv_file" value="C:\Users\SebastianGolbert\Documents\Twitter\names.txt"/>
    <parameter key="column_separators" value=";"/>
    <parameter key="trim_lines" value="false"/>
    <parameter key="use_quotes" value="true"/>
    <parameter key="quotes_character" value="&quot;"/>
    <parameter key="escape_character" value="\"/>
    <parameter key="skip_comments" value="false"/>
    <parameter key="comment_characters" value="#"/>
    <parameter key="parse_numbers" value="true"/>
    <parameter key="decimal_character" value="."/>
    <parameter key="grouped_digits" value="false"/>
    <parameter key="grouping_character" value=","/>
    <parameter key="date_format" value=""/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations"/>
    <parameter key="time_zone" value="SYSTEM"/>
    <parameter key="locale" value="English (United States)"/>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information"/>
    <parameter key="read_not_matching_values_as_missings" value="true"/>
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="data_management" value="auto"/>
  </operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
  <operator activated="true" class="concurrency:loop_values" compatibility="7.5.001" expanded="true" height="82" name="Loop Values" width="90" x="380" y="34">
    <parameter key="attribute" value="att1"/>
    <parameter key="iteration_macro" value="loop_value"/>
    <parameter key="reuse_results" value="false"/>
    <parameter key="enable_parallel_execution" value="true"/>
    <process expanded="true">
      <operator activated="true" class="social_media:get_twitter_user_details" compatibility="7.3.000" expanded="true" height="68" name="Get Twitter User Details" width="90" x="380" y="34">
        <parameter key="connection" value="Twitter"/>
        <parameter key="query_type" value="name"/>
        <parameter key="user" value="%{loop_value}"/>
      </operator>
      <connect from_op="Get Twitter User Details" from_port="output" to_port="output 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="source_input 2" spacing="0"/>
      <portSpacing port="sink_output 1" spacing="0"/>
      <portSpacing port="sink_output 2" spacing="0"/>
    </process>
  </operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
  <operator activated="true" class="append" compatibility="7.5.001" expanded="true" height="82" name="Append" width="90" x="648" y="34">
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="data_management" value="auto"/>
    <parameter key="merge_type" value="all"/>
  </operator>
</process>

Please try it out and give us a feedback about the running time, the part about appending all the collections could be quite inneficient.

 

Best regards,

SebaG

Contributor I m_oke
Contributor I

Re: HELP obtaining Twitter user details (I'm new to RapidMiner)

@SGolbert For some strange reason, the xml script you posted in your reply is not running in my studio.

 

Could you please re-confirm that it is running in your studio?

RM Staff
RM Staff

Re: HELP obtaining Twitter user details (I'm new to RapidMiner)

Hi @m_oke,

 

attached is a working process in RapidMiner Studio v7.6.001. Of course you need to replace the Twitter connection with your own one.

By the way, I recommend to use only a non duplicate list of User Ids to search for (Easiest way: Aggregation Operator and group by "From-User-Id"). The amount of free Twitter API requests is limited per month.

 

Best regards,

Edin

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
        <parameter key="connection" value="Twitter Klapic"/>
        <parameter key="query" value="rapidminer"/>
        <parameter key="limit" value="3"/>
      </operator>
      <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="246" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="From-User-Id"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="concurrency:loop_values" compatibility="7.6.001" expanded="true" height="82" name="Loop Values" width="90" x="380" y="34">
        <parameter key="attribute" value="From-User-Id"/>
        <process expanded="true">
          <operator activated="true" class="social_media:get_twitter_user_details" compatibility="7.3.000" expanded="true" height="68" name="Get Twitter User Details (2)" width="90" x="648" y="34">
            <parameter key="connection" value="Twitter Klapic"/>
            <parameter key="query_type" value="id"/>
            <parameter key="id" value="%{loop_value}"/>
            <parameter key="user" value="%{loop_value}"/>
          </operator>
          <connect from_op="Get Twitter User Details (2)" from_port="output" to_port="output 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
          <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="364" y="83">Type your comment</description>
        </process>
      </operator>
      <operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append" width="90" x="514" y="34"/>
      <connect from_op="Search Twitter" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
      <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Loop Values" to_port="input 1"/>
      <connect from_op="Loop Values" from_port="output 1" to_op="Append" to_port="example set 1"/>
      <connect from_op="Append" from_port="merged set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Contributor I m_oke
Contributor I

Re: HELP obtaining Twitter user details (I'm new to RapidMiner)

@Edin_Klapic Thanks Edin,

 

It worked (though you posted the reply in a different thread Smiley Happy ).

 

Could you please tell me what you did differently to make it work?

RM Staff
RM Staff

Re: HELP obtaining Twitter user details (I'm new to RapidMiner)

Hi,

 

@elibartholf I can confirm that the XML from @SGolbert is broken. Please find a working process XML in my other post above.

 

Sorry @m_oke, I answered your question in the Original thread and linked this thread because of the XML which is a working process.

 

The problem with "Get Twitter User Details" is that the parameter 'name' searches for the Screen name of a user.

That is the one with @. Those do not have blanks. If you can obtain those names you can use them.

Otherwise you can use the parameter id within "Get Twitter User Details". The id is a number and is also available from the Operator Search Twitter (Attribute: From-User-Id).

 

Best regards,

Edin

RM Staff
RM Staff

Re: HELP obtaining Twitter user details (I'm new to RapidMiner)

My process was a simpler version of the one from Edin, so no need to fix the XML.

RM Certified Expert
RM Certified Expert

Re: HELP obtaining Twitter user details (I'm new to RapidMiner)

The XML works on my side but I've found using the Get User Details operator to be prone to API issues on Twitter. In @Edin_Klapic's example, he uses 3 as the maximum search for RapidMiner. Anything greater than 6 causes an API problem, which is rather strange.
RM Staff
RM Staff

Re: HELP obtaining Twitter user details (I'm new to RapidMiner)

Hi all,

 

Further investigation shows that this problem only occurs together with long user-ids.

We investigate this on a code base.

 

I am afraid, in the meantime the only working solution seems to filter those user_ids.

 

Best regards,

Edin

Polls
How can RapidMiner increase participation in our new competitions?
Twitter Feed