HELP obtaining Twitter user details (I'm new to RapidMiner)

elibartholfelibartholf Member Posts: 1 Contributor I
edited December 2018 in Product Feedback - Resolved

Hello helpful people! I am trying to use the "Get Twitter user details" operator in order to get the following for each ID in my Twitter search:

- location

- number of followers

- number of friends

- number of favorites

- number of tweets, etc.

 

I see that the "Get Twitter user details" operator will get me results for one Twitter ID at a time. However, I have 5,000 IDs that I need the above information for. Is there a way to obtain this simulanteously? Or perhaps using another operator? THANK YOU! :)

0
0 votes

Released · Last Updated

9.0.0

Comments

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi Elibart,

     

    your question got me interested and I think that you need to use the Loop Values operator in combination with the Get Twitter User Details operator. Here is a simple process showing what I meant:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <operator activated="true" class="read_csv" compatibility="7.5.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
    <parameter key="csv_file" value="C:\Users\SebastianGolbert\Documents\Twitter\names.txt"/>
    <parameter key="column_separators" value=";"/>
    <parameter key="trim_lines" value="false"/>
    <parameter key="use_quotes" value="true"/>
    <parameter key="quotes_character" value="&quot;"/>
    <parameter key="escape_character" value="\"/>
    <parameter key="skip_comments" value="false"/>
    <parameter key="comment_characters" value="#"/>
    <parameter key="parse_numbers" value="true"/>
    <parameter key="decimal_character" value="."/>
    <parameter key="grouped_digits" value="false"/>
    <parameter key="grouping_character" value=","/>
    <parameter key="date_format" value=""/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations"/>
    <parameter key="time_zone" value="SYSTEM"/>
    <parameter key="locale" value="English (United States)"/>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information"/>
    <parameter key="read_not_matching_values_as_missings" value="true"/>
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="data_management" value="auto"/>
    </operator>
    </process>
    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <operator activated="true" class="concurrency:loop_values" compatibility="7.5.001" expanded="true" height="82" name="Loop Values" width="90" x="380" y="34">
    <parameter key="attribute" value="att1"/>
    <parameter key="iteration_macro" value="loop_value"/>
    <parameter key="reuse_results" value="false"/>
    <parameter key="enable_parallel_execution" value="true"/>
    <process expanded="true">
    <operator activated="true" class="social_media:get_twitter_user_details" compatibility="7.3.000" expanded="true" height="68" name="Get Twitter User Details" width="90" x="380" y="34">
    <parameter key="connection" value="Twitter"/>
    <parameter key="query_type" value="name"/>
    <parameter key="user" value="%{loop_value}"/>
    </operator>
    <connect from_op="Get Twitter User Details" from_port="output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    </process>
    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <operator activated="true" class="append" compatibility="7.5.001" expanded="true" height="82" name="Append" width="90" x="648" y="34">
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="data_management" value="auto"/>
    <parameter key="merge_type" value="all"/>
    </operator>
    </process>

    Please try it out and give us a feedback about the running time, the part about appending all the collections could be quite inneficient.

     

    Best regards,

    SebaG

  • m_okem_oke Member Posts: 11 Contributor I

    @SGolbert For some strange reason, the xml script you posted in your reply is not running in my studio.

     

    Could you please re-confirm that it is running in your studio?

  • Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 299 RM Data Scientist

    Hi @m_oke,

     

    attached is a working process in RapidMiner Studio v7.6.001. Of course you need to replace the Twitter connection with your own one.

    By the way, I recommend to use only a non duplicate list of User Ids to search for (Easiest way: Aggregation Operator and group by "From-User-Id"). The amount of free Twitter API requests is limited per month.

     

    Best regards,

    Edin

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
    <parameter key="connection" value="Twitter Klapic"/>
    <parameter key="query" value="rapidminer"/>
    <parameter key="limit" value="3"/>
    </operator>
    <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="From-User-Id"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="concurrency:loop_values" compatibility="7.6.001" expanded="true" height="82" name="Loop Values" width="90" x="380" y="34">
    <parameter key="attribute" value="From-User-Id"/>
    <process expanded="true">
    <operator activated="true" class="social_media:get_twitter_user_details" compatibility="7.3.000" expanded="true" height="68" name="Get Twitter User Details (2)" width="90" x="648" y="34">
    <parameter key="connection" value="Twitter Klapic"/>
    <parameter key="query_type" value="id"/>
    <parameter key="id" value="%{loop_value}"/>
    <parameter key="user" value="%{loop_value}"/>
    </operator>
    <connect from_op="Get Twitter User Details (2)" from_port="output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="364" y="83">Type your comment</description>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append" width="90" x="514" y="34"/>
    <connect from_op="Search Twitter" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
    <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Loop Values" to_port="input 1"/>
    <connect from_op="Loop Values" from_port="output 1" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • m_okem_oke Member Posts: 11 Contributor I

    @Edin_Klapic Thanks Edin,

     

    It worked (though you posted the reply in a different thread :) ).

     

    Could you please tell me what you did differently to make it work?

  • Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 299 RM Data Scientist

    Hi,

     

    @elibartholf I can confirm that the XML from @SGolbert is broken. Please find a working process XML in my other post above.

     

    Sorry @m_oke, I answered your question in the Original thread and linked this thread because of the XML which is a working process.

     

    The problem with "Get Twitter User Details" is that the parameter 'name' searches for the Screen name of a user.

    That is the one with @. Those do not have blanks. If you can obtain those names you can use them.

    Otherwise you can use the parameter id within "Get Twitter User Details". The id is a number and is also available from the Operator Search Twitter (Attribute: From-User-Id).

     

    Best regards,

    Edin

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    My process was a simpler version of the one from Edin, so no need to fix the XML.

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    The XML works on my side but I've found using the Get User Details operator to be prone to API issues on Twitter. In @Edin_Klapic's example, he uses 3 as the maximum search for RapidMiner. Anything greater than 6 causes an API problem, which is rather strange.
  • Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 299 RM Data Scientist

    Hi all,

     

    Further investigation shows that this problem only occurs together with long user-ids.

    We investigate this on a code base.

     

    I am afraid, in the meantime the only working solution seems to filter those user_ids.

     

    Best regards,

    Edin

  • KPLKPL RapidMiner Certified Analyst, Member Posts: 9 Contributor II

    Yes, experiencing the same problem with long Twitter IDs. Works OK for shorter IDs.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi @KPL - yes the error still exists.  I have a hunch it has to do with the move from 32-bit to 64-bit user ID numbers (you'll notice that the "long" user ids are 18 digits instead of 9).  So right now I would recommend skipping those ids (assuming they are not that critical to you):

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
    <parameter key="connection" value="Twitter1"/>
    <parameter key="query" value="rapidminer"/>
    <parameter key="limit" value="10"/>
    </operator>
    <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="From-User-Id"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="concurrency:loop_values" compatibility="7.6.001" expanded="true" height="82" name="Loop Values" width="90" x="380" y="34">
    <parameter key="attribute" value="From-User-Id"/>
    <parameter key="enable_parallel_execution" value="false"/>
    <process expanded="true">
    <operator activated="true" class="handle_exception" compatibility="7.6.001" expanded="true" height="82" name="Handle Exception" width="90" x="648" y="34">
    <process expanded="true">
    <operator activated="true" class="social_media:get_twitter_user_details" compatibility="7.3.000" expanded="true" height="68" name="Get Twitter User Details (2)" width="90" x="45" y="34">
    <parameter key="connection" value="Twitter1"/>
    <parameter key="query_type" value="id"/>
    <parameter key="id" value="%{loop_value}"/>
    <parameter key="user" value="%{loop_value}"/>
    </operator>
    <connect from_op="Get Twitter User Details (2)" from_port="output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <process expanded="true">
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Handle Exception" from_port="out 1" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <description align="center" color="yellow" colored="false" height="105" resized="false" width="180" x="364" y="83">Type your comment</description>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append" width="90" x="514" y="34"/>
    <connect from_op="Search Twitter" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
    <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Loop Values" to_port="input 1"/>
    <connect from_op="Loop Values" from_port="output 1" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    If those users are important to you, you will need to solve it via a cURL shell script (Execute Program), Enrich Data via Webservice, or otherwise.

     

    Scott

     

  • KPLKPL RapidMiner Certified Analyst, Member Posts: 9 Contributor II

    @sgenzer, thanks for the bug confirmation.

    Scott, could you elaborate further on your proposed solutions? I'm not sure how that would get "under the hood" of the Get Twitter User Details operator with an ID query type.

    Thanks!

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    yeah I was afraid you were going to ask that.  :)  So in order to do a workaround you need to not use the Twitter operator at all, but rather commmunicate with the Twitter API directly using other RapidMiner operators.  I've written several KB articles here in the community showing different API use cases but never bothered with Twitter as it's one of the only ones where we actually have a custom operator.  SO I would say to 1) read my various KB articles about APIs; and 2) if you're still game, go to developer.twitter.com and give it a go.

     

    Scott

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    This was fixed in RapidMiner 9.0 already.
Sign In or Register to comment.