Options

Passing examples to an operator (API) incrementally

batstache611batstache611 Member Posts: 45 Guru
edited December 2018 in Help

Hi everybody,

 

I'm sorry I tried finding a solution to this but wasn't successful. I'm trying to pass examples to one of the twitter operators that GETs tweets via one of their APIs. However it has a rate limit of 450 tweets/15 mins. I have a list of twitter handles whose tweets I want to collect. I'm using loop values to iterate through each of those. In my twitter operator, I can configure how many tweets per handle I want to GET. Right now my process is configured to get 30 tweets per handle with a delay of 1 minute between each handle such that it is approximately 450 tweets every 15 minutes. If I wanted more tweets per handle, I'd have to increase the delay time between each handle so that it never goes over the rate limit. This is not only the case with twitter's API but most APIs have these kinds of limits.

 

But instead of having to calculate how much I should adjust the delay time by every time I wan't to increase the number of tweets per handle, I'd like to have a way of grabbing 4 handles from the exampleset at one time with 150 tweets for each with a delay time of 15 mins -> and then move on to the next 4 handles. What would be the simplest way to do this? Attached is my process. Thank you.

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.5.001" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
<parameter key="csv_file" value="C:\Users\Pari\Downloads\new_twitters.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Website.true.polynominal.attribute"/>
<parameter key="1" value="TwitterHandle.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="concurrency:loop_values" compatibility="7.5.001" expanded="true" height="82" name="Loop Values" width="90" x="380" y="34">
<parameter key="attribute" value="TwitterHandle"/>
<parameter key="iteration_macro" value="loop_twitter"/>
<process expanded="true">
<operator activated="true" class="handle_exception" compatibility="7.5.001" expanded="true" height="82" name="Handle Exception" width="90" x="179" y="340">
<parameter key="exception_macro" value="error_message"/>
<process expanded="true">
<operator activated="false" class="social_media:get_twitter_user_statuses" compatibility="7.3.000" expanded="true" height="68" name="Get Twitter User Statuses" width="90" x="179" y="85">
<parameter key="connection" value="Twitter"/>
<parameter key="user" value="%{loop_twitter}"/>
</operator>
<operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="179" y="187">
<parameter key="connection" value="Twitter"/>
<parameter key="query" value="%{loop_twitter}"/>
<parameter key="limit" value="30"/>
</operator>
<connect from_op="Search Twitter" from_port="output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<process expanded="true">
<connect from_port="in 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="store" compatibility="7.5.001" expanded="true" height="68" name="Store" width="90" x="380" y="340">
<parameter key="repository_entry" value="//Local Repository/data/Equifax/Twitter Crawl/%{loop_twitter}"/>
</operator>
<operator activated="true" class="delay" compatibility="7.5.001" expanded="true" height="82" name="Delay" width="90" x="581" y="340">
<parameter key="delay_amount" value="60000"/>
</operator>
<connect from_port="input 1" to_op="Handle Exception" to_port="in 1"/>
<connect from_op="Handle Exception" from_port="out 1" to_op="Store" to_port="input"/>
<connect from_op="Store" from_port="through" to_op="Delay" to_port="through 1"/>
<connect from_op="Delay" from_port="through 1" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="subprocess" compatibility="7.5.001" expanded="true" height="82" name="Union Append" width="90" x="648" y="34">
<process expanded="true">
<operator activated="true" class="loop_collection" compatibility="7.5.001" expanded="true" height="82" name="Output (4)" width="90" x="45" y="34">
<parameter key="set_iteration_macro" value="true"/>
<process expanded="true">
<operator activated="false" breakpoints="after" class="select" compatibility="7.5.001" expanded="true" height="68" name="Select (5)" width="90" x="112" y="34">
<parameter key="index" value="%{iteration}"/>
</operator>
<operator activated="true" class="branch" compatibility="7.5.001" expanded="true" height="82" name="Branch (2)" width="90" x="313" y="34">
<parameter key="condition_type" value="expression"/>
<parameter key="expression" value="%{iteration}==1"/>
<process expanded="true">
<connect from_port="condition" to_port="input 1"/>
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="recall" compatibility="7.5.001" expanded="true" height="68" name="Recall (5)" width="90" x="45" y="187">
<parameter key="name" value="LoopData"/>
</operator>
<operator activated="true" class="union" compatibility="7.5.001" expanded="true" height="82" name="Union (2)" width="90" x="179" y="34"/>
<connect from_port="condition" to_op="Union (2)" to_port="example set 1"/>
<connect from_op="Recall (5)" from_port="result" to_op="Union (2)" to_port="example set 2"/>
<connect from_op="Union (2)" from_port="union" to_port="input 1"/>
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="remember" compatibility="7.5.001" expanded="true" height="68" name="Remember (5)" width="90" x="581" y="34">
<parameter key="name" value="LoopData"/>
</operator>
<connect from_port="single" to_op="Branch (2)" to_port="condition"/>
<connect from_op="Branch (2)" from_port="input 1" to_op="Remember (5)" to_port="store"/>
<connect from_op="Remember (5)" from_port="stored" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="select" compatibility="7.5.001" expanded="true" height="68" name="Select (6)" width="90" x="179" y="34">
<parameter key="index" value="%{iteration}"/>
</operator>
<connect from_port="in 1" to_op="Output (4)" to_port="collection"/>
<connect from_op="Output (4)" from_port="output 1" to_op="Select (6)" to_port="collection"/>
<connect from_op="Select (6)" from_port="selected" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="input 1"/>
<connect from_op="Loop Values" from_port="output 1" to_op="Union Append" to_port="in 1"/>
<connect from_op="Union Append" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

Answers

  • Options
    FBTFBT Member Posts: 106 Unicorn

    I think one way of solving this would be to use the "Multiply" and "Filter Examples" operators. If you need run this process only once, and if the number of your handles is manageable the solution is fairly simple:

     

    1. Generate a rank attribute (it may be quicker to do this directly in your source file). This basically just assigns one unique number from 1 to X (where X is the total number of handles) to your handles. It is then used for filtering purposes.

     

    2. Multiply your dataset as many times as required. From your given information, you would need X/4 copies of your dataset.

     

    3. Filter examples for rank. 1 - 4; 5 - 8; 9 - 12; etc., each being a different thread from your multiply operator.

     

    4. Run the process as you have it now (just adapt the delay accordingly).

     

    General note: make sure that the process order (within a multiply thread) is correct.

     

    If you have a huge list of handles, the filtering can be solved more elegantly in a different loop, but it requires some slightly more elaborate logic to make sure the correct handles are selected.

     

    If the process is meant to run constantly, you would need to put everything in yet another loop, making sure to configure the delay in such a way that it doesn't exceed the API limit. 

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi...another option is to use the Twitter Streaming API instead of the one out of the box (search).  I have not used it myself but my understanding is that, for use cases such as yours, it may be a better option: https://dev.twitter.com/streaming/public

     

    Scott

Sign In or Register to comment.