🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Bug with Loop Values?

pari1234pari1234 Member Posts: 26  Maven
edited December 2018 in Help

Hi RM team, I'm trying to call Facebook graph API using Enrich Data by Webservice operator which I'm using inside the Loop Values operator that outputs a collection of documents. Input data is a csv with a bunch of facebook business page usernames. Basically, as far as I understand, the Loop Values operator is supposed to grab each username and return me some facebook content for each handle, but -

 

  • it is only doing that partially
  • each document in the collection from Loop Values should only contain data for one username however it contains all the usernames and only one row of data per user.

Attached:

  1. RM process
  2. Input excel
  3. JSON output from facebook API from an API testing platform.

Any help will be greatly appreciated as I'm kind of on a deadline for this. Thank you.

 

PROCESS

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="7.5.001" expanded="true" height="68" name="Read Excel" width="90" x="112" y="34">
<parameter key="excel_file" value="C:\Users\Pari\Documents\BDC\Socials\Facebook Scrapper\Test\TestHandles.xlsx"/>
<parameter key="imported_cell_range" value="A1:A5"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Username.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="concurrency:loop_values" compatibility="7.5.001" expanded="true" height="82" name="Loop Values" width="90" x="313" y="34">
<parameter key="attribute" value="Username"/>
<process expanded="true">
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (2)" width="90" x="313" y="34">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries">
<parameter key="message" value="$..message"/>
<parameter key="post id" value="$..id"/>
</list>
<parameter key="url" value="https://graph.facebook.com/v2.10/&amp;lt;%Username%&amp;gt;/posts?access_token=1745625495738593|w_a8sajfHCYsCHNZOTDr5H1r-wY"/>
<list key="request_properties"/>
<parameter key="encoding" value="UTF-8"/>
</operator>
<connect from_port="input 1" to_op="Enrich Data by Webservice (2)" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice (2)" from_port="ExampleSet" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Loop Values" to_port="input 1"/>
<connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

JSON Output

 

 

{
"data": [
{
"created_time": "2017-10-31T12:01:32+0000",
"message": "Click to read news on #Tableau latest conference.\n#BigData #Tech",
"id": "1563861787269208_1910035019318548"
},
{
"created_time": "2017-10-30T22:02:02+0000",
"message": "\"South Australia is about to get “Big Doctor”, cloud-based artificial intelligence that analyses our health and intervenes when it spots something amiss.\"-Brad Crouch",
"id": "1563861787269208_1909800592675324"
},
{
"created_time": "2017-10-30T21:21:00+0000",
"message": "Why you should welcome Artificial Intelligence with open arms",
"id": "1563861787269208_1909790786009638"
},
{
"created_time": "2017-10-30T12:00:59+0000",
"message": "\"AI will put bankers out of work? Some people think these advances will boost productivity, enabling industries to actually increase the number of jobs\"",
"id": "1563861787269208_1909600706028646"
},
{
"created_time": "2017-10-27T12:01:38+0000",
"message": "What's Elon Musks stance on Artificial Intelligence?",
"id": "1563861787269208_1908177749504275"
}
],
"paging": {
"cursors": {
"before": "Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5TXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9qazFPVEE1TURrME56STJOelkzTlRZAeU9ROE1ZAWEJwWDNOMGIzSjVYMmxrRHlFeE5UWXpPRFl4TnpnM01qWTVNakE0WHpFNU1UQXdNelV3TVRrek1UZAzFORGdQQkhScGJXVUdXZAmhtSEFFPQZDZD",
"after": "Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5UXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9pMHlORGs0TURBMk9UZA3pOVEEyTkRJMU9EUVBER0ZA3YVY5emRHOXllVjlwWkE4aE1UVTJNemcyTVRjNE56STJPVEl3T0Y4eE9UQTRNVGMzTnpRNU5UQTBNamMxRHdSMGFXMWxCbG56SUNJQgZDZD"
},
"next": "https://graph.facebook.com/v2.10/1563861787269208/posts?pretty=1&limit=5&after=Q2c4U1pXNTBYM0YxWlhKNVgzTjBiM0o1WDJsa0R5UXhOVFl6T0RZAeE56ZAzNNalk1TWpBNE9pMHlORGs0TURBMk9UZA3pOVEEyTkRJMU9EUVBER0ZA3YVY5emRHOXllVjlwWkE4aE1UVTJNemcyTVRjNE56STJPVEl3T0Y4eE9UQTRNVGMzTnpRNU5UQTBNamMxRHdSMGFXMWxCbG56SUNJQgZDZD"
}
}

 

Best Answer

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959  Community Manager
    Solution Accepted

    ah I see.  Sorry about that.  :)  So this is a common challenge that we are currently working - parsing JSON arrays as a response to some webservice.  There are a couple of workarounds that you can use in the meanwhile...converting to XML is probably the easiest.  RapidMiner handles XML much, much better than JSON in its current version.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="7.6.001" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
    <parameter key="csv_file" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/TestHandles.csv"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="UTF-8"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Username.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (2)" width="90" x="246" y="34">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="jsonResponse" value=".*"/>
    </list>
    <list key="regular_region_queries"/>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries">
    <parameter key="message" value="$..message"/>
    <parameter key="post id" value="$..id"/>
    </list>
    <parameter key="url" value="https://graph.facebook.com/v2.10/&amp;lt;%Username%&amp;gt;/posts?access_token=1745625495738593|w_a8sajfHCYsCHNZOTDr5H1r-wY"/>
    <list key="request_properties"/>
    <parameter key="encoding" value="UTF-8"/>
    </operator>
    <operator activated="true" class="loop_examples" compatibility="7.6.001" expanded="true" height="103" name="Loop Examples" width="90" x="380" y="34">
    <process expanded="true">
    <operator activated="true" class="filter_example_range" compatibility="7.6.001" expanded="true" height="82" name="Filter Example Range" width="90" x="45" y="34">
    <parameter key="first_example" value="%{example}"/>
    <parameter key="last_example" value="%{example}"/>
    </operator>
    <operator activated="true" class="text:data_to_documents" compatibility="7.5.000" expanded="true" height="68" name="Data to Documents" width="90" x="179" y="34">
    <parameter key="select_attributes_and_weights" value="true"/>
    <list key="specify_weights">
    <parameter key="jsonResponse" value="1.0"/>
    </list>
    </operator>
    <operator activated="true" class="text:combine_documents" compatibility="7.5.000" expanded="true" height="82" name="Combine Documents" width="90" x="313" y="34"/>
    <operator activated="true" class="web:json_to_xml" compatibility="7.3.000" expanded="true" height="68" name="JSON to XML" width="90" x="447" y="34"/>
    <operator activated="true" class="text:write_document" compatibility="7.5.000" expanded="true" height="82" name="Write Document" width="90" x="581" y="34">
    <parameter key="file" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/jsonExport.xml"/>
    </operator>
    <operator activated="true" class="advanced_file_connectors:read_xml" compatibility="7.6.001" expanded="true" height="68" name="Read XML" width="90" x="715" y="34">
    <parameter key="file" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/jsonExport.xml"/>
    <parameter key="xpath_for_examples" value="//json/data"/>
    <enumeration key="xpaths_for_attributes">
    <parameter key="xpath_for_attribute" value="created_time[1]/text()"/>
    <parameter key="xpath_for_attribute" value="id[1]/text()"/>
    <parameter key="xpath_for_attribute" value="message[1]/text()"/>
    </enumeration>
    <list key="namespaces"/>
    <parameter key="use_default_namespace" value="false"/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="created_time[1]/text().true.attribute_value.attribute"/>
    <parameter key="1" value="id[1]/text().true.attribute_value.attribute"/>
    <parameter key="2" value="message[1]/text().true.attribute_value.attribute"/>
    </list>
    </operator>
    <connect from_port="example set" to_op="Filter Example Range" to_port="example set input"/>
    <connect from_op="Filter Example Range" from_port="example set output" to_op="Data to Documents" to_port="example set"/>
    <connect from_op="Data to Documents" from_port="documents" to_op="Combine Documents" to_port="documents 1"/>
    <connect from_op="Combine Documents" from_port="document" to_op="JSON to XML" to_port="document"/>
    <connect from_op="JSON to XML" from_port="document" to_op="Write Document" to_port="document"/>
    <connect from_op="Write Document" from_port="file" to_op="Read XML" to_port="file"/>
    <connect from_op="Read XML" from_port="output" to_port="output 1"/>
    <portSpacing port="source_example set" spacing="0"/>
    <portSpacing port="sink_example set" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Union Append" width="90" x="514" y="34">
    <process expanded="true">
    <operator activated="true" class="loop_collection" compatibility="7.6.001" expanded="true" height="82" name="Output (4)" width="90" x="45" y="34">
    <parameter key="set_iteration_macro" value="true"/>
    <process expanded="true">
    <operator activated="false" breakpoints="after" class="select" compatibility="7.6.001" expanded="true" height="68" name="Select (5)" width="90" x="112" y="34">
    <parameter key="index" value="%{iteration}"/>
    </operator>
    <operator activated="true" class="branch" compatibility="7.6.001" expanded="true" height="82" name="Branch (2)" width="90" x="313" y="34">
    <parameter key="condition_type" value="expression"/>
    <parameter key="expression" value="%{iteration}==1"/>
    <process expanded="true">
    <connect from_port="condition" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="recall" compatibility="7.6.001" expanded="true" height="68" name="Recall (5)" width="90" x="45" y="187">
    <parameter key="name" value="LoopData"/>
    </operator>
    <operator activated="true" class="union" compatibility="7.6.001" expanded="true" height="82" name="Union (2)" width="90" x="179" y="34"/>
    <connect from_port="condition" to_op="Union (2)" to_port="example set 1"/>
    <connect from_op="Recall (5)" from_port="result" to_op="Union (2)" to_port="example set 2"/>
    <connect from_op="Union (2)" from_port="union" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="remember" compatibility="7.6.001" expanded="true" height="68" name="Remember (5)" width="90" x="581" y="34">
    <parameter key="name" value="LoopData"/>
    </operator>
    <connect from_port="single" to_op="Branch (2)" to_port="condition"/>
    <connect from_op="Branch (2)" from_port="input 1" to_op="Remember (5)" to_port="store"/>
    <connect from_op="Remember (5)" from_port="stored" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="select" compatibility="7.6.001" expanded="true" height="68" name="Select (6)" width="90" x="179" y="34">
    <parameter key="index" value="%{iteration}"/>
    </operator>
    <connect from_port="in 1" to_op="Output (4)" to_port="collection"/>
    <connect from_op="Output (4)" from_port="output 1" to_op="Select (6)" to_port="collection"/>
    <connect from_op="Select (6)" from_port="selected" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Enrich Data by Webservice (2)" to_port="Example Set"/>
    <connect from_op="Enrich Data by Webservice (2)" from_port="ExampleSet" to_op="Loop Examples" to_port="example set"/>
    <connect from_op="Loop Examples" from_port="output 1" to_op="Union Append" to_port="in 1"/>
    <connect from_op="Union Append" from_port="out 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Is this better?


    Scott

    mschmitz

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,079  RM Data Scientist

    Hi,

     

    are you sure that this is not caused by a limit on the API? Have you tried to deactivate parallelism of Loop Values and add a Delay (with a Delay Operator)?

     

    Edit: Nevermind, that was off the scope..

     

    Cheers,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959  Community Manager

    hi @pari1234 - yes I understand what you're trying to do.  You're working too hard :)   With "Enrich Data via Webservice", it is already going through your values for Username one by one, feeding each one to your API and getting a response.  You don't need to Loop Values.  It's also why there's a "delay" parameter in Enrich Data...it is good practice to put a 200ms or greater delay between queries (to prevent overloading server).

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="7.6.001" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
    <parameter key="csv_file" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/random community stuff/TestHandles.csv"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="UTF-8"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Username.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice (2)" width="90" x="313" y="34">
    <parameter key="query_type" value="JsonPath"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries"/>
    <list key="regular_region_queries"/>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries">
    <parameter key="message" value="$..message"/>
    <parameter key="post id" value="$..id"/>
    </list>
    <parameter key="url" value="https://graph.facebook.com/v2.10/&amp;lt;%Username%&amp;gt;/posts?access_token=1745625495738593|w_a8sajfHCYsCHNZOTDr5H1r-wY"/>
    <list key="request_properties"/>
    <parameter key="encoding" value="UTF-8"/>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Enrich Data by Webservice (2)" to_port="Example Set"/>
    <connect from_op="Enrich Data by Webservice (2)" from_port="ExampleSet" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    (FYI it's probably not a good idea to post your token in an open forum like this  :)   )


    Scott

  • pari1234pari1234 Member Posts: 26  Maven

    Thanks @sgenzer, I appreciate your response. The token I'm using is for an unpublished app on fb and I will change it once I have exhausted RM community resources :smileyhappy: . In the xml process that you replied back with, the key problem still remains. I only get one row of data per username, i.e. one post and one post id. However I wish to get all the posts (with whatever pagination limit facebook has) and post_ids per username. If you look at the sample JSON o/p from the Graph API, it has 5 posts with ids and a time stamp or in other words 5 rows of data for the given username. Which is why I thought using a loop might solve that for me. Hope that helps with you understanding it better. Thank you.

     

    Pari

  • pari1234pari1234 Member Posts: 26  Maven

    Thank you VERY much @sgenzer This one helps. There was a minor hiccup with the file encoding during "Read XML" but I changed the encoding for "Write Document" from SYSTEM to UTF-16 and it seems to be working perfectly!  Thank you!

    sgenzer
Sign In or Register to comment.