Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Cannot retrieve data with "Enrich Data by Webservice"
rachel_lomasky
Member Posts: 52 Maven
Hi,
I've downloaded the Web Mining extension and would like to use it to connect to a Google-provided webservice. I've constructed a GET url, and it works fine when I just paste it into a browser (bunch of JSON returned). However, when I run it with "Enrich Data by Webservice", I get:
Dec 3, 2016 10:31:57 AM SEVERE: Process failed: Cannot retrieve data from the specified URL 'https://www.googleapis.com/analytics/v3/data/ga'.
Dec 3, 2016 10:31:57 AM SEVERE: Here:
Dec 3, 2016 10:31:57 AM SEVERE: Process[1] (Process)
Dec 3, 2016 10:31:57 AM SEVERE: subprocess 'Main Process'
Dec 3, 2016 10:31:57 AM SEVERE: +- Retrieve questions[1] (Retrieve)
Dec 3, 2016 10:31:57 AM SEVERE: ==> +- Enrich Data by Webservice[1] (Enrich Data by Webservice)
Two questions:
1. Why doesn't it work?
2. Is there a way that I can see the query string to do debugging?
Thank you,
Rachel
Tagged:
0
Best Answer
-
sgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
here's a sample process (it's using RM 7.3):
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="7.3.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34">
<list key="attribute_values">
<parameter key="foo" value="0"/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="179" y="34">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="foo2" value=".*"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
<parameter key="url" value="https://www.googleapis.com/analytics/v3/data/ga?ids=ga:XXXXX&amp;start-date=30daysAgo&amp;end-date=yesterday&amp;metrics=ga:sessions&amp;access_token=XXXXXX"/>
<list key="request_properties"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>I just tested this with my own Google API account and it works.
Scott
0
Answers
hi...I use Google API all the time with this operator and it is quite tricky to get all the settings right. First guess - did you encode your URL? Can you share your parameter settings (without your key of course)?
The answer to your second question is no, RM does not give you the same verbose output as you would get with the terminal. Sometimes when I can't get it right, I do a cURL at the command line, get that to work, and then go back to RM.
Scott
<?xml version="1.0" encoding="UTF-8"?><process version="7.2.003">
<operator activated="true" class="retrieve" compatibility="7.2.003" expanded="true" height="68" name="Retrieve questions" width="90" x="45" y="85">
<parameter key="repository_entry" value="../../data/import/questions"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.2.003">
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.2.001" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="246" y="85">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<parameter key="attribute_type" value="Nominal"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<parameter key="ignore_CDATA" value="true"/>
<parameter key="assume_html" value="true"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
<parameter key="request_method" value="GET"/>
<parameter key="service_method" value="reportRequests"/>
<parameter key="url" value="https://www.googleapis.com/analytics/v3/data/ga"/>
<parameter key="delay" value="0"/>
<list key="request_properties">
<parameter key="ids" value="ga:myids"/>
<parameter key="start-date" value="30daysAgo"/>
<parameter key="end-date" value="yesterday"/>
<parameter key="metrics" value="ga:sessions"/>
<parameter key="access_token" value="my access token"/>
</list>
<parameter key="encoding" value="SYSTEM"/>
</operator>
</process>
hi ok thanks. It was hard to figure out that XML (it's from ver 7.2 and there's some strange cut and paste there) but I think I know what you're doing. I have not used Google Analytics API before but for a GET request, I would first try putting all the parameters in the URL, rather than in "request properties". Don't ask me why this makes a difference, but in my experience, it does. Try something like this in the URL:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3A<your number here>&start-date=30daysAgo&end-date=yesterday&metrics=ga%3Asessions&access_token=<your access token>
I also don't see anything in your String Matching (called "Machting in the XML!) query so you'll need to tell RapidMiner what you want to do with the response. I would recommend just doing Regular Expression and using .* for now - just to ensure you're getting a response.
Scott
Thank you, this works. Now to figure out how to parse the response...
<grin> should not be too bad. There are a variety of tools to use. Post if you need more help.
Scott
It ain't pretty, but I got it working .
Hi, I have the same problem with the "Enrich Data by Webservice". I already tried the parameters using curl.. its work. Here is my process:
<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="45" y="136">
<parameter key="text" value="I love hotdogs. Hotdogs are the greatest. They are hot and delicious."/>
<parameter key="add label" value="false"/>
<parameter key="label_type" value="nominal"/>
</operator>
<operator activated="true" class="text:documents_to_data" compatibility="7.4.001" expanded="true" height="82" name="Documents to Data" width="90" x="179" y="136">
<parameter key="text_attribute" value="text"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="datamanagement" value="double_sparse_array"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.3.000" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="313" y="136">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<parameter key="attribute_type" value="Nominal"/>
<list key="regular_expression_queries">
<parameter key="all" value=".*"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<parameter key="ignore_CDATA" value="true"/>
<parameter key="assume_html" value="true"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
<parameter key="request_method" value="POST"/>
<parameter key="body" value="text=<%text%>"/>
<parameter key="url" value="https://twinword-sentiment-analysis.p.mashape.com/analyze/"/>
<parameter key="delay" value="0"/>
<list key="request_properties">
<parameter key="X-Mashape-Key" value="QhBpo6d9YgmsherFsSBVfycN0czjp1rf0HIjsnooes2EdNYmao"/>
<parameter key="Content-Type" value="application/x-www-form-urlencoded"/>
<parameter key="Accept" value="application/json"/>
</list>
<parameter key="encoding" value="SYSTEM"/>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
<connect from_op="Documents to Data" from_port="example set" to_op="Enrich Data by Webservice" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I think there's a problem with your API key. I tried your XML code and get a JSON respons that say "
My problem was that I was quoting parameters. Everything should be non-quoted.
See this BUG-Report!