Manipulate string in URL in a loop
I want to generate an URL for The Guardian API. When querying a search word (here "Brexit"), the API returns the first 10 hits of the list in JSON. In order to be able to see all the results, I need to be able to change the 'page' parameter here: https://content.guardianapis.com/search?page=1&q=Brexit&api-key=a2d0...
Here's an example process. What I would love is to be able to loops through all the pages, ie be able to increase the page size by 1. Any ideas would be appreciated!
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.1.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="182" y="78">
<parameter key="generator_type" value="comma_separated_text"/>
<parameter key="number_of_examples" value="100"/>
<parameter key="use_stepsize" value="false"/>
<list key="function_descriptions">
<parameter key="url" value="https://content.guardianapis.com/search?page=3&q=Brexit&api-key=..."/>
</list>
<parameter key="add_id_attribute" value="false"/>
<list key="numeric_series_configuration"/>
<list key="date_series_configuration"/>
<list key="date_series_configuration (interval)"/>
<parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
<parameter key="input_csv_text" value="url https://content.guardianapis.com/search?page=3&q=Brexit&api-key=a2d052f9-9052-4297-ac5f-5341b104e479"/>
<parameter key="column_separator" value=","/>
<parameter key="parse_all_as_nominal" value="false"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="trim_attribute_names" value="true"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<operator activated="true" class="web:retrieve_webpages" compatibility="7.3.000" expanded="true" height="68" name="Get Pages" width="90" x="411" y="78">
<parameter key="link_attribute" value="url"/>
<parameter key="random_user_agent" value="false"/>
<parameter key="connection_timeout" value="10000"/>
<parameter key="read_timeout" value="10000"/>
<parameter key="follow_redirects" value="true"/>
<parameter key="accept_cookies" value="none"/>
<parameter key="cookie_scope" value="global"/>
<parameter key="request_method" value="GET"/>
<parameter key="delay" value="none"/>
<parameter key="delay_amount" value="1000"/>
<parameter key="min_delay_amount" value="0"/>
<parameter key="max_delay_amount" value="1000"/>
</operator>
</process>
Best Answer
-
MartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,528 RM Data Scientist
Hi @greg_lorincz79,
your XML does not work for me. There seems to be some issue?
In any case, the solution is a loop operator. Loop provides you with a macro called iteration. You can just use this in the page parameter like this:
https://content.guardianapis.com/search?page=%{iteration}&q=Brexit&api-key=XXXXX
%{iteration} is always replaced with the current iteration count.
I would recommend that you delete your API key from your initial post. APIkeys are like passwords, you don't share them.
Best,
Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany2
Answers
Thank you, I managed to sort out the looping with a macro.