Encoding full url path
I am trying to call a url to get a JSON response. I am using Enrich Data with Webservice. This is convinient since there are settings available to pull out the specific values through the jsonpath.
The problem that I am having is that the url that I am calling is received from upstream and different everytime. I need to add authentication parameters to the url as well. To do this I was thinking of encoding the url and then encode my authentication parameters. When I encode a url path the output converts all symbols such as a colon to hex values. For example "http://abc.com" turns to "http%3A%2F%2Fabc.com". Enrich Data with Webservices then throws an error. Is there a way to encode a full url path?
The other method to do this is to use Get Pages, but I have another issue with that.
Below is a simplified process of the issue.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="generate_data_user_specification" compatibility="7.1.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="34">
<list key="attribute_values">
<parameter key="xUrl" value=""https://content.guardianapis.com/sport/2016/aug/02/lizzie-armitstead-missed-tests-cycling-rochelle-gilmore""/>
</list>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" breakpoints="after" class="web:encode_urls" compatibility="7.1.001" expanded="true" height="82" name="Encode URLs (2)" width="90" x="313" y="34">
<parameter key="url_attribute" value="xUrl"/>
<parameter key="encoding" value="SYSTEM"/>
</operator>
<operator activated="true" class="web:enrich_data_by_webservice" compatibility="7.1.001" expanded="true" height="68" name="Enrich Data by Webservice" width="90" x="514" y="34">
<parameter key="query_type" value="JsonPath"/>
<list key="string_machting_queries"/>
<parameter key="attribute_type" value="Nominal"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<parameter key="ignore_CDATA" value="true"/>
<parameter key="assume_html" value="true"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
<parameter key="request_method" value="GET"/>
<parameter key="url" value="<%xUrl%>"/>
<parameter key="delay" value="0"/>
<list key="request_properties"/>
<parameter key="encoding" value="SYSTEM"/>
</operator>
<connect from_op="Generate Data by User Specification" from_port="output" to_op="Encode URLs (2)" to_port="example set input"/>
<connect from_op="Encode URLs (2)" from_port="example set output" to_op="Enrich Data by Webservice" to_port="Example Set"/>
<connect from_op="Enrich Data by Webservice" from_port="ExampleSet" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Answers
If it is just to validate them then you can use filter examples (such as the attached) to check if the URL is good or not. If it's to correct any errors that might have entered into the URL upstream then what might those errors be? If there's no chance that the domain part of the URL is invalid then you can remove that part of the URL and then correct the suffix of the URL with EncodeURLs, putting it all back together with Generate Attributes, but the key is knowing where the bad URLs could come into the data and what sort of errors these might be. Once that's known then corrections are straightforward.