SOLVED: Open File with basic authentication in RapidAnalytics

juliojulio Member Posts: 17 Contributor II
edited November 2018 in Help
Dear all,

Has someone stumbled into this?

I want to open a file that on a site that has basic authentication. With Rapidminer that goes well, but I do not see the possibility for Analytics to remember passwords.

Get Page in theory should work, as you can pass special authentication parameters, but Get Page seems to transform the read data (I am reading an xls file).

Any suggestions ? (I understand we could write an extension to the open file operator...




  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hi Julio,

    I think I'm answering your other part of this query on another topic. 
    I've used Basic authentication before with Get Page as well, but is that your main problem or simply reading the xls? 
  • juliojulio Member Posts: 17 Contributor II
    Hi J.

    Thank you for this.We tried with Rapidminer 5.2.8 and the process you sent gave an error. BUT, tried it with 5.3 and it worked fine.
    Great! So I assume that some work was done in 5.3 on these operators.

    We should be able to take it from here! Thank you for this!

    Please note that we would still be very interested in knowing how to POST a xls file to a Rapidanalytics service. We willl follow your encoding suggestions and play around. Actually, we will try this directly on 5.3 to ensure this is not the same problem!

    Thank you!

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hi Julio,

    Okay, let's solve your Basic Authentication troubles.  I'm not sure how your authentication is based, but generally it'll need Base64 encryption which there isn't a RapidMiner operator for. 

    See the below process as an example that I normally use as a template when using Basic Auth. 

    Good luck. 
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.012">
      <operator activated="true" class="process" compatibility="5.3.012" expanded="true" name="Process">
        <description>This process allows Applications to connect to Twitter using the Application Only authorisation level. 

    For documentation for Twitter REST 1.1 API see here:<;/description>
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="5.3.012" expanded="true" height="76" name="Credentials" width="90" x="45" y="30">
            <process expanded="true">
              <operator activated="true" class="generate_data_user_specification" compatibility="5.3.012" expanded="true" height="60" name="Fuel Data" width="90" x="45" y="75">
                <list key="attribute_values">
                  <parameter key="Username" value="&quot;username_here&quot;"/>
                  <parameter key="Password" value="&quot;password_here&quot;"/>
                <list key="set_additional_roles"/>
              <operator activated="true" class="loop_attributes" compatibility="5.3.012" expanded="true" height="76" name="Loop Attributes" width="90" x="179" y="75">
                <description>It's good practive to encode the URL before converting to Base64 (same as when calling, but this analyst was lazy) :)</description>
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attributes" value="Password|Username"/>
                <process expanded="true">
                  <operator activated="true" class="web:encode_urls" compatibility="5.3.001" expanded="true" height="76" name="Encode URLs" width="90" x="179" y="120">
                    <parameter key="url_attribute" value="%{loop_attribute}"/>
                    <parameter key="encoding" value="US-ASCII"/>
                  <connect from_port="example set" to_op="Encode URLs" to_port="example set input"/>
                  <connect from_op="Encode URLs" from_port="example set output" to_port="example set"/>
                  <portSpacing port="source_example set" spacing="0"/>
                  <portSpacing port="sink_example set" spacing="0"/>
                  <portSpacing port="sink_result 1" spacing="0"/>
              <operator activated="true" class="generate_attributes" compatibility="5.3.012" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="75">
                <description>Concatenate Username and Password
    (This might change depending on how your authorization is based)
    If for example it's based on the RFC6749 ( specification then you need  an extra step to get your authorization token.  </description>
                <list key="function_descriptions">
                  <parameter key="token" value="concat(Username,&quot;:&quot;,Password)"/>
              <operator activated="true" class="select_attributes" compatibility="5.3.012" expanded="true" height="76" name="Select Attributes" width="90" x="112" y="210">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attributes" value="|token"/>
              <operator activated="true" class="execute_script" compatibility="5.3.012" expanded="true" height="76" name="Execute Script" width="90" x="246" y="210">
                <description>Important bit!!

    Convert credentials to Base64. 

    As there is no operator for this so a small piece of Groovy script is executed on the attribute.  </description>
                <parameter key="script" value="ExampleSet eSet = operator.getInput(ExampleSet.class);&#10;for (Example example : eSet) { &#10;def s = example[&quot;token&quot;] &#10;String encoded = s.bytes.encodeBase64().toString()&#10;example[&quot;token&quot;] = encoded&#10;&#9;}&#10;&#10;return eSet;"/>
              <connect from_op="Fuel Data" from_port="output" to_op="Loop Attributes" to_port="example set"/>
              <connect from_op="Loop Attributes" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Execute Script" to_port="input 1"/>
              <connect from_op="Execute Script" from_port="output 1" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
          <operator activated="true" class="subprocess" compatibility="5.3.012" expanded="true" height="76" name="GetRequestToken" width="90" x="179" y="30">
            <process expanded="true">
              <operator activated="true" class="extract_macro" compatibility="5.3.012" expanded="true" height="60" name="Extract Macro" width="90" x="45" y="30">
                <description>Get that 64bit encoded goodness. </description>
                <parameter key="macro" value="token64bit"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="token"/>
                <parameter key="example_index" value="1"/>
                <list key="additional_macros"/>
              <operator activated="true" class="web:get_webpage" compatibility="5.3.001" expanded="true" height="60" name="Get Page" width="90" x="112" y="120">
                <description>See request properties for use of macro. 
    If based on RFC6749 then this call would be POST with client credentials (see spec </description>
                <parameter key="url" value=""/>
                <list key="query_parameters"/>
                <list key="request_properties">
                  <parameter key="Authorization" value="Basic %{token64bit}"/>
              <connect from_port="in 1" to_op="Extract Macro" to_port="example set"/>
              <connect from_op="Get Page" from_port="output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
          <connect from_op="Credentials" from_port="out 1" to_op="GetRequestToken" to_port="in 1"/>
          <connect from_op="GetRequestToken" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
  • juliojulio Member Posts: 17 Contributor II

    Part of the solution was answered in the following thread:,7143.0.html

    And the rest is here.

    Why that specific ISO encoding would do the job...? In any case, great support and thank you for your inventful approach to solving this!!

    Many thanks!

  • AlbertCSparksAlbertCSparks Member Posts: 4 Contributor I
    This is a great solution for basic auth. Do you have any suggestions for how this could be modified to do oAuth requests? I am stuck on finding a starting point for doing oAuth in Rapidminer and this thread is the closest thing I could find!

    I am trying to download data from Twitter and Facebook APIs directly into Rapidminer.


  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Yes.  OAuth works in exactly the same way.  The process above is a small snippet from one I built that authenticated with Twitter to download data.  Facebook is also the same.  

    For OAuth the basic process is this.
    > First register your application with the API to get your ClientKey, ClientSecret & if necessary any additional tokens instructions.  
    > Use GetPage to send these tokens in the request to the OAuth API.  
    > Next read the resulting Bearer token using ReadXML and store it as a Macro.  
    > Then use the 'token' Macro in any GetPage requests you have to prove authentication & execute your API request.  

    Considerations: You will want to separate this into separate processes because most APIs have a limit on the number of calls you can make within a certain period of time.  So you store your 'Active' login token and make your API requests using this rather than wasting your calls reauthenticating each time.  

    Send me a note and we can have a proper chat about it, we work in similar areas so can probably help each other's knowledge using RM along the way.  

Sign In or Register to comment.