RapidMiner

0 Likes

Datatable generated by Execute Python bad displayed

Status: Open For Voting

Hi all,

 

Once again, it's to report a weird behaviour in RapidMiner : 

I'm following a tutorial on time series which use RapidMiner.

For that, I'm using the library Quandl of Python (via Execute Python operator) to retrieve from the web 

the stock prices that serve as entry dataset.

However, the first column (date-time column) contain only missing values "?" : 

Time_series_quandl_1.png

 

Here the process : 

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="246" y="85">
        <parameter key="script" value="import pandas as pd&#10;import quandl&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10;   data = quandl.get(&quot;WIKI/GE&quot;, start_date=&quot;2016-01-04&quot;, end_date=&quot;2016-03-26&quot;,collapse = &quot;daily&quot;,column_index =11,returns=&quot;numpy&quot;)&#10;   data = pd.DataFrame(data)&#10;    # connect 2 output ports to see the results&#10;   return data"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="380" y="85">
        <parameter key="attribute_name" value="Date"/>
        <parameter key="target_role" value="id"/>
        <list key="set_additional_roles"/>
      </operator>
      <connect from_op="Execute Python" from_port="output 1" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

 

I decided to execute the Python code in a Notebook and here it works perfectly fine (the datatable is good displayed) : 

Time_series_quandl_2.png

 

Can you help me to determine what's going on ?

 

Thanks you for your answers,

 

Best regards,

 

 

Lionel

 

 

 

Go to Solution. 7 Comments (7 New)
Comments
Unicorn

I think this question came up before in the Community. I seem to remember it was related to formating the data-time.

Hi all,

After research in the communauty, it seems there is not solution to this problem.

Concretely what is done to solve this problem ?

Thanks you for your answers,

 

Best regars,

 

Lionel

 

 

Unicorn

I have found a temporary solution:

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="112" y="85">
        <parameter key="script" value="import pandas as pd&#10;import quandl&#10;&#10;# rm_main is a mandatory function,&#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10;   data = quandl.get(&quot;WIKI/GE&quot;, start_date=&quot;2016-01-04&quot;, end_date=&quot;2016-03-26&quot;,collapse = &quot;daily&quot;,column_index =11,returns=&quot;numpy&quot;)&#10;   data = pd.DataFrame(data)&#10;    # connect 2 output ports to see the results&#10;   data['Date'] = data['Date'].astype(str)&#10;   return data"/>
      </operator>
      <operator activated="true" class="nominal_to_date" compatibility="8.1.000" expanded="true" height="82" name="Nominal to Date" width="90" x="313" y="85">
        <parameter key="attribute_name" value="Date"/>
        <parameter key="date_format" value="yyyy-mm-dd"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="514" y="85">
        <parameter key="attribute_name" value="Date"/>
        <parameter key="target_role" value="id"/>
        <list key="set_additional_roles"/>
      </operator>
      <connect from_op="Execute Python" from_port="output 1" to_op="Nominal to Date" to_port="example set input"/>
      <connect from_op="Nominal to Date" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Basically it consists of converting the dates to string in Python and then parsing them in RapidMiner Studio. 

 

Should RM be able to convert from pandas' dates automatically?

Unicorn

@SGolbert I think it should do that automatically because the Python Script operator translates the RM exampleset to the Pandas dataframe. I think this needs to be investigated by the RM Dev team. 

Hi,

 

Thanks you @SGolbert . This solution works well and helps me a lot.

 

Best regards, 

 

Lionel

Community Manager

moving to Product Feedback.

 

Scott

 

Community Manager
Status: Open For Voting