Datatable generated by Execute Python bad displayed

lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
edited December 2018 in Product Feedback - Resolved

Hi all,

 

Once again, it's to report a weird behaviour in RapidMiner : 

I'm following a tutorial on time series which use RapidMiner.

For that, I'm using the library Quandl of Python (via Execute Python operator) to retrieve from the web 

the stock prices that serve as entry dataset.

However, the first column (date-time column) contain only missing values "?" : 

Time_series_quandl_1.png

 

Here the process : 

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="246" y="85">
<parameter key="script" value="import pandas as pd&#10;import quandl&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10; data = quandl.get(&quot;WIKI/GE&quot;, start_date=&quot;2016-01-04&quot;, end_date=&quot;2016-03-26&quot;,collapse = &quot;daily&quot;,column_index =11,returns=&quot;numpy&quot;)&#10; data = pd.DataFrame(data)&#10; # connect 2 output ports to see the results&#10; return data"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="380" y="85">
<parameter key="attribute_name" value="Date"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles"/>
</operator>
<connect from_op="Execute Python" from_port="output 1" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

 

I decided to execute the Python code in a Notebook and here it works perfectly fine (the datatable is good displayed) : 

Time_series_quandl_2.png

 

Can you help me to determine what's going on ?

 

Thanks you for your answers,

 

Best regards,

 

 

Lionel

 

 

 

0
0 votes

Fixed and Released · Last Updated

SCRIPT-58

Comments

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I think this question came up before in the Community. I seem to remember it was related to formating the data-time.

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi all,

    After research in the communauty, it seems there is not solution to this problem.

    Concretely what is done to solve this problem ?

    Thanks you for your answers,

     

    Best regars,

     

    Lionel

     

     

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn
    Solution Accepted

    I have found a temporary solution:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="112" y="85">
    <parameter key="script" value="import pandas as pd&#10;import quandl&#10;&#10;# rm_main is a mandatory function,&#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10; data = quandl.get(&quot;WIKI/GE&quot;, start_date=&quot;2016-01-04&quot;, end_date=&quot;2016-03-26&quot;,collapse = &quot;daily&quot;,column_index =11,returns=&quot;numpy&quot;)&#10; data = pd.DataFrame(data)&#10; # connect 2 output ports to see the results&#10; data['Date'] = data['Date'].astype(str)&#10; return data"/>
    </operator>
    <operator activated="true" class="nominal_to_date" compatibility="8.1.000" expanded="true" height="82" name="Nominal to Date" width="90" x="313" y="85">
    <parameter key="attribute_name" value="Date"/>
    <parameter key="date_format" value="yyyy-mm-dd"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="514" y="85">
    <parameter key="attribute_name" value="Date"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <connect from_op="Execute Python" from_port="output 1" to_op="Nominal to Date" to_port="example set input"/>
    <connect from_op="Nominal to Date" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Basically it consists of converting the dates to string in Python and then parsing them in RapidMiner Studio. 

     

    Should RM be able to convert from pandas' dates automatically?

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @SGolbert I think it should do that automatically because the Python Script operator translates the RM exampleset to the Pandas dataframe. I think this needs to be investigated by the RM Dev team. 

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi,

     

    Thanks you @SGolbert . This solution works well and helps me a lot.

     

    Best regards, 

     

    Lionel

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    moving to Product Feedback.

     

    Scott

     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Dear all,

     

    I agree with Thomas. Some tasks that cannot be performed by RM, can be performed with Python scripts.

    It's frustrating not to be able to display the associated results in RM.

     

    Regards,

     

    Lionel

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    tagging @bhupendra_patil :)

  • phellingerphellinger Employee, Member Posts: 103 RM Engineering
    Hi All,

    Date values no longer become missing values when using Execute Python starting from Python Scripting Extension version 9.3.1.

    Thanks for reporting this problem and for coming up with workarounds.

    Best,
    Peter
Sign In or Register to comment.