Bug in conversion of data from Python extension

jiri_kulikjiri_kulik Member Posts: 1 Contributor I
edited November 2018 in Help

Hi,

 

I'd like to report a bug in conversion of date data type from Python to RM dataset:

 

1. Dataset contains column 'Date' with type date and role 'id'. The column behaves as date should (sorting, plotting, extracting max date with macro).

2. The column is properly converted to python's pandas - dtype of the column is datetime64[ns], all the common pandas operations with datetime data work as expected. Also metadata for that column seems correct: 'Date': ('date', 'id')

3. The column contains only missing values after passing back to RapidMiner (number of examples is the same, but all the values in this column are missing). Even if the code is only:

import pandas
def rm_main(data):
return data

I went through the documentation to the extension to find out if there is anything specific about dates, but it does not seem so. I also created a completely new, empty project with a trivial dataset (just two columns, one for date and one for dummy data) but the data in the date column are always missing when received by RM.

 

I'm just exploring RapidMiner, so I'm sorry if missing something obvious. But if conversion from RM to python works, it seems to me that the opposite should work as well.

 

I'm using python 3.5.3, pandas 0.19.2 and numpy 1.12.0  from Anaconda on mac.

RapidMiner Studio 7.4.000

Python Scripting 7.4.0

 

Best regards

Jiri

 

Update

The conversion from pandas to dataset works when the date column is made timezone-aware by tz_localize in the python script.

 

It seems from my experiments that all date and date_time types in RapidMiner are timezone aware no matter what, so would it be possible to localize them right away in conversion from dataset into pandas?

 

One suggestion in the end - why not to create a naive, timezone unaware date type in RapidMiner?

 

Best regards

Jiri

Best Answer

  • gmeiergmeier Employee, Member Posts: 25 RM Engineering
    Solution Accepted

    I could not reproduce your problem going RM > Python > RM. It works fine for me on Windows and Mac even if I change the timezones on my pc or in Studio. Could you post your test process, please?

     

    In RapidMiner Studio the dates in the date columns are not stored timezone aware, just displayed like this. You can adjust the display timezone via Preferences > General > Timezone.

     

    Best,

    Gisa

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    That is interesting, I was unaware that RM > Pandas > RM causes this issue because of the time zones. One thing to try is just change the RM date-time values to a nominal data type via the Date to Nominal operator and then pass it to Python?  If it needs to be data-time then convert it inside pandas. Let's see if @gmeier can shed some light on this, I believe she drafted up the Execute Python operator.

     

     

Sign In or Register to comment.