RapidMiner

Highlighted
Newbie jiri_kulik
Newbie

Bug in conversion of data from Python extension

Hi,

 

I'd like to report a bug in conversion of date data type from Python to RM dataset:

 

1. Dataset contains column 'Date' with type date and role 'id'. The column behaves as date should (sorting, plotting, extracting max date with macro).

2. The column is properly converted to python's pandas - dtype of the column is datetime64[ns], all the common pandas operations with datetime data work as expected. Also metadata for that column seems correct: 'Date': ('date', 'id')

3. The column contains only missing values after passing back to RapidMiner (number of examples is the same, but all the values in this column are missing). Even if the code is only:

import pandas
def rm_main(data):
    return data

I went through the documentation to the extension to find out if there is anything specific about dates, but it does not seem so. I also created a completely new, empty project with a trivial dataset (just two columns, one for date and one for dummy data) but the data in the date column are always missing when received by RM.

 

I'm just exploring RapidMiner, so I'm sorry if missing something obvious. But if conversion from RM to python works, it seems to me that the opposite should work as well.

 

I'm using python 3.5.3, pandas 0.19.2 and numpy 1.12.0  from Anaconda on mac.

RapidMiner Studio 7.4.000

Python Scripting 7.4.0

 

Best regards

Jiri

 

Update

The conversion from pandas to dataset works when the date column is made timezone-aware by tz_localize in the python script.

 

It seems from my experiments that all date and date_time types in RapidMiner are timezone aware no matter what, so would it be possible to localize them right away in conversion from dataset into pandas?

 

One suggestion in the end - why not to create a naive, timezone unaware date type in RapidMiner?

 

Best regards

Jiri

2 REPLIES
RM Certified Expert
RM Certified Expert

Re: Bug in conversion of data from Python extension

That is interesting, I was unaware that RM > Pandas > RM causes this issue because of the time zones. One thing to try is just change the RM date-time values to a nominal data type via the Date to Nominal operator and then pass it to Python?  If it needs to be data-time then convert it inside pandas. Let's see if @gschaefer can shed some light on this, I believe she drafted up the Execute Python operator.

 

 

RM Staff
RM Staff
Solution

Re: Bug in conversion of data from Python extension

I could not reproduce your problem going RM > Python > RM. It works fine for me on Windows and Mac even if I change the timezones on my pc or in Studio. Could you post your test process, please?

 

In RapidMiner Studio the dates in the date columns are not stored timezone aware, just displayed like this. You can adjust the display timezone via Preferences > General > Timezone.

 

Best,

Gisa

Polls
How can RapidMiner increase participation in our new competitions?
Twitter Feed