02-27-2017 01:17 PM - edited 02-27-2017 01:20 PM
I wish to clean up a file obtained by web extraction. It has a date attribute, and occasionally non-date values appear (specifically, they are text) so RapidMiner interprets this as a polynominal attribute. I wish to carry out cleansing steps to change non-date entries to missing so that I can then use the Nominal to Date operator - but I've failed... any help would be appreciated!
Solved! Go to Solution.
02-27-2017 01:24 PM
I guess Generate Attributes witch something like
if(matches(attribute,"regex for date"),attributes,str(0/0))
should do it. instead of str(0/0) you can also use the placeholder i always forget .
@Thomas_Ott : You are too fast
02-27-2017 01:31 PM
Hey! His question was blank when I responded!
Yes, RM will automatically assign mixed value types as polynomimal because it doesn't know if it's intentional or the data has mistakes. The best bet is to extract out those string values like what Martin proposes. Then you can use a Nominal to Date operator to convert the dates in string format back to an actual date-time data format.
03-02-2017 05:16 AM
Many thanks Martin
I was trying to convert items within the attribute but your approach of creating a new attribute containing just the items that fit a regex expression for a date solves it. And there are lots of expressions for that on the web - thankfully... I wouldn't want to do that myself!