RapidMiner

Cleaning up invalid items from a date attribute

SOLVED
Contributor II dhampton
Contributor II

Cleaning up invalid items from a date attribute

I wish to clean up a file obtained by web extraction.  It has a date attribute, and occasionally non-date values appear (specifically, they are text) so RapidMiner interprets this as a polynominal attribute.  I wish to carry out cleansing steps to change non-date entries to missing so that I can then use the Nominal to Date operator - but I've failed... any help would be appreciated!

4 REPLIES
RM Certified Expert
RM Certified Expert

Re: Cleaning up invalid items from a date attribute

Highlighted
RM Staff
RM Staff
Solution

Re: Cleaning up invalid items from a date attribute

Hi,

 

I guess Generate Attributes witch something like

if(matches(attribute,"regex for date"),attributes,str(0/0))

should do it. instead of str(0/0) you can also use the placeholder i always forget Smiley Happy.

 

Best,

Martin

 

@Thomas_Ott : You are too fast Smiley Very Happy

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
RM Certified Expert
RM Certified Expert

Re: Cleaning up invalid items from a date attribute

Hey! His question was blank when I responded! Smiley Happy

 

Yes, RM will automatically assign mixed value types as polynomimal because it doesn't know if it's intentional or the data has mistakes.  The best bet is to extract out those string values like what Martin proposes. Then you can use a Nominal to Date operator to convert the dates in string format back to an actual date-time data format.

 

Good luck!

Contributor II dhampton
Contributor II

Re: Cleaning up invalid items from a date attribute

Many thanks Martin

 

I was trying to convert items within the attribute but your approach of creating a new attribute containing just the items that fit a regex expression for a date solves it.  And there are lots of expressions for that on the web - thankfully... I wouldn't want to do that myself!

 

DH