Cleaning up invalid items from a date attribute

dhamptondhampton Member Posts: 14 Contributor II
edited November 2018 in Help

I wish to clean up a file obtained by web extraction.  It has a date attribute, and occasionally non-date values appear (specifically, they are text) so RapidMiner interprets this as a polynominal attribute.  I wish to carry out cleansing steps to change non-date entries to missing so that I can then use the Nominal to Date operator - but I've failed... any help would be appreciated!

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,421 RM Data Scientist
    Solution Accepted

    Hi,

     

    I guess Generate Attributes witch something like

    if(matches(attribute,"regex for date"),attributes,str(0/0))

    should do it. instead of str(0/0) you can also use the placeholder i always forget :).

     

    Best,

    Martin

     

    @Thomas_Ott : You are too fast :D

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
    Thomas_Ottdhampton

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Yes?

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Hey! His question was blank when I responded! :)

     

    Yes, RM will automatically assign mixed value types as polynomimal because it doesn't know if it's intentional or the data has mistakes.  The best bet is to extract out those string values like what Martin proposes. Then you can use a Nominal to Date operator to convert the dates in string format back to an actual date-time data format.

     

    Good luck!

  • dhamptondhampton Member Posts: 14 Contributor II

    Many thanks Martin

     

    I was trying to convert items within the attribute but your approach of creating a new attribute containing just the items that fit a regex expression for a date solves it.  And there are lots of expressions for that on the web - thankfully... I wouldn't want to do that myself!

     

    DH

    Thomas_Ott
Sign In or Register to comment.