RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

Identifying Data Type for Ages listed in a range

stevensteven Member Posts: 1 Contributor I
edited November 2018 in Help
Hello all,

I'm new. I've read all the FAQ's and searched for the topic before posting this, so I'm sorry if this has been answered already.

I have a set of data in CSV form. Two of the columns are for age and weight. However they are not integers, they are in ranges or bins, example 50-60 or 150-160.Β  The data shows up like this: [80-90) with a bracket on left and parenthesis on right.

My Questions:

1. What Data type do I classify this as? It automatically selects polynomial, is this correct?
2. Should I clean the data and remove the brackets and parenthesis?
3. Should I set the roles to age and weight for these columns?


Any help is greatly appreciated. thanks!

-SL

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,408  RM Data Scientist
    Hi,

    i personally would convert the ranges to one numerical value in first place. Most likely the mean. So reading it in to [80-90) (as polynominal) and built a process to replace this with 85 (as real). There might be use cases where this is not the preferred way, but i think in most of the use cases it is the way to go.
    If you want to use age and weight for analysis, i would recommend to leave them in the data set as regular attributes.

    ~Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.