The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

Identifying Data Type for Ages listed in a range

stevensteven Member Posts: 1 Contributor I
edited November 2018 in Help
Hello all,

I'm new. I've read all the FAQ's and searched for the topic before posting this, so I'm sorry if this has been answered already.

I have a set of data in CSV form. Two of the columns are for age and weight. However they are not integers, they are in ranges or bins, example 50-60 or 150-160.  The data shows up like this: [80-90) with a bracket on left and parenthesis on right.

My Questions:

1. What Data type do I classify this as? It automatically selects polynomial, is this correct?
2. Should I clean the data and remove the brackets and parenthesis?
3. Should I set the roles to age and weight for these columns?


Any help is greatly appreciated. thanks!

-SL

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    Hi,

    i personally would convert the ranges to one numerical value in first place. Most likely the mean. So reading it in to [80-90) (as polynominal) and built a process to replace this with 85 (as real). There might be use cases where this is not the preferred way, but i think in most of the use cases it is the way to go.
    If you want to use age and weight for analysis, i would recommend to leave them in the data set as regular attributes.

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.