Identifying Data Type for Ages listed in a range
Hello all,
I'm new. I've read all the FAQ's and searched for the topic before posting this, so I'm sorry if this has been answered already.
I have a set of data in CSV form. Two of the columns are for age and weight. However they are not integers, they are in ranges or bins, example 50-60 or 150-160. The data shows up like this: [80-90) with a bracket on left and parenthesis on right.
My Questions:
1. What Data type do I classify this as? It automatically selects polynomial, is this correct?
2. Should I clean the data and remove the brackets and parenthesis?
3. Should I set the roles to age and weight for these columns?
Any help is greatly appreciated. thanks!
-SL
I'm new. I've read all the FAQ's and searched for the topic before posting this, so I'm sorry if this has been answered already.
I have a set of data in CSV form. Two of the columns are for age and weight. However they are not integers, they are in ranges or bins, example 50-60 or 150-160. The data shows up like this: [80-90) with a bracket on left and parenthesis on right.
My Questions:
1. What Data type do I classify this as? It automatically selects polynomial, is this correct?
2. Should I clean the data and remove the brackets and parenthesis?
3. Should I set the roles to age and weight for these columns?
Any help is greatly appreciated. thanks!
-SL
0
Answers
i personally would convert the ranges to one numerical value in first place. Most likely the mean. So reading it in to [80-90) (as polynominal) and built a process to replace this with 85 (as real). There might be use cases where this is not the preferred way, but i think in most of the use cases it is the way to go.
If you want to use age and weight for analysis, i would recommend to leave them in the data set as regular attributes.
~Martin
Dortmund, Germany