parse numbers output not numerical
Hi all,
I am working on data that contains lots of missing values. When reading the csv (or xlsx for that matter) file, it therefore misclassifies many of the attributes as polynomial instead of numeric. I did a parse numbers on the values and when looking in the statistics tab of the output, all attributes indeed show up as numeric.
However, when I want to use those values in a new generated attribute, it says the attributes are not numeric. In the first generate attribute it gives an error, but the calculation still works. However, I cannot add a second generated attribute, because the first one "cannot generate an example data set". I could enter the names of the attributes by heart, but I need to do around the 100 calculations, so that doesn't really work.
I also found this post from 2013 with basically the same problem: http://community.rapidminer.com/t5/RapidMiner-Studio/Output-from-quot-parse-att1-quot-not-numerical/m-p/22455. The answer to the problem is: we are working on it... I was hoping that by now there was a way around this?
Thanks!
Answers
Hi,
have you cosnidered to hardcode the types in read excel? Otherwise storing the data set and using it in another process is a option. The other one is to use Process->Synchronize Meta Data with Real data. If you execute the process once the meta data will be correctly available.
~Martin
Dortmund, Germany
I have checked 'synchronize meta data' but that does not solve the problem.
And yes, I have considered hardcoding it, but that means I will have to do that with 100 variables... and this workflow will be used in new data, so if we would then would read in new data we would have to do it again. And I would prefer all the be automatic.
hmm that's odd. Are you sure they're "missing" values and not "blank" values? There's a difference. Missing values should have a "?" placeholder; blank values will look blank. If they're blank values, just use a "Declare Missing Values" operator to turn them to missing.
Scott
Hi Scott,
Thanks for answering; they are not blank, but missing (with a ?). I have tried to impute the missings with zero just to test whether it works, but it doesn't...
So for now, problem not solved unfortunately.
I would suggest checking the Statistics view to see if missing values are in an attribute column that is set to "Nominal." Sometimes if there is a nominal (string) value in a cell with all numbers, it automatically sets the attribute column to "nominal" and can cause errors when trying to parse the numbers.
Normally, you don't need to impute the value of zero for the missing values, you can just use the Replace Missing Values operator and set the replacement value to zero. The Impute Missing Values operator can be used on both numerical and nominal missing values, so you might want to try that out too. Just use a k-nn inside the subprocess to test it out.