The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

[SOLVED] How to convert nominal to numeric value? Urgent..Please have a look

Anuj_GuptaAnuj_Gupta Member Posts: 3 Contributor I
edited November 2018 in Help
Hey Folks,

I am struggling with one query since long. I have nominal data, I want to convert it into

numerical data. I got to know from rapidminer that there are two ways  which are as follows

:

1. Using nominal to numeric operator.
2. Creating dummy variables (0 or 1), using nominal to binominal operator.

But for first method, my boss says it (nominal to numeric operator) is not correct method

and he is not at all happy with this operator. Can anybody suggest me that, it is correct or

not so that i can convince to my mentor.

While using second operator, I tried with this but doing this numbers of variables become

too high (as categories are more) then it leads to memory error.

So Can anybody guide or suggest me some other alternatives so that I can convert nominal

to numeric value.

Thanks for your timings and seeking for your valuable suggestions.

Answers

  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    1. Using nominal to numeric operator.
    2. Creating dummy variables (0 or 1), using nominal to binominal operator.
    well, there is also a third one: you could first map the values to numbers you want to use instead (Operator: "Map") and parse them afterwards (Operator: "Parse Numbers").

    Well, if the usage of the nominal to numeric operator is a problem or not depends a bit on what are you doing on which data. Indeed, the operator simply produces numbers based on the internal mapping used by RapidMiner. If you produce those numbers for two data sets with different mappings, those numbers would also differ. You can deal with this by ensuring that the same internal mapping is used for all data sets.

    But still even then the internal mappings don't have any real meaning. For example, if you have the three nominal values "low", "medium", "high", you would probably would not like to end up with the numbers "2", "1", and "3" but would prefer at least something like "1", "2", and "3" instead. But even this might become problematic: Is "high" really exactly 1 more than "medium" compared to "medium" to "low". Who knows?

    For both reasons (especially the second one since the first one can be dealt with if you are cautious) I would agree with your boss that method 2 should usually be preferred. If memory is getting low, you could try to create a view instead which calculates the values on the fly instead of directly calculating and storing them. If this still does not work, you could use method 3 introduced by me above so that at least both problems discussed above will be smaller.

    Cheers,
    Ingo
  • PrablyPrably Member Posts: 3 Contributor I
    I had an issue similar to Anuj's. Parse Numbers worked perfectly. Thanks Ingo
Sign In or Register to comment.