Options

# Reverse map a nominal to numerical transform

Member Posts: 5 Contributor II
edited June 2020 in Help

I am using K-means to cluster the data. To do so, I have transformed my nominal values into numerical ones using the Nominal to Numerical operator, but using the coding type parameter set to "unique integers." How do I reverse this transformation so on output I can see what these values were in the clusters before they were transformed. For example, if "sandwich" gets mapped to 0, I would like to reverse map 0 back to sandwich.

Tagged:

• Options
Member Posts: 106 Unicorn
Solution Accepted

It may not be the most elegant solution, but what you could do is the following:

Multiply your example set prior to the type conversation. Connect the first output of the multiply operator to your current process, after which you add a join operator and connect the resulting example set to the left port. Connect the second output of multiply to the right port of the join.

You will need an id on which to make the join and you may want to make some pre-processing (renaming attributes, etc.).

• Options
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
That's how I usually handled it.
• Options
Member Posts: 5 Contributor II

Thanks that works. Would have never thought of it.

• Options
Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

Be very careful with "unique integers" mapping if your nominal categories are not inherently ordinal.  For example, if you have sandwich, bread, and butter mapped as 1, 2, and 3, then k-means thinks that the distance between 1 and 3 is larger than the distance between 1 and 2 or 2 and 3.  But for non-ordered categories, this doesn't make any sense and can lead to strange and distorted results when clustering.  If your nominal categories are not ordered, you are better off with numerical dummy coding or simply using mixed Euclidean distance (which assumes a distance of 1 between all nominal values that are not the same, precisely to avoid this problem).

Brian T.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
• Options
Member Posts: 5 Contributor II

thanks. I originally used dummy coding, but it blows up the record, as I have lots of unordered nominal values. I will try using mixed Euclidean distance. How does one use this?

• Options
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

You could use effect code too, assuming your don't have too many nominal values per attribute.

• Options
Member Posts: 5 Contributor II

Never mind, I figured out how to use mixed Euclidean distance

• Options
Member Posts: 4 Contributor I
I have this problem too. I've tried with the proposed solution, with the multiply operator, but the final result I've got is just the exampleset with unique integers values (I don't understand very well the data with this values on it). I have even generate an id attribute prior to the multiply operator and after all the process, I used the join operator too.  I couldn't get the nominal values again. Anyone have an idea what I am doing wrong?
Thanks!
• Options
Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
hi @laavila sorry this is an old thread. Can you please post your process XML so we can see what you're doing? Scott
• Options
Member Posts: 1 Learner I
Hello all,

¿Is there any current accepted solution in the latest version of the program?
¿How can be do this in 2020?
¿Does the same mentioned methodology work?

If possible please provide the diagram!