🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Reverse map a nominal to numerical transform

labbronxlabbronx Member Posts: 5 Contributor I
edited June 2020 in Help

I am using K-means to cluster the data. To do so, I have transformed my nominal values into numerical ones using the Nominal to Numerical operator, but using the coding type parameter set to "unique integers." How do I reverse this transformation so on output I can see what these values were in the clusters before they were transformed. For example, if "sandwich" gets mapped to 0, I would like to reverse map 0 back to sandwich.

Tagged:

Best Answer

  • FBTFBT Member Posts: 106   Unicorn
    Solution Accepted

    It may not be the most elegant solution, but what you could do is the following:

     

    Multiply your example set prior to the type conversation. Connect the first output of the multiply operator to your current process, after which you add a join operator and connect the resulting example set to the left port. Connect the second output of multiply to the right port of the join. 

     

    You will need an id on which to make the join and you may want to make some pre-processing (renaming attributes, etc.).

    laavila

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,760   Unicorn
    That's how I usually handled it.
  • labbronxlabbronx Member Posts: 5 Contributor I

    Thanks that works. Would have never thought of it.

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,625   Unicorn

    Be very careful with "unique integers" mapping if your nominal categories are not inherently ordinal.  For example, if you have sandwich, bread, and butter mapped as 1, 2, and 3, then k-means thinks that the distance between 1 and 3 is larger than the distance between 1 and 2 or 2 and 3.  But for non-ordered categories, this doesn't make any sense and can lead to strange and distorted results when clustering.  If your nominal categories are not ordered, you are better off with numerical dummy coding or simply using mixed Euclidean distance (which assumes a distance of 1 between all nominal values that are not the same, precisely to avoid this problem).

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    Thomas_OttFBTlaavila
  • labbronxlabbronx Member Posts: 5 Contributor I

    thanks. I originally used dummy coding, but it blows up the record, as I have lots of unordered nominal values. I will try using mixed Euclidean distance. How does one use this?

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,760   Unicorn

    You could use effect code too, assuming your don't have too many nominal values per attribute.

  • labbronxlabbronx Member Posts: 5 Contributor I

    Never mind, I figured out how to use mixed Euclidean distance

  • laavilalaavila Member Posts: 4 Contributor I
    I have this problem too. I've tried with the proposed solution, with the multiply operator, but the final result I've got is just the exampleset with unique integers values (I don't understand very well the data with this values on it). I have even generate an id attribute prior to the multiply operator and after all the process, I used the join operator too.  I couldn't get the nominal values again. Anyone have an idea what I am doing wrong?  :# 
    Thanks! 
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959  Community Manager
    hi @laavila sorry this is an old thread. Can you please post your process XML so we can see what you're doing? Scott
    laavila
  • jm_echeverria40jm_echeverria40 Member Posts: 1 Learner I
    Hello all,

    ¿Is there any current accepted solution in the latest version of the program?
    ¿How can be do this in 2020?
    ¿Does the same mentioned methodology work?

    If possible please provide the diagram!
Sign In or Register to comment.