Options

How to reconvert from numerical to nominal

jctorrespjctorresp Member Posts: 3 Learner I
edited February 2020 in Help
Hi,

I am making my thesis about data mining so I had to convert some data from nominal to numerical, after that I exported this data to csv and process in python. But now, I have a new order in data and I need convert again in nominal values, I was searching how save a map or something like this with the original conversion, example:
column genre:
male->1
female->2
other->3

If I'd had that mapper I can reconvert from nominal to numerical, but I couldn't find a way to do that.

Is necessary indicate that I had to convert several columns so I nee something like a map by each column.

Thanks for your help
Tagged:

Best Answer

Answers

  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hi @jctorresp

    Did you look at the map operator in RM? This can be applied to both numerical and nominal values.


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    jctorrespjctorresp Member Posts: 3 Learner I
    The problem is that I have 10 columns and in each column can have differents values. Some columns have around 7 possible values. And I need to do the same process with other set data, so is so hard have to set up manually a dictionary by each one. Finally I think that I will export the result of the nominal to numerical operator and I will go this process in python
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Not to throw a monkey wrench in here, but why did you need to convert nominal data to integer coding in the first place?  Doing it in the way you have described is usually not recommended for truly nominal data (like gender) rather than ordinal data because it implies numerical relationships that don't actually exist in the underlying categories if you are using coefficient based algorithms.  So you should probably be using dummy coding or effect coding instead of integer coding in the first place.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    jctorrespjctorresp Member Posts: 3 Learner I
    I am working with clustering. I need separate the data in different cluster but the most columns of the data are categorical data so I had to use k-modes that is a variation of the k-means algorithm, but the first step in that is convert data to numerical to improve the process.
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    If you do the conversion to integer coding then you are not representing the data in a consistent way with nominal categories.  For example, If you have 4 nominal categories where the underlying data is not ordinal in any way (like the colors red, green, yellow, and blue) and you then recode them as {1,2,3,4} and then use that numerical value in any distance calculation, you are basically saying that the 1st and 4th values are much farther apart than the 2nd and 3rd values, when that isn't the case.
    In RapidMiner, both k-medoids (I assume that is what you are referring to, there is no k-mode) and k-means operators both handle nominal data just fine.  Just set the distance measure types parameter to Mixed Measures and also make sure you normalize your other numerical data (which you should do anyways whenever you are doing distance calculations).  
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.