Options

Low medium high Dataset to predict other dataset values

njhelloworldnjhelloworld Member Posts: 16 Contributor I
edited November 2018 in Help

I have Nitrogen Attribute with nominal values of Low,Medium,High:

Nitrogen

Low

Medium

Low

Low

Medium

High

 

and then on the other dataset I have the equivalent value for Low : 0-15,Medium: 15-30,High 30+ . I also have other attribute SoilPh equivalent to numeric values the basis why Nitrogen becomes Low,Medium or High Values or these valuess are dependent to SoilPh.Now I want to predict the numerical values of my Nitrogen Attribute based from the other dataset of Low : 0-15,Medium: 15-30,High 30+.Is this possible??I am a newbie to Rapidminer and Data Mining hope you all give me chance.

Answers

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @njhelloworld,

     

    To better understand, can you share your two datasets, please ?

     

    Regards,

     

    Lionel

  • Options
    njhelloworldnjhelloworld Member Posts: 16 Contributor I

    @lionelderkrikor

    These are my Data:

    Here is my first data file name: nitrogen-Cleaned which has a Nitrogen Value of low,Medium, and High .I want to determine its specific numerical equivalent using the other excel file named: nitrogen those are the range equialient ..Is this be possible?? https://www.youtube.com/watch?v=EKK8X-1oaH8 can this link be applied?tnx for any actions..

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @njhelloworld,

     

    Many things : 

     

    1 - As mentionned in the Youtube video, The Nominal to Numerical operator is used for converting nominal attributes into numerical attributes

    in case you use algorithms which are not working with nominal attributes.

    For example, in your case, this operator (by choosing Unique Integers), will transform Low, Medium, High into 0,1,2.

    But if I good understand, it is not what you want to do.

     

    2. It is impossible, in your Nitrogene-cleaned file, for a specific observation, to retrieve what was the numerical value of the nitrogen between 0 and 15, or between 15 and 30, or between 31 and 46,  if respectivly nitrogene = Low, or respectivly nitrogene = Medium, or respectivly nitrogen = High.

    However, it is possible to associate to each value (Low, Medium,High) a relevant numeric value, for example the average of the range, that is : 

    Low -> 7,5 unit (or any value in range [0,15])

    Medium -> 22,5 unit (or any value in range [15,30])

    High -> 38.5 unit (or any value in range [31,46])

    To perform this association task, you can use the Map operator.

     

    I hope it helps,

     

    Regards,

     

    Lionel

     

     

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @lionelderkrikor @njhelloworld I want to give a warning with respect to using the Nominal to Numerical operator. Use the default "dummy coding." If you use unique integers you are implying an order. For example if your training set had Nitrogen, Carbon, Oxygen it would convert them to 1, 2, 3. Likewise if your scoring data had Oxygen, Nitrogren, Carbon, it would covert them in order like 1, 2, 3. This could cause bad predictions in the scoring set because the model only sees 1, 2, 3/ 

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Ok, thanks @Thomas_Ott. I understand.

    I will apply your advice.

    That's explain that 80% of the RapidMiner's users choose "dummy coding" (according to the RapidMiner's statistics) when they 

    use "Nominal to numerical".

     

    Regards,

     

    Lionel

Sign In or Register to comment.