🎉 🎉   RAPIDMINER 9.5 BETA IS OUT!!!   🎉 🎉

GRAB THE HOTTEST NEW BETA OF RAPIDMINER STUDIO, SERVER, AND RADOOP. LET US KNOW WHAT YOU THINK!

CLICK HERE TO DOWNLOAD

🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

Classification with ordinal data

MaltePetersenMaltePetersen Member Posts: 2 Newbie
edited October 7 in Help
I am new to data science and rapid miner. I made a prediction with automodel with a dataset which persists of nominal and ordinal data. Online I read that a classification is normally only done with nominal data. So this begs the question can my classification be accurate? And which method would be the right one for my use case. 

Best Answer

Answers

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,525  Community Manager
    @MaltePetersen just to be clear, you are asking about ordinal data (e.g. 1st, 2nd, 3rd, etc..) rather than numerical data (1, 2, 3, etc...)?
    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 

    varunm1MaltePetersen
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,666  RM Founder
    edited October 7
    First of all: there are only few ML methods out there which can deal with ordinal values out of the box.  All of them focus more on ordinal labels / target columns though, not really on ordinal attributes.  In my opinion and experience, treating ordinal attributes simply as nominals is the way to go and much better than treating them as numericals - exactly for the reason Varun has mentioned. 
    If you treat them as nominal, than some ML algos will either handle each category on its own (like decision trees) or transform typically with one-hot-encoding which essentially leads to treating each category again on its own.  While you are not using the specific relationships between the values this way, all ML algos are powerful enough to assign importance to the values accordingly and I have yet to see a case where this did not work.
    Having ordinal labels (aka target columns) is a different story though!  Here, there would be some benefits if the algorithm would make use of the relationships between the ordinal values.  However, in 20 years now I only got a handful of requests for this, so most people seem to do just fine with treating this as regular classification problem B)
    Hope this perspective helps,
    Ingo
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

    varunm1sgenzerTghadiallyMaltePetersen
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,256   Unicorn
    With ordinal data, I would add my vote to those who recommend generally treating these as nominal rather than as numerical data.  At the very least, you are not likely to do any damage this way, although you may lose some potentially useful relationships.
    One other caution though is that you should probably look at the number of distinct categories that you have.  If you have very many categories, and the relationship is fairly linear, then that might be an argument for treating the data as numerical.  Otherwise, you may need to consider binning or other combinations of values to get the most stability out of the model.  Having an attribute with too many nominal values (whether as a predictor, or even worse, as the label) can definitely cause complications, instability, or deterioration in performance.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    IngoRMTghadiallyvarunm1MaltePetersen
  • MaltePetersenMaltePetersen Member Posts: 2 Newbie
    First of all I am really sorry that I took so long to answer I did not expect any answer or such goods answers at all! 
    So to further specify my request. The dataset I am analysing is about speeddating. My ordinal data mostly describes how the participants ranks the partner for example the ranking of the appearance or humour of the partner between 0 and 10. With this data we try to find out which attributes weight the most and try to predict new data. 
    varunm1sgenzer
  • varunm1varunm1 Moderator, Member Posts: 840   Unicorn
    Hello @MaltePetersen

    My preference is to consider them as nominal as mentioned earlier as 10 is not a huge number of categories. Please feel free to ask anything you need and we are happy to help.
    Regards,
    Varun
    Rapidminer Wisdom 2020 (User Track): Call for proposals 

    https://www.varunmandalapu.com/
    lionelderkrikor
Sign In or Register to comment.