Classification with ordinal data

MaltePetersenMaltePetersen Member Posts: 7 Contributor I
edited October 7 in Help
I am new to data science and rapid miner. I made a prediction with automodel with a dataset which persists of nominal and ordinal data. Online I read that a classification is normally only done with nominal data. So this begs the question can my classification be accurate? And which method would be the right one for my use case. 

Best Answers

Answers

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,636  Community Manager
    @MaltePetersen just to be clear, you are asking about ordinal data (e.g. 1st, 2nd, 3rd, etc..) rather than numerical data (1, 2, 3, etc...)?
    varunm1MaltePetersen
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,702  RM Founder
    edited October 7
    First of all: there are only few ML methods out there which can deal with ordinal values out of the box.  All of them focus more on ordinal labels / target columns though, not really on ordinal attributes.  In my opinion and experience, treating ordinal attributes simply as nominals is the way to go and much better than treating them as numericals - exactly for the reason Varun has mentioned. 
    If you treat them as nominal, than some ML algos will either handle each category on its own (like decision trees) or transform typically with one-hot-encoding which essentially leads to treating each category again on its own.  While you are not using the specific relationships between the values this way, all ML algos are powerful enough to assign importance to the values accordingly and I have yet to see a case where this did not work.
    Having ordinal labels (aka target columns) is a different story though!  Here, there would be some benefits if the algorithm would make use of the relationships between the ordinal values.  However, in 20 years now I only got a handful of requests for this, so most people seem to do just fine with treating this as regular classification problem B)
    Hope this perspective helps,
    Ingo
    RapidMiner Wisdom 2020
    February 11th and 12th 2020 in Boston, MA, USA

    varunm1sgenzerTghadiallyMaltePetersen
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,277   Unicorn
    With ordinal data, I would add my vote to those who recommend generally treating these as nominal rather than as numerical data.  At the very least, you are not likely to do any damage this way, although you may lose some potentially useful relationships.
    One other caution though is that you should probably look at the number of distinct categories that you have.  If you have very many categories, and the relationship is fairly linear, then that might be an argument for treating the data as numerical.  Otherwise, you may need to consider binning or other combinations of values to get the most stability out of the model.  Having an attribute with too many nominal values (whether as a predictor, or even worse, as the label) can definitely cause complications, instability, or deterioration in performance.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    IngoRMTghadiallyvarunm1MaltePetersen
  • MaltePetersenMaltePetersen Member Posts: 7 Contributor I
    First of all I am really sorry that I took so long to answer I did not expect any answer or such goods answers at all! 
    So to further specify my request. The dataset I am analysing is about speeddating. My ordinal data mostly describes how the participants ranks the partner for example the ranking of the appearance or humour of the partner between 0 and 10. With this data we try to find out which attributes weight the most and try to predict new data. 
    varunm1sgenzer
  • varunm1varunm1 Moderator, Member Posts: 964   Unicorn
    Hello @MaltePetersen

    My preference is to consider them as nominal as mentioned earlier as 10 is not a huge number of categories. Please feel free to ask anything you need and we are happy to help.
    lionelderkrikorTghadially
  • MaltePetersenMaltePetersen Member Posts: 7 Contributor I
    Hey, one more follow up question. How can I transform my ordinal data to nominal data. I tried to do it in Turbo prep but if I click on change type it does not give me any options do change my type.
  • varunm1varunm1 Moderator, Member Posts: 964   Unicorn
    Hello @MaltePetersen

    How did rapidminer read your data? Is it in numerical form or nominal form? 

    How to check this: In turbo prep, you can see the data type under the attribute name.


  • MaltePetersenMaltePetersen Member Posts: 7 Contributor I
    Partially numbers and partially categories but all numbers should actually be categories/ordinal data to begin with. If they are number attributes I can not change the type at all and if they are categories I can  only change them to numbers or dates.
  • varunm1varunm1 Moderator, Member Posts: 964   Unicorn
    Hello @MaltePetersen

    You can select the attributes with "number" type and then "Transform" and "Change Type" to category. Here category datatype means nominal. 
  • MaltePetersenMaltePetersen Member Posts: 7 Contributor I
    @varunm1
     I tried that but it turboprep is not giving me that option.
  • MaltePetersenMaltePetersen Member Posts: 7 Contributor I
    @varunm1 Yea now I see it sorry.. Yes I am trying that so categories do not have a natural order and are therefore nominal data right ?
    varunm1
  • varunm1varunm1 Moderator, Member Posts: 964   Unicorn
    Yep, that is correct.
    Tghadially
  • MaltePetersenMaltePetersen Member Posts: 7 Contributor I
    @varunm1 So right now I am only using nominal attributes. I should only use two models for my paper. Is there a model from automodel espcially fitting for my use case ? 
    ?
    Or would the better approach be to look which model has the lowest classification error and then decide for that model?
  • varunm1varunm1 Moderator, Member Posts: 964   Unicorn
    You can look at model performances after running, but generally, for complete nominal data, I first focus on Logistic regression and Naive Bayes(it is easy for naive Bayes to deal with nominal data). Decision Tree for better understanding as well.
    sgenzer
Sign In or Register to comment.