Continuous vs Categorical

k_vishnu772k_vishnu772 Member Posts: 34 Contributor I
edited December 2018 in Help

Hi All,

 

I have small question regarding the type of variables.I have continuous variable called tempereature which is have only 2 values {90,220}in my entire data set.

I am little confused over taking this featues as categorical since it has only 2 values in my data set all the time or take it as continuous value ?

 

Is there any infulence of choosing the one of them to the model performance?

 

 

Thanks in advance.

 

 

Regards,

Vishnu

Best Answer

  • sgenzersgenzer 12Posts: 2,446  Community Manager
    Solution Accepted

    hmm ok. Basically it depends on whether or not you care whether or not 90 is less than 220. If you look at it as a binary classification problem, RapidMiner will just treat them as "apple" and "orange". If you wish to use the idea that 220 is greater than 90 for some reason, you should keep it numerical.

     

    That's all I can really say off the cuff unless I know more about your use case.


    Scott

     

Answers

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,446  Community Manager

    hello @k_vishnu772 - so that really, really depends on your use case. I could make arguments one way or the other, depending on what you want to do.

     

    Some quick requests so we can help you:


    • Post your XML process here in this thread (see this post for instructions on How to Post on the Community)
    • Attach your dataset if possible (use a fictionalized version if there are privacy concerns)
    • Make sure you have all necessary extensions installed (see https://youtu.be/pjBqG3xtXx4)

    Scott

  • k_vishnu772k_vishnu772 Member Posts: 34 Contributor I

    @sgenzer

     

    Hi Sir Thanks for your reply .I Cannot disclose the data as  i have not right to do that .So just want to understand how you can say it depends on use case ,could you please explain me any use case that you have so that i can relate to my problem.

     

    Thanks in advance.

     

    Regards,

    Vishnu

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,230   Unicorn

    You also need to think about other potential values in the data if you are going to apply the model to future samples.  If you treat the attribute as nominal in your development data, then in the future if you have any values that are not exactly 90 or 220, your model may not be able to handle them.  So I would recommend either keeping the temperature as a numerical, or at least binning it using one of the Discretize operators (e.g., you could make temperature <100 vs >=100), because in that way, you will be able to handle future numerical values that were not present in your development sample.  

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    sgenzerSGolbert
  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 340   Unicorn

    Hi all,

     

    good observation about bining!

     

    If you keep the variable numerical, then 220 is not only bigger than 90, it is more than double! This could mess with some (linear) model types.

     

    Regards,

    Sebastian

    sgenzer
Sign In or Register to comment.