Interpretation of ROC Analysis

Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
edited February 2020 in Help
Hello Community, 

I have derived the following ROC curves by considering four classification models: 



As you see, SVM and k-NN generates a curve where shades respectively exist.

Would it be a correct implication out of the graph to say that only k-NN and SVM were able to learn based on the given dataset and the resting two (DT and NB) were not?

What does the shade mean in detail? I would interpret them as the learning interval deviation which generated the curve between the shade course in mean. 

I thank you in advance for your help! 

Best regards, 

Fatih 

Best Answer

Answers

  • [Deleted User][Deleted User] Posts: 0 Learner III
    Hello

    you can watch this video and I hope can help you
    https://academy.rapidminer.com/learn/video/finding-the-right-model

    All the best
    mbs
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited February 2020
    Hello @Muhammed_Fatih_

    Are you sure Decision tree and NB are not learning? I see that their AUC values are 1 or closer to 1 based on the ROC curves. If what I think is correct, then DT and NB are discriminating classes with very high accuracy compared to SVM and KNN.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
    Hello @mbs,

    thank you for the link! 

    Helloo @varunm1,

    I am not sure whether they learn or not. But it looks like an indicator for Overfitting when I see that such high values are reached in comparison to SVM and k-NN. How do you see that? Would you interprete DT and NB also as appropriate solutions here? If yes, why? 
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @Muhammed_Fatih_

    I can comment that based on data and the type of analysis you were doing. If its a split validation, then there is a chance you might get high performance like this randomly. There are also other factors like temporal characteristics in data and many other checks that you need to do when you get this kind of very good results. 

    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
    Hello @varunm1

    thank you for your answer! I have used Cross Validation because studies have shown that it generates more accurate predictions in comparison to Split validation.  
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @Muhammed_Fatih_

    Cross-validation is a good validation method, but if your data has some temporal (time-dependent) characteristics and confounding relationships then it might overestimate performance some times. But if you think there is none, then the models might be doing good. Different models work well for different types of data. 

    You can also split your original data 70:30 or 80:20 based on the size of your data and then cross-validated on the major portion and test the minor portion to see how the model is doing.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Muhammed_Fatih_Muhammed_Fatih_ Member Posts: 93 Maven
    edited February 2020
    Hello @varunm1
    hello @mbs

    thank you for your answers!

    To come back and to refine the initial question: Do you think that the marked ROC course is common if the ROC curve goes hand in hand with the optimum? Is this possible in general?   


Sign In or Register to comment.