Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

standard of reliability- accuracy and gains

kkyckkyc Member Posts: 1 Learner I
edited August 2020 in Help
Hi everyone, 

I am a newbie to Rapidminer. This will be a quick question, I hope to hear any comments from you.

I imported 203 rows of data into Rapidminer and run automodel. The best performing model is the decision tree model, accuracy=55.6%, standard deviation ±12, gains =20, can I judge that this model is reliable? why? is there a standard for reliability in Rapidminer?

And what are the means of the weight of important factors? At what value can the two be considered very related?

thank you very much!

Answers

  • varunm1varunm1 Member Posts: 1,207 Unicorn
    Hello @kkyc

    There are multiple things you need to look at to see if your model is reliable or not.
    One starting point is the model performance and I see it is 55.6 which is just above chance accuracy. Now to analyze your performance, you need to look at your class imbalance and the model performance in each class. For example, if your data has two classes A & B and you have 113 samples belonging to class A and 90 samples belonging to class B and your model predicted all samples as class A then you get an accuracy of 55.6. In this case, the model is bad as it never predicted class B. So you need to look at confusion matrix or precision and recall metrics.

    If you want to find this in a single metric, there is "Kappa" performance metric. This is not impacted by class imbalances and provides you with a stable performance metric. You can use this instead of accuracy. The standard deviation is also a major factor to consider. It is hard to tell what is the best value as it depends on the domain and the problem you are trying to solve.

    Domain knowledge is also another major thing, are models with accuracy 55.6% accepted in the domain linked to your problem. I think 55.6 is low in any domain as its just above-chance accuracy, but no strict comments as I am not aware of the data.

    Other things to consider are feature importance, model validation type, and cost sensitivity matrix.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.