Low Recall High Accuracy

ozcanozcan Member Posts: 7 Contributor II
edited February 2020 in Help
Below example results for same dataset. And dataset has not missing value;

For Naive Bayes:
Rapidminer Recall: 26.35% +/- 5.17% (micro average: 26.37%) :/
Weka Recall: 0.768
Rapidminer Precision: 43.41% :/ 
Weka Precision: 0.735
Rapidminer Accuracy:77.14
Weka Accuracy:76.7639 %

For Random Forrest: 
Rapidminer Recall: 16.60% +/- 6.01% (micro average: 16.59%) :/
Weka Recall: 0.843
Rapidminer Accuracy:81.75%
Weka Accuracy:84.2897 %

For KNN: 
Rapidminer Recall:  12.89% +/- 3.82% (micro average: 12.89%) :/ 
Weka Recall: 0.824
Rapidminer Precision: 55.82% +/- 12.05% (micro average: 55.77%) :/ 
Weka Precision: 0.810
Rapidminer Accuracy:79.40%
Weka Accuracy:82.4396 %

For Decision Tree
Weka Accuracy; 81.4989 % 
RapidMiner Accuracy: 83.07% 
Weka Recall; 0.815 
RapidMiner Recall: 30.67%

Why rapidminer recall and and precision value is very low despite accuracy is high. Especially recall value. ? 
My process is in attach. I use same process for other algorithms

**Also I try other settings in related Algorithms for improve recall in Rapidminer. 
I mean ,
For Example KNN;
Changing K values, measure types, mixes measure, weighted vote.
Decision Tress;
Changing criterion,maximal dept, prunning,confidence,preprunning,minimal gain, leaf size,minimal size for split,number of preprunning alternatives
Random Forrest;
Changing number of trees, criterion,prunning,confidence,preprunning, random splits,guess subset ratio, voting strategy ets
But still recall value is low
01.JPG 33.5K
02.JPG 40.3K

Best Answer

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @ozcan,

    This results can be explained by a highly imbalanced dataset.
    In this case, the algorithm has difficulties to "capture" the relationships between your regular attribute(s) and the minority class of your label and thus to correctly predict the minority class, that's why the recall is low although the accuracy is relatively good.
    However I don't know why there is significant difference between Weka and RapidMiner.
    Could your share your dataset ?

    Regards,

    Lionel

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited February 2020
    Hello @ozcan

    This is a tricky question. How are you gettings these results? Are you cross validating or split validating your data? If so are the test data sets same in both rapidminer and weka. 

    How about the hyperparameters of these algorithms? Are they exactly same?


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • [Deleted User][Deleted User] Posts: 0 Learner III
    @ozcan

    Hello

    It depends on your data and depends on algorithm. According to the classification and clustering when any software wants to do classification or clustering on your data may be you see some differences and this is not a problem. Base of data science is with Statistics and Probabilities. So different Accuracy is normal.

    I hope this helps
    mbs
  • ozcanozcan Member Posts: 7 Contributor II
    edited February 2020
    Hi my dataset in attachment.
    In rapidminer; I change to bug label as nominal other attributes are real.
    In weka ; I change bug label numeric to nominal , other attributes are numeric
    For all algorithms; I user 10 cross validation for Weka and Rapidminer
    I set role bug label. I select all attributes.for all algoritgms.
    Cross validations is folds:10, other options are default.
    I didnt any changes of algorithm options, All of them are default settings.
    But minor differences between Weka and Rapidminer; can be confidence interval. But this should not be affect recall like this.
    This is not a tricky question. These results and comprassion are need to my thesis. @lionelderkrikor @varunm1 @mbs


  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited February 2020
    Hello @ozcan

    We understand that, but what we are trying to say is that the performance varies based on the way 10 folds of cross validations are divided and also the settings of each algorithm. The default in raoidminer and weka might not be same, the base algorithm might not be working the same way of default parameters are not similar

    I am not sure if its a good idea to compare two softwares based on performance. I guess @IngoRM might help you with the pitfalls of doing this.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • [Deleted User][Deleted User] Posts: 0 Learner III
    edited February 2020
    @ozcan

    I had an experience about different Accuracy but this is not a problem you can accept both answers for both software because they are not the same in clustering and classification and according to the Statistics and Probabilities both of them are correct. May be others can help you more. :)
    One more thing:
    Take a look on your data please, you have a lot of different numbers in your data which is very important and can affect on your process.

    All the best
    mbs
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @ozcan yes can we please see your actual processes and data to replicate your results?
  • ozcanozcan Member Posts: 7 Contributor II
    edited February 2020
    Hi @sgenzer , My dataset and process are in attachment. Thanks.
    Moreover; For decision tree;
    Weka Accuracy; 81.4989 %
    RapidMiner Accuracy: 83.07% %
    Weka Recall; 0.815 
    RapidMiner Recall: 30.67%
  • ozcanozcan Member Posts: 7 Contributor II
    Hi @varunm1
    Yes this setting solved my problem. Thank you very much . One more thing, my bug label is nominal. For this, ı get potential problem . Is it effect my  results. I have to change bug label to binominal.? ı add screenshot to attachment
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Its a warning and no need to worry. You can also change using numerical to the binominal operator. This will change your 0 and 1 as False and True (binominal categories). You should always be careful while analyzing your results and understand how they can change based classes, data and models.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.