Poor recall and precision classification results

AlexM · July 2016

Hello RapidMiner community!

As a newbie to the machine learning and data mining world, I'd first like to extend my thanks to the RapidMiner team for working so hard on the tutorials to make the topic as accessible as possible. Your software is a joy to use. Now onto my problem.

I'm performing tool testing as part of a student assignment where I have to compare RapidMiner and Weka in both experimental results and in general. I'm having some problems currently with the experimental part of my assignment. My task is to compare three RapidMiner implementations of classification algorithms with three of Wekas. In my case this means DecisionTree vs. J48, k-NN vs. iBK and respective implementations of NaiveBayes. Parameters are default, except that I have disabled Laplace smoothing for NaiveBayes. I've used 10-fold Cross validation, using Performance (Polynominal) operator.

The accuracy of RapidMiner is fine and compares well to Weka's implementations, DecisionTree does better in most cases as a matter of fact. The recall and precision are somewhat troublesome though. Consider the following tables:

Precision: https://gyazo.com/ced749cebc185b4b70a0a077188cf17f

Recall: https://gyazo.com/4bbacf1ff196671a36d4c38220e25c22

As you can see for the majority of cases, Weka has better results. I was hoping if you could enlighten me as to why. Am I doing something very wrong or is there something else afoot?

Kind regards,

Alex

MartinLiebig · August 2016

The operator Generate Weight (Stratification) does the trick

~Martin

Thomas_Ott · July 2016

What type of Validation are you using? Split? X-val?

AlexM · July 2016

I'm using 10-fold X-Validation operator yeah. What I've found is that Weka weights things differently from RapidMiner. Whereas the default weights for RapidMiner are 1 for all classes, Weka weighs classes based on how often the class occurs in the set (from what I could understand anyway). The more a certain class occurs, the bigger the weight. This means that the weighted average of precision and recall values in Weka are skewed when compared to RapidMiner's approach.

Since I couldn't find a solution to adjust the weights appropriately (because I'm too green or otherwise), I've since done some manual spreadsheet work to normalize Wekas weights and the results are much more comparable now.

Thomas_Ott · August 2016

That is interesting, I didn't know that Weka does that. There is a way to weight your classes based on class occurrences. I know that @mschmitz_ shared such a process. Perhaps he can drop it in here.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Poor recall and precision classification results

Best Answer

Answers