Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Final prediction in bagging algorithm"
adrian_crouch
Member Posts: 8 Contributor II
Hello RM community,
I'm not certain whether I'm wrong but I always thought that the bagging meta algorithm should select the final prediction on the basis of a majority vote (in classification). While averaging the numeric confidences generated by the individual models for a label value this would mean that the final confidence may not directly map to the final prediction.
Lets say we have three models that are aggregated and the models predict confidences of 0.4., 0.4 and 0.9 for class 'A' and 0.6, 0.6, 0.1 respectively for class 'B' for a given example in a binominal classification. When averaging these confidences, class 'A' would get a confidence of 0.567 and class 'B' 0.433. In a majority voting approach I would however expect 'B' as the finally predicted class as it was 2 times predicted by the three models while class 'A' was predicted only once.
This does not correlate with the implementation in the BaggingModel (version 5.3.008). There it is the label value for the highest averaged confidence that is finally chosen - which for the example above was 'A' due to the higher confidence of 0.567.
Could someone tell me if I made a mistake with my thinking here?
Many thanks,
Adrian
I'm not certain whether I'm wrong but I always thought that the bagging meta algorithm should select the final prediction on the basis of a majority vote (in classification). While averaging the numeric confidences generated by the individual models for a label value this would mean that the final confidence may not directly map to the final prediction.
Lets say we have three models that are aggregated and the models predict confidences of 0.4., 0.4 and 0.9 for class 'A' and 0.6, 0.6, 0.1 respectively for class 'B' for a given example in a binominal classification. When averaging these confidences, class 'A' would get a confidence of 0.567 and class 'B' 0.433. In a majority voting approach I would however expect 'B' as the finally predicted class as it was 2 times predicted by the three models while class 'A' was predicted only once.
This does not correlate with the implementation in the BaggingModel (version 5.3.008). There it is the label value for the highest averaged confidence that is finally chosen - which for the example above was 'A' due to the higher confidence of 0.567.
Could someone tell me if I made a mistake with my thinking here?
Many thanks,
Adrian
Tagged:
0
Answers
it simply comes down to weighted or unweighted average. I think both are useful. Brimans original RF implementation used unweighted.
~Martin
Dortmund, Germany
So I don't exactly get the point. Am I misinterpreting something or is it indeed a bug in the bagging implementation?