Standard deviation on cross-validation

yzanyzan Member Posts: 66 Unicorn
edited December 2018 in Product Feedback - Resolved

Whenever we look at performance result obtained from cross-validation, there is a mean value of the selected measure and its standard deviation (marked with +/-). However, sometimes it happens that the selected measure does not get calculated on some of the folds (e.g. when all samples are classified as negative and we attempt to calculate precision or f-measure, we get division by zero and consequently, the measure is treated as missing). That is a perfectly reasonable behaviour. However, the mean value of the measure still gets reported in the presence of missing measurements. But standard deviation does not get reported anymore.

 

Proposal: Make the behaviour consistent and report nanmean and nanstd (i.e. ignore missing values and report both statistics).

 

Reasoning: It can be puzzling when you look at the performance results and standard deviations are suddenly missing (without any explanation), even though you are performing cross-validation and you are certain that RM used to report the standard deviation.

0
0 votes

Declined · Last Updated

no comments or votes in over a year - closing this idea for now. Please comment if still relevant.

Comments

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
Sign In or Register to comment.