RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
Standard deviation on cross-validation
Whenever we look at performance result obtained from cross-validation, there is a mean value of the selected measure and its standard deviation (marked with +/-). However, sometimes it happens that the selected measure does not get calculated on some of the folds (e.g. when all samples are classified as negative and we attempt to calculate precision or f-measure, we get division by zero and consequently, the measure is treated as missing). That is a perfectly reasonable behaviour. However, the mean value of the measure still gets reported in the presence of missing measurements. But standard deviation does not get reported anymore.
Proposal: Make the behaviour consistent and report nanmean and nanstd (i.e. ignore missing values and report both statistics).
Reasoning: It can be puzzling when you look at the performance results and standard deviations are suddenly missing (without any explanation), even though you are performing cross-validation and you are certain that RM used to report the standard deviation.