Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Standard deviation on cross-validation

yzanyzan Member Posts: 66 Unicorn
edited December 2018 in Product Feedback - Resolved

Whenever we look at performance result obtained from cross-validation, there is a mean value of the selected measure and its standard deviation (marked with +/-). However, sometimes it happens that the selected measure does not get calculated on some of the folds (e.g. when all samples are classified as negative and we attempt to calculate precision or f-measure, we get division by zero and consequently, the measure is treated as missing). That is a perfectly reasonable behaviour. However, the mean value of the measure still gets reported in the presence of missing measurements. But standard deviation does not get reported anymore.

 

Proposal: Make the behaviour consistent and report nanmean and nanstd (i.e. ignore missing values and report both statistics).

 

Reasoning: It can be puzzling when you look at the performance results and standard deviations are suddenly missing (without any explanation), even though you are performing cross-validation and you are certain that RM used to report the standard deviation.

0
0 votes

Declined · Last Updated

no comments or votes in over a year - closing this idea for now. Please comment if still relevant.

Comments

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
Sign In or Register to comment.