Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Averaging performances

ri_jicari_jica Member Posts: 1 Learner II
edited November 2018 in Help

Hi all,

 

I have 2 different pipelines:

 

1) Model X ran on data Y. I run this model 10 times (on the same data) and then use the Average operator on the performance vectors. I run it 10 times because the data is small and I wish to obtain a better performance measurement (regression) of the model. My measurement of choice is RMSE, which I expect it is averaged by the Average operator which is exactly what happens. But what does the +- represent? Testing in Python, does not seem to be min/max, nor std, nor 95/97/99 confidence interval. Furthermore, peculiarly, converting the vectors to data (before average) they show negative variance. This does not seem right.

 

2) Model U ran on data V1-V22 to get 22 models (each dataset corresponds to a person and I want to measure the 'average predicting power' when the model is trained only on the given person). Again, I use the Average operator on the performance vectors and the same observations from 1) apply (namely, the negative variance and mysterious +-). However, on the same reasoning as in 1), I want to run this 10 times for a more 'real' average performance (i.e. rmse). This time, converting the perf. vectors to data before averaging shows both positive variance and std; but the 'average' is no longer just the mean of the performance values! Values in the min/max range 0.550-0.589 produce 'average' 0.609.

 

tl;dr: what does averaging performance vectors do exactly under the hood (precisely regarding +-)? Why do I get weird results?

Tagged:

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I'm assuming you are using the Cross Validation operator in RapidMiner. The resulting output (i.e. 90.00% Accuracy, +/- 2.5%) means that the model averaged that performance over the "k" models with a +/- X% of a std deviation. 

Sign In or Register to comment.