Averaging performances

ri_jicari_jica Member Posts: 1 Contributor I
edited November 2018 in Help

Hi all,

 

I have 2 different pipelines:

 

1) Model X ran on data Y. I run this model 10 times (on the same data) and then use the Average operator on the performance vectors. I run it 10 times because the data is small and I wish to obtain a better performance measurement (regression) of the model. My measurement of choice is RMSE, which I expect it is averaged by the Average operator which is exactly what happens. But what does the +- represent? Testing in Python, does not seem to be min/max, nor std, nor 95/97/99 confidence interval. Furthermore, peculiarly, converting the vectors to data (before average) they show negative variance. This does not seem right.

 

2) Model U ran on data V1-V22 to get 22 models (each dataset corresponds to a person and I want to measure the 'average predicting power' when the model is trained only on the given person). Again, I use the Average operator on the performance vectors and the same observations from 1) apply (namely, the negative variance and mysterious +-). However, on the same reasoning as in 1), I want to run this 10 times for a more 'real' average performance (i.e. rmse). This time, converting the perf. vectors to data before averaging shows both positive variance and std; but the 'average' is no longer just the mean of the performance values! Values in the min/max range 0.550-0.589 produce 'average' 0.609.

 

tl;dr: what does averaging performance vectors do exactly under the hood (precisely regarding +-)? Why do I get weird results?

Tagged:

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I'm assuming you are using the Cross Validation operator in RapidMiner. The resulting output (i.e. 90.00% Accuracy, +/- 2.5%) means that the model averaged that performance over the "k" models with a +/- X% of a std deviation. 

Sign In or Register to comment.