I have 2 different pipelines:
1) Model X ran on data Y. I run this model 10 times (on the same data) and then use the Average operator on the performance vectors. I run it 10 times because the data is small and I wish to obtain a better performance measurement (regression) of the model. My measurement of choice is RMSE, which I expect it is averaged by the Average operator which is exactly what happens. But what does the +- represent? Testing in Python, does not seem to be min/max, nor std, nor 95/97/99 confidence interval. Furthermore, peculiarly, converting the vectors to data (before average) they show negative variance. This does not seem right.
2) Model U ran on data V1-V22 to get 22 models (each dataset corresponds to a person and I want to measure the 'average predicting power' when the model is trained only on the given person). Again, I use the Average operator on the performance vectors and the same observations from 1) apply (namely, the negative variance and mysterious +-). However, on the same reasoning as in 1), I want to run this 10 times for a more 'real' average performance (i.e. rmse). This time, converting the perf. vectors to data before averaging shows both positive variance and std; but the 'average' is no longer just the mean of the performance values! Values in the min/max range 0.550-0.589 produce 'average' 0.609.
tl;dr: what does averaging performance vectors do exactly under the hood (precisely regarding +-)? Why do I get weird results?