I am trying to do repeated cross validation (e.g. 10x10 = 10 CV runs with 10 folds each run) and use the average of all the runs as my performance measure. The problem is that the averages don't seem to be computed correctly.
I've attached the XML for a toy example, using 2x2 cross validation. If you run it and look at the log you'll see that the average of the four individual values is 0.291 (as correctly reported in the mikro average) but the makro average has a different (incorrect) value and it is this value that is reported as the overall performance.
Any help on how to work around this issue would be highly appreciated.