RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
Relative Overfitting Rate
I have a question regarding the calculation of the "relative overfitting rate". Background is the comparison of different parameter settings and their overfitting behavior respectively.
The relative overfitting rate was proposed in:
Efron, B.; Tibshirani, R.: Improvements on Cross-Validation: The .632+ Bootstrap Method. Journal of the American Statistical Association. (1997), Nr. 92, S. 548–560.
In this paper the .632 Bootstrap Method is enhaced by some sort of weighting mechanism, which is irrelevant for this post. Anyway, the relevant question regards the formula for the relative overfitting rate which is defined in formula 28 (see below). R being the relative overfitting rate, Êrr1 being the Bootstrap-Leave-one-out Error and err being the "emprical error" (Formula 7). Formula 27 shows the calculation of gamma for a binary classificator.
Now here is the question:
Can anyone please explain the me how I can adapt this concept for a regression problem? I have a dataset of 30 Attributes and about 300 examples for which I create a prediction for a label (range 0,01 to 0,1). I have trouble understanding the mathematics behind it.. and the writing. I can retrieve the Êrr1 from the bootstrap operator of RM, but how do I calculate the rest?
Any help greatly appreciated..