# "Calculate confidence interval of RMSE"

Dear All,

I have two forecasting algorithms that output some forecast for the temperature 24 hours a head in time.

Algorithm A uses 1-nearest neighbours.

Algorithm B is a baseline algorithm, and simply outputs the last known temperature value as a prediction.

Lets say I calculate the Mean Squared Error, and the Variance of the Squared Error for A and B on a separate test set with N data points.

Then what is the confidence interval of MSE_A?

And what is the confidence interval of MSE_B?

Best regards,

Wessel

I have two forecasting algorithms that output some forecast for the temperature 24 hours a head in time.

Algorithm A uses 1-nearest neighbours.

Algorithm B is a baseline algorithm, and simply outputs the last known temperature value as a prediction.

Lets say I calculate the Mean Squared Error, and the Variance of the Squared Error for A and B on a separate test set with N data points.

Then what is the confidence interval of MSE_A?

And what is the confidence interval of MSE_B?

Best regards,

Wessel

Tagged:

0

## Answers

537MavendiffErrMean = baseErrMean - predErrMean;

diffVarMean = baseVarMean + predVarMean;

varOverSqrtN = diffVarMean / Math.sqrt(N);

z = diffErrMean / varOverSqrtN;

z = Math.abs(z);

upper = diffErrMean + z * diffVarMean

lower = diffErrMean - z * diffVarMean

(Where B = baseline = baseErrMean, and A = algorithm = predErrMean)

I can then print something like:

N: 13 // number of test points

Target: "temp"

Run time: 0.105 ms

predErrMean: 0.134 predVarMean: 0.067

baseErrMean: 0.246 baseVarMean: 0.141

diffErrMean: 0.113 +- 0.058 = [-0.003, 0.228] // kinda weird that this is already nearly significant with only 13 test points

Ratio: 1.843

537MavenYou need to use the CDF of the T distribution to convert the z at the 2.5% point.

But this is hard in Java since there is no easy access to the CDF of the T distribution.

So for now I think I will assume the normal distribution and use confidence interval = MEAN +- 2 * S.D.

But then the problem is:

The differences are not normally distributed.

The maximum difference possible with algorithm A 0 error, and the baseline some big error, then the big error would be equal to the S.D.

And nothing would ever be significant.