Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Calculate confidence interval of RMSE"

wesselwessel Member Posts: 537 Maven
edited June 2019 in Help
Dear All,

I have two forecasting algorithms that output some forecast for the temperature 24 hours a head in time.
Algorithm A uses 1-nearest neighbours.
Algorithm B is a baseline algorithm, and simply outputs the last known temperature value as a prediction.

Lets say I calculate the Mean Squared Error, and the Variance of the Squared Error for A and B on a separate test set with N data points.
Then what is the confidence interval of MSE_A?
And what is the confidence interval of MSE_B?

Best regards,

Wessel
Tagged:

Answers

  • wesselwessel Member Posts: 537 Maven
    I have solved this problem as following, although I'm not sure it is correct:

    diffErrMean = baseErrMean - predErrMean;
    diffVarMean = baseVarMean + predVarMean;
    varOverSqrtN = diffVarMean / Math.sqrt(N);
    z = diffErrMean / varOverSqrtN;
    z = Math.abs(z);
    upper = diffErrMean + z * diffVarMean
    lower = diffErrMean - z * diffVarMean
    (Where B = baseline = baseErrMean, and A = algorithm = predErrMean)

    I can then print something like:
    N: 13 // number of test points
    Target: "temp"
    Run time: 0.105 ms
    predErrMean: 0.134  predVarMean: 0.067
    baseErrMean: 0.246  baseVarMean: 0.141
    diffErrMean: 0.113 +- 0.058 = [-0.003, 0.228] // kinda weird that this is already nearly significant with only 13 test points
    Ratio:  1.843
  • wesselwessel Member Posts: 537 Maven
    Okay this does not make any sense.

    You need to use the CDF of the T distribution to convert the z at the 2.5% point.

    But this is hard in Java since there is no easy access to the CDF of the T distribution.

    So for now I think I will assume the normal distribution and use confidence interval = MEAN +- 2 * S.D.

    But then the problem is:
    The differences are not normally distributed.
    The maximum difference possible with algorithm A 0 error, and the baseline some big error, then the big error would be equal to the S.D.
    And nothing would ever be significant.
Sign In or Register to comment.