The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
How to create confidence intervals for numeric prediction ?
Hey Community,
I would like to know, how I can generate confidence intervals around the numeric predictions I get from different algorithms. I have around three years of data with values for my label, a shipment amount, that has to be processed. Until now I only used the RMSE, I got through the performance operator, to compare different algorithms or their parameters.
Can you please give me some advice how I could create confidence intervals, for example the 95% confidence around the predicted values, so I can show the users the expected range of shipments.
Thanks in advance.
Tagged:
0
Answers
Dear Björn,
i think this request is hardly possible to do. At least in a model agnostic fashion. I do not know an algorithm to do this for every model. There are some tricks like using simulations, but they all have strong assumptions on the underlying distributions.
What model are you using?
cheers,
martin
Dortmund, Germany
Dear Martin,
thanks for your response. Right now the model I am using is Gradient Boosted Trees.
@mschmitz
I had a think about how to do something similar based on the LIME approach. What about measuring the differences between the prediction & actual value for each record, discretizing that into bands and then building a model to predict how accurately the previous model might predict for a record (given a certain range).
It's still work in progress and I think it need a bit more thinking about upper & lower bounds rather than just difference between the prediction & reality. But posting the idea here to get some ideas.
@JEdward,
clever. I know a PhD student I worked with earlier in my career who uses a DL model with tensorflow to predict abs(prediction-label) with some success. One needs to keep in mind what this includes. This way of doing is not covering the standard deviations on all model parameters nor does it cover the measurement uncertainty on all input attributes. The later is tough to cover at all.
But, neat trick
~Martin
Dortmund, Germany