RapidMiner 9.8 Beta is now available
Be one of the first to get your hands on the new features. More details and downloads here:
Linear Regression: error in calculation of tolerance
I am writing training materials for multiple regression. The Linear Regression Operator is giving what seems to be incorrect calculations for tolerance.
To illustrate, see attached toy dataset. My process reads this data and uses Linear Regression to do y=f(x1, x2, x3, x4). The model is then applied to the training data (just to keep things simple) and finally I use Performance to get R-squared. The result is:
Attribute Coefficient Standard Error Std. Coefficient Tolerance t-stat p-value code
I cross check the results with Minitab and RapidMiner and Minitab agree on everything except tolerance. Minitab reports VIFs but they are simply the reciprocal of tolerance. Here is the Minitab output
Term Coef SE Coef T-Value P-Value VIF
Constant -0.328 0.161 -2.03 0.073
x1 0.6099 0.0971 6.28 0.000 2.53
x2 -0.000000 0.000000 -0.15 0.888 5.58
x3 0.1783 0.0821 2.17 0.058 19.54
x4 -0.001083 0.000783 -1.38 0.200 18.24
The VIFs are a long way from the reciprocals of the tolerances.
I calculated the values directly: tolerance = 1-R-sq, where R-sq is obtained by regressing the x against all the other xs. So for example if I drop the y and make x4 the label and re-run the process, I get an R-sq of 94.5% and the tolerance for x4 should therefore be 0.055, not 0.262
Am I going wrong, or is it an error?