RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
LinearRegression vs W-LinearRegression
I have a set of data that I've applied PCA to, and obtained 9 principle components as input to the regression.
I'm using XValidationParallel with 20 validations and shuffled sampling.
Within the XVal node, I'm building either a LinearRegression or W-LinearRegression model, applying it and measuring its performance. The average RMS error is the performance reported
Both regression nodes have attribute selection turned off, and are not trying to eliminate colinear features. The other parameters are at default settings.
The results I'm getting are below. Note that the coefficients are different, as is the RMS error estimates.
I thought that the two models would have yielded near identical results, so I'm confused what's causing the different, and whether I'd be better off using the Weka LinearRegression, as it yielded a lower error.
This is with RM 4.4.
Linear Regression Model
5.5846 * pc_1 +
-1.757 * pc_2 +
-1.018 * pc_3 +
-1.3188 * pc_4 +
0.5875 * pc_5 +
-0.7379 * pc_6 +
3.8062 * pc_7 +
1.3037 * pc_8 +
0.5423 * pc_9 +
root_mean_squared_error: 17.360 +/- 0.512 (mikro: 17.367 +/- 0.000)
3.547 * pc_1
- 0.473 * pc_2
- 1.579 * pc_3
- 1.314 * pc_4
- 1.693 * pc_5
- 0.131 * pc_6
- 0.111 * pc_7
- 1.802 * pc_8
- 1.016 * pc_9
root_mean_squared_error: 20.991 +/- 0.596 (mikro: 21.001 +/- 0.000)