I built a simple regression model (5 independent variables) and computed the predicted values to estimate performance indicators (mean squared error, absolute error and R2). Afterwards, I used rapidminer to built a neural net (with cross validation to optimize the parameters). The issue is with the performance results when applying the best neural net to all dataset in order to compare with the regression. I got a average squared error of 10.1379 with the regression and 11.749 +/- 15.630 with the neural net, but the squared correlation of the regression is only 0.732 and the rapidminer output for the neural net is 0.814. This doesn't make sense to me, could anyone help? Am I comparing different things?
Thanks in advance,
I may be missing some maths here, but for me it makes little sense to perform cross validation on linear regression. Contrarily to ANN (also SVM and other methods) that require random initialization of the connection weights and, therefore, may leed to different solutions if local minima exist, the LR does not suffer from this issue. The drawback it that the regression equation structure needs to be known or defined beforehand and with the ANN no.
So, no, I haven't used cross validation with the LR. More, I ended up using SPSS to build the regression because the R2 was higher (I assume there may be slight differences in the algorithms used to determine the regression coefficients). Also, after determining the best ANN model configuration using the cross validation, I simply applied it once to all dataset to be comparable with the LR. It made little sense to compare the performance of an LR on all data with an ANN only on part of the data.
Still, my question was different - is it possible to have to models that there ratings is not the same using the average squared error and the R2? I was expecting that the model with the lowest average squared error would have the highest R2.
to LR and X-Val: I would argue that you should put a X-Val around any model. Of course the effect of overfitting is more extreme the more paremeters you estimate in your model. LR by itself (with out any regularization, family choices and so) as a fairly limited complexity. Thus the effect is smaller than in "complex" models (even though k-nn is one of the worst one w/o x-val and i wouldn't call it complex).
For the other: First, just because it's good in one - it does not mean that it is good in the others. Maybe sq. error and r2 is something different here.
The interesting part is the std_dev of your ANN. It is extreme. In other words - on some data sets it works perfectly on other it totally crashes. Maybe you where just on the lucky side of the world.