The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

"Linear regression beats ANN"

chaosbringerchaosbringer Member Posts: 21 Contributor II
edited June 2019 in Help
Hi,
i have a dataset consisting of 1000 samples and 19 attributes. The data is housing data (living area, presence of heating, bath, neighborhood characteristics, etc). The target value is house price. The dataset has 8 binary attributes.
If i apply linear regression to this dataset the results are far superior to an ANN, although from my understanding the data is to complex for linear regression.
Also decission trees and SVM are inferior to linear regression.
Have you some advice, how i can validate the results and check why linear regression is that good?


Thank you very much.

Answers

  • wesselwessel Member Posts: 537 Maven
    Use cross validation.
    A linear model with 19 parameters is still a fairly complex model.
  • chaosbringerchaosbringer Member Posts: 21 Contributor II
    Hi,
    thank you for your answer.

    Yes, the data is still complex. But i still do not understand, why ANN is so bad.
    Even with cross validation if get:
    MSQE with lin. reg: 0.34
    MSQE with ANN: 0.54

    Is there an explanation for this? How can i shed some light into the details? Why is ANN such bad in comparisson to lin. reg?

    Thank you very much.

  • wesselwessel Member Posts: 537 Maven
    Make a convergence plot.
    E.g. measure the RMSE at every iteration.
    Maybe you need to train your network for many more iterations.
    With 19 inputs, your network gets very big, very fast, so you have lots of weights to optimize.
    An alternative problem could be premature convergence, e.g. getting stuck in local optima.

    Best regards,

    Wessel
  • chaosbringerchaosbringer Member Posts: 21 Contributor II
    Hi,
    tank you, that helped. Fiddling with the parameters improved the situation significantly.
    However, another problem raises:
    T-Test says, that the means are the same (p=1,0).
    If i test-wise modify the parameters of the neural net to produce a realy bad result, the t-test still return 1.
    How can it be, that the t-test returns 1, even though the RMSEs are very different (0.5 vs 0.34)?


    Thank you
  • fikiofikio Member Posts: 3 Contributor I
    chaosbringer wrote:

    Hi,
    i have a dataset consisting of 1000 samples and 19 attributes. The data is housing data (living area, presence of heating, bath, neighborhood characteristics, etc). The target value is house price. The dataset has 8 binary attributes.
    If i apply linear regression to this dataset the results are far superior to an ANN, although from my understanding the data is to complex for linear regression.
    Also decission trees and SVM are inferior to linear regression.
    Have you some advice, how i can validate the results and check why linear regression is that good?


    Thank you very much.
    I come from a statistical background, to clarify, your dataset has 1000 observations and 19 variables, 8 of which are binary? Why do you believe that the data is too complex for linear regression, have you looked at variables univariately with a scatterplot or done some modeling to determine that there are nonlinear relationships?

    I am used to evaluating models with AUCs, what numbers are you getting? Is MSQE the mean squared error?
Sign In or Register to comment.