Validating a linear regression

AsokaAsoka Member Posts: 14 Contributor II
edited June 2019 in Help
I figure the capabilities I'm looking for must be available - I just haven't been able to find them.

When generating a Linear Regression in RapidMiner v5 (.008 - the upgrade to .015 isn't working for me), I am trying to figure out how to get the various measures and plots that are used to validate the various assumptions of a Linear Regression.  With the standard output of the Linear Regression operator, I can find the R Square and T-test results for the individual variables.  I can use the T-test results to imply the model level F-test.

Additional information I am looking for are things like the Adjusted R-Square, plot of errors, QQ plot, Variance Inflation Factor, Cooke's distance, and that sort of thing.  I originally learned validation of linear regression using PROC REG from SAS if that helps frame the sort of information I'm looking for.

I figure these tests and plots have to be available in Rapid Miner - any hints or pointers to where I can get that info is greatly appreciated.



  • Options
    AsokaAsoka Member Posts: 14 Contributor II
    Bumping to give this another chance - am I truly limited to the t-test for validating a Linear Regression within Rapid Miner?

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn

    usually we use a X-Validation to validate the Linear Regression  - the same way as we do with all supervised learning algorithms.

    Basically the X-Validation splits the data numerous times into test and training set, calculates the linear regression model on the training set, applies it on the test set and calculates a performance measure.
    By using the operator Performance (Regression) you have a big choice of measures to calculate.

    Best regards,
  • Options
    AsokaAsoka Member Posts: 14 Contributor II
    That much makes sense Marius - I'll set that up and see how close I can get to what I'm looking for.  At the very least, I'll be able to be more precise about what I'm finding or not finding.  Setting up the validation and performance(regression) operators makes perfect sense.

Sign In or Register to comment.