Wyh does rapidminer include a variable with a p-value >0,05 in a multiple linear regression?

MariJAMMariJAM Member Posts: 1 Newbie

I'm doing a multiple linear regression. For my regression I have choosen the M5 prime feature with a min tolerance of 0,05. The final model contains three independent variables. Two of them have a p-value under 0,05 and one is above with a p-value of 0,135 (and t-Stat of 1,543).
Two other independent variables have not been included in the model due to their high p-values und low t-Stat values. 

Can anyone help and tell me why rapid miner includes this one variable eventhough its p-value is above 0,05?

Thanks a lot!

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,505 RM Data Scientist
    Solution Accepted

    you are coming from a stats background, while RM is more from a DS background. There are quite some assumptions behind the p-value calculation. The mindset of DS is more: If i can prove that this method works better than another one, i take the method. So what you would do is vary the cutoff and check the results.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.