The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Wyh does rapidminer include a variable with a p-value >0,05 in a multiple linear regression?

MariJAMMariJAM Member Posts: 1 Newbie

I'm doing a multiple linear regression. For my regression I have choosen the M5 prime feature with a min tolerance of 0,05. The final model contains three independent variables. Two of them have a p-value under 0,05 and one is above with a p-value of 0,135 (and t-Stat of 1,543).
Two other independent variables have not been included in the model due to their high p-values und low t-Stat values. 

Can anyone help and tell me why rapid miner includes this one variable eventhough its p-value is above 0,05?

Thanks a lot!

Best Answer

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    Solution Accepted

    you are coming from a stats background, while RM is more from a DS background. There are quite some assumptions behind the p-value calculation. The mindset of DS is more: If i can prove that this method works better than another one, i take the method. So what you would do is vary the cutoff and check the results.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.