How to interpret highly significant predictors and low squared correlation?

patrick_de_winpatrick_de_win Member Posts: 2 Contributor I
edited December 2018 in Help

Hello. I did a linear regression analysis on my 42.000 data (online contest results) and after the model building and the model performance calculation, some of my variables turned out to be highly siginificant (4 stars in the tabular view of the model). The p-values were 0,000 for these variables. But then we looked at the squared correlation, and this was low: 0,013. I don't quite understand this contradiction. How can variables be highly significant in predicting the target variable, and the correlation value be very low at the same time? How should I interpret this? Thx in advance!

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    Well you really don't want your LR model to be built on 'correlated' data and gives rise to the multi-colinearity problem.

  • yoni1961yoni1961 Member Posts: 13 Contributor II
    Thanks for the great explanation. Yet, does RapidMiner provides p values for its Pearson correlation matrix??? 
  • yoni1961yoni1961 Member Posts: 13 Contributor II
    And "switch to a modeling approach that is not primarily based on interpreting p-values at all"... Such as? 

     

Sign In or Register to comment.