patrick_de_win
Hello. I did a linear regression analysis on my 42.000 data (online contest results) and after the model building and the model performance calculation, some of my variables turned out to be highly siginificant (4 stars in the tabular view of the model). The p-values were 0,000 for these variables. But then we looked at the squared correlation, and this was low: 0,013. I don't quite understand this contradiction. How can variables be highly significant in predicting the target variable, and the correlation value be very low at the same time? How should I interpret this? Thx in advance!


  Thomas_Ott

Well you really don't want your LR model to be built on 'correlated' data and gives rise to the multi-colinearity problem.

    Well you really don't want your LR model to be built on 'correlated' data and gives rise to the multi-colinearity problem.

  • yoni1961yoni1961 Member Posts: 13 Contributor II
    Thanks for the great explanation. Yet, does RapidMiner provides p values for its Pearson correlation matrix??? 
  • yoni1961yoni1961 Member Posts: 13 Contributor II
    And "switch to a modeling approach that is not primarily based on interpreting p-values at all"... Such as? 


