RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.


Correlation Matrix is a weak. Decision tree accuracy 70%

mariozupanmariozupan Member Posts: 15 Contributor II
edited November 2019 in Help
I have financial performance indicators as interdependent variables and  the financial performance mark as the label.  I tried to use correlation matrix operator and got very weak correlations between label and the indicators, although marks (from A to E) are derived from indicators. Do I need optimization of the parameters, or some other type of optimization? Do I need normalization of variables? Do I need discretization?
The same questions stays for decision tree. I got 70% accuracy with the pre-pruning disabled .
I was mentioned correlation matrix before decision tree because it logic to me that I need a very strong correlation before any learning operator. Correct me if I'm wrong
Could you please show me the way.


  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn

    you don't need necessarily a strong correlation for good prediction results: correlation measures the impact of each single attribute on the label, but it does not catch attribute interactions. Suppose you have a shop and want to find good customers. It may be possible that alone by the age of a customer you can't tell anything and alone by the city you can't tell anything, but if you combine both attributes, you will see that old customers from New York by a lot, and young customers from Seattle.
    So here the predictive strength comes only from the combination of two attributes. This is not represented in the correlation matrix.

    For the accuracy: please keep in mind, that this is the probability for new examples to be classified correctly. If you have equally distributed data, anything above 50% is better than random guessing. However, if you have 70% positives in your data, and your learner always predicts "positive", you will already have an accuracy of 70%. So the accuracy must always be interpreted in combination with the class priors.

    Happy Mining!
Sign In or Register to comment.