decision tree vs k-means

simsim Member Posts: 18 Learner I
I have run a decision tree and K-means in rapidminer, however my results from the two appear to be conflicting each other. I have checked, and my methods appear to be correct. 

Is there any possible reason for these contradicting results? I would just like to understand possible reasoning, so I am able to understand further how rapidminer works. 

Answers

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,440  Community Manager
    hi @sim so I'm a little confused. Decision Tree is a supervised learning algorithm; k-means clustering is an unsupervised learning algorithm. They are literally apples and oranges. How are you using these?

    Scott

  • simsim Member Posts: 18 Learner I
    Hi Scott,
    Sorry if I'm being unclear- I'm new to rapidminer and am just trying to understand why my results from these two mechanisms are contradicting each other.
  • simsim Member Posts: 18 Learner I
    I know that they both belong to different machine learning types, but surely there should be some correlation between the results?
  • varunm1varunm1 Member Posts: 724   Unicorn
    Hi @sim

    If possible, can you post your xml and sample data to check how they are contradicting? From my understanding k means will cluster the data and decision tree helps interpret the clustering. As an unsupervised algorithm k means just uses numerical data to plot and divide clusters. But the supervised algorithms like decision tree work mainly based on label and not the total data at once. They train to fit their output labels. One big difference is k means consider all attributes where as decision tree drops that are not useful in fitting the output (pruning). You can get similar output if any one attribute is highly related to output. But as @sgenzer the comparison is not suitable between these two

    Thanks 
    Varun
    sgenzer
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,226   Unicorn
    What the others are saying is that it is unclear what you mean by the statement that the algorithm output is "contradicting" each other since they are not solving the same problem.  It would be like saying a recipe for cookies was contradicting instructions for how to change the oil in your car.  They are not really doing the same thing at all.
    • In this case, the DT algorithm will look at your label and then generate a set of splits from all your other attributes that best helps you to best separate the different values of the label.
    • The k-nn algorithm will simply look at all your data and try to find the number of groups that you specify that are most similar (based on the similarity metric you select) across all the dimensions together.  (And if you don't normalize the data and you have numerical data, it can get easily skewed, but that is another story).
    I hope this helps clarify.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    varunm1sgenzer
Sign In or Register to comment.