Where in the process to place the 'Cross validation' operator?

tonyboy9tonyboy9 Member Posts: 113 Contributor II
edited August 2020 in Help
In the customer segmentation process below, I believe I've answered in the cluster model using k means, which cluster of customers (by ID) to use. This would be the answer to my problem statement. 

I'm confused over where to place 'Cross validation'. The tutorials seem to indicate placing the operator after the 'retrieve' data set. At that point how does RapidMiner  validate a model not yet developed by k means clustering down the line?

Any helpful suggestions are greatly appreciated.


Best Answer

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Cross validation is an approach to model validation for supervised machine learning problems when you have a defined target variable (called the label in RapidMiner).  If you look at the tutorial process for that operator, you can see that inside it, you put the training learner on the left part of the process, and the validation on the right side.
    But clustering is an unsupervised machine learning problem, where there is no defined label in advance that you are trying to obtain.  So generally speaking Cross Validation is not applicable when you are doing clustering.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • tonyboy9tonyboy9 Member Posts: 113 Contributor II
    Thanks for that, Brian. You wrote: "But clustering is an unsupervised machine learning problem, where there is no defined label in advance that you are trying to obtain.  So generally speaking Cross Validation is not applicable when you are doing clustering."

    So is there another way to go to validate a clustering model?
Sign In or Register to comment.