Where in the process to place the 'Cross validation' operator?
In the customer segmentation process below, I believe I've answered in the cluster model using k means, which cluster of customers (by ID) to use. This would be the answer to my problem statement.
I'm confused over where to place 'Cross validation'. The tutorials seem to indicate placing the operator after the 'retrieve' data set. At that point how does RapidMiner validate a model not yet developed by k means clustering down the line?
Any helpful suggestions are greatly appreciated.
I'm confused over where to place 'Cross validation'. The tutorials seem to indicate placing the operator after the 'retrieve' data set. At that point how does RapidMiner validate a model not yet developed by k means clustering down the line?
Any helpful suggestions are greatly appreciated.
Tagged:
0
Best Answer

Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornI think the question is what do you mean by validating a clustering model? Validation normally implies that you have a set of observations where you know the correct answer so you can check the ML algorithm's prediction against a known outcome and "grade" its performance.
With clustering (or any unsupervised learning problem) there is no known correct answer in advance. You are simply using an algorithm to explore structures in your data and return results. You may or may not be happy with the outcome of any particular algorithm, but there is no objective way for the algorithm to "selfassess" its performance relative to other possible clustering solutions.
Now there are performance operators for clustering in RapidMiner that you might want to take a look at, which you can use to help you understand the outcome of any particular clustering solution, and that you can use to compare outcomes. And there are also different methods that people have suggested as helpful for evaluating or comparing different clustering outcomes (like the elbow method), but it is still somewhat subjective and there is no clear and compelling objective method for saying that one clustering outcome is superior to another unless you can specify in advance what the exact criteria are that you are going to use for that determination.5
Answers
But clustering is an unsupervised machine learning problem, where there is no defined label in advance that you are trying to obtain. So generally speaking Cross Validation is not applicable when you are doing clustering.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
So is there another way to go to validate a clustering model?