laurablaurab Member Posts: 23 Maven
edited November 2018 in Help

I am using a really large dataset to train a model.  I want to improve the prediction results by breaking down the datset  in to smaller groups that have similar trends. Rather than one large group with lots of different trends.  The data is so large and complex, and I am not familar enough with it to break it down into suitable subgroups by hand so I have to use a clustering model. 

I am using the kmeans clustering. I am also using the EvolutionaryParemterOptimizer to establish the optimum number of k clusters.  The problem is that I cant not see any distinuishing / correlating aspects between the clusters.  What should I be looking for ?

Am i using the best model for the task?




  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Laura,
    I'm not sure if this approach is suitable after all. KMeans will group the similiarst examples in the same cluster. Often this examples are then of the same class, making correct prediction more difficult due to the class imbalance problem.
    This might work in the case you have very inhomogeneous data and keep the number of clusters small enough. But since Clustering does not provide a clear criterion to optimize you will have to guess or include the following classification into the optimizing. But this might take a huge amount of calculation power to solve this.


Sign In or Register to comment.