clustering

laurab · December 2008

Hi,

I am using a really large dataset to train a model. I want to improve the prediction results by breaking down the datset in to smaller groups that have similar trends. Rather than one large group with lots of different trends. The data is so large and complex, and I am not familar enough with it to break it down into suitable subgroups by hand so I have to use a clustering model.

I am using the kmeans clustering. I am also using the EvolutionaryParemterOptimizer to establish the optimum number of k clusters. The problem is that I cant not see any distinuishing / correlating aspects between the clusters. What should I be looking for ?

Am i using the best model for the task?

Thanks

Laura

land · December 2008

Hi Laura,
I'm not sure if this approach is suitable after all. KMeans will group the similiarst examples in the same cluster. Often this examples are then of the same class, making correct prediction more difficult due to the class imbalance problem.
This might work in the case you have very inhomogeneous data and keep the number of clusters small enough. But since Clustering does not provide a clear criterion to optimize you will have to guess or include the following classification into the optimizing. But this might take a huge amount of calculation power to solve this.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

clustering

Answers