Some Questions around Clustering

Gudi · January 2022

Dear all,

Since im neither mathematician nor a computer scientist the answer to the following question might be an easy one for you guys here but for me it doesn't make sense at the moment. So I would kindly ask for your support on the following questions:

My goal is to do a clustering with this data a "customer-personality-analysis".
I want to answer the question, whether the "education" (5 different types are available) has an impact on the "amount sweet products". Therefore I want to analyze with the clustering (k-means) the amounts of sweets being purchased and afterwards understand the education behind the amounts.

Image: https://us.v-cdn.net/6030995/uploads/editor/gz/8pc5vpros737.png

In the below screenshot you can see 5 different cluster. How do I understand their meaning?
Will I have to compare 5 different charts as I have 5 different types of education? (The chart looks the same when I chose a different eduction type for "color column", only the color get different)
I have more than 2000 rows. Is it necessary to prepare the data so I have n=1000 to get a better and more precise result?

What does the "Cluster Model" mean? I would have expected 5 different cluster where the amount of items rise (e.g. cluster 1= 5 items, cluster 2= 349 items, cluster 3= 500 items)

Image: https://us.v-cdn.net/6030995/uploads/editor/e4/m3fs3lnmslj5.png

Thank you for your help!

Kind regard
Gudi

Some Questions around Clustering

Categories