The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Some Questions around Clustering

GudiGudi Member Posts: 1 Newbie
Dear all,

Since im neither mathematician nor a computer scientist the answer to the following question might be an easy one for you guys here but for me it doesn't make sense at the moment. So I would kindly ask for your support on the following questions:

My goal is to do a clustering with this data a "customer-personality-analysis".
I want to answer the question, whether the "education" (5 different types are available) has an impact on the "amount sweet products". Therefore I want to analyze with the clustering (k-means) the amounts of sweets being purchased and afterwards understand the education behind the amounts.

  • In the below screenshot you can see 5 different cluster. How do I understand their meaning?
  • Will I have to compare 5 different charts as I have 5 different types of education? (The chart looks the same when I chose a different eduction type for "color column", only the color get different)
  • I have more than 2000 rows. Is it necessary to prepare the data so I have n=1000 to get a better and more precise result?
  • What does the "Cluster Model" mean? I would have expected 5 different cluster where the amount of items rise (e.g. cluster 1= 5 items, cluster 2= 349 items, cluster 3= 500 items)

Thank you for your help!

Kind regard

Sign In or Register to comment.