Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Clustering Dummy Variables
mario_sark
Member Posts: 13 Contributor I
Dears,
I am working on to segment a list customers into different cluster based on different variables, but some of these variables are Dummy variables for example below is the list of variables that i will use to apply the clustering technique:
Unpaid : Yes/No (dummy)
Deposit : Continuous (Some Customers has Zero deposits)
Term Deposits: Continuous (some customer has Zero Term Deposits)
Number of returned Checks : discrete (Some Customers Has Zero)
Insurance Product : discrete (some Customer has Zero) - this can be transform into (Yes /No)
Credit Card Spending : Continuous ( Some customers has zero since they don't hold credit Cards)
Number of Product (Loans) : it can be number of Car Loan ,Personal Loan, Housing Loans, ...(some customer has zero)
What is the best algorithm in RapidMiner i can use to cluster these customers into different segments to highlight the less profitable group.
As i know K-means can hold only continuous variable, and i am afraid to normalize the dummy variables available in the data set.
Hope That you can help with this. !!
Thank you in advance,
Mario
I am working on to segment a list customers into different cluster based on different variables, but some of these variables are Dummy variables for example below is the list of variables that i will use to apply the clustering technique:
Unpaid : Yes/No (dummy)
Deposit : Continuous (Some Customers has Zero deposits)
Term Deposits: Continuous (some customer has Zero Term Deposits)
Number of returned Checks : discrete (Some Customers Has Zero)
Insurance Product : discrete (some Customer has Zero) - this can be transform into (Yes /No)
Credit Card Spending : Continuous ( Some customers has zero since they don't hold credit Cards)
Number of Product (Loans) : it can be number of Car Loan ,Personal Loan, Housing Loans, ...(some customer has zero)
What is the best algorithm in RapidMiner i can use to cluster these customers into different segments to highlight the less profitable group.
As i know K-means can hold only continuous variable, and i am afraid to normalize the dummy variables available in the data set.
Hope That you can help with this. !!
Thank you in advance,
Mario
Tagged:
0
Answers
Thank you for your reply, the list of customer that i am going to clusters is around 70,000 Customers.
I was wondering if there is any algorithm other than K-means. I
i am looking forward also to read about other possibilities.
Thank you,
Mario
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts