Questions about customer clustering/segmentation

FranziFranzi Member Posts: 2 Contributor I
edited November 2018 in Help



I’m new to rapidminer, I did all the tutorials, but when I try my own cases, its a bit difficult to find the rigth operators and parameters.


I want to cluster my customers (CustomerID) in three groups based on their transactions.


Transactionsattributes are:


Date of transaction (datatype: date)

Value of transaction (datatype: integer)

Number of transactions (datatype: integer)


I would like to give the customers with following features a higher rate (weight)


  • more than one transactions
  • with a higher transactionsvalue than average
  • recent transactions (i.e. transactions in the last month)


Is their any possibilty to create a process in rapidminer, that reflect my requirements?

Which operator would be best for that use case?


Thanks for your help in advance and sorry for my poor english!


Best Answer

  • Options
    yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    Solution Accepted

    'Generate Attribute' operator is your good friend, to achieve your goal 'to give the customers with following features a higher rate (weight)'

    you can create several indicator attributes, for instance, to tag the customers who has any more than one transactions, 


    attribute name                                function expression

    AnyTransaction                             if(Number of transactions>1, 1,0)


    You can refer to the tutorial process for Generate Attribute, and get inspired by the example function expressions.


    Happy RapidMining!



  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist

    Dear Franzi,


    my key question for you is: Do you want to classify/cluster by your own rules or by computer generated rules based on statistical reasoning?


    In rapidminer we got a lot of operators which group customers together by their attributes. They find the rules for the grouping which are the best - given some statistical measure. Most likely they will be similar to the groups you had in mind, but not necessarly.


    The operators for this would be: K-Means, K-Medoids, DBScan or maybe Agglomerative Clustering. Please be aware that all of those operators use a distance measure and thus need normalized data. You can normalize your data with teh Normalize operator.




    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    FranziFranzi Member Posts: 2 Contributor I

    Thank you a lot! The "Generate Attribute" helped me out.



Sign In or Register to comment.