🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉
Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.
handling empty values
I'm currently working on my master's thesis. Part of the work is a customer segmentation by means of a cluster analysis.
One variable for the cluster determination shall be the chronological sequence of product categories purchased. For example, customer 10 has bought as the first product category A, then the product category C and an article from the group X. That means that the changing of the purchase behavior of the customers should be included in the analysis.
But I don't know what will be the best way to map the data.
My idea was to divide this criterion in different variables. I wanted to create a new variable for each purchase of a new category made. So finally I get the variable "first selling category" to "10th selling category". This variables can take the value of the category names.
The problem with this approach is that each customer buys a different number of products. If a customer buys 3 different product categories, there will be in the first 3 columns the desired values and in the remaining columns will be no value.
Because the various clustering algorithms cannot handle missing values, I am now at a loss.
Is there another method to map the criterion or a possibility of using empty values?
I would be very happy about a tip.
Thank you in advance for your help.