Options

How to convert k-means clusters into a prediction with labeled data

Laura_BrongersLaura_Brongers Member Posts: 1 Newbie
Hi, for an assignment in the form of a Kaggle competition (called 2nd Assignment DMT, 2022 VU Data Mining Techniques Cup) I have a very big labeled dataset with data about customers that search and book hotels on a website. Each row has a search ID (so one customer can have multiple searches). A search is a hotel which has several properties like location, price per night, star rating etc. Attributes are thus for example how many nights the search was for, how many adults would join the stay, what the star rating of the hotel is etc. The outcome variable that I want to predict is whether the customer will book or not. This attribute is already included in the dataset and is set to label. Additionally, I would like to take into account the chance a customer will click on a hotel, which is also a binary attribute in the dataset (clicked yes/no). 
I produced a k-means cluster with rapid miner studio, by:
1. set role of attribute booked (yes/no) to label
2. getting a sample of the data (20%)
3. selecting attributes which we think are useful
4. normalize the data
5. transform the data that are nominal into numerical data
6. apply the k-means clustering with k=3, 100 runs, add cluster as attribute to the data and all other default settings
7. apply performance measure for cluster distance

Now I want to make a prediction on the label booked (y/n) based on these clusters. It has to become a list with search ID and the chance that this customer books the hotel of this search. So my question is how do I transform the data with the cluster as an attribute into a list of predictions?
It would be great if the predictions also take into account the chance that the customer clicked on the hotel. 
Sign In or Register to comment.