Predict missing values

hatsjikidee · October 2019

Hello all,

I have a dataset with about 3000 records of rated songs. About half are rated, the other half is not. I'm trying to build a model that predicts the empty ratings based on what users rated. I have done the following:

My question is, is this correct? Do I need to make adjustments to make it more correct? Because when I for example already change the k I get different values. And another question: how do I show only the values that have been predicted instead of a full overview, including the already filled in values.

Thanks in advance!

Image: https://us.v-cdn.net/6030995/uploads/editor/nx/gofny4v48ndg.png

lionelderkrikor · October 2019

@hatsjikidee,

OK, I understand. In theory, your method is the good one....but as you mentionned for each k value, you have different results, but you can not evaluate the "performance" of each prediction.

From my point of view, to create a real recommender model, you need descriptive features of your song(s). For example
you need an associated dataset with for each song, its style (pop, rock etc.), its lenght, its author etc.

Hope this helps,

Regards,

Lionel

PS : There is a useful ressource (a book) for you :
- "RapidMiner, Data mining use cases and business analytics applications", (Chapter 9 : Constructing Recommender Systems in RapidMiner) , from Markus Hofmann and Ralf Klinkenberg.
- the associated extension "Recommenders" (to install from the MarketPlace).

lionelderkrikor · October 2019

Hi @hatsjikidee,

If you have some descriptive features of your songs, you can build a model based on your labeled data (your rated songs) and then apply this model to the unlabelled data (the unrated songs).

To help you further can you share your data ?

Hope this helps,

Regards,

Lionel

hatsjikidee · October 2019

Hi lionel,

The dataset has 3 attributes:
Song name - Rating - Name (of the rater)

Every user has about 40 songs, of which 20 rated and 20 not. So the goal is to predict the missing ones based on what the user rated on the ones he did rate. Hope this gives more clarification.

MarcoBarradas · October 2019

@hatsjikidee as @lionelderkrikor said you need to add more data to the one you already have and then you can predict the rate that the user is going to give. Also read something about how Netflix algorithm works. You may also add some complexity to the analysis by obtaining the lyrics and doing some text minning to obtain the words that are more repeated on the songs and how the presence of them may or may not impact de rating of the user.
I don't know if you are doing this as part of a class or just for the fun of doing it but in real life part of being a Data Scientist is analysing the problem, identify the data that may or may not predict an outcome and then extract it from it source. Sometimes the Example Set includes all the attributes you may need and sometimes you need to go out to the internet and find it to enhance your analysis.
Hope this helps and if you need help texts us and we'll be glad to guide you on the process.

Best Regards.

hatsjikidee · October 2019

So from what I understand, as far as it is possible I make good predictions with this process. Thank you both for th help and information!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Predict missing values

Best Answer

Answers