05-15-2017 05:32 PM
Currently I'm in a class on Data Mining and for a project I am trying to use Rapidminer to create multiple linear regression models based on data from Steam.
I have two excel sheets: one listing 80 games and acting as my training data, and another list of 20 games.
Both lists contain the game's name, price, current owners, current players in the past two weeks, median play time (in seconds), score (1=generally positive review scores, 0 = bad scores).
I am attempting to figure out if I can predict whether a game will be popular or review positively based on price, number of owners, and the number of players.
Predict number of players by the price, owners, and ratings.
Tried going off of other linear regression model tutorials I've seen online, but couldn't quite figure out if they're right for my particular case.
Any advice would be greatly appreciated.
I attached my work so far.
05-15-2017 09:55 PM
05-16-2017 08:48 AM
Given that your label variable is actually binary, I would recommend logistic regression rather than linear regression. It was developed specifically for this type of label and addresses a number of conceptual limitations with linear regression in such cases. And it also does allow you to use other algorithms entirely as @Thomas_Ott describes.