The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
"Some help for training a regression algorithm [SOLVED]"
Hi dear rapid-i community,
I am testing the rapidminer modeling to make a content-based recommender system. To do that i downloaded the movielens 100K dataset which has information about movies and ratings made by users to movies (http://www.grouplens.org/node/73). The ratings have a range between 0 and 5 and the movies has genre information (action, commedy, etc). I am training a classifier using the user with more ratings (uid= 405; Number of reviews= 737). To do that I discretize the rating label (good >= 3.5; bad < 3.5) but due that the user has a lot of more reviews with label bad the classifier (libSVM) predicts all labels as bad.
true bad true good class precision
pre.bad 621 116 84.26%
pre.good 0 0 0%
class recall 100% 0%
So i used another strategy where I made stratified sampling (http://rapid-i.com/rapidforum/index.php/topic,2190.0.html) to get good and bad labels balanced. I get the following results
true bad true good class precision
pre.bad 58 80 42.03%
pre.good 57 35 38.04%
class recall 50.43% 30.43%
But as you can see the performance obtained is still not good, i really appreciate any suggestion.
Thanks.
Eduardo
Edit: Sorry for the replicated message
I am testing the rapidminer modeling to make a content-based recommender system. To do that i downloaded the movielens 100K dataset which has information about movies and ratings made by users to movies (http://www.grouplens.org/node/73). The ratings have a range between 0 and 5 and the movies has genre information (action, commedy, etc). I am training a classifier using the user with more ratings (uid= 405; Number of reviews= 737). To do that I discretize the rating label (good >= 3.5; bad < 3.5) but due that the user has a lot of more reviews with label bad the classifier (libSVM) predicts all labels as bad.
true bad true good class precision
pre.bad 621 116 84.26%
pre.good 0 0 0%
class recall 100% 0%
So i used another strategy where I made stratified sampling (http://rapid-i.com/rapidforum/index.php/topic,2190.0.html) to get good and bad labels balanced. I get the following results
true bad true good class precision
pre.bad 58 80 42.03%
pre.good 57 35 38.04%
class recall 50.43% 30.43%
But as you can see the performance obtained is still not good, i really appreciate any suggestion.
Thanks.
Eduardo
Edit: Sorry for the replicated message
Tagged:
0
Answers
To optimize them, use an Optimize Parameters (Grid) operator. Good ranges for both C and Gamma are something like 10^-5 - 10^5 on a logarithmic scale.
Best, Marius
At least now is best follow the classifier prediction (instead of doing the opposite ) The results were
accuracy 59.13% +/- 7.33%
true bad true good class precission
pred.bad 86 65 56.95%
pred.good 29 50 63.29%
class recall 74.78% 43.48%
Maybe I have to tray the movielens1m dataset.
Thanks again.