The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
"Help! Weighting of Key Words per Label/Class"
Hello,
thanks in advance for your help.
I have a problem with my rapidminer results.
I have a example dataset of tweets which I classify manually in to three different classes: Buy, Sell, Neutral.
I use the Naive Bayes and the k-nn Algorithm to cross validate my data. But the accuracy of data is just 40% (for buy and sell) and 70% of neutral. Thus overall I get an accuracy of nearly 60%.
My process looks very similar to the process Neill McGuigan used in his Vancouver blog. So I used tokenizing, stopword, stemming...
My data is an excel file with two columns: First, the class (nominal, lable), second the tweet.
I have two questions:
Is it possible to assign some important words to the three classes, e.g. everytime if a tweet contains "buying" that it is allocated to the buy class? Or may I weight some words more than others in one document?
Is there a maximum number of stopwords in a stopwordlist? Always if I update my own stopwordlist (it becomes longer), the process doesn't use the new one.
Do you have any ideas how I can optimize my result?
Is there another algorithm which works better using tweets?
Thanks for your help!
Kind regards!
thestony
thanks in advance for your help.
I have a problem with my rapidminer results.
I have a example dataset of tweets which I classify manually in to three different classes: Buy, Sell, Neutral.
I use the Naive Bayes and the k-nn Algorithm to cross validate my data. But the accuracy of data is just 40% (for buy and sell) and 70% of neutral. Thus overall I get an accuracy of nearly 60%.
My process looks very similar to the process Neill McGuigan used in his Vancouver blog. So I used tokenizing, stopword, stemming...
My data is an excel file with two columns: First, the class (nominal, lable), second the tweet.
I have two questions:
Is it possible to assign some important words to the three classes, e.g. everytime if a tweet contains "buying" that it is allocated to the buy class? Or may I weight some words more than others in one document?
Is there a maximum number of stopwords in a stopwordlist? Always if I update my own stopwordlist (it becomes longer), the process doesn't use the new one.
Do you have any ideas how I can optimize my result?
Is there another algorithm which works better using tweets?
Thanks for your help!
Kind regards!
thestony
Tagged:
0
Answers
kind regards!