"Help! Weighting of Key Words per Label/Class"

Hanepah · August 2014

Hello,

thanks in advance for your help.
I have a problem with my rapidminer results.

I have a example dataset of tweets which I classify manually in to three different classes: Buy, Sell, Neutral.
I use the Naive Bayes and the k-nn Algorithm to cross validate my data. But the accuracy of data is just 40% (for buy and sell) and 70% of neutral. Thus overall I get an accuracy of nearly 60%.

My process looks very similar to the process Neill McGuigan used in his Vancouver blog. So I used tokenizing, stopword, stemming...
My data is an excel file with two columns: First, the class (nominal, lable), second the tweet.

I have two questions:
Is it possible to assign some important words to the three classes, e.g. everytime if a tweet contains "buying" that it is allocated to the buy class? Or may I weight some words more than others in one document?

Is there a maximum number of stopwords in a stopwordlist? Always if I update my own stopwordlist (it becomes longer), the process doesn't use the new one.

Do you have any ideas how I can optimize my result?
Is there another algorithm which works better using tweets?

Thanks for your help!

Kind regards!
thestony

Hanepah · August 2014

Does no one has any help?

kind regards!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Help! Weighting of Key Words per Label/Class"

Answers