Sentiment Analysis using Wordnet Dictionary
Rapidminer textmining capabilities provide several methods for Sentiment Analysis. One of the popular methods when dealing with English text is using the wordnet dictionary and relevant operators from Rapidminer Wordnet Dictionary. This article gives an overview of doing sentiment analysis using Rapidminer and the Wordnet Dictionary.
You will also need the "Text Processing" Extension from here
You will need to download the wordnet dictionary from here
Setup steps for wordnet dictionary
The wordnet dictionary file is a file with extension "gz". You will need to use utility like 7Zip to extract it. Once you have the "WordNet-3.0.tar" file, you will unzip that further using the same 7Zip tool. You should then have a folder "Wordnet-3.0" with folders like dict, doc, include etc.
Once you have done this you should be ready to build a text mining process with Rapidminer and using the Wordnet Dictionary.
In the screen shot below we are searching twitter, then changing data type of the column we want to use for "text processing" and then passing the dataset(Exampleset) to "Process Documents from Data". You can replace the search twitter step with any datasource of your choice like database, excel files etc. If you would like to utilize files from a folder you can also use the "Process documents from files" or in case of email use the "process documents from mail store" operator
Then double click on the "Process documents from data" operator to build your text processing steps. You will add your standard text processing steps like tokenize, transform cases, filter stops words, filter tokens etc based on your specific needs. Then the two operators you need to get the sentiment score are "Open WordNet dictionary" and "Extract Sentiment(English) both coming from the Wordnet extension.
Configure the "Open Wordnet Dictionary" operator l
to select directory in the "resource type" parameter and then confugure the directory parameter to point to the ....\WordNet-3.0\dict folder
Please explore the additional help provided with the "Extract Sentiment(Dictionary)" operator to understand the various parameters.
You can also use tthe wordnet operators for Synonyms, Hyoernyms, Hyponyms to improve on your process.
This process adds a new column 'sentiment" that provides a numeric value for sentiment, Negative sentiment are scored less than zero and positive sentiments are code greater than zero.
One can use the sentiment score and "Generate Attributes" operator to flag documents as Positive, Neutral, Negative etc based on the actual score value itself
See the attached process for the complete example.
You can open the process in RapidMiner Studio using File(Menu) >> Import Process.