🎉 🎉   RAPIDMINER 9.5 BETA IS OUT!!!   🎉 🎉
GRAB THE HOTTEST NEW BETA OF RAPIDMINER STUDIO, SERVER, AND RADOOP. LET US KNOW WHAT YOU THINK!
🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤
We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.
Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!
"How to Build a Dictionary Based Sentiment Model in RapidMiner"
When you want to extract a sentiment from a text you usually have three options to go
- Use a prelearned model like the sentiment tools from Aylien and Rosette
- Use a supervised learning method on annotated texts to built your own sentiment scorer
- Use a predefined dictionary where each word has a weight
This post describes a generic way to implement a custom dictionary based scoring.
In this example we assume, that you have a dictionary with two coloums:
Where a negative Weight means a negative sentiment. From this table we would like to built a scoring function like this:
score = 1.0 * good - 1.5 * bad
As we can see, this is a simple linear equation. We can use simple linear regression archive our results. To do so we need to prepare the table. First of all we need to invert all weights - this can be done using a Generate Attributes operator.
The next step is to bring the table into a form like this
This is in pinciple a task for the Pivot operator. We combine this with a GenerateID operator to get a unique group key and with a Rename by Replacing to get the correct naming conventions. A Replace Missing Values operator allows us to replace all missing values with zeros.
The next step is to generate a label attribute. For this task we use a Generate Attributes and a Set Role operator. The resulting example set looks like this.
On this example set we learn a Vector Linear Regression to get a model with our desired equation.
This model can be used on texts. These texts can be transformed into the right shape using Process Documents (from Data), Tokenize and Transform Cases. An example is shown in the attached process.