The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"How to Build a Dictionary Based Sentiment Model in RapidMiner"

MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
edited June 2019 in Knowledge Base

When you want to extract a sentiment from a text you usually have three options to go

  1. Use a prelearned model like the sentiment tools from Aylien and Rosette
  2. Use a supervised learning method on annotated texts to built your own sentiment scorer
  3. Use a predefined dictionary where each word has a weight

Every method has it's pros and cons. In this post we will focus on #3 and built a model. For english language you can use Wordnet, which has it's own extension. For german you can use SentiWS

This post describes a generic way to implement a custom dictionary based scoring.

 

In this example we assume, that you have a dictionary with two coloums:

 

Word        Weight

good 1.0
bad -1.5

 

 

Where a negative Weight means a negative sentiment. From this table we would like to built a scoring function like this:

 

score = 1.0 * good - 1.5 * bad

As we can see, this is a simple linear equation. We can use simple linear regression archive our results. To do so we need to prepare the table. First of all we need to invert all weights - this can be done using a Generate Attributes operator.

 

The next step is to bring the table into a form like this

good    bad
1.0 0
0 -1.5

This is in pinciple a task for the Pivot operator. We combine this with a GenerateID operator to get a unique group key and with a Rename by Replacing to get the correct naming conventions. A Replace Missing Values operator allows us to replace all missing values with zeros.

 

Prep.png

 

 

 The next step is to generate a label attribute. For this task we use a Generate Attributes and a Set Role operator. The resulting example set looks like this.

PreparedExampleSet.png

On this example set we learn a Vector Linear Regression to get a model with our desired equation.

 

Equation.png

 

This model can be used on texts. These texts can be transformed into the right shape using Process Documents (from Data), Tokenize and Transform Cases. An example is shown in the attached process.

- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany

Comments

  • ar4oar4o Member Posts: 8 Contributor II

    Can you please reupload the process? Cannot download it.

  • hatem_alhasanihatem_alhasani Member Posts: 5 Contributor I

     

    first of all thanks for a good example,

    i add new value to dictionary for example

    Text = "best" with weight "1.0"

    without change the test set, i get the error "The input ExampleSet does not match the training ExampleSet. Missing Attribute: ''best".

    how to solve this problem?

Sign In or Register to comment.