Sentiment Analysis using Wordnet Dictionary

bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist
edited November 2018 in Knowledge Base

 

Rapidminer textmining capabilities provide several methods for Sentiment Analysis. One of the popular methods when dealing with English text is using the wordnet dictionary and relevant operators from Rapidminer Wordnet Dictionary. This article gives an overview of doing sentiment analysis using Rapidminer and the Wordnet Dictionary.

 

Prerequisites

You will need to download and install the "Wordnet Extension" from here

You will also need the "Text Processing" Extension from here

You will need to download the wordnet dictionary from here 

 

Setup steps for wordnet dictionary

The wordnet dictionary file is a file with extension "gz". You will need to use utility like 7Zip to extract it. Once you have the "WordNet-3.0.tar" file, you will unzip that further using the same 7Zip tool. You should then have a folder "Wordnet-3.0" with folders like dict, doc, include etc.

 

Once you have done this you should be ready to build a text mining process with Rapidminer and using the Wordnet Dictionary.

In the screen shot below we are searching twitter, then changing data type of the column we want to use for "text processing" and then passing the dataset(Exampleset) to "Process Documents from Data". You can replace the search twitter step with any datasource of your choice like database, excel files etc. If you would like to utilize files from a folder you can also use the "Process documents from files" or in case of email use the "process documents from mail store" operator

wordnet sentiment analysis.png

Then double click on the "Process documents from data" operator to build your text processing steps. You will add your standard text processing steps like tokenize, transform cases, filter stops words, filter tokens etc based on your specific needs. Then the two operators you need to get the sentiment score are "Open WordNet dictionary" and "Extract Sentiment(English) both coming from the Wordnet extension.

 

Configure the "Open Wordnet Dictionary" operator l

to select directory in the "resource type" parameter and then confugure the directory parameter to point to the ....\WordNet-3.0\dict folder

processdocumentdetails.png

Please explore the additional help provided with the "Extract Sentiment(Dictionary)" operator to understand the various parameters.

You can also use tthe wordnet operators for Synonyms, Hyoernyms, Hyponyms to improve on your process.

 

This process adds a new column 'sentiment" that provides a numeric value for sentiment, Negative sentiment are scored less than zero and positive sentiments are code greater than zero.

One can use the sentiment score and "Generate Attributes" operator to flag documents as Positive, Neutral, Negative etc based on the actual score value itself 

 

See the attached process for the complete example.

You can open the process in RapidMiner Studio using File(Menu) >> Import Process.

 

 

 

 

 

 

Comments

  • jaijai Member Posts: 2 Contributor I

    Facing following issue.. If anyone can adress that would be really great


    @bhupendra_patil wrote:

     

    Rapidminer textmining capabilities provide several methods for Sentiment Analysis. One of the popular methods when dealing with English text is using the wordnet dictionary and relevant operators from Rapidminer Wordnet Dictionary. This article gives an overview of doing sentiment analysis using Rapidminer and the Wordnet Dictionary.

     

    Prerequisites

    You will need to download and install the "Wordnet Extension" from here

    You will also need the "Text Processing" Extension from here

    You will need to download the wordnet dictionary from here 

     

    Setup steps for wordnet dictionary

    The wordnet dictionary file is a file with extension "gz". You will need to use utility like 7Zip to extract it. Once you have the "WordNet-3.0.tar" file, you will unzip that further using the same 7Zip tool. You should then have a folder "Wordnet-3.0" with folders like dict, doc, include etc.

     

    Once you have done this you should be ready to build a text mining process with Rapidminer and using the Wordnet Dictionary.

    In the screen shot below we are searching twitter, then changing data type of the column we want to use for "text processing" and then passing the dataset(Exampleset) to "Process Documents from Data". You can replace the search twitter step with any datasource of your choice like database, excel files etc. If you would like to utilize files from a folder you can also use the "Process documents from files" or in case of email use the "process documents from mail store" operator

    wordnet sentiment analysis.png

    Then double click on the "Process documents from data" operator to build your text processing steps. You will add your standard text processing steps like tokenize, transform cases, filter stops words, filter tokens etc based on your specific needs. Then the two operators you need to get the sentiment score are "Open WordNet dictionary" and "Extract Sentiment(English) both coming from the Wordnet extension.

     

    Configure the "Open Wordnet Dictionary" operator l

    to select directory in the "resource type" parameter and then confugure the directory parameter to point to the ....\WordNet-3.0\dict folder

    processdocumentdetails.png

    Please explore the additional help provided with the "Extract Sentiment(Dictionary)" operator to understand the various parameters.

    You can also use tthe wordnet operators for Synonyms, Hyoernyms, Hyponyms to improve on your process.

     

    This process adds a new column 'sentiment" that provides a numeric value for sentiment, Negative sentiment are scored less than zero and positive sentiments are code greater than zero.

    One can use the sentiment score and "Generate Attributes" operator to flag documents as Positive, Neutral, Negative etc based on the actual score value itself 

     

    See the attached process for the complete example.

    You can open the process in RapidMiner Studio using File(Menu) >> Import Process.

     

     

     

     

     

     



     

    Screenshot from 2016-10-20 16_30_43.png

     

  • aluna04aluna04 Member Posts: 1 Contributor I

    Hi,

    I also have the same problem. Hope someone can help.

    Thx!

  • KostasBonikosKostasBonikos Member Posts: 25 Maven
    The trick is to first uncompress the archive completely and then navigate to the 'dict' folder, where the dictionary is stored.


    So, in the "Open WordNet Dictionary" operator, in the "directory" option, you have to put something like: "C:\...\WordNet-3.0\dict"
  • rtbarberrtbarber Member, University Professor Posts: 8 University Professor

    Note: you cannot open the wordnet dictionary IN your loop - it trys to open it multiple times and fails.  Follow the instructions from @awchisholm in http://community.rapidminer.com/t5/forums/v3_1/forumtopicpage/board-id/Studio/thread-id/15219/page/4 to resolve that issue.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi...following up on this KB. Can someone explain what "hyponyms" and "hypernyms" are with examples? I'm having a hard time getting my head around them. @bhupendra_patil@Thomas_Ott ?

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    So a hyponym tries to group a word into it's higher level taxonomy. Like knife is part of cutlery. The same goes for spoon, it's part of cutlery. Here's a great example: https://en.wikipedia.org/wiki/Hyponymy_and_hypernymy

     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    ah that's perfect - exactly what I'm looking for. Thanks, @Thomas_Ott. Have you used those operators from the Wordnet extension? I'm trying to experiment and I can load the dictionary but cannot get any kind of result. Like this:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="wordnet:open_wordnet_dictionary" compatibility="5.3.000" expanded="true" height="68" name="Open WordNet Dictionary" width="90" x="715" y="442">
    <parameter key="directory" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/Projets Secondaires/Drift/WordNet-3.0/dict"/>
    </operator>
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="715" y="289">
    <parameter key="text" value="cost budget money currency"/>
    </operator>
    <operator activated="true" class="wordnet:find_hyponym_wordnet" compatibility="5.3.000" expanded="true" height="82" name="Find Hyponyms (WordNet)" width="90" x="916" y="340">
    <parameter key="use_prefix" value="false"/>
    <parameter key="keep_original_tokens" value="true"/>
    </operator>
    <connect from_op="Open WordNet Dictionary" from_port="dictionary" to_op="Find Hyponyms (WordNet)" to_port="dictionary"/>
    <connect from_op="Create Document" from_port="output" to_op="Find Hyponyms (WordNet)" to_port="document"/>
    <connect from_op="Find Hyponyms (WordNet)" from_port="document" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @sgenzer yes, I've used this extension quite a bit but since I've moved machines i haven't had a chance to reinstall the Wordnet libraries. This extension is quite nice, it gives users access to some powerful sentiment capabilities but it's often overlooked and underused IMHO.

  • ahootanhaahootanha Member Posts: 69 Contributor I

    Someone knows how to solve the problem?wo1.JPG

  • ethanlakemanethanlakeman Member Posts: 2 Contributor I

     Hey,   

     

    Is there any way to remove certain words or terms from text contained within a Excel file then save a new version of the file with the same layout but with these words removed?  

     

    I am in the process of analysing the text content of Tweets for language analysis and I want to remove external links (https) and tags (@...) before I run it through a different software.   

    I have used data to documents, tokenize and delete document parts to find specific word frequencies and remove the above but I was wondering if I could then generate a new excel file with these words removed.   

     

    Thanks,

     

    Ethan.  

  • ruuby815ruuby815 Member Posts: 1 Contributor I

    Hi,

     

    I have a question concerning extract sentiment (WordNet). In the result window, I only found one sentiment score for the whole document, can I expect to find scores for each row instead of just one score for the whole document?

     

    Thank you.

    螢幕快照 2018-08-25 下午2.46.35.png

Sign In or Register to comment.