how sentiment analysis by python or R

student_computestudent_compute Member Posts: 73 Contributor II
edited December 2018 in Help

Hello
I want to make some tweets by Python or R sentiment Analyze .
I did preprocessing in my rapidminer program.
But I do not know how to use R or python to sentiment Analyze in the program?
Someone knows How? Or is there an example?
Any help is helpful to me.
Thanks in advance

Answers

  • kaymankayman Member Posts: 662 Unicorn

    I like to use the Vader sentiment part of the NLTK toolkit. It works pretty well with social data (sentiment analysis will always remain a bit of a challenge) and gives a bit more than the usual possitive / negative indications

     

    Attached sample uses this framework, the example chops the response by sentence and gives the 'vibe' per sentence. I typically use this method to ensure also mixed data get's covered well. But of course you could also use it on the full data.

     

    What I provided was like this ;

     

    Review.Body Review.ID Review.Date Review.Title Review.Rating
    Sound is great. Picture is bad XYZ123 Wed Aug 01 10:08:34 CEST 2018 My opinion 3.0

     

    What it returns is as follows :

     

    Review.ID sentence compound negative possitive neutral Review.Date Review.Title Review.Rating
    XYZ123 Sound is great. 0.6249 0.0 0.672 0.328 Wed Aug 01 10:08:34 CEST 2018 My opinion 3.0
    XYZ123 Picture is bad -0.5423 0.636 0.0 0.364 Wed Aug 01 10:08:34 CEST 2018 My opinion 3.0

     

    The more negative or possitive the compound value (range -1 to +1), the more likely it will be that the sentiment of a given sentence is equally negative or possitive

     

     

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="8.2.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="85">
    <list key="attribute_values">
    <parameter key="Review.Body" value="&quot;Sound is great. Picture is bad&quot;"/>
    <parameter key="Review.ID" value="&quot;XYZ123&quot;"/>
    <parameter key="Review.Date" value="date_now()"/>
    <parameter key="Review.Title" value="&quot;My opinion&quot;"/>
    <parameter key="Review.Rating" value="3"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="filter_example_range" compatibility="8.2.000" expanded="true" height="82" name="Filter Example Range" width="90" x="179" y="289">
    <parameter key="first_example" value="1"/>
    <parameter key="last_example" value="5"/>
    </operator>
    <operator activated="true" class="subprocess" compatibility="8.2.000" expanded="true" height="103" name="Subprocess" width="90" x="313" y="289">
    <process expanded="true">
    <operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
    <parameter key="attribute_name" value="Review.ID"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    <description align="center" color="transparent" colored="false" width="126">define review as id</description>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="514" y="187">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value="review-body"/>
    <parameter key="attributes" value="|Review.Body"/>
    <description align="center" color="transparent" colored="false" width="126">Keep only the ones we need&lt;br&gt;&lt;br&gt;We focus only on body but we could also concatenate with the title</description>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (3)" width="90" x="916" y="289">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Review.Body"/>
    <parameter key="invert_selection" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Get all other fields</description>
    </operator>
    <operator activated="true" class="replace" compatibility="8.2.000" expanded="true" height="82" name="Replace (3)" width="90" x="916" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Review.Body"/>
    <parameter key="replace_what" value="\n"/>
    <parameter key="replace_by" value=" "/>
    <description align="center" color="transparent" colored="false" width="126">replace linebreaks as python doesn't like these too much</description>
    </operator>
    <connect from_port="in 1" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Replace (3)" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="original" to_op="Select Attributes (3)" to_port="example set input"/>
    <connect from_op="Select Attributes (3)" from_port="example set output" to_port="out 2"/>
    <connect from_op="Replace (3)" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    <portSpacing port="sink_out 3" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Normalization and preparation</description>
    </operator>
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="447" y="34">
    <parameter key="script" value="import pandas&#10;from nltk.sentiment.vader import SentimentIntensityAnalyzer&#10;from nltk import tokenize&#10;&#10;def rm_main(data):&#10;&#10; sent_all=[]&#10; score_all=[]&#10; id_all=[]&#10; &#10; data_split = pandas.DataFrame()&#10; for index,row in data.iterrows():&#10; review=row[&quot;Review.Body&quot;]&#10; _id=row[&quot;Review.ID&quot;]&#10; &#10; lines_list = tokenize.sent_tokenize(review)&#10; sid = SentimentIntensityAnalyzer()&#10; for sentence in lines_list:&#10;&#10; ss = sid.polarity_scores(sentence)&#10; id_all.append(_id)&#10; sent_all.append(sentence)&#10; score_all.append(ss)&#10; &#10; data_split['Review.ID']=id_all&#10; data_split['sentence']=sent_all&#10; data_split['scores']=score_all&#10;&#10; #print(data)&#10; &#10; return data_split"/>
    <description align="center" color="transparent" colored="false" width="126">We use nltk / vader framework to do sentiment analysis.&lt;br&gt;&lt;br&gt;Can be easily replaced with other frameworks or custom code</description>
    </operator>
    <operator activated="true" class="subprocess" compatibility="8.2.000" expanded="true" height="103" name="Subprocess (2)" width="90" x="581" y="289">
    <process expanded="true">
    <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="34">
    <list key="function_descriptions">
    <parameter key="compound" value="parse(replaceAll([scores],&quot;^.*?'compound': (-?\\d+.\\d+).*$&quot;,&quot;$1&quot;))"/>
    <parameter key="negative" value="parse(replaceAll([scores],&quot;^.*?'neg': (-?\\d+.\\d+).*$&quot;,&quot;$1&quot;))"/>
    <parameter key="possitive" value="parse(replaceAll([scores],&quot;^.*?'pos': (-?\\d+.\\d+).*$&quot;,&quot;$1&quot;))"/>
    <parameter key="neutral" value="parse(replaceAll([scores],&quot;^.*?'neu': (-?\\d+.\\d+).*$&quot;,&quot;$1&quot;))"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="380" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="scores"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role (2)" width="90" x="514" y="34">
    <parameter key="attribute_name" value="Review.ID"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (2)" width="90" x="514" y="187">
    <list key="key_attributes"/>
    </operator>
    <connect from_port="in 1" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_port="in 2" to_op="Join (2)" to_port="right"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
    <connect from_op="Set Role (2)" from_port="example set output" to_op="Join (2)" to_port="left"/>
    <connect from_op="Join (2)" from_port="join" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="source_in 3" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">post processing</description>
    </operator>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
    <connect from_op="Filter Example Range" from_port="original" to_op="Subprocess" to_port="in 1"/>
    <connect from_op="Subprocess" from_port="out 1" to_op="Execute Python" to_port="input 1"/>
    <connect from_op="Subprocess" from_port="out 2" to_op="Subprocess (2)" to_port="in 2"/>
    <connect from_op="Execute Python" from_port="output 1" to_op="Subprocess (2)" to_port="in 1"/>
    <connect from_op="Subprocess (2)" from_port="out 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <description align="center" color="yellow" colored="false" height="108" resized="true" width="736" x="102" y="497">http://www.nltk.org/_modules/nltk/sentiment/vader.html&lt;br&gt;&lt;br&gt;http://t-redactyl.io/blog/2017/04/using-vader-to-handle-sentiment-analysis-with-social-media-text.html</description&gt;
    </process>
    </operator>
    </process>

     

     

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @student_compute,

     

    In addition to the solution of @kayman, I propose a Python script using the "textblob" library.

    From your text attribute, this script delivers a polarity between -1 and +1 where : 

    -1 (negative) < polarity < +1 (positive).

     

    To execute this script, you have to set the name of your text attribute (with quotes) in the Set Macros operator : 

    Spelling_Correction_3.png

     

    The process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA4">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.000-BETA4" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="9.0.000-BETA" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="85">
    <parameter key="connection" value="dkk"/>
    <parameter key="query" value="iphone"/>
    <parameter key="limit" value="20"/>
    <parameter key="language" value="en"/>
    </operator>
    <operator activated="true" class="set_macros" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Set Macros" width="90" x="246" y="85">
    <list key="macros">
    <parameter key="textAttribute" value="'Text'"/>
    </list>
    </operator>
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="380" y="85">
    <parameter key="script" value="import pandas&#10;from textblob import TextBlob&#10;&#10;textAtt = %{textAttribute}&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def sent(text) : &#10;&#10; testimonial = TextBlob(str(text))&#10; sentiment = testimonial.sentiment.polarity &#10; return sentiment&#10;&#10;&#10;def rm_main(data):&#10; &#10; data['polarity'] =data[textAtt].apply(sent)&#10; return data "/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Set Macros" to_port="through 1"/>
    <connect from_op="Set Macros" from_port="through 1" to_op="Execute Python" to_port="input 1"/>
    <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Regards,

     

    Lionel

     

  • student_computestudent_compute Member Posts: 73 Contributor II

    Hi, thank you very much
    How to get on points, in a new column. Insert a positive word and a negative word in front of each sentence?
    Thanks a lot

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @student_compute,

     

    I think you can use Generate Attributes and Set Data operators and eventually if needed Reorder Attributes operator.

     

    Regards,

     

    Lionel

  • student_computestudent_compute Member Posts: 73 Contributor II

    Hello
    Thank you so much
    I used
    But I do not know how to show the polarity of sentences based on scores
    look
    ۱.JPG
    Could this be the case?
    Thanks

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @student_compute,

     

    You can, indeed, create a new attribute "Pol" defined by (for example) : 

     

     - if -1 < polarity < -0,1, then Pol = "negative"

     - if -0,1 <= polarity <= 0,1, then Pol = "neutral"

     - if 0,1 < polarity < 1, then Pol = "positive"

     

    Note : You can, of, course, choose and set other thresholds than -0,1 / 0,1.

     

    Here the associated process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA4">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.000-BETA4" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="9.0.000-BETA" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="85">
    <parameter key="connection" value="dkk"/>
    <parameter key="query" value="iphone"/>
    <parameter key="limit" value="20"/>
    <parameter key="language" value="en"/>
    </operator>
    <operator activated="true" class="set_macros" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Set Macros" width="90" x="246" y="85">
    <list key="macros">
    <parameter key="textAttribute" value="'Text'"/>
    </list>
    </operator>
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="380" y="85">
    <parameter key="script" value="import pandas&#10;from textblob import TextBlob&#10;&#10;textAtt = %{textAttribute}&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def sent(text) : &#10;&#10; testimonial = TextBlob(str(text))&#10; sentiment = testimonial.sentiment.polarity &#10; return sentiment&#10;&#10;&#10;def rm_main(data):&#10; &#10; data['polarity'] =data[textAtt].apply(sent)&#10; return data "/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate Attributes" width="90" x="514" y="85">
    <list key="function_descriptions">
    <parameter key="Pol" value="if(polarity&lt;-0.1,&quot;negative&quot;,if(polarity&gt;0.1,&quot;positive&quot;,&quot;neutral&quot;))"/>
    </list>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Set Macros" to_port="through 1"/>
    <connect from_op="Set Macros" from_port="through 1" to_op="Execute Python" to_port="input 1"/>
    <connect from_op="Execute Python" from_port="output 1" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Regards,

     

    Lionel

  • student_computestudent_compute Member Posts: 73 Contributor II

    Thank you so much
    How to download rapidminer version 9?
    Thanks

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @student_compute,

     

    The link to download RapidMiner 9.0 Beta : 

     

    http://static.rapidminer.com/rnd/html/rapidminer-9.0-preview.html

     

    Regards,

     

    Lionel

  • student_computestudent_compute Member Posts: 73 Contributor II

    Hello
    Thank you so much
    Is there a perpelexity parameter in the new version for LDA? Or more facilities?

  • student_computestudent_compute Member Posts: 73 Contributor II

    Hellokeyman
    I've used your code. But he did not know the package NLTK
    How do I download this package and introduce RapidMiner
    I use Anacanda. I installed the textblob package but I can not package it
    May I help how to do to install?

    Thank you

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @student_compute,

     

    Yes, there is Perplexity as one of performance measure in the last version of LDA.

     

    Regards,

     

    Lionel

  • student_computestudent_compute Member Posts: 73 Contributor II

    lionelderkrikor dear
    Thank you:heart:

    ------------------

    Hellokeyman
    I've used your code. But he did not know the package NLTK
    How do I download this package and introduce RapidMiner
    I use Anacanda. I installed the textblob package but I can not package it
    May I help how to do to install?

    Thank you

  • student_computestudent_compute Member Posts: 73 Contributor II

    Hello

    How to install nltk package and use it? The program has an error that this package does not exist !! Thankful

    And

    I downloaded and run RapidMiner 9. But I do not know how to find Perplexity mesure for assessing LDA? Does anyone know?

    Thank

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @student_compute,

    "How to install nltk package and use it?"

    Lauch the windows "invite de commande" (type "cmd" in the search bar of Windows 10) and type de following command : pip install nltk

     

    "But I do not know how to find Perplexity mesure for assessing LDA"

    Connect the per output port of LDA operator to the res port

     

    Regards,

     

    Lionel

     

  • student_computestudent_compute Member Posts: 73 Contributor II

    Hello
    Thank you
    Thank you

    .

    .

    .
    Excuse me about perpelexity in the LDA may I send a sample shot screenshot?
    Thankful

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi,

     

    Here the screenshots relativ to LDA : 

    LDA.png

     

    LDA_2.png

     

    Regards,

     

    Lionel

  • student_computestudent_compute Member Posts: 73 Contributor II

    Hello

    Thank you so much

    Just might say
    What other things do they use?
    I mean avgs ???

  • student_computestudent_compute Member Posts: 73 Contributor II

    Hello Sorry, I raised the topic again I tried a lot. Do nltk I installed it. But there is an error in the run. Which I myself could not solve. Can anyone help me? And about The amounts of AVGs reported on the LDA output can be explained to me. What is their use? Thanks for all your help

    Capture25.jpg

     

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @student_compute,

     

     

    Can you share your process in order we can reproduce your bug ?

     

    Try to add in the Python script after the others nltk.download('xxxxxx') :

    nltk.download('vader_lexicon')

    and execute the process one time.

     

    Regards,

     

    Lionel

Sign In or Register to comment.