RapidMiner

how sentiment analysis by python or R

Wisdom logo Registration now open for RapidMiner Wisdom Americas | New Orleans | October 10-12, 2018   Learn More
Contributor II student_compute
Contributor II

how sentiment analysis by python or R

Hello
I want to make some tweets by Python or R sentiment Analyze .
I did preprocessing in my rapidminer program.
But I do not know how to use R or python to sentiment Analyze in the program?
Someone knows How? Or is there an example?
Any help is helpful to me.
Thanks in advance

17 REPLIES
Unicorn
Unicorn

Re: how sentiment analysis by python or R

I like to use the Vader sentiment part of the NLTK toolkit. It works pretty well with social data (sentiment analysis will always remain a bit of a challenge) and gives a bit more than the usual possitive / negative indications

 

Attached sample uses this framework, the example chops the response by sentence and gives the 'vibe' per sentence. I typically use this method to ensure also mixed data get's covered well. But of course you could also use it on the full data.

 

What I provided was like this ;

 

Review.Body Review.ID Review.Date Review.Title Review.Rating
Sound is great. Picture is bad XYZ123 Wed Aug 01 10:08:34 CEST 2018 My opinion 3.0

 

What it returns is as follows :

 

Review.ID sentence compound negative possitive neutral Review.Date Review.Title Review.Rating
XYZ123 Sound is great. 0.6249 0.0 0.672 0.328 Wed Aug 01 10:08:34 CEST 2018 My opinion 3.0
XYZ123 Picture is bad -0.5423 0.636 0.0 0.364 Wed Aug 01 10:08:34 CEST 2018 My opinion 3.0

 

The more negative or possitive the compound value (range -1 to +1), the more likely it will be that the sentiment of a given sentence is equally negative or possitive

 

 

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_data_user_specification" compatibility="8.2.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="85">
        <list key="attribute_values">
          <parameter key="Review.Body" value="&quot;Sound is great. Picture is bad&quot;"/>
          <parameter key="Review.ID" value="&quot;XYZ123&quot;"/>
          <parameter key="Review.Date" value="date_now()"/>
          <parameter key="Review.Title" value="&quot;My opinion&quot;"/>
          <parameter key="Review.Rating" value="3"/>
        </list>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="filter_example_range" compatibility="8.2.000" expanded="true" height="82" name="Filter Example Range" width="90" x="179" y="289">
        <parameter key="first_example" value="1"/>
        <parameter key="last_example" value="5"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="8.2.000" expanded="true" height="103" name="Subprocess" width="90" x="313" y="289">
        <process expanded="true">
          <operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
            <parameter key="attribute_name" value="Review.ID"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
            <description align="center" color="transparent" colored="false" width="126">define review as id</description>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="514" y="187">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value="review-body"/>
            <parameter key="attributes" value="|Review.Body"/>
            <description align="center" color="transparent" colored="false" width="126">Keep only the ones we need&lt;br&gt;&lt;br&gt;We focus only on body but we could also concatenate with the title</description>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (3)" width="90" x="916" y="289">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Review.Body"/>
            <parameter key="invert_selection" value="true"/>
            <description align="center" color="transparent" colored="false" width="126">Get all other fields</description>
          </operator>
          <operator activated="true" class="replace" compatibility="8.2.000" expanded="true" height="82" name="Replace (3)" width="90" x="916" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Review.Body"/>
            <parameter key="replace_what" value="\n"/>
            <parameter key="replace_by" value=" "/>
            <description align="center" color="transparent" colored="false" width="126">replace linebreaks as python doesn't like these too much</description>
          </operator>
          <connect from_port="in 1" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Replace (3)" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="original" to_op="Select Attributes (3)" to_port="example set input"/>
          <connect from_op="Select Attributes (3)" from_port="example set output" to_port="out 2"/>
          <connect from_op="Replace (3)" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <portSpacing port="sink_out 3" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Normalization and preparation</description>
      </operator>
      <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="447" y="34">
        <parameter key="script" value="import pandas&#10;from nltk.sentiment.vader import SentimentIntensityAnalyzer&#10;from nltk import tokenize&#10;&#10;def rm_main(data):&#10;&#10;    sent_all=[]&#10;    score_all=[]&#10;    id_all=[]&#10;    &#10;    data_split = pandas.DataFrame()&#10;    for index,row in data.iterrows():&#10;        review=row[&quot;Review.Body&quot;]&#10;        _id=row[&quot;Review.ID&quot;]&#10;        &#10;        lines_list = tokenize.sent_tokenize(review)&#10;        sid = SentimentIntensityAnalyzer()&#10;        for sentence in lines_list:&#10;&#10;            ss = sid.polarity_scores(sentence)&#10;            id_all.append(_id)&#10;            sent_all.append(sentence)&#10;            score_all.append(ss)&#10;            &#10;    data_split['Review.ID']=id_all&#10;    data_split['sentence']=sent_all&#10;    data_split['scores']=score_all&#10;&#10;    #print(data)&#10;    &#10;    return data_split"/>
        <description align="center" color="transparent" colored="false" width="126">We use nltk / vader framework to do sentiment analysis.&lt;br&gt;&lt;br&gt;Can be easily replaced with other frameworks or custom code</description>
      </operator>
      <operator activated="true" class="subprocess" compatibility="8.2.000" expanded="true" height="103" name="Subprocess (2)" width="90" x="581" y="289">
        <process expanded="true">
          <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="34">
            <list key="function_descriptions">
              <parameter key="compound" value="parse(replaceAll([scores],&quot;^.*?'compound': (-?\\d+.\\d+).*$&quot;,&quot;$1&quot;))"/>
              <parameter key="negative" value="parse(replaceAll([scores],&quot;^.*?'neg': (-?\\d+.\\d+).*$&quot;,&quot;$1&quot;))"/>
              <parameter key="possitive" value="parse(replaceAll([scores],&quot;^.*?'pos': (-?\\d+.\\d+).*$&quot;,&quot;$1&quot;))"/>
              <parameter key="neutral" value="parse(replaceAll([scores],&quot;^.*?'neu': (-?\\d+.\\d+).*$&quot;,&quot;$1&quot;))"/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="380" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="scores"/>
            <parameter key="invert_selection" value="true"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role (2)" width="90" x="514" y="34">
            <parameter key="attribute_name" value="Review.ID"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (2)" width="90" x="514" y="187">
            <list key="key_attributes"/>
          </operator>
          <connect from_port="in 1" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_port="in 2" to_op="Join (2)" to_port="right"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Join (2)" to_port="left"/>
          <connect from_op="Join (2)" from_port="join" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="source_in 3" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">post processing</description>
      </operator>
      <connect from_op="Generate Data by User Specification" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
      <connect from_op="Filter Example Range" from_port="original" to_op="Subprocess" to_port="in 1"/>
      <connect from_op="Subprocess" from_port="out 1" to_op="Execute Python" to_port="input 1"/>
      <connect from_op="Subprocess" from_port="out 2" to_op="Subprocess (2)" to_port="in 2"/>
      <connect from_op="Execute Python" from_port="output 1" to_op="Subprocess (2)" to_port="in 1"/>
      <connect from_op="Subprocess (2)" from_port="out 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <description align="center" color="yellow" colored="false" height="108" resized="true" width="736" x="102" y="497">http://www.nltk.org/_modules/nltk/sentiment/vader.html&lt;br&gt;&lt;br&gt;http://t-redactyl.io/blog/2017/04/using-vader-to-handle-sentiment-analysis-with-social-media-text.html</description>
    </process>
  </operator>
</process>

 

 

Highlighted

Re: how sentiment analysis by python or R

Hi @student_compute,

 

In addition to the solution of @kayman, I propose a Python script using the "textblob" library.

From your text attribute, this script delivers a polarity between -1 and +1 where : 

-1 (negative) < polarity < +1 (positive).

 

To execute this script, you have to set the name of your text attribute (with quotes) in the Set Macros operator : 

Spelling_Correction_3.png

 

The process : 

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA4">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.000-BETA4" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="9.0.000-BETA" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="85">
        <parameter key="connection" value="dkk"/>
        <parameter key="query" value="iphone"/>
        <parameter key="limit" value="20"/>
        <parameter key="language" value="en"/>
      </operator>
      <operator activated="true" class="set_macros" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Set Macros" width="90" x="246" y="85">
        <list key="macros">
          <parameter key="textAttribute" value="'Text'"/>
        </list>
      </operator>
      <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="380" y="85">
        <parameter key="script" value="import pandas&#10;from textblob import TextBlob&#10;&#10;textAtt = %{textAttribute}&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def sent(text) : &#10;&#10;  testimonial = TextBlob(str(text))&#10;  sentiment = testimonial.sentiment.polarity &#10;  return sentiment&#10;&#10;&#10;def rm_main(data):&#10; &#10;  data['polarity'] =data[textAtt].apply(sent)&#10;  return data  "/>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Set Macros" to_port="through 1"/>
      <connect from_op="Set Macros" from_port="through 1" to_op="Execute Python" to_port="input 1"/>
      <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Regards,

 

Lionel

 

Contributor II student_compute
Contributor II

Re: how sentiment analysis by python or R

Hi, thank you very much
How to get on points, in a new column. Insert a positive word and a negative word in front of each sentence?
Thanks a lot

Re: how sentiment analysis by python or R

Hi @student_compute,

 

I think you can use Generate Attributes and Set Data operators and eventually if needed Reorder Attributes operator.

 

Regards,

 

Lionel

Contributor II student_compute
Contributor II

Re: how sentiment analysis by python or R

Hello
Thank you so much
I used
But I do not know how to show the polarity of sentences based on scores
look
۱.JPG
Could this be the case?
Thanks

Re: how sentiment analysis by python or R

Hi @student_compute,

 

You can, indeed, create a new attribute "Pol" defined by (for example) : 

 

 - if -1 < polarity < -0,1, then Pol = "negative"

 - if -0,1 <= polarity <= 0,1, then Pol = "neutral"

 - if 0,1 < polarity < 1, then Pol = "positive"

 

Note : You can, of, course, choose and set other thresholds than -0,1 / 0,1.

 

Here the associated process : 

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA4">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.000-BETA4" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="9.0.000-BETA" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="85">
        <parameter key="connection" value="dkk"/>
        <parameter key="query" value="iphone"/>
        <parameter key="limit" value="20"/>
        <parameter key="language" value="en"/>
      </operator>
      <operator activated="true" class="set_macros" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Set Macros" width="90" x="246" y="85">
        <list key="macros">
          <parameter key="textAttribute" value="'Text'"/>
        </list>
      </operator>
      <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="380" y="85">
        <parameter key="script" value="import pandas&#10;from textblob import TextBlob&#10;&#10;textAtt = %{textAttribute}&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def sent(text) : &#10;&#10;  testimonial = TextBlob(str(text))&#10;  sentiment = testimonial.sentiment.polarity &#10;  return sentiment&#10;&#10;&#10;def rm_main(data):&#10; &#10;  data['polarity'] =data[textAtt].apply(sent)&#10;  return data  "/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="9.0.000-BETA4" expanded="true" height="82" name="Generate Attributes" width="90" x="514" y="85">
        <list key="function_descriptions">
          <parameter key="Pol" value="if(polarity&lt;-0.1,&quot;negative&quot;,if(polarity&gt;0.1,&quot;positive&quot;,&quot;neutral&quot;))"/>
        </list>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Set Macros" to_port="through 1"/>
      <connect from_op="Set Macros" from_port="through 1" to_op="Execute Python" to_port="input 1"/>
      <connect from_op="Execute Python" from_port="output 1" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Regards,

 

Lionel

Contributor II student_compute
Contributor II

Re: how sentiment analysis by python or R

Thank you so much
How to download rapidminer version 9?
Thanks

Re: how sentiment analysis by python or R

Hi @student_compute,

 

The link to download RapidMiner 9.0 Beta : 

 

http://static.rapidminer.com/rnd/html/rapidminer-9.0-preview.html

 

Regards,

 

Lionel

Contributor II student_compute
Contributor II

Re: how sentiment analysis by python or R

Hello
Thank you so much
Is there a perpelexity parameter in the new version for LDA? Or more facilities?