Sentiment Anaylsis of Twitter Data

marlene_boettch · August 2018

Hello there,

I would like to conduct a sentiment analysis of Twitter data. I already looked in the forums for a solution to my problem, but all entries I found were very incomprehensible to me. (I really am a desperate beginner with a hardly any talent for anything to do with technology)

My problem is probably quite simple: I have 10000 tweets stored in a CSV file. Now, as already mentioned, I would like to carry out a sentiment analysis. My teacher gave me a process for that. However, in this process the tweets must be read in as separate text files.

Now my question: How do I filter the tweets from the CSV file so that each tweet (and only the text, not the other information like username, ID and so on) is stored in a separate txt file?

As I said, my understanding of RapidMiner is unfortunately very limited, so I would be very grateful if someone could explain it to me as simply as possible.

Thank you very much and have a nice day
Marlene

lionelderkrikor · August 2018

Hi @marlene_boettch,

Difficult to help you without your .csv file and your process...

Can you share them in order we better understand your problem ?

Regards,

Lionel

MartinLiebig · August 2018

Hi,

Data to Document, Loop Collection, Write CSV is a way to do this. But i think you can just use Read CSV+data to Document and you got it in the format you need for your analysis.

BR,

Martin

Telcontar120 · August 2018

There are also a number of recent threads that have almost exactly the same type of problem. See the ongoing discussions here, for example, both of which have several example processes:

https://community.rapidminer.com/t5/Getting-Started-Forum/Errors-Twitter-data-Suddenly-Attribute-Label-Missing-Inside/m-p/52708#M3254

https://community.rapidminer.com/t5/Getting-Started-Forum/Non-nominal-label-the-lavel-attribute-must-be-nominal/m-p/35581#M263

marlene_boettch · August 2018

Hello to all who have answered until now!

From what has been written here so far, I have managed to bring my tweets from the CSV document into an IO Object Collection. This is now in the form of an .md file. Unfortunately I cannot use the tweets in this way because I need them as individual text files to be able to read them into the process and use them further. This is the process:

The Sentiment Analysis Process CSV file.PNG I want to get the 'text' column into seperated txt files

I'm sorry if I'm being dumb, but my poor understanding of the subject matter and my mediocre English skills just make me a little desperate.

Thanks for your help so far!

Kind regards
Marlene

MartinLiebig · August 2018

Hi @marlene_boettch,

i think you only need one Process Documents per CSV file you have. Chaining two is a bit odd.

Attached are two processes, one is separating the texts. I do not think you need this. The second is showing you what i think you need to do.

BR,

Martin

Seperating files:

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="8.1.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="85">
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
        <description align="center" color="transparent" colored="false" width="126">Read your CSV Here</description>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="85">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="text"/>
        <description align="center" color="transparent" colored="false" width="126">Select only text</description>
      </operator>
      <operator activated="true" class="text:data_to_documents" compatibility="8.1.000" expanded="true" height="68" name="Data to Documents" width="90" x="313" y="85">
        <list key="specify_weights"/>
      </operator>
      <operator activated="true" class="loop_collection" compatibility="8.1.001" expanded="true" height="68" name="Loop Collection" width="90" x="447" y="85">
        <process expanded="true">
          <operator activated="true" class="text:write_document" compatibility="8.1.000" expanded="true" height="82" name="Write Document" width="90" x="112" y="34">
            <parameter key="file" value="/my/path/%{a}"/>
            <description align="center" color="transparent" colored="false" width="126">Make sure to have a valid path here! &lt;br/&gt;&lt;br/&gt;%{a} will be replaced with the current iteration umber.&lt;br/&gt;&lt;br/&gt;E.g.: 1 in the first round, 2 in the second etc.</description>
          </operator>
          <connect from_port="single" to_op="Write Document" to_port="document"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Data to Documents" to_port="example set"/>
      <connect from_op="Data to Documents" from_port="documents" to_op="Loop Collection" to_port="collection"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

How to tokenize the csv directly:

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="8.1.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="85">
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
        <description align="center" color="transparent" colored="false" width="126">Read your CSV Here</description>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="8.1.001" expanded="true" height="82" name="Nominal to Text" width="90" x="179" y="85">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="text"/>
        <description align="center" color="transparent" colored="false" width="126">Make sure only text is of type Text</description>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="85">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">do processing in here</description>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="42"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

sgenzer · August 2018

Hello @marlene_boettch - have you tried looking at some of our community processes from within RapidMiner Studio? There are now two that do things very similar to what you are looking for:

Screen Shot 2018-08-20 at 2.53.10 PM.png

Scott

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Sentiment Anaylsis of Twitter Data

Answers