Sentiment Anaylsis of Twitter Data

marlene_boettchmarlene_boettch Member Posts: 2 Contributor I
edited December 2018 in Help

Hello there,

I would like to conduct a sentiment analysis of Twitter data. I already looked in the forums for a solution to my problem, but all entries I found were very incomprehensible to me. (I really am a desperate beginner with a hardly any talent for anything to do with technology)

My problem is probably quite simple: I have 10000 tweets stored in a CSV file. Now, as already mentioned, I would like to carry out a sentiment analysis. My teacher gave me a process for that. However, in this process the tweets must be read in as separate text files.

Now my question: How do I filter the tweets from the CSV file so that each tweet (and only the text, not the other information like username, ID and so on) is stored in a separate txt file?

As I said, my understanding of RapidMiner is unfortunately very limited, so I would be very grateful if someone could explain it to me as simply as possible. :)

Thank you very much and have a nice day
Marlene

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 716   Unicorn

    Hi @marlene_boettch,

     

    Difficult to help you without your .csv file and your process...

    Can you share them in order we better understand your problem ?

     

    Regards,

     

    Lionel

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,029  RM Data Scientist

    Hi,

    Data to Document, Loop Collection, Write CSV is a way to do this. But i think you can just use Read CSV+data to Document and you got it in the format you need for your analysis.

     

    BR,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,125   Unicorn

    There are also a number of recent threads that have almost exactly the same type of problem.  See the ongoing discussions here, for example, both of which have several example processes:

    https://community.rapidminer.com/t5/Getting-Started-Forum/Errors-Twitter-data-Suddenly-Attribute-Label-Missing-Inside/m-p/52708#M3254

    https://community.rapidminer.com/t5/Getting-Started-Forum/Non-nominal-label-the-lavel-attribute-must-be-nominal/m-p/35581#M263

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • marlene_boettchmarlene_boettch Member Posts: 2 Contributor I

    Hello to all who have answered until now!

    From what has been written here so far, I have managed to bring my tweets from the CSV document into an IO Object Collection. This is now in the form of an .md file. Unfortunately I cannot use the tweets in this way because I need them as individual text files to be able to read them into the process and use them further. This is the process:

    Process.PNGThe Sentiment Analysis ProcessCSV file.PNGI want to get the 'text' column into seperated txt files

    I'm sorry if I'm being dumb, but my poor understanding of the subject matter and my mediocre English skills just make me a little desperate.

    Thanks for your help so far! :)

    Kind regards
    Marlene

     

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,029  RM Data Scientist

    Hi @marlene_boettch,

     

    i think you only need one Process Documents per CSV file you have. Chaining two is a bit odd.

     

    Attached are two processes, one is separating the texts. I do not think you need this. The second is showing you what i think you need to do.

     

    BR,

    Martin

     

    Seperating files:

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="85">
    <list key="annotations"/>
    <list key="data_set_meta_data_information"/>
    <description align="center" color="transparent" colored="false" width="126">Read your CSV Here</description>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="text"/>
    <description align="center" color="transparent" colored="false" width="126">Select only text</description>
    </operator>
    <operator activated="true" class="text:data_to_documents" compatibility="8.1.000" expanded="true" height="68" name="Data to Documents" width="90" x="313" y="85">
    <list key="specify_weights"/>
    </operator>
    <operator activated="true" class="loop_collection" compatibility="8.1.001" expanded="true" height="68" name="Loop Collection" width="90" x="447" y="85">
    <process expanded="true">
    <operator activated="true" class="text:write_document" compatibility="8.1.000" expanded="true" height="82" name="Write Document" width="90" x="112" y="34">
    <parameter key="file" value="/my/path/%{a}"/>
    <description align="center" color="transparent" colored="false" width="126">Make sure to have a valid path here! &lt;br/&gt;&lt;br/&gt;%{a} will be replaced with the current iteration umber.&lt;br/&gt;&lt;br/&gt;E.g.: 1 in the first round, 2 in the second etc.</description>
    </operator>
    <connect from_port="single" to_op="Write Document" to_port="document"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Data to Documents" to_port="example set"/>
    <connect from_op="Data to Documents" from_port="documents" to_op="Loop Collection" to_port="collection"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    </process>
    </operator>
    </process>

    How to tokenize the csv directly:

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="85">
    <list key="annotations"/>
    <list key="data_set_meta_data_information"/>
    <description align="center" color="transparent" colored="false" width="126">Read your CSV Here</description>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.1.001" expanded="true" height="82" name="Nominal to Text" width="90" x="179" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="text"/>
    <description align="center" color="transparent" colored="false" width="126">Make sure only text is of type Text</description>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="85">
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">do processing in here</description>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="42"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,279  Community Manager

    Hello @marlene_boettch - have you tried looking at some of our community processes from within RapidMiner Studio? There are now two that do things very similar to what you are looking for:

     

    Screen Shot 2018-08-20 at 2.53.10 PM.png

     

    Scott

Sign In or Register to comment.