RapidMiner 9.8 Beta is now available

Be one of the first to get your hands on the new features. More details and downloads here:

GET RAPIDMINER 9.8 BETA

Comparing movie perfomance

faizharry4faizharry4 Member Posts: 5 Contributor I
edited November 2018 in Help

hi...im doing a project in rapid miner using search twitter and sentiment analysis...im trying to find a way to prove that marvel movies is better than dc movies and also im trying to extract new attributes from the data that been collected. for example, what kinds of words (common words) that used to describe the avengers. what are the word that used to describe the positive, negative, neutral. so far..i have no idea how to do that...i already collected the data using the seacrh twitter and sentiment analysis...but the later part..is a puzzler...can you please help me

Tagged:

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    @faizharry4 that's an interesting problem. It'll be hard comparing sentiment for Spiderman tweets vs Superman tweets. Have you thought about extract the sentiment scores for DC vs Marvel movies and doing a weight rolling average. Like 1000 pos / 20,000 tweet for DC vs 500 pos / 6000 tweets for Marvel, doing it per day and trending it? This way you might be able to see a rate of change before and after a movie is released?

  • faizharry4faizharry4 Member Posts: 5 Contributor I

    basically im trying to compare between infinity wars vs justice league....what i have done now is basically retrieving data from twitter using search twitter and then using aylien to analyze sentiment then using data to documents and then use categorize (document) followed by documents to data operator and finally write excel to store the data that being retrieved...so now i have 200 tweets for each movie... and then im stuck for the next move...which is how to compare the two movies....

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    @faizharry4 200 tweets for each movie sounds awfully low. Maybe start generating a Wordlist for each movie and see what are the most common words used to describe each movie?

  • faizharry4faizharry4 Member Posts: 5 Contributor I

    @Thomas_Ott the 200 tweet is only for the startup before it being expanded...i will add on no of tweet once i have figured out the soluton...anyway...as you suggested...how to generate a Wordlist for each movie and see what are the most common words used to describe each movie in rapcan we id miner?

     

    and can we import data directly from metacrtitics, imdb, rotten tomatoes so that i can compare the perfomance of the two movies and then import other data from any website that has the gross of both film?

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    @faizharry4 use the Process Documents from Data operator, embed a tokenizer and other text processing operatprs. Then output the WOR port.

    sgenzer
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    @faizharry4 also, you can get IMDB and Rotten Tomato info from using the Web Mining extension, you just have to create the process. 

    sgenzer
  • faizharry4faizharry4 Member Posts: 5 Contributor I

    @Thomas_Ott thanks....i have try to create a process for the word count...but i come to blank...i try to do a word associaton...which word is associated with polarity of positive, negative and neutral but the result is empty1.png2.png3.png

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    @faizharry4 you need the Process Documents from Data operator, not Process Documents from Files. 

     

    Also you will need probably use a Nominal to Text conversion operator. 

    sgenzer
  • faizharry4faizharry4 Member Posts: 5 Contributor I

    @Thomas_Ott i've tried other method...but it seems my luck is not there...still wont give the result that i want...using sentiment analysis, it categorized the polarity based on the tweet...is it possible to find out the word that being associated with the neutral, positive and negative?4.png5.png6.png

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    @faizharry4 if you're passing the sentiment into the process documents operator, try setting it as a label role. Or, if you are using the Extract Sentiment operator and set the Vector Creation to Binary Occurances you can output the EXA port and see the sentiment for the tweet ID and what word its attached too/

     

    Like so:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="8.1.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
    <parameter key="connection" value="Twitter"/>
    <parameter key="query" value="DonaldTrump"/>
    <parameter key="language" value="en"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Text"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="8.2.000" expanded="true" height="82" name="Nominal to Text" width="90" x="380" y="34"/>
    <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="313" y="187">
    <parameter key="vector_creation" value="Binary Term Occurrences"/>
    <parameter key="prune_method" value="percentual"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="34"/>
    <operator activated="true" class="wordnet:open_wordnet_dictionary" compatibility="5.3.000" expanded="true" height="68" name="Open WordNet Dictionary" width="90" x="112" y="136">
    <parameter key="directory" value="C:\Users\TomOtt\OneDrive\wordnet\WordNet-3.0\dict"/>
    </operator>
    <operator activated="true" class="wordnet:stem_wordnet" compatibility="5.3.000" expanded="true" height="82" name="Stem (WordNet)" width="90" x="313" y="34"/>
    <operator activated="true" class="wordnet:find_sentiment_wordnet" compatibility="5.3.000" expanded="true" height="82" name="Extract Sentiment (English)" width="90" x="447" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_op="Stem (WordNet)" to_port="document"/>
    <connect from_op="Open WordNet Dictionary" from_port="dictionary" to_op="Stem (WordNet)" to_port="dictionary"/>
    <connect from_op="Stem (WordNet)" from_port="document" to_op="Extract Sentiment (English)" to_port="document"/>
    <connect from_op="Stem (WordNet)" from_port="dictionary" to_op="Extract Sentiment (English)" to_port="dictionary"/>
    <connect from_op="Extract Sentiment (English)" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="text:wordlist_to_data" compatibility="8.1.000" expanded="true" height="82" name="WordList to Data" width="90" x="447" y="238"/>
    <operator activated="false" class="pivot" compatibility="8.2.000" expanded="true" height="82" name="Pivot" width="90" x="648" y="34">
    <parameter key="group_attribute" value="sentiment"/>
    <parameter key="index_attribute" value="Id"/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_op="WordList to Data" to_port="word list"/>
    <connect from_op="WordList to Data" from_port="example set" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>
Sign In or Register to comment.