Options

Sentiment Measure through Aggregate Operator

TitzaaaTitzaaa Member Posts: 12 Learner I
edited June 2019 in Help
Dear Community,

I currently conduct a Sentiment Analysis for news media on a specific company.
I already matched each article ("row") with a specific sentiment score (+1 for each positive word contained, -1 for each negative word contained, so e.g. one article has a positivity score of 5 and a negativity score of -3).

Now I want to construct a sentiment measure:
The dominant sentiment of the article tells if I consider the entire text as positive/negative. So in the example above, my article is a positive article and therefore a +1 in total (another article may have more negative than positive words and is considered negative by -1). 
One possibility for my sentiment measure is then #negative articles/#articles or #positive articles/#articles in a specific timeframe, e.g. in one week. The date of each article is contained in the news-media-source. 

This is what my process looks like until now:

<?xml version="1.0" encoding="UTF-8"?><process version="9.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.3.000" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.3.000" expanded="true" height="68" name="Retrieve" width="90" x="45" y="34">
        <parameter key="repository_entry" value="../Data/Finanzen.net"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="9.3.000" expanded="true" height="82" name="Nominal to Text" width="90" x="179" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Titel"/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="text:data_to_documents" compatibility="8.2.000" expanded="true" height="68" name="Data to Documents" width="90" x="313" y="34">
        <parameter key="select_attributes_and_weights" value="false"/>
        <list key="specify_weights"/>
      </operator>
      <operator activated="true" class="loop_collection" compatibility="9.3.000" expanded="true" height="82" name="Loop Collection" width="90" x="447" y="34">
        <parameter key="set_iteration_macro" value="false"/>
        <parameter key="macro_name" value="iteration"/>
        <parameter key="macro_start_value" value="1"/>
        <parameter key="unfold" value="false"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="8.2.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="45" y="34">
            <parameter key="mode" value="non letters"/>
            <parameter key="characters" value=".:"/>
            <parameter key="language" value="English"/>
            <parameter key="max_token_length" value="3"/>
          </operator>
          <operator activated="true" class="text:transform_cases" compatibility="8.2.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="179" y="34">
            <parameter key="transform_to" value="lower case"/>
          </operator>
          <operator activated="true" class="text:filter_stopwords_german" compatibility="8.2.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="313" y="34">
            <parameter key="stop_word_list" value="Standard"/>
          </operator>
          <operator activated="true" class="text:filter_by_length" compatibility="8.2.000" expanded="true" height="68" name="Filter Tokens (2)" width="90" x="514" y="34">
            <parameter key="min_chars" value="3"/>
            <parameter key="max_chars" value="10000"/>
          </operator>
          <connect from_port="single" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
          <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
          <connect from_op="Filter Stopwords (2)" from_port="document" to_op="Filter Tokens (2)" to_port="document"/>
          <connect from_op="Filter Tokens (2)" from_port="document" to_port="output 1"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="retrieve" compatibility="9.3.000" expanded="true" height="68" name="Retrieve (2)" width="90" x="45" y="289">
        <parameter key="repository_entry" value="../Data/GRESD"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="9.3.000" expanded="true" height="68" name="Retrieve (3)" width="90" x="45" y="391">
        <parameter key="repository_entry" value="../Data/Negationsliste"/>
      </operator>
      <operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.0.001" expanded="true" height="82" name="Dictionary-Based Sentiment (Documents)" width="90" x="246" y="289">
        <parameter key="value_attribute" value="Klassifizierung"/>
        <parameter key="key_attribute" value="Wort"/>
        <parameter key="negation_attribute" value="Negationen"/>
        <parameter key="negation_window_size" value="5"/>
        <parameter key="use_symmetric_negation_window" value="true"/>
      </operator>
      <operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.0.001" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="715" y="289">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="write_excel" compatibility="9.3.000" expanded="true" height="103" name="Write Excel" width="90" x="876" y="136">
        <parameter key="excel_file" value="D:\Franziska C. Weis\Masterarbeit\03 Datenanalyse\Rapid_Miner_Analysis.xlsx"/>
        <parameter key="file_format" value="xlsx"/>
        <enumeration key="sheet_names"/>
        <parameter key="sheet_name" value="RapidMiner Data"/>
        <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
        <parameter key="number_format" value="#.0"/>
        <parameter key="encoding" value="SYSTEM"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Data to Documents" to_port="example set"/>
      <connect from_op="Data to Documents" from_port="documents" to_op="Loop Collection" to_port="collection"/>
      <connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
      <connect from_op="Retrieve (2)" from_port="output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
      <connect from_op="Retrieve (3)" from_port="output" to_op="Dictionary-Based Sentiment (Documents)" to_port="neg"/>
      <connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
      <connect from_op="Apply Model (Documents)" from_port="exa" to_op="Write Excel" to_port="input"/>
      <connect from_op="Write Excel" from_port="through" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
I attached the original source, where the date column is included. 

With which operator can I create a sentiment measure, to perform a vector autoregression with this sentiment measure and a share price later on? 
Thanks already for your answers!

Franziska

Answers

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    You are probably going to want to use Aggregate in some fashion for this, although you may need to do some Date to Numerical conversions first to group things into the appropriate time unit (e.g., by week of the year).  Depending on how you want to do that (averaging, taking the best/worst, etc.) you may find the Windowing operators in the time series operators folder to be useful here as well.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.