After Nominal to Text and Loop Collection, the date column gone

Titzaaa · July 2019

Hi everyone,
when doing some text mining, I would like to know the date of each article after tokenization. However, I only receive the text columns in the end, on which the sentiment dictionary is applied. Is there any possibility to keep the date column or add it again on the way?
Here is my code:

<context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve" width="90" x="45" y="34">
        <parameter key="repository_entry" value="../Data/Lexis_Nexis_PAT"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve (2)" width="90" x="45" y="289">
        <parameter key="repository_entry" value="../Data/GRESD"/>
      </operator>
      <operator activated="true" class="operator_toolbox:dictionary_sentiment_learner" compatibility="2.0.001" expanded="true" height="82" name="Dictionary-Based Sentiment (Documents)" width="90" x="246" y="289">
        <parameter key="value_attribute" value="Klassifizierung"/>
        <parameter key="key_attribute" value="Wort"/>
        <parameter key="negation_attribute" value="Negationen"/>
        <parameter key="negation_window_size" value="5"/>
        <parameter key="use_symmetric_negation_window" value="true"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.3.001" expanded="true" height="82" name="Set Role" width="90" x="112" y="187">
        <parameter key="attribute_name" value="Datum"/>
        <parameter key="target_role" value="Datum"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="9.3.001" expanded="true" height="82" name="Nominal to Text" width="90" x="179" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value="Body Teil 1"/>
        <parameter key="attributes" value="|Body Teil 1|Body Teil 2"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="text:data_to_documents" compatibility="8.2.000" expanded="true" height="68" name="Data to Documents" width="90" x="313" y="34">
        <parameter key="select_attributes_and_weights" value="false"/>
        <list key="specify_weights"/>
      </operator>
      <operator activated="true" class="loop_collection" compatibility="9.3.001" expanded="true" height="82" name="Loop Collection" width="90" x="447" y="34">
        <parameter key="set_iteration_macro" value="false"/>
        <parameter key="macro_name" value="iteration"/>
        <parameter key="macro_start_value" value="1"/>
        <parameter key="unfold" value="false"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="8.2.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="45" y="34">
            <parameter key="mode" value="non letters"/>
            <parameter key="characters" value=".:"/>
            <parameter key="language" value="English"/>
            <parameter key="max_token_length" value="3"/>
          </operator>
          <operator activated="true" class="text:transform_cases" compatibility="8.2.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="179" y="34">
            <parameter key="transform_to" value="lower case"/>
          </operator>
          <operator activated="true" class="text:filter_stopwords_german" compatibility="8.2.000" expanded="true" height="68" name="Filter Stopwords (2)" width="90" x="313" y="34">
            <parameter key="stop_word_list" value="Standard"/>
          </operator>
          <operator activated="true" class="text:filter_by_length" compatibility="8.2.000" expanded="true" height="68" name="Filter Tokens (2)" width="90" x="514" y="34">
            <parameter key="min_chars" value="3"/>
            <parameter key="max_chars" value="10000"/>
          </operator>
          <connect from_port="single" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
          <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (2)" to_port="document"/>
          <connect from_op="Filter Stopwords (2)" from_port="document" to_op="Filter Tokens (2)" to_port="document"/>
          <connect from_op="Filter Tokens (2)" from_port="document" to_port="output 1"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="operator_toolbox:apply_model_documents" compatibility="2.0.001" expanded="true" height="103" name="Apply Model (Documents)" width="90" x="581" y="187">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="715" y="187">
        <list key="function_descriptions">
          <parameter key="#Pos_Wörter/(#Pos_Wörter+#Neg_Wörter)" value="Positivity/(Positivity-Negativity)"/>
          <parameter key="#Neg_Wörter/(#Pos_Wörter+#Neg_Wörter)" value="Negativity*-1/(Negativity*-1+Positivity)"/>
          <parameter key="Pos_Score" value="if(Positivity&gt;(Negativity*-1),1,0)"/>
          <parameter key="Neg_Score" value="if((Negativity*-1)&gt;Positivity,-1,0)"/>
        </list>
        <parameter key="keep_all" value="true"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="849" y="187">
        <list key="function_descriptions">
          <parameter key="Sentiment_Score" value="if(Pos_Score&gt;0,1,if(Neg_Score&lt;0,-1,0))"/>
        </list>
        <parameter key="keep_all" value="true"/>
      </operator>
      <operator activated="true" class="write_excel" compatibility="9.3.001" expanded="true" height="103" name="Write Excel" width="90" x="983" y="187">
        <parameter key="excel_file" value="D:\Franziska C. Weis\Masterarbeit\03 Datenanalyse\Rapid_Miner_Analysis_IZ.xlsx"/>
        <parameter key="file_format" value="xlsx"/>
        <enumeration key="sheet_names"/>
        <parameter key="sheet_name" value="RapidMiner Data"/>
        <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
        <parameter key="number_format" value="#.0"/>
        <parameter key="encoding" value="SYSTEM"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Retrieve (2)" from_port="output" to_op="Dictionary-Based Sentiment (Documents)" to_port="exa"/>
      <connect from_op="Dictionary-Based Sentiment (Documents)" from_port="mod" to_op="Apply Model (Documents)" to_port="mod"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Data to Documents" to_port="example set"/>
      <connect from_op="Data to Documents" from_port="documents" to_op="Loop Collection" to_port="collection"/>
      <connect from_op="Loop Collection" from_port="output 1" to_op="Apply Model (Documents)" to_port="doc"/>
      <connect from_op="Apply Model (Documents)" from_port="exa" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
      <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Write Excel" to_port="input"/>
      <connect from_op="Write Excel" from_port="through" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

MarlaBot · July 2019

Hi @Titzaaa - this is MarlaBot. I found these great videos on our RapidMiner Academy that you may find helpful:

Instructional Video: Data Prep Challenge Lab (Viewing time: ~1m)

Instructional Video: Text Association Rules (Viewing time: ~10m)

Instructional Video: Loading Text into RapidMiner (Viewing time: ~6m)

Instructional Video: Turbo Prep - Introduction (Viewing time: ~6m)

Please LIKE my comment if it helps! 👇

MarlaBot

MarcoBarradas · July 2019

Hi @Titzaaa

I can´t run the process since you retrieve information from some repository on your computer.
You could add the Generate ID Operator after the retrieve and the use that ID to join your results to your first DataSet.

Titzaaa · July 2019

Hi @MarcoBarradas
unfortunately, when putting the Generate ID Operator directly after the retrieve, the ID also does not survive the rest of the process (the text mining and tokenization).
How can I solve that problem?
Many thanks!

kayman · July 2019

use the role operator before you start your document workflow. You can actually add anything you want (so don't be distracted by the dropdown options). If you name your datefield for instance date, it becomes a special attribute and it will travel along your process as metadata. Just ensure you do not exclude special data in your process as it will be gone again then.

Titzaaa · July 2019

Hi Kayman,
thank you for your reply!
The date goes all along the way until the "Apply Model (Documents)" Operator - there, it gets lost.
Any suggestions here?
Thanks a lot!

MartinLiebig · July 2019

Hi @Titzaaa,
this is almost for sure a bug. @sgenzer can you please create a ticket and assign me?

BR,
Martin

MartinLiebig · July 2019

@Titzaaa ,

fixed in the local dev branch. Will be released with the next version of Operator Toolbox.

BR,

Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

After Nominal to Text and Loop Collection, the date column gone

Answers