Process Problem

fokkofokko Member Posts: 8 Contributor II
edited November 2018 in Help
Hello dear,

I have a question to my process. It seems that it works, but at the operator "join, left side" I get an exclamation mark. What is the problem, how can I fix it?
Thanks


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.0.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.003" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files (2)" width="90" x="45" y="255">
        <list key="text_directories">
          <parameter key="positive" value="C:\Users\chris_000\Desktop\Master Doks\Arbeitsstand\Dictionary\General Inquirer\Positive"/>
        </list>
        <parameter key="vector_creation" value="Binary Term Occurrences"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize" width="90" x="313" y="30"/>
          <operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (2)" width="90" x="447" y="30"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_op="Stem (2)" to_port="document"/>
          <connect from_op="Stem (2)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files (3)" width="90" x="45" y="345">
        <list key="text_directories">
          <parameter key="negative" value="C:\Users\chris_000\Desktop\Master Doks\Arbeitsstand\Dictionary\General Inquirer\Negative"/>
        </list>
        <parameter key="vector_creation" value="Binary Term Occurrences"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize (3)" width="90" x="180" y="30"/>
          <operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (3)" width="90" x="416" y="30"/>
          <connect from_port="document" to_op="Tokenize (3)" to_port="document"/>
          <connect from_op="Tokenize (3)" from_port="document" to_op="Stem (3)" to_port="document"/>
          <connect from_op="Stem (3)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="loop_files" compatibility="6.0.003" expanded="true" height="76" name="Loop Files" width="90" x="45" y="30">
        <parameter key="directory" value="C:\Users\chris_000\Desktop\Master Doks\Arbeitsstand\Textdaten\Post\Samsung\Split"/>
        <process expanded="true">
          <operator activated="true" class="text:read_document" compatibility="5.3.002" expanded="true" height="60" name="Read Document" width="90" x="45" y="30"/>
          <connect from_port="file object" to_op="Read Document" to_port="file"/>
          <connect from_op="Read Document" from_port="output" to_port="out 1"/>
          <portSpacing port="source_file object" spacing="0"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="text:filter_documents_by_content" compatibility="5.3.002" expanded="true" height="76" name="Filter Documents (by Content)" width="90" x="112" y="120">
        <parameter key="string" value="via twitter"/>
        <parameter key="invert condition" value="true"/>
      </operator>
      <operator activated="true" class="text:filter_documents_by_content" compatibility="5.3.002" expanded="true" height="76" name="Filter Documents (2)" width="90" x="246" y="120">
        <parameter key="string" value="GATE-0001"/>
        <parameter key="invert condition" value="true"/>
      </operator>
      <operator activated="true" class="text:filter_documents_by_content" compatibility="5.3.002" expanded="true" height="76" name="Filter Documents (3)" width="90" x="380" y="120">
        <parameter key="string" value="SOOK-127654"/>
        <parameter key="invert condition" value="true"/>
      </operator>
      <operator activated="true" class="text:process_documents" compatibility="5.3.002" expanded="true" height="94" name="Process Documents" width="90" x="514" y="30">
        <parameter key="keep_text" value="true"/>
        <process expanded="true">
          <connect from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="set_role" compatibility="6.0.003" expanded="true" height="76" name="Set Role" width="90" x="648" y="30">
        <parameter key="attribute_name" value="text"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="6.0.003" expanded="true" height="94" name="Multiply" width="90" x="782" y="30"/>
      <operator activated="true" class="text:process_document_from_data" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Data" width="90" x="179" y="255">
        <parameter key="vector_creation" value="Term Occurrences"/>
        <parameter key="keep_text" value="true"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize (2)" width="90" x="179" y="30"/>
          <operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (Porter)" width="90" x="447" y="30"/>
          <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
          <connect from_op="Stem (Porter)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="generate_aggregation" compatibility="6.0.003" expanded="true" height="76" name="Generate Aggregation" width="90" x="313" y="255">
        <parameter key="attribute_name" value="positive"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="6.0.003" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="255">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="metadata_path|text|positive|label|metadata_date|metadata_file"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Data (2)" width="90" x="179" y="345">
        <parameter key="vector_creation" value="Term Occurrences"/>
        <parameter key="keep_text" value="true"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize (4)" width="90" x="180" y="30"/>
          <operator activated="true" class="text:stem_porter" compatibility="5.3.002" expanded="true" height="60" name="Stem (4)" width="90" x="484" y="30"/>
          <connect from_port="document" to_op="Tokenize (4)" to_port="document"/>
          <connect from_op="Tokenize (4)" from_port="document" to_op="Stem (4)" to_port="document"/>
          <connect from_op="Stem (4)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="generate_aggregation" compatibility="6.0.003" expanded="true" height="76" name="Generate Aggregation (2)" width="90" x="313" y="345">
        <parameter key="attribute_name" value="negative"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="6.0.003" expanded="true" height="76" name="Select Attributes (2)" width="90" x="447" y="345">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="metadata_path|text|negative|label|metadata_date|metadata_file"/>
      </operator>
      <operator activated="true" class="join" compatibility="6.0.003" expanded="true" height="76" name="Join" width="90" x="581" y="300">
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="metadata_path" value="metadata_path"/>
        </list>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="6.0.003" expanded="true" height="76" name="Generate Attributes" width="90" x="715" y="300">
        <list key="function_descriptions">
          <parameter key="Sentiment" value="(positive-negative)/(positive+negative)"/>
        </list>
      </operator>
      <operator activated="true" class="write_excel" compatibility="6.0.003" expanded="true" height="76" name="Write Excel" width="90" x="715" y="435">
        <parameter key="excel_file" value="C:\Users\chris_000\Desktop\Master Doks\Arbeitsstand\Textdaten\Post\RSA\RSA Output F2.xls"/>
        <parameter key="file_format" value="xlsx"/>
        <parameter key="sheet_name" value="RapidMiner Test"/>
      </operator>
      <connect from_op="Process Documents from Files (2)" from_port="word list" to_op="Process Documents from Data" to_port="word list"/>
      <connect from_op="Process Documents from Files (3)" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
      <connect from_op="Loop Files" from_port="out 1" to_op="Filter Documents (by Content)" to_port="documents 1"/>
      <connect from_op="Filter Documents (by Content)" from_port="documents" to_op="Filter Documents (2)" to_port="documents 1"/>
      <connect from_op="Filter Documents (2)" from_port="documents" to_op="Filter Documents (3)" to_port="documents 1"/>
      <connect from_op="Filter Documents (3)" from_port="documents" to_op="Process Documents" to_port="documents 1"/>
      <connect from_op="Process Documents" from_port="example set" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Process Documents from Data (2)" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Generate Aggregation" to_port="example set input"/>
      <connect from_op="Generate Aggregation" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Generate Aggregation (2)" to_port="example set input"/>
      <connect from_op="Generate Aggregation (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Join" to_port="right"/>
      <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Write Excel" to_port="input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    If you open the "Problems View" you can see an overview over all problems.

    There you see

    The attribute 'metadata_path' is missing in the input example set.
    for your join operator.
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • fokkofokko Member Posts: 8 Contributor II
    Ok, I see.
    But I don´t have an idea how to fix the problem.....
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    Sometimes errors like this happen but the process runs OK anyway.

    I wouldn't worry too much if you're sure the results are what you expect.

    regards

    Andrew
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    these warnings are based on the metadata on operator input ports. Metadata is designed to help you build processes and it tries to be as close to the actual data as possible. However metadata sometimes simply cannot know what attributes you will get when executing the process. For example when reading in files it's impossible to know what you will get unless you actually read them. As that may be very costly in terms of performance, in such cases metadata will fail. You can then safely ignore such warnings.

    Regards,
    Marco
Sign In or Register to comment.