UNION - notpassing attribute names to the next OP

kludikovskykludikovsky Member Posts: 30 Maven
edited November 2018 in Help

Issue:

The Union operator does not pass the attribute names to the next operation. 

Situation

I have two example sets (participants lists) which I read from excel sheets.

As they might be slightly different I combine those using the union operator.

This works well, until I try to do some futher processing line filtering.

in the filter opperator I don't get attributes from UNION to see.

Nevertheless I can use the attribute names directly in any operation.

So I can use a Generate Attribute to create a combined search key if I directly input the attribute names.

in the filter I do afterwards I can only see the newly creaed attribute but not those from the UNION.
And I can't see anything in UNIO to do about.
If I check the dataset I the stat's looklike any other.
That's the code.

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="187">
<parameter key="excel_file" value="D:\Teilnehmerliste.xlsx"/>
<parameter key="sheet_number" value="1"/>
<parameter key="imported_cell_range" value="A1"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations"/>
<parameter key="date_format" value=""/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (3)" width="90" x="45" y="289">
<parameter key="excel_file" value="D:\Teilnehmerliste.xlsx"/>
<parameter key="sheet_number" value="1"/>
<parameter key="imported_cell_range" value="A1"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations"/>
<parameter key="date_format" value=""/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="179" y="289">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="numeric"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="real"/>
<parameter key="block_type" value="value_series"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_series_end"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal (2)" width="90" x="179" y="187">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="numeric"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="real"/>
<parameter key="block_type" value="value_series"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_series_end"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="union" compatibility="7.6.001" expanded="true" height="82" name="Union" width="90" x="313" y="187"/>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal (3)" width="90" x="447" y="187">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="numeric"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="real"/>
<parameter key="block_type" value="value_series"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_series_end"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" breakpoints="after" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="581" y="187">
<list key="function_descriptions">
<parameter key="sel_crit" value="concat(Nachname,Vorname,Email)"/>
</list>
<parameter key="keep_all" value="true"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples" width="90" x="715" y="187">
<parameter key="parameter_expression" value="missing(sel_crit)"/>
<parameter key="condition_class" value="expression"/>
<parameter key="invert_filter" value="true"/>
<list key="filters_list"/>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
</process>

 

 

EDIT:
After I store and retieve the example set the attributes are there again.

EDIT 2:
Is there a difference if an example set ist read at the top process or within a sub-process?

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @kludikovsky - hmm hard to help as it seems that your XML is corrupted.  Can you please repost?

     

    Side note: this seems to be happening a lot these days where the posted XML looks repeated.  Any idea why this is happening?

     

    Scott

     

  • kludikovskykludikovsky Member Posts: 30 Maven

    Hi @sgenzer,

    I belive a made two mistakes

     

    1) the Code

    as it works so easily I just copied the operators from RM and pasted it in here. So it's not a complete process as when I export it.

    EDIT: Just checked it. You can copy the XML and post it into an open canvas.  May this is what many people here will do like me.

     

    So here is the code as export
    (Note:this might not be completely identical to the inital)

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="179" y="85">
    <parameter key="excel_file" value="D:\Teilnehmerliste.xlsx"/>
    <parameter key="sheet_number" value="1"/>
    <parameter key="imported_cell_range" value="A1"/>
    <parameter key="encoding" value="SYSTEM"/>
    <parameter key="first_row_as_names" value="true"/>
    <list key="annotations"/>
    <parameter key="date_format" value=""/>
    <parameter key="time_zone" value="SYSTEM"/>
    <parameter key="locale" value="English (United States)"/>
    <list key="data_set_meta_data_information"/>
    <parameter key="read_not_matching_values_as_missings" value="true"/>
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="data_management" value="auto"/>
    </operator>
    <operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (3)" width="90" x="179" y="187">
    <parameter key="excel_file" value="D:\Teilnehmerliste.xlsx"/>
    <parameter key="sheet_number" value="1"/>
    <parameter key="imported_cell_range" value="A1"/>
    <parameter key="encoding" value="SYSTEM"/>
    <parameter key="first_row_as_names" value="true"/>
    <list key="annotations"/>
    <parameter key="date_format" value=""/>
    <parameter key="time_zone" value="SYSTEM"/>
    <parameter key="locale" value="English (United States)"/>
    <list key="data_set_meta_data_information"/>
    <parameter key="read_not_matching_values_as_missings" value="true"/>
    <parameter key="datamanagement" value="double_array"/>
    <parameter key="data_management" value="auto"/>
    </operator>
    <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="313" y="187">
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="real"/>
    <parameter key="block_type" value="value_series"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_series_end"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    </operator>
    <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal (2)" width="90" x="313" y="85">
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="real"/>
    <parameter key="block_type" value="value_series"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_series_end"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    </operator>
    <operator activated="true" breakpoints="after" class="union" compatibility="7.6.001" expanded="true" height="82" name="Union" width="90" x="447" y="85"/>
    <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="85">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    </operator>
    <connect from_op="Read Excel (2)" from_port="output" to_op="Numerical to Polynominal (2)" to_port="example set input"/>
    <connect from_op="Read Excel (3)" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
    <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Union" to_port="example set 2"/>
    <connect from_op="Numerical to Polynominal (2)" from_port="example set output" to_op="Union" to_port="example set 1"/>
    <connect from_op="Union" from_port="union" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

     

    2) the issue itself

    Can it be that when I create a process and read an excel file the following operator is not able to know the attributes as the file has not been read, so therefore the metadata is not avilable.

    This would be logical.

    But on the other hand how can I then process a file unless I read and store it.

     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    1. AHA!  Mystery solved.  Thank you.

    2. Yes well we can't have RapidMiner make EVERYTHING easy for you, right?  :)  On a more serious note, what you can do is use the Read Excel operator and set the metadata under "data set meta data information".  The "wizard" will also do this for you.  The metadata will propagate after this.

     

    Scott

     

    kludikovsky
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @sgenzer do you think that could be the cause for the XML corruption, users just doing a select all in the process window and then pasting it the XML code? I'd try it myself but i don't have access to RM at the moment. 

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    yup.  Just tried it myself.  Weird but true.  Now the next question: if I make this clear in the new sidebar, will people read it?  :)

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @sgenzer as a former Community Manager, your guess is as good as mine!

    sgenzer
Sign In or Register to comment.