[SOLVED] split example set by value of nominal attribute

ReginaRegina Member Posts: 8 Contributor I
edited November 2018 in Help
Hallo,

I would like split the example set of the data by nominal attribute.
This attribute contains about 40000 examples of for example "A", "B" and "C" .
I would like all equal examples in one example set, one example set vor "A", one for "B" and one for "C".
I tried it with loop_value and filter_example (MyAttribute=%{loop_value}) but it doesn't work so.
Without filter_example it gives me several example sets but don't split in "A", "B" and "C" and with this operator it gives me also several example sets but without examples.
I need this because I would calculate the values of another attribute according to "A", "B" and "C".
After this the serveral example sets should combine again.

Can anybody help me please?

Thanks in advance,
Regina

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Hi Regina,

    can you please post the XML of your process as described in the post that is linked in my signature?

    Best regards,
    Marius
  • ReginaRegina Member Posts: 8 Contributor I
    Hi Marius,

    i wish you a happy new year and thank you for your reply.

    I have found a way.
    I read the database usually with "Read Database" and define a query (dynamic) but then I can't take the "Loop_values" operator.
    But why it doesn't work in this way?

    So I export the data in excel and then import in the repository and it works with "Loop_values".

    Can you say me, how I can combine the splitted example sets again? I would like calculate the average of a numerical attribute and put the result of all example sets in the collection out in one example set.

    Thank you and best regards,
    Regina
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve test_db" width="90" x="179" y="435">
            <parameter key="repository_entry" value="Datenbanken/test_db"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="5.3.015" expanded="true" height="76" name="Filter Example Range" width="90" x="313" y="390">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="50"/>
          </operator>
          <operator activated="false" class="read_database" compatibility="5.3.015" expanded="true" height="60" name="Read Database" width="90" x="45" y="120">
            <parameter key="connection" value="sqlserver"/>
            <parameter key="query" value="select DP_Name, TIME_ON from ALRHIST2012&#10;where CTRL_NAME like '%Alarm%' and TIME_OFF is not null&#10;and DP_NAME in ('AMS-A1_VERBINDUNG_ZU_K10','AMS-A1_VERBINDUNG_ZU_K1','AMS-A1_VERBINDUNG_ZU_A2')"/>
            <enumeration key="parameters"/>
          </operator>
          <operator activated="true" class="loop_values" compatibility="5.3.015" expanded="true" height="76" name="Loop Values" width="90" x="313" y="210">
            <parameter key="attribute" value="DP_Name"/>
            <process expanded="true">
              <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples" width="90" x="112" y="30">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="DP_Name=%{loop_value}"/>
              </operator>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve test_db" from_port="output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Regina wrote:

    Can you say me, how I can combine the splitted example sets again? I would like calculate the average of a numerical attribute and put the result of all example sets in the collection out in one example set.
    If you simply want to calculate the average of a numerical attribute after filtering by the values of a nominal attribute you are probably better off with the Aggregate operator. Configure the nominal attribute as group attribute, and the one where you want to calculate the average as aggregation attribute.

    Best regards,
    Marius
  • ReginaRegina Member Posts: 8 Contributor I
    I have already calculate the numerical attribute and the result is the average. And at the moment I have a collection of example sets that each have an average. What I would like to know is how I can combine the single example sets, that they are in the original form before being separated?
    If I have two sepapate example sets, I can combine these with the Operator "Union". But it doesn't work with a collection because the example sets are only in one Operator (loop values).
    I wanted write the examples in excel file (Operator "Write Excel" within the Loop_values) but it writes only the average of one example set.

    Sorry for my caotic description.

    Best regards,
    Regina
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve Example2_DB" width="90" x="179" y="210">
            <parameter key="repository_entry" value="Datenbanken/Example2_DB"/>
          </operator>
          <operator activated="true" class="loop_values" compatibility="5.3.015" expanded="true" height="94" name="Loop Values" width="90" x="380" y="165">
            <parameter key="attribute" value="DP_NAME"/>
            <process expanded="true">
              <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples" width="90" x="112" y="30">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="DP_NAME=%{loop_value}"/>
              </operator>
              <operator activated="true" class="sort" compatibility="5.3.015" expanded="true" height="76" name="Sort (3)" width="90" x="246" y="30">
                <parameter key="attribute_name" value="TIME_ON"/>
              </operator>
              <operator activated="true" class="series:differentiate_example_set" compatibility="5.3.000" expanded="true" height="76" name="Differentiate" width="90" x="380" y="30">
                <parameter key="attribute_name" value="TIME_ON"/>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples (2)" width="90" x="514" y="30">
                <parameter key="condition_class" value="no_missing_attributes"/>
              </operator>
              <operator activated="true" class="rename" compatibility="5.3.015" expanded="true" height="76" name="Rename (3)" width="90" x="380" y="210">
                <parameter key="old_name" value="change(TIME_ON)"/>
                <parameter key="new_name" value="Differenz"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <operator activated="true" class="aggregate" compatibility="5.3.015" expanded="true" height="76" name="Aggregate" width="90" x="514" y="210">
                <list key="aggregation_attributes">
                  <parameter key="Differenz" value="average"/>
                </list>
                <parameter key="group_by_attributes" value="|DP_NAME"/>
              </operator>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Sort (3)" to_port="example set input"/>
              <connect from_op="Sort (3)" from_port="example set output" to_op="Differentiate" to_port="example set input"/>
              <connect from_op="Differentiate" from_port="example set output" to_op="Filter Examples (2)" to_port="example set input"/>
              <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Rename (3)" to_port="example set input"/>
              <connect from_op="Rename (3)" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
              <connect from_op="Aggregate" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Example2_DB" from_port="output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
          <connect from_op="Loop Values" from_port="out 2" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    If all example sets in the collection have the same structure (same attributes etc.), then you can use Append, even with collections.

    Append should usually be preferred over Union anyway - you only need Union when the structure of the example sets is different.

    Best regards and happy mining!

    ~Marius
  • ReginaRegina Member Posts: 8 Contributor I
    That's it  :D

    Thank you very much!

    Best regards,
    Regina
Sign In or Register to comment.