Options

[SOLVED] Median calculation

RapidEndUserRapidEndUser Member Posts: 10 Contributor II
edited November 2018 in Help
Hello Support,

Is there an operator to calculate median value in a column?

Thanks,
RapidEndUser.

Answers

  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    You could use the Aggregate operator although you have to add a constant value for all the examples so that the operator has something to group against in order to find the median for that grouping.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve Iris" width="90" x="112" y="120">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.007" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="120">
            <list key="function_descriptions">
              <parameter key="constant" value="1"/>
            </list>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.3.007" expanded="true" height="76" name="Aggregate" width="90" x="380" y="120">
            <parameter key="default_aggregation_function" value="median"/>
            <list key="aggregation_attributes">
              <parameter key="a1" value="median"/>
              <parameter key="a2" value="median"/>
              <parameter key="a3" value="median"/>
              <parameter key="a4" value="median"/>
            </list>
            <parameter key="group_by_attributes" value="|constant"/>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    If you wanted to find the median for different labels then you would change the group by parameter to be the label in the example above.

    regards

    Andrew
  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Actually it's even easier - I realized you don't need a constant attribute at all

    This process is even simpler
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve Iris" width="90" x="112" y="120">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.3.007" expanded="true" height="76" name="Aggregate" width="90" x="246" y="120">
            <parameter key="use_default_aggregation" value="true"/>
            <parameter key="default_aggregation_function" value="median"/>
            <list key="aggregation_attributes"/>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    regards

    Andrew
  • Options
    RapidEndUserRapidEndUser Member Posts: 10 Contributor II
    Thanks Andrew.

    After experimenting the different operators yes I found that median function in the aggregate operator.

    Currently the median value is only returning when you use aggregate function obvious since we are using aggregate function so it is returning that value only.
    Row No. median(a1) median(a2) median(a3)   median(4)
    1           5.8             3.0               4.4               1.3

    Actually I would like to keep all the rows in the table and add one more column which is called "MedianValue".

    For e.g. If you have 4 columns in a table and would like to add 5th column that will be "MedianValue" column.
    Since I want to do some calculation using "MedianValue" column with other columns and then produce 6th column, 7th column, etc.,

    The output I am expecting:-

    label                 a1  a2    a3    a4    median(a1)    a1_percentage
    Iris-setosa     5.1 3.5 1.4 0.2  5.800                0.879
    Iris-setosa     4.9 3.0 1.4 0.2  5.800                0.844
    Iris-setosa     4.7 3.2 1.3 0.2  5.800                0.810
    Iris-setosa     4.6 3.1 1.5 0.2  5.800                0.793
    Iris-setosa     5.0 3.6 1.4 0.2  5.800                0.862
    Iris-setosa     5.4 3.9 1.7 0.4  5.800                0.931
    Iris-setosa     4.6 3.4 1.4 0.3  5.800                0.793

    Is it possible to do this?


    Thanks,
    RapidEndUser.
  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    It can be done using the Extract Macro operator on the aggregate result example set then using the macros with Generate Attributes to create a new attribute with the original example set and the appropriate macro.

    With a bit of effort, it could be done using Loop operators but it's a bit more involved.

    regards

    Andrew
  • Options
    RapidEndUserRapidEndUser Member Posts: 10 Contributor II
    Hi Andrew,

    Can you provide me sample XML how to use Extract Macro with Generate Attributes operator?

    Thanks,
    RapidEndUser.
  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    I had a bit of spare time - so it's a lucky day.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve Iris" width="90" x="112" y="75">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.3.007" expanded="true" height="76" name="Aggregate" width="90" x="112" y="165">
            <parameter key="use_default_aggregation" value="true"/>
            <parameter key="default_aggregation_function" value="median"/>
            <list key="aggregation_attributes"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="5.3.007" expanded="true" height="76" name="Rename by Replacing" width="90" x="112" y="255">
            <parameter key="replace_what" value="(.*)\(+(.*)\)+"/>
            <parameter key="replace_by" value="$1_$2"/>
          </operator>
          <operator activated="true" class="loop_attributes" compatibility="5.3.007" expanded="true" height="76" name="Loop Attributes" width="90" x="112" y="345">
            <process expanded="true">
              <operator activated="true" class="extract_macro" compatibility="5.3.007" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="30">
                <parameter key="macro" value="%{loop_attribute}"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="%{loop_attribute}"/>
                <parameter key="example_index" value="1"/>
                <list key="additional_macros"/>
              </operator>
              <connect from_port="example set" to_op="Extract Macro" to_port="example set"/>
              <connect from_op="Extract Macro" from_port="example set" to_port="example set"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="subprocess" compatibility="5.3.007" expanded="true" height="94" name="enforce order" width="90" x="246" y="345">
            <process expanded="true">
              <connect from_port="in 1" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="source_in 3" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="loop_attributes" compatibility="5.3.007" expanded="true" height="76" name="Loop Attributes (2)" width="90" x="380" y="345">
            <process expanded="true">
              <operator activated="true" class="generate_macro" compatibility="5.3.007" expanded="true" height="76" name="Generate Macro" width="90" x="112" y="30">
                <list key="function_descriptions">
                  <parameter key="medianMacro" value="&quot;median_&quot;+&quot;%{loop_attribute}&quot;"/>
                  <parameter key="newAttributeName" value="&quot;ratio_&quot;+&quot;%{loop_attribute}&quot;"/>
                </list>
              </operator>
              <operator activated="true" class="generate_macro" compatibility="5.3.007" expanded="true" height="76" name="Generate Macro (2)" width="90" x="112" y="120">
                <list key="function_descriptions">
                  <parameter key="medianMacroValue" value="macro(&quot;%{medianMacro}&quot;)"/>
                </list>
              </operator>
              <operator activated="true" class="generate_attributes" compatibility="5.3.007" expanded="true" height="76" name="Generate Attributes" width="90" x="112" y="210">
                <list key="function_descriptions">
                  <parameter key="%{newAttributeName}" value="%{loop_attribute}/%{medianMacroValue}"/>
                </list>
              </operator>
              <connect from_port="example set" to_op="Generate Macro" to_port="through 1"/>
              <connect from_op="Generate Macro" from_port="through 1" to_op="Generate Macro (2)" to_port="through 1"/>
              <connect from_op="Generate Macro (2)" from_port="through 1" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_port="example set"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="original" to_op="enforce order" to_port="in 1"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Loop Attributes" to_port="example set"/>
          <connect from_op="Loop Attributes" from_port="example set" to_op="enforce order" to_port="in 2"/>
          <connect from_op="enforce order" from_port="out 1" to_op="Loop Attributes (2)" to_port="example set"/>
          <connect from_op="Loop Attributes (2)" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Quite complex because it does the work reasonably automatically with Loops and macros - enjoy!

    regards

    Andrew
  • Options
    aborgaborg Member Posts: 66 Contributor II
    Would not be easier (maybe even more efficient) with a Join operator? (Obviously only for the first loop attributes.)
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="696" width="706">
          <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="75">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.2.008" expanded="true" height="76" name="Aggregate" width="90" x="179" y="75">
            <parameter key="use_default_aggregation" value="true"/>
            <parameter key="default_aggregation_function" value="median"/>
            <list key="aggregation_attributes"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="120">
            <list key="function_descriptions">
              <parameter key="One" value="1"/>
            </list>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="313" y="30">
            <list key="function_descriptions">
              <parameter key="One" value="1"/>
            </list>
          </operator>
          <operator activated="true" class="join" compatibility="5.2.008" expanded="true" height="76" name="Join" width="90" x="447" y="75">
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="One" value="One"/>
            </list>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="original" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="180"/>
        </process>
      </operator>
    </process>
    Cheers, gabor
  • Options
    RapidEndUserRapidEndUser Member Posts: 10 Contributor II
    Thanks Gabor.

    Perfect in my scenario. It is working great. I had a separate process to calculate median value and now I don't need that. Just I can do it in one process.


    Thanks,
    RapidEndUser.
Sign In or Register to comment.