Global sum of a column

StanKStanK Member Posts: 3 Newbie
I am very disappointed with the way of calculation the simple global sum in RapidMiner. I think if you are unable to make this easy - there is no sense to continue with more complicated things.

In particular, I just need to get a % of an amount for each row from the total sum of a column. This procedure takes normally just seconds in Excel.

Nor Aggregate, neither Pivot could help me - as I don't need "Count", I need a "Total Sum".

Best Answer

  • Options
    StanKStanK Member Posts: 3 Newbie
    Solution Accepted
    Thank you, Martin! I will look at it!


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,508 RM Data Scientist
    edited July 2020
    Hi @StanK,

    i think what you want is very easy to build with like 3 operators. I think all of these operators are part of the training and certification we offer free of charge on academy.rapidminer.com. Attached is a process, which I think does exactly what you want.


    <?xml version="1.0" encoding="UTF-8"?><process version="9.7.001">
      <operator activated="true" class="process" compatibility="9.7.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="9.7.001" expanded="true" height="68" name="Generate Data" width="90" x="112" y="136">
            <parameter key="target_function" value="random"/>
            <parameter key="number_examples" value="100"/>
            <parameter key="number_of_attributes" value="5"/>
            <parameter key="attributes_lower_bound" value="0.0"/>
            <parameter key="attributes_upper_bound" value="10.0"/>
            <parameter key="gaussian_standard_deviation" value="10.0"/>
            <parameter key="largest_radius" value="10.0"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
          <operator activated="true" class="aggregate" compatibility="9.7.001" expanded="true" height="82" name="Aggregate (2)" width="90" x="246" y="136">
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default_aggregation_function" value="average"/>
            <list key="aggregation_attributes">
              <parameter key="att1" value="sum"/>
            <parameter key="group_by_attributes" value=""/>
            <parameter key="count_all_combinations" value="false"/>
            <parameter key="only_distinct" value="false"/>
            <parameter key="ignore_missings" value="true"/>
          <operator activated="true" class="cartesian_product" compatibility="9.7.001" expanded="true" height="82" name="Cartesian" width="90" x="380" y="136">
            <parameter key="remove_double_attributes" value="true"/>
          <operator activated="true" class="generate_attributes" compatibility="9.7.001" expanded="true" height="82" name="Generate Attributes" width="90" x="514" y="136">
            <list key="function_descriptions">
              <parameter key="fraction_att1" value="att1/[sum(att1)]"/>
            <parameter key="keep_all" value="true"/>
          <connect from_op="Generate Data" from_port="output" to_op="Aggregate (2)" to_port="example set input"/>
          <connect from_op="Aggregate (2)" from_port="example set output" to_op="Cartesian" to_port="left"/>
          <connect from_op="Aggregate (2)" from_port="original" to_op="Cartesian" to_port="right"/>
          <connect from_op="Cartesian" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    StanKStanK Member Posts: 3 Newbie
    Hi Martin, thank you for your reply! Where should I exactly to enter this code you sent me?
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,508 RM Data Scientist
    Hi @StanK,


    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    edited July 2020

    Since a couple versions ago, you can copy the XML into your clipboard and then put it into Studio by simply pressing the paste button in the Process panel top right corner:

Sign In or Register to comment.