Options

Running Sum & Running Sumif

ZabSpecZabSpec Member Posts: 4 Contributor I
edited November 2018 in Help
Hello Everyone

This is my first post and I am very new to Rapidminer.  I'm sorry if I don't describe this well but I will do my best.  I won't go into great detail about what I am trying to achieve straight off, but I will ask if something is possible:

I've loaded a dataset into Rapidminer and set up a process - the data is a list of customers, the store they belong to and the postal area they reside in.  I have aggregated the data into groups of stores and postal areas with a count of customers.

I would now like to add another column/field that is a running sum down (based on the count of customers) the table, or better still a running sumif (based on store).  I've searched but can't find any process that will give me this result.

Is this possible?  Any help for a new user is much appreciated.

Answers

  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    I'm just busy with some other things right now so can't answer in detail, but what it sounds like you want is cumulative sum.  You can find this operator by downloading the finance & economics extension. 
  • Options
    ZabSpecZabSpec Member Posts: 4 Contributor I
    Hi, thanks for your response.  I've found a few other threads that reference this, and you are correct; I need an extension.  The extension needed is called the 'Series' extension and the operator is called 'integrate'.

    So great, I now have a running sum.  The next step for me is to generate this running sum like a sum if, so I have around 8 stores in this list and I want to carry out the running sum for each.  When the calculation gets to a new store in the list I want it to start the running sum again.

    Any help from anyone with this is much appreciated.  I will continue to also try and work it out for myself too!
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Hey,

    what about loop values, **** sum and append? Seems a way to go for me.

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    And here is an example

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.0.001" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="7.0.001" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="187">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="loop_values" compatibility="7.0.001" expanded="true" height="82" name="Loop Values" width="90" x="179" y="187">
            <parameter key="attribute" value="Play"/>
            <process expanded="true">
              <operator activated="true" class="filter_examples" compatibility="7.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34">
                <list key="filters_list">
                  <parameter key="filters_entry_key" value="Play.equals.%{loop_value}"/>
                </list>
              </operator>
              <operator activated="true" class="series:integrate_series" compatibility="5.3.000" expanded="true" height="82" name="Integrate" width="90" x="380" y="34">
                <parameter key="attribute_name" value="Humidity"/>
              </operator>
              <operator activated="false" class="aggregate" compatibility="7.0.001" expanded="true" height="82" name="Aggregate" width="90" x="313" y="238">
                <list key="aggregation_attributes">
                  <parameter key="Humidity" value="sum"/>
                </list>
              </operator>
              <operator activated="false" class="generate_attributes" compatibility="7.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="136">
                <list key="function_descriptions"/>
              </operator>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Integrate" to_port="example set input"/>
              <connect from_op="Integrate" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="7.0.001" expanded="true" height="82" name="Append" width="90" x="313" y="187"/>
          <connect from_op="Retrieve Golf" from_port="output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    ZabSpecZabSpec Member Posts: 4 Contributor I
    Hey Martin

    Thanks for replying to my post with an answer.  I'm afraid I have only just started using RapidMiner so could you please provide a small explanation about what you're suggesting.  I'm unfamiliar with the code you have provided so could you just go into a little more detail about the process you have suggested?

    Thanks in advance for this and sorry to be a pain.
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Hi,

    if you add the XML panel (View Show Panel) to your layout, than you can copy my XML over and press the green check mark. Afterwards you can see my process. This is the easiest way to share processes :)

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    ZabSpecZabSpec Member Posts: 4 Contributor I
    Martin, thank you so much for your help.  I have applied the code you provided with success.

    Many Thanks
Sign In or Register to comment.