Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Generate percentages from absolute values

leonardormachadleonardormachad Member Posts: 3 Contributor I
edited November 2018 in Help
Hey everybody.

I have a CSV reader operator. It reads the following data from a CSV file:
INSERT, SELECT, UPDATE, DELETE
5,10,0,0
1,2,2,0
30,0,0,0
...

I want to use an operator that generates the follow values (percentage of each column) from the original ones:
INSERT, SELECT, UPDATE, DELETE
33,66,0,0
20,40,40,0
100,0,0,0
...

Is there a way to do that?

Thanks in advance,
Leonardo

Answers

  • SebastianLohSebastianLoh Member Posts: 99 Contributor II
    hi leonardormachado,

    I uploaded a process on my experiment. search for "Generate percentages from absolute values" in the myExperiment plugin and the just open the process and explanations in RM.

    Ciao Sebastian

    P.S.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
        <process expanded="true" height="446" width="534">
          <operator activated="true" class="generate_data" compatibility="5.0.10" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_of_attributes" value="4"/>
            <parameter key="attributes_lower_bound" value="0.0"/>
          </operator>
          <operator activated="true" class="generate_aggregation" compatibility="5.0.10" expanded="true" height="76" name="Generate Aggregation" width="90" x="246" y="30">
            <parameter key="attribute_name" value="SumAll"/>
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="att4|att3|att2|att1"/>
          </operator>
          <operator activated="true" class="loop_attributes" compatibility="5.0.10" expanded="true" height="60" name="Loop Attributes" width="90" x="45" y="165">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="att4|att3|att2|att1"/>
            <process expanded="true" height="446" width="534">
              <operator activated="true" class="branch" compatibility="5.0.10" expanded="true" height="76" name="Branch sum!=0?" width="90" x="179" y="30">
                <parameter key="condition_type" value="attribute_value_filter"/>
                <parameter key="condition_value" value="sum != 0"/>
                <process expanded="true" height="446" width="242">
                  <operator activated="true" class="generate_attributes" compatibility="5.0.10" expanded="true" height="76" name="Generate Attributes" width="90" x="45" y="30">
                    <list key="function_descriptions">
                      <parameter key="rel_%{loop_attribute}" value="%{loop_attribute}/SumAll"/>
                    </list>
                  </operator>
                  <connect from_port="condition" to_op="Generate Attributes" to_port="example set input"/>
                  <connect from_op="Generate Attributes" from_port="example set output" to_port="input 1"/>
                  <portSpacing port="source_condition" spacing="0"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="sink_input 1" spacing="0"/>
                  <portSpacing port="sink_input 2" spacing="0"/>
                </process>
                <process expanded="true" height="446" width="242">
                  <operator activated="true" class="generate_attributes" compatibility="5.0.10" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="45" y="30">
                    <list key="function_descriptions">
                      <parameter key="rel_%{loop_attribute}" value="0"/>
                    </list>
                  </operator>
                  <connect from_port="condition" to_op="Generate Attributes (2)" to_port="example set input"/>
                  <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="input 1"/>
                  <portSpacing port="source_condition" spacing="0"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="sink_input 1" spacing="0"/>
                  <portSpacing port="sink_input 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="example set" to_op="Branch sum!=0?" to_port="condition"/>
              <connect from_op="Branch sum!=0?" from_port="input 1" to_port="example set"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate Aggregation" to_port="example set input"/>
          <connect from_op="Generate Aggregation" from_port="example set output" to_op="Loop Attributes" to_port="example set"/>
          <connect from_op="Loop Attributes" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="126"/>
          <portSpacing port="sink_result 2" spacing="36"/>
        </process>
      </operator>
    </process>
  • onesix4onesix4 Member Posts: 7 Contributor II
    Hi,
    I have attempted to load this example to generate a percentage but the "Process" fails:

    Oct 18, 2012 12:25:00 PM SEVERE: Process failed: operator cannot be executed (Unknown attribute: 'sum'). Check the log messages...
    Oct 18, 2012 12:25:00 PM SEVERE: Here:          Process[1] (Process)
              subprocess 'Main Process'
                +- Generate Data[1] (Generate Data)
                +- Generate Aggregation[1] (Generate Aggregation)
                +- Loop Attributes[1] (Loop Attributes)
              subprocess 'Subprocess'
          ==>        +- Branch sum!=0?[1] (Branch)
              subprocess 'Then'
                          |  +- Generate Attributes[0] (Generate Attributes)
              subprocess 'Else'


    I would like to have an example to generate a percentage column based on data read.
    Thanks.
  • wesselwessel Member Posts: 537 Maven
    Sebastian made a small typo, so change: 'sum' into 'SumAll'

    Its not that hard to figure out what is going on.
    Basically each row is first summed over on every column.
    Then each column gets divided by the sum.

    Best regards,

    Wessel
Sign In or Register to comment.