Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

How can I compute/derive additional attributes?

ChrisNelsonChrisNelson Member Posts: 11 Contributor II
edited November 2018 in Help
I have a dataset with two columns, A, and B.  For each record, i want to compute C = (B-A)/B.  Can I do that in RapidMiner transformations?  Can you direct me to the right one?

I also want to compute the overall C.  That is ((B1 + B2 + B3 ...) - (A1 + A2 + A3 ...)) / (B1 + B2 + B3 ...).  Can I compute this aggregate function?  How?  (I think this is a special case of a weighted average which I think I've seen reference to but haven't found the method for computing.)

Thanks.

Answers

  • ChrisNelsonChrisNelson Member Posts: 11 Contributor II
    So, I found Generate Attribute as the means to create C = f(A,B).  Still looking for a new summary function.
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    Use Generate Aggregation to create new attributes based on a function applied to attributes within a single example of an example set

    regards

    Andrew
  • ChrisNelsonChrisNelson Member Posts: 11 Contributor II
    In 5.2.008 on Linux, the Generate Aggregation has many more parameters in the help than in the form above it.  Notably missing is "attribute".  I see only:

    * attribute name
    * attribute filter type
    * invert selection
    * include special attributes
    * aggregation function
    * keep all
    * ignore missings

    Is this a bug or am I missing a step?
  • ChrisNelsonChrisNelson Member Posts: 11 Contributor II
    When I pick "single" or "subset" instead of "all" for the filter type, a new control appears that allows me to pick attributes.  But then it appears that Generate Aggregation works across a single row.  What I need is something that works down the columns, perhaps this is creating new meta data?

    Given:
    AB
    12
    23
    41
    I can use Generate Attribute to computer C = (B-A)/A for each row:
    ABC
    121.0
    230.5
    41-0.75
    But I need to calculate ((B1 + B2 + B3) - (A1 + A2 + A3)) / (A1 + A2 + A3):
    ABC
    121.0
    230.5
    41-0.75
    76[glow=red,2,300]-0.14[/glow]
    I guess I'm adding a new row, not a new column.  How do I do that?
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    use Aggregate (not Create Aggregation) to create column-wise sum-aggregations. You will end up with a new example set, on which you can apply the formula (sum(A)-sum(B))/sum(A).
    Please note that Generate Attributes cannot handle attributes with parenthesis in their names, so you have to rename the aggregation attributes. Please have a look at the attached process.

    Happy Mining!
    ~Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
        <process expanded="true" height="161" width="681">
          <operator activated="true" class="generate_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
          <operator activated="true" class="aggregate" compatibility="5.3.000" expanded="true" height="76" name="Aggregate" width="90" x="179" y="30">
            <list key="aggregation_attributes">
              <parameter key="att1" value="sum"/>
              <parameter key="att2" value="sum"/>
            </list>
          </operator>
          <operator activated="true" class="rename" compatibility="5.3.000" expanded="true" height="76" name="Rename" width="90" x="313" y="30">
            <parameter key="old_name" value="sum(att1)"/>
            <parameter key="new_name" value="sum_att1"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="rename" compatibility="5.3.000" expanded="true" height="76" name="Rename (2)" width="90" x="447" y="30">
            <parameter key="old_name" value="sum(att2)"/>
            <parameter key="new_name" value="sum_att2"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.000" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="30">
            <list key="function_descriptions">
              <parameter key="value" value="(sum_att1-sum_att2) / sum_att2"/>
            </list>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
          <connect from_op="Rename (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.