How can I compute/derive additional attributes?

ChrisNelsonChrisNelson Member Posts: 11 Contributor II
edited November 2018 in Help
I have a dataset with two columns, A, and B.  For each record, i want to compute C = (B-A)/B.  Can I do that in RapidMiner transformations?  Can you direct me to the right one?

I also want to compute the overall C.  That is ((B1 + B2 + B3 ...) - (A1 + A2 + A3 ...)) / (B1 + B2 + B3 ...).  Can I compute this aggregate function?  How?  (I think this is a special case of a weighted average which I think I've seen reference to but haven't found the method for computing.)



  • Options
    ChrisNelsonChrisNelson Member Posts: 11 Contributor II
    So, I found Generate Attribute as the means to create C = f(A,B).  Still looking for a new summary function.
  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn

    Use Generate Aggregation to create new attributes based on a function applied to attributes within a single example of an example set


  • Options
    ChrisNelsonChrisNelson Member Posts: 11 Contributor II
    In 5.2.008 on Linux, the Generate Aggregation has many more parameters in the help than in the form above it.  Notably missing is "attribute".  I see only:

    * attribute name
    * attribute filter type
    * invert selection
    * include special attributes
    * aggregation function
    * keep all
    * ignore missings

    Is this a bug or am I missing a step?
  • Options
    ChrisNelsonChrisNelson Member Posts: 11 Contributor II
    When I pick "single" or "subset" instead of "all" for the filter type, a new control appears that allows me to pick attributes.  But then it appears that Generate Aggregation works across a single row.  What I need is something that works down the columns, perhaps this is creating new meta data?

    I can use Generate Attribute to computer C = (B-A)/A for each row:
    But I need to calculate ((B1 + B2 + B3) - (A1 + A2 + A3)) / (A1 + A2 + A3):
    I guess I'm adding a new row, not a new column.  How do I do that?
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn

    use Aggregate (not Create Aggregation) to create column-wise sum-aggregations. You will end up with a new example set, on which you can apply the formula (sum(A)-sum(B))/sum(A).
    Please note that Generate Attributes cannot handle attributes with parenthesis in their names, so you have to rename the aggregation attributes. Please have a look at the attached process.

    Happy Mining!
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
      <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
        <process expanded="true" height="161" width="681">
          <operator activated="true" class="generate_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
          <operator activated="true" class="aggregate" compatibility="5.3.000" expanded="true" height="76" name="Aggregate" width="90" x="179" y="30">
            <list key="aggregation_attributes">
              <parameter key="att1" value="sum"/>
              <parameter key="att2" value="sum"/>
          <operator activated="true" class="rename" compatibility="5.3.000" expanded="true" height="76" name="Rename" width="90" x="313" y="30">
            <parameter key="old_name" value="sum(att1)"/>
            <parameter key="new_name" value="sum_att1"/>
            <list key="rename_additional_attributes"/>
          <operator activated="true" class="rename" compatibility="5.3.000" expanded="true" height="76" name="Rename (2)" width="90" x="447" y="30">
            <parameter key="old_name" value="sum(att2)"/>
            <parameter key="new_name" value="sum_att2"/>
            <list key="rename_additional_attributes"/>
          <operator activated="true" class="generate_attributes" compatibility="5.3.000" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="30">
            <list key="function_descriptions">
              <parameter key="value" value="(sum_att1-sum_att2) / sum_att2"/>
          <connect from_op="Generate Data" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
          <connect from_op="Rename (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
Sign In or Register to comment.