How can I compute/derive additional attributes?

ChrisNelson
edited November 2018
I have a dataset with two columns, A, and B.  For each record, i want to compute C = (B-A)/B.  Can I do that in RapidMiner transformations?  Can you direct me to the right one?

I also want to compute the overall C.  That is ((B1 + B2 + B3 ...) - (A1 + A2 + A3 ...)) / (B1 + B2 + B3 ...).  Can I compute this aggregate function?  How?  (I think this is a special case of a weighted average which I think I've seen reference to but haven't found the method for computing.)



    ChrisNelson
    So, I found Generate Attribute as the means to create C = f(A,B).  Still looking for a new summary function.
    awchisholm

    Use Generate Aggregation to create new attributes based on a function applied to attributes within a single example of an example set


    ChrisNelson
    In 5.2.008 on Linux, the Generate Aggregation has many more parameters in the help than in the form above it.  Notably missing is "attribute".  I see only:

    * attribute name
    * attribute filter type
    * invert selection
    * include special attributes
    * aggregation function
    * keep all
    * ignore missings

    Is this a bug or am I missing a step?
    ChrisNelson
    When I pick "single" or "subset" instead of "all" for the filter type, a new control appears that allows me to pick attributes.  But then it appears that Generate Aggregation works across a single row.  What I need is something that works down the columns, perhaps this is creating new meta data?

    I can use Generate Attribute to computer C = (B-A)/A for each row:
    But I need to calculate ((B1 + B2 + B3) - (A1 + A2 + A3)) / (A1 + A2 + A3):
    I guess I'm adding a new row, not a new column.  How do I do that?
    MariusHelf

    use Aggregate (not Create Aggregation) to create column-wise sum-aggregations. You will end up with a new example set, on which you can apply the formula (sum(A)-sum(B))/sum(A).
    Please note that Generate Attributes cannot handle attributes with parenthesis in their names, so you have to rename the aggregation attributes. Please have a look at the attached process.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
      <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
        <process expanded="true" height="161" width="681">
          <operator activated="true" class="generate_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
          <operator activated="true" class="aggregate" compatibility="5.3.000" expanded="true" height="76" name="Aggregate" width="90" x="179" y="30">
            <list key="aggregation_attributes">
              <parameter key="att1" value="sum"/>
              <parameter key="att2" value="sum"/>
          <operator activated="true" class="rename" compatibility="5.3.000" expanded="true" height="76" name="Rename" width="90" x="313" y="30">
            <parameter key="old_name" value="sum(att1)"/>
            <parameter key="new_name" value="sum_att1"/>
            <list key="rename_additional_attributes"/>
          <operator activated="true" class="rename" compatibility="5.3.000" expanded="true" height="76" name="Rename (2)" width="90" x="447" y="30">
            <parameter key="old_name" value="sum(att2)"/>
            <parameter key="new_name" value="sum_att2"/>
            <list key="rename_additional_attributes"/>
          <operator activated="true" class="generate_attributes" compatibility="5.3.000" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="30">
            <list key="function_descriptions">
              <parameter key="value" value="(sum_att1-sum_att2) / sum_att2"/>
          <connect from_op="Generate Data" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
          <connect from_op="Rename (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
