Options

# How can I compute/derive additional attributes?

Member Posts: 11 Contributor II
edited November 2018 in Help
I have a dataset with two columns, A, and B.  For each record, i want to compute C = (B-A)/B.  Can I do that in RapidMiner transformations?  Can you direct me to the right one?

I also want to compute the overall C.  That is ((B1 + B2 + B3 ...) - (A1 + A2 + A3 ...)) / (B1 + B2 + B3 ...).  Can I compute this aggregate function?  How?  (I think this is a special case of a weighted average which I think I've seen reference to but haven't found the method for computing.)

Thanks.

• Options
Member Posts: 11 Contributor II
So, I found Generate Attribute as the means to create C = f(A,B).  Still looking for a new summary function.
• Options
RapidMiner Certified Expert, Member Posts: 458 Unicorn
Hello

Use Generate Aggregation to create new attributes based on a function applied to attributes within a single example of an example set

regards

Andrew
• Options
Member Posts: 11 Contributor II
In 5.2.008 on Linux, the Generate Aggregation has many more parameters in the help than in the form above it.  Notably missing is "attribute".  I see only:

* attribute name
* attribute filter type
* invert selection
* include special attributes
* aggregation function
* keep all
* ignore missings

Is this a bug or am I missing a step?
• Options
Member Posts: 11 Contributor II
When I pick "single" or "subset" instead of "all" for the filter type, a new control appears that allows me to pick attributes.  But then it appears that Generate Aggregation works across a single row.  What I need is something that works down the columns, perhaps this is creating new meta data?

Given:  A B 1 2 2 3 4 1
I can use Generate Attribute to computer C = (B-A)/A for each row:  A B C 1 2 1.0 2 3 0.5 4 1 -0.75
But I need to calculate ((B1 + B2 + B3) - (A1 + A2 + A3)) / (A1 + A2 + A3):  A B C 1 2 1.0 2 3 0.5 4 1 -0.75 7 6 [glow=red,2,300]-0.14[/glow]
I guess I'm adding a new row, not a new column.  How do I do that?
• Options
RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
Hi,

use Aggregate (not Create Aggregation) to create column-wise sum-aggregations. You will end up with a new example set, on which you can apply the formula (sum(A)-sum(B))/sum(A).
Please note that Generate Attributes cannot handle attributes with parenthesis in their names, so you have to rename the aggregation attributes. Please have a look at the attached process.

Happy Mining!
~Marius
`<?xml version="1.0" encoding="UTF-8" standalone="no"?><process version="5.3.000">  <context>    <input/>    <output/>    <macros/>  </context>  <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">    <process expanded="true" height="161" width="681">      <operator activated="true" class="generate_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>      <operator activated="true" class="aggregate" compatibility="5.3.000" expanded="true" height="76" name="Aggregate" width="90" x="179" y="30">        <list key="aggregation_attributes">          <parameter key="att1" value="sum"/>          <parameter key="att2" value="sum"/>        </list>      </operator>      <operator activated="true" class="rename" compatibility="5.3.000" expanded="true" height="76" name="Rename" width="90" x="313" y="30">        <parameter key="old_name" value="sum(att1)"/>        <parameter key="new_name" value="sum_att1"/>        <list key="rename_additional_attributes"/>      </operator>      <operator activated="true" class="rename" compatibility="5.3.000" expanded="true" height="76" name="Rename (2)" width="90" x="447" y="30">        <parameter key="old_name" value="sum(att2)"/>        <parameter key="new_name" value="sum_att2"/>        <list key="rename_additional_attributes"/>      </operator>      <operator activated="true" class="generate_attributes" compatibility="5.3.000" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="30">        <list key="function_descriptions">          <parameter key="value" value="(sum_att1-sum_att2) / sum_att2"/>        </list>      </operator>      <connect from_op="Generate Data" from_port="output" to_op="Aggregate" to_port="example set input"/>      <connect from_op="Aggregate" from_port="example set output" to_op="Rename" to_port="example set input"/>      <connect from_op="Rename" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>      <connect from_op="Rename (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>      <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>      <portSpacing port="source_input 1" spacing="0"/>      <portSpacing port="sink_result 1" spacing="0"/>      <portSpacing port="sink_result 2" spacing="0"/>    </process>  </operator></process>`