How do I smooth by bin means?

JamisonWJamisonW Member Posts: 2 Contributor I
edited November 2018 in Help
For an assignment, i need to use smoothing by bin means. Where you sort a value, create bins of the same size, and replace the value with the bin mean.I'm having a tough time finding this feature. Discretization is the only section that discusses binning and I didn't see anything dealing with means in the transformations section. Does RapidMiner support this?

After searching a bit, I've only seen this technique mentioned in academic papers and presentations. Is this not a common technique for professionals? What is a more preferred smoothing approach?



  • Options
    SkirzynskiSkirzynski Member Posts: 164 Maven
    I didn't find the operator either (which does not mean there is no), but i have found a workaround, which can help you:
    • Copy the attribute you want to smooth with the "Generate Attribute"-operator
    • Use your favored discretization on the copied attribute
    • Apply a average-aggregation with the copied attribute as grouping attribute and the original attribute as aggregation with the average-function
    Here is an example process for this:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.009">
      <operator activated="true" class="process" compatibility="5.2.009" expanded="true" name="Process">
        <process expanded="true" height="558" width="696">
          <operator activated="true" class="generate_data" compatibility="5.2.009" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="sum"/>
            <parameter key="number_examples" value="10"/>
            <parameter key="number_of_attributes" value="1"/>
          <operator activated="true" class="generate_attributes" compatibility="5.2.009" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="30">
            <list key="function_descriptions">
              <parameter key="att1_group" value="att1"/>
          <operator activated="true" class="discretize_by_bins" compatibility="5.2.009" expanded="true" height="94" name="Discretize" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="att1_group"/>
            <parameter key="number_of_bins" value="5"/>
          <operator activated="true" class="aggregate" compatibility="5.2.009" expanded="true" height="76" name="Aggregate" width="90" x="447" y="30">
            <list key="aggregation_attributes">
              <parameter key="att1" value="average"/>
            <parameter key="group_by_attributes" value="|att1_group"/>
          <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Discretize" to_port="example set input"/>
          <connect from_op="Discretize" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
          <connect from_op="Aggregate" from_port="original" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
    Now you can delete ("Select Attributes"-operator) the copied attribute and the original attribute is smoothed.  8)

    It isn't very elegant, escpecially if you want to smooth more than one attribute, but maybe this is sufficient for your needs. I will ask around for another way to accomplish this.
  • Options
    JamisonWJamisonW Member Posts: 2 Contributor I
    Thanks Marcn,

    That got me on the right track! I had to do one extra-step to join the averages back into the original set.

    My bins are still not coming out the same as in Excel, so I'll need to review. I think the difference is that in Excel I created a bin every four rows whereas RapidMiner is creating ranges for the bins. This leads to some bins having 3 and some having 5 items. To resolve this I'm looking into sorting by my value and adding a row count column (can RM do this?). The row count column will become my field to discretize.

    I found "a" solution.
    1. Sort by Value
    2. Generate Id (this will be a row number based on the sort)
    3. Set Role of new Id to Regular
    4. Discretize by Size on Id from #2
    5. Multiply
    6. Aggregate values from #1 grouped by Id from #2
    7. Join original to #6

    You now have a data set with your values grouped by bin mean.

Sign In or Register to comment.