"How to get the standard deviation from clustered data?"

stever1kstever1k Member Posts: 10 Contributor II
edited May 2019 in Help

after clustering my data, the data has the following format:

id A B C Cluster
a x y z  0
.. .... .... 1
.. .... .... 1
.. .... .... 2
.. .... .... 0
.. .... ....
.. .... .... N

So the cluster algorithm found several clusters and created a new column with the attribute cluster. I now want to calculate the standard deviation for Cluster 0 for the attributes A B and C, the same for cluster 1 up to N. Any ideas how this works?



  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Stever,
    this is a typical situation for using the aggregation operator. You can group the examples by the cluster and then calculate a aggregation function over each attribute. I have done this in this process:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="gaussian mixture clusters"/>
        <operator name="KMeans" class="KMeans">
            <parameter key="k" value="3"/>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="cluster"/>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="att1" value="standard_deviation"/>
              <parameter key="att2" value="standard_deviation"/>
              <parameter key="att3" value="standard_deviation"/>
              <parameter key="att4" value="standard_deviation"/>
              <parameter key="att5" value="standard_deviation"/>
            <parameter key="group_by_attributes" value="cluster"/>
    It should be easy to adapt it onto your needs.

  • Options
    stever1kstever1k Member Posts: 10 Contributor II
    thanks a lot Sebastian, that is EXACTLY what I'm looking for. My problem was, that I was searching for suitable operator inside the preprocession->attributres tree instead of the olap!

    best regards,
Sign In or Register to comment.