Options

"Aggregate samples by cluster name and create average sample per cluster"

smariesmarie Member Posts: 3 Contributor I
edited June 2019 in Help
Hello Rapidminers,

I was wondering how to simply display the results of some clustering. In particular I would love to see the average sample of each cluster, and display all of them in the same window. I have found several ways to do that  but none is satisfactory:

- a) Use the "Aggregate" operator with the GroupBy="cluster" and all 100 attributes, one by one (I cant really do this !! :) ), in the "aggregation attributes" parameter (I haven't found any wildcard here to say that I want all real-valued attributes to be averaged)

- b) Use the "Multiply" operator as many times as needed (one per cluster). In each branch use filtering on the "cluster" attribute so that each branch now contains the subset of the sample set corresponding to "cluster_0", "cluster_1",... . Finally transpose the sample set and use the "Generate Aggregation" operator so that a new attribute is created, being the average of all others. Since the transpose operator has been used this new attribute is actually the new sample (the average of the samples in that cluster).
> issue: now I have x different samplesets (one for each cluster) and it seems that there is no operator to put all of them together in a new sampleset.


Is there an easy way to solve this problem, that is really simply an average of all rows belonging to each cluster group ? Maybe with the R plugin ?
Any help would be very much appreciated

Cheers

Sylvain

Answers

  • Options
    smariesmarie Member Posts: 3 Contributor I
    Hi,

    I have thought of another way : maybe I can use a "script" operator in order to generate the correct inputs (the list of all real-valued attributes) and then pass them to the "Aggregate" operator. I came up with the following java code to use in the "script" operator, but I still have to dig the java sources to understand how to correctly pass the parameters and trigger the "Aggregate" operator from there.

    Do you think this has a chance to work ?

    ExampleSet exampleSet = operator.getInput(ExampleSet.class);

    // getParameterAsBoolean(PARAMETER_ONLY_DISTINCT )
    boolean onlyDistinctValues = false;

    // getParameterAsBoolean(PARAMETER_IGNORE_MISSINGS )
    boolean ignoreMissings = false;

    /*
    * we create a list of tuples (attribute_name,"average") for all
    * real-valued attributes
    */
    List<String[]> parameterList = new ArrayList<String[]>();
    Attributes attributes = exampleSet.getAttributes();
    Iterator<Attribute> r = attributes.allAttributes();
    while (r.hasNext()) {
    Attribute attribute = r.next();
    if (Ontology.ATTRIBUTE_VALUE_TYPE.isA(attribute.getValueType(), Ontology.REAL)) {
    parameterList.add(new String[] { attribute.getName(), "average" });
    }
    }

    Operator aggOperator = new AggregationOperator (TODO....)

    return aggOperator.apply();

    Thanks in advance for your help
    Best regards

    Sylvain
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    if it's a centroid based clustering this is already shown in the result screen.

    Otherwise I must admit, you have a problem. Well, I wasn't aware that this isn't possible. Please write a feature request for that in the bug tracker.

    Greetings,
    Sebastian
  • Options
    chikayachikaya Member Posts: 5 Contributor II
    I kow its an old thread, but anyway, I would vote for the feature too, just as Sebastians says it is already implemented in the Centroid based operators. I work with Spectra and would like to the "Centroids"  ??? when i use other clustering operators!
Sign In or Register to comment.