"Rank order of attributes to each cluster"

doronadorona Member Posts: 1 Contributor I
edited May 2019 in Help
I just now started to play around with clustering and using Rapid Miner I was able to get results. Now my problem is how to categorize each cluster. Is there a way to get out of Rapid Miner for each cluster a ranked ordered list of attributes that best describe each cluster?
In addition, it would be great to have an actual value of contribution to the model and a statistic to measure its statistical significance as well.



  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    yes, this is possible with RapidMiner. After clustering, each example in the input data set gets a cluster id assigned. Now you could use the new operator "AttributeConstruction" (will replace the operator FeatureGeneration in future releases together with the new ValueIterator operator). The whole setup looks like this:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="number_examples" value="200"/>
            <parameter key="number_of_attributes" value="10"/>
            <parameter key="target_function" value="gaussian mixture clusters"/>
        <operator name="IdTagging" class="IdTagging">
        <operator name="KMeans" class="KMeans">
            <parameter key="k" value="5"/>
        <operator name="IOConsumer" class="IOConsumer">
            <parameter key="io_object" value="ClusterModel"/>
        <operator name="ValueIterator" class="ValueIterator" expanded="yes">
            <parameter key="attribute" value="cluster"/>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="AttributeConstruction" class="AttributeConstruction">
                    <list key="function_descriptions">
                      <parameter key="inner_label_%{loop_value}" value="if (cluster == &quot;%{loop_value}&quot;, &quot;%{loop_value}&quot;, &quot;other&quot;)"/>
                <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
                    <parameter key="name" value="inner_label_%{loop_value}"/>
                    <parameter key="target_role" value="label"/>
                <operator name="Relief" class="Relief">
                <operator name="IOConsumer (2)" class="IOConsumer">
                    <parameter key="io_object" value="ExampleSet"/>
    Please note that you will have to use the latest CVS version of RapidMiner or you would have to wait until the next release in order to get access to the latest version containing both new operators. It's by the way also possible with older versions but the process is much more complicated then.

Sign In or Register to comment.