Options

"Help With Cluster Output"

hgwelechgwelec Member Posts: 31 Maven
edited May 2019 in Help
Hello to Rapid-I Team,



One quick question :


I have a dataset which consists of Age, Number of children, Income etc. I am trying to run K-means through the dataset and everything works ok. However i would like to get the following format :

Cluster 0 : Age 22.5, Income : 1225, Children 0.25
Cluster 1 : Age 34.2,Income : 2300,Children : 2



Can RM output such information or it can provide just centroid distances???



Thanks!
Tagged:

Answers

  • Options
    TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi,
    hgwelec wrote:

    I have a dataset which consists of Age, Number of children, Income etc. I am trying to run K-means through the dataset and everything works ok. However i would like to get the following format :

    Cluster 0 : Age 22.5, Income : 1225, Children 0.25
    Cluster 1 : Age 34.2,Income : 2300,Children : 2

    Can RM output such information or it can provide just centroid distances???
    unfortunately I do not really understand what the problem is here. When I run [tt]KMeans[/tt] I get a lot of information including the a cluster centroid table like the one you want to see. But I do not see any information about distances. The information like the one above is contained in the Centroid Table view of the [tt]ClusterModel[/tt].

    Kind regards,
    Tobias
  • Options
    hgwelechgwelec Member Posts: 31 Maven
    Tobias,


    Thanks. I just can'tt output the ClusterCentroidModel because it gets "consumed" somewhere. Here is my XML setup

    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="D:\MyDocuments\Analyzer\data-numeric-.csv"/>
            <parameter key="label_name" value="class"/>
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="KMeans" class="KMeans">
                <parameter key="k" value="3"/>
                <parameter key="max_runs" value="5"/>
            </operator>
            <operator name="ClusterModelWriter" class="ClusterModelWriter">
                <parameter key="cluster_model_file" value="D:\Programs\Rapid-I\rm_workspace\cluster.clm"/>
            </operator>
            <operator name="ClusterCentroidEvaluator" class="ClusterCentroidEvaluator">
                <parameter key="keep_example_set" value="true"/>
            </operator>
        </operator>
        <operator name="ClusterModelReader" class="ClusterModelReader">
            <parameter key="cluster_model_file" value="D:\Programs\Rapid-I\rm_workspace\cluster.clm"/>
        </operator>
    </operator>



    Is this the way to do it?



    Many Thanks
  • Options
    TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi,
    hgwelec wrote:

    Thanks. I just can'tt output the ClusterCentroidModel because it gets "consumed" somewhere. Here is my XML setup
    You can check this yourself by clicking on the operators in the operator tree and then pressing F1. In the operator help dialog the inputs and outputs are listed. Another way is to use breakpoints in the process and inspect the intermediate results.

    In principal, your process setup is right. You can however use the [tt]IOMultiplier[/tt] alternatively, which allows you to generate a copy of an object before one of this will be consumed. Another way would be to use the [tt]IOStorer[/tt] - [tt]IORetriever[/tt] mechanism, which does not require the object being saved to disk.

    Kind regards,
    Tobias
  • Options
    hgwelechgwelec Member Posts: 31 Maven
    That is great.


    Thank you very much Tobias
Sign In or Register to comment.