Set roles for unsupervised learning - clustering

lovefinearts198lovefinearts198 Member Posts: 6 Contributor II
edited November 2018 in Help
Hi there,

I have a dataset example as follow : id_productColorreferencequantity_orderedprice_paidweight_in_grams1RedA110001002RedA225002003BlueA220002004RedB20200008005BlueB820006506RedB60050000120007BlueB545005008BlueC880008009BlueC21000150010RedD965006500011BlueE12120001405012RedE45450003350
I want to perform a clustering operation to detect anomalies, but i am not sure about the kind of role i must give to my attributes.

I was thinking about :
  • id_product : id
  • Color : cluster
  • reference : label
  • quantity_ordered : weight
  • price_paid : regular
  • weight_in_grams : regular
Am i wrong ? right ?

Thanks for help.


  • lovefinearts198lovefinearts198 Member Posts: 6 Contributor II
    Can anyone can give me a lead or a way to understand the roles in rapidminer ?
  • lovefinearts198lovefinearts198 Member Posts: 6 Contributor II
    Hello again,

    is my question too easy or too complexe ??

    Here's rapidminer help extract :
    This operator can be used to change the role of an attribute of the input ExampleSet. If you want to change the attribute name you should use the
    Rename operator. The target role indicates if the attribute is a regular attribute (used by learning operators) or a special attribute (e.g. a label or id attribute).

    The following target attribute types are possible:
    • regular: only regular attributes are used as input variables for learning tasks
    • id: the id attribute for the example set
    • label: target attribute for learning
    • prediction: predicted attribute, i.e. the predictions of a learning scheme
    • cluster: indicates the membership to a cluster
    • weight: indicates the weight of the example
    • batch: indicates the membership to an example batch
    Users can also define own attribute types by simply using the desired name.

    Please be aware that roles have to be unique! Assigning a non regular role the second time will cause the first attribute to be dropped from the example set. If you want to keep this attribute, you have to change it's role first.
    So perhaps this is better ?
    • id_product : id
    • Color : regular
    • reference : label
    • quantity_ordered : regular
    • price_paid : regular
    • weight_in_grams : regular
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn

    Set the attributes you want to use to drive cluster membership to be "regular"

    All other types will by ignored by the clustering.

    I don't know your data but if the attribute called "reference" is some sort of pre-existing classification and you want to compare with the final clustering then it makes sense to set the role of this to be label as you have done. There is an operator called "map clustering on labels" that can be used to determine which cluster is closest to the labels. the resultant example set contains a prediction that can be used to determine a performance measure using the "performance" operator.


Sign In or Register to comment.