Options

How do I define the distance measure to be used for clustering methods?

DiePaupiDiePaupi Member Posts: 3 Contributor I
So I'm currently working on different clustering methods to analyse music data.
I'm using RapidMiner as a library and want to use e.g. the k-Means method. I've already initialized everything but I'm still struggling to define the Distance Measure to be used. I'd like to be able to choose between all of those RapidMiner offers for numerical values, but can't find how I'd have to set it and if it would be possible to get a list of those measures the clustering method supports.

I set the operator and its parameters in my class like this:
Operator clusterer = OperatorService.createOperator(FastKMeans.class);
clusterer.setParameter("k", new Integer(k).toString());
...

But the Distance Measure isn't set via a parameter but based on the given example set (in e.g. FastKMeans.class):
DistanceMeasure measure = this.getInitializedMeasure(eSet);

vs.

int k = this.getParameterAsInt("k");

So how do I set the measure?
Tagged:

Best Answer

  • Options
    jczogallajczogalla Employee, Member Posts: 144 RM Engineering
    Solution Accepted
    Hi @DiePaupi!

    You should be able to find those parameters in all the cluster operators. They should show up when calling
    getParameterTypes()
    In code, you can see that they are added by this line: 
    types<span>.addAll(getMeasureParameterTypes());</span>
    So it should be as easy to set these parameters as with the parameter k as you are already doing.

    Cheers
    Jan

Answers

  • Options
    jczogallajczogalla Employee, Member Posts: 144 RM Engineering
    Hi @DiePaupi!

    You can find information on the parameters regarding measures on github
    The measures are adjusted/initialized based on the given data, but they are provided by parameters as you can see when looking at the operator's parameters in Studio:


    To set the measure to a specific one, you have to set the measure type first (using the Constant PARAMETER_MEASURE_TYPESmeasure_types), which has the possible values "MixedMeasures", "NominalMeasures", "NumericalMeasures", "BregmanDivergences".
    Secondly you set the specific measure to use with the corresponding parameter (one of PARAMETER_[NOMINAL|NUMERICAL|MIXED]_MEASURE or PARAMETER_DIVERGENCE) and set the the value to one of the possibilites provided in the different type arrays in above mentioned class. You can of course just use the correct strings here, but we recommend to use the constants where possible.

    If you have more questions, feel free to ask!

    Cheers
    Jan
  • Options
    DiePaupiDiePaupi Member Posts: 3 Contributor I
    Hey @jczogalla, thanks for the quick reply!

    Where would I set those parameters? I couldn't find them in the clustering methods / operator class.

    Thank you for your help!
  • Options
    DiePaupiDiePaupi Member Posts: 3 Contributor I

    Thank you for your help! It works fine now and for the record the specific lines I use are as follows:
    clusterer.setParameter(DistanceMeasures.PARAMETER_MEASURE_TYPES, 
    DistanceMeasures.MEASURE_TYPES[DistanceMeasures.NUMERICAL_MEASURES_TYPE]);
    clusterer.setParameter(DistanceMeasures.PARAMETER_NUMERICAL_MEASURE, measureType);
    Where "measureType" is a String containing the name of the distance measure to be used which is specified in the DistanceMeasure.class:
    NUMERICAL_MEASURES = new String[]{"EuclideanDistance", "CamberraDistance", "ChebychevDistance", 
    "CorrelationSimilarity", "CosineSimilarity", "DiceSimilarity", "DynamicTimeWarpingDistance",
    "InnerProductSimilarity", "JaccardSimilarity", "KernelEuclideanDistance", "ManhattanDistance",
    "MaxProductSimilarity", "OverlapSimilarity"};

    Cheers
    Paupi
Sign In or Register to comment.