RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.


"Similarity Measure into Clustering"

B_MinerB_Miner Member Posts: 72  Maven
edited June 2019 in Help
Hi Guys,

Is it possible to use RM to create a distance matrix (say Jaccard Sim) and use this matrix into a cluster analysis? If so are there any examples?




  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi Brian,
    both is possible. You might create a distance matrix using the Data to Similarity operator and select Jaccard Simularity as distance function. And you might do clustering selecting the same distance function using for example kMedoids.

  • B_MinerB_Miner Member Posts: 72  Maven
    Hi Sebastian,

    I tried to hook up a Data to Similarity operator to kmeans and got an error. Is kMedoids the only clustering that can take a distance matrix as input? Example that causes error for type of input into kmeans:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="296" width="280">
          <operator activated="true" class="generate_nominal_data" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="165"/>
          <operator activated="true" class="data_to_similarity" expanded="true" height="76" name="Data to Similarity" width="90" x="112" y="30">
            <parameter key="measure_types" value="NominalMeasures"/>
            <parameter key="nominal_measure" value="JaccardSimilarity"/>
          <operator activated="true" class="k_means" expanded="true" height="76" name="Clustering" width="90" x="179" y="165"/>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Data to Similarity" to_port="example set"/>
          <connect from_op="Data to Similarity" from_port="similarity" to_op="Clustering" to_port="example set"/>
          <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    K-Means does always use Euclidean distance, it's simply part of the algorithm. In Kmedoids, you might select the distance function, but you cannot forward a similarity matrix. It will calculate the similarities from the given example set as it needs them.

Sign In or Register to comment.