"Rocchio Algorithm"

dalidali Member Posts: 6 Contributor II
edited May 2019 in Help

is there an implementation of the rocchio algorithm in RapidMiner? Or how could I change the k-Nearest-Neighbor to a Rocchio by calculating the average word vector for each class and use only these for classification.

THX in advance.


  • dalidali Member Posts: 6 Contributor II
    Hello again,

    it's pretty sad, that there is no Rocchio in RapidMiner. Now I'm trying to set up my own but already having problems while trying to get the mean of all word vectors of a class.

    Is there a function that averages all given word vectors so I get one centroid vector? I can't find it.

    Thanks for any help.
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    this is possible if you somehow missuse K-Medoids. See the following process for details:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.001">
      <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
        <process expanded="true" height="190" width="614">
          <operator activated="true" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve" width="90" x="112" y="75">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          <operator activated="true" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="246" y="75">
            <parameter key="name" value="label"/>
            <list key="set_additional_roles"/>
          <operator activated="true" class="k_medoids" compatibility="5.1.001" expanded="true" height="76" name="Clustering" width="90" x="514" y="75">
            <parameter key="k" value="3"/>
            <parameter key="measure_types" value="NominalMeasures"/>
          <connect from_op="Retrieve" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Clustering" to_port="example set"/>
          <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
          <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
    Unfortunately this won't work in the current version because of a bug in the nominal Distance measure using the numerical attributes, too. This is resolved with the coming update at end of next week.

  • dalidali Member Posts: 6 Contributor II
    Thanx for the reply. I'm really looking forward to try it by the end of the week. I'll tell, if it worked.
  • dalidali Member Posts: 6 Contributor II
    well, it looked like a good idea to "misuse K-Medoids" but it's taking hours to calculate - I stopped it after half an hour. I think the problem is, that RM is trying to find my classes, but using the given classes might help speeding up the whole process.

    isn't there another operator to just calculate the mean of some wordvectors? there must be anything like averaging all given vectors and getting the mean vector?! just can't find it.

    thanks a lot for any advice.
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    I have just uploaded a process which calculates the average values for all attributes grouped by the class and uses the resulting prototypes as input for the k-NN learner. It might be that you need a recent RapidMiner version since this process makes use of a relatively new feature of the operator "Aggregate", namely to directly aggregate a set of attributes with the same default function. Otherwise you will have to define all aggegations for all attributes manually which is of course not really possible for word vectors...

    The description of the process on myExperiment can be found at


    You can directly download the process from myExperiment within RapidMiner (which I strongly recommend) by using the Community Extension of RapidMiner. Just install the extension and activate the "MyExperiment Browser" view. Then you can easily search for processes and download them. The process is called "Rocchio".

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    let me mention that this is possible only with the 5.1.002+ version released a week before.

    Some problems become outdated really fast...

Sign In or Register to comment.