Grouping of classes

castecaste Member Posts: 9 Contributor II
edited November 2018 in Help

With RapidMiner is it possible to automatically collapse the classes in a learning set on a given number of classes by their cardinality so that variance? The goal is to improve the precision of methods such as SVM and KNN.

I have a learning set of 20.000 elements divided in more than 100 classes, with high variance in the number of elements and I need to reduce them to 20 classes.

For example:

Class A - 3 elements
Class B - 4 elements
Class C - 8 elements

It would be nice to have the opportunity to reduce to a given number of classes, i.e. 2 this way:

Class 1 - 7 elements (obtained by Class A and B)
Class 2 - 8 elements (obtained by Class C)

Please, help me!! I'm trying with operations research methods but have so less time...

Thank you!


  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    I'm not quite sure if I understood you correctly. You want to merge most similar classes to improve the precision? But this would not improve performance on the problem, instead it would simply change the problem...
    But if you want to do this manually, you could use the MergeNominalValues operator to do this. Perhabs you should take a look at the parameterIteration operator and its examples in the meta directory of the example processes. It could save you a lot of typing.

  • Options
    castecaste Member Posts: 9 Contributor II
    I needed that because I'm building an hashing system to distribute a huge load of information. The semantic bonds are not that important, so I could collapse classes without taking care of their names but of their weight in the context. This balancing helps the SVM recognition.

    Actually I solved my problem using a Operational Research method, the Assembly Line Balancing problem implementation.

    Just a note: i tried to use the evolutionary parameter optimization of the examples, but even with the examples it took really many hours, so I decided to change approach.

    Thanks for your availability and compliments for the software you realized and the choice of keeping it open source: it is really great!

Sign In or Register to comment.