WordNet in RM 5

simon_knollsimon_knoll Member Posts: 40 Maven
edited November 2018 in Help
Hello all,
short question: in RM 4.x there was this WordNetSynonymStemmer. is this operator gone in ver. 5 and one has to use groovy scripting instead?

simon knoll


  • WanttoknowWanttoknow Member Posts: 6 Contributor II

    I was asking myself the same thing: Where is the Wordnet stemmer in RM5?
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 294 RM Product Management

    I think the WordNet stemmer was removed since it did not work that well. Eventually, we try to re-animate it somewhen, but that is only speculation.

    Kind regards,
  • simon_knollsimon_knoll Member Posts: 40 Maven
    i coded myself a wordnet operator, if someone is interested i can share code snippets.
    what i can say is that for my testing dataset i've got some good results by adding hyponyms  for kmeans clustering.

    all the best,
  • B_B_ Member Posts: 70 Guru

    would appreciate seeing how you set this up. 

  • simon_knollsimon_knoll Member Posts: 40 Maven
    1st, you'll have to install wordnet
    2nd, you need a java wordnet api, i took this one http://projects.csail.mit.edu/jwi/ (not for commercial purposes, but the fastest i know)
    3rd, you'll have to implement an Operator (i added a new Class in the "com.rapidminer.operator.text.io.wordfilter" package)
    for this i just copied an operator of the text plugin, deleted all the things i do not need and added the code for wordnet (here i add hypernyms)

    i hope this was more helpful  than confusing ;)
    package com.rapidminer.operator.text.io.wordfilter;

    import java.io.File;
    import java.net.MalformedURLException;
    import java.net.URL;
    import java.util.ArrayList;
    import java.util.List;

    import com.rapidminer.operator.OperatorDescription;
    import com.rapidminer.operator.OperatorException;
    import com.rapidminer.operator.text.Document;
    import com.rapidminer.operator.text.Token;
    import com.rapidminer.operator.text.io.AbstractTokenProcessor;
    import com.rapidminer.parameter.UndefinedParameterError;

    import edu.mit.jwi.Dictionary;
    import edu.mit.jwi.IDictionary;
    import edu.mit.jwi.item.IIndexWord;
    import edu.mit.jwi.item.ISynset;
    import edu.mit.jwi.item.ISynsetID;
    import edu.mit.jwi.item.IWord;
    import edu.mit.jwi.item.IWordID;
    import edu.mit.jwi.item.POS;
    import edu.mit.jwi.item.Pointer;
    import edu.mit.jwi.morph.WordnetStemmer;

    public class WordnetHyponymOperator extends AbstractTokenProcessor {
    private WordnetStemmer stemmer;
    private IDictionary dict;

    public WordnetHyponymOperator(OperatorDescription description) {
    String wnhome = "/usr/local/WordNet-3.0/";
    String path = wnhome + File.separator + "dict";
    URL url = null;
    try {
    url = new URL("file", null, path);
    } catch (MalformedURLException e) {
    // TODO Auto-generated catch block

    // construct the dictionary object and open it
    IDictionary dict = new Dictionary(url);
    WordnetStemmer stemmer = new WordnetStemmer(dict);
    this.dict = dict;
    this.stemmer = stemmer;

    protected Document doWork(Document textObject) throws OperatorException {

    List<Token> newSequence = new ArrayList<Token>(textObject
    for (Token token : textObject.getTokenSequence()) {
    List<String> stems = stemmer.findStems(token.getToken(), POS.NOUN);
    if (stems != null && stems.size() > 0) {
    String word2 = stems.get(0);
    IIndexWord idxWord = dict.getIndexWord(word2, POS.NOUN);
    if (idxWord != null && idxWord.getWordIDs().size() > 0) {
    if (idxWord != null && idxWord.getWordIDs().size() > 0) {
    IWordID wordID = idxWord.getWordIDs().get(0);
    IWord word = dict.getWord(wordID);
    ISynset synset = word.getSynset();
    List<ISynsetID> blub = synset.getRelatedMap().get(

    for (ISynsetID iSynsetID : blub) {
    ISynset set = dict.getSynset(iSynsetID);
    List<IWord> bla = set.getWords();
    for (IWord iWord : bla) {
    newSequence.add(new Token(iWord.getLemma(),

    return textObject;

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 294 RM Product Management
    Hi Simon,

    thank you very much for sharing your work. At the moment, our work at the text processing extension is almost idle because of other work. But maybe we have a look at it sometime ...?!

    Best regards,
  • simon_knollsimon_knoll Member Posts: 40 Maven
    Yes, would be cool if this kind of features would be added again to the text plugin.
  • B_B_ Member Posts: 70 Guru
    thanks for the example Simon
Sign In or Register to comment.