Options

WordNet in RM 5

simon_knollsimon_knoll Member Posts: 40 Contributor II
edited November 2018 in Help
Hello all,
short question: in RM 4.x there was this WordNetSynonymStemmer. is this operator gone in ver. 5 and one has to use groovy scripting instead?

thx
simon knoll

Answers

  • Options
    WanttoknowWanttoknow Member Posts: 6 Contributor II
    Hi,

    I was asking myself the same thing: Where is the Wordnet stemmer in RM5?
  • Options
    TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi,

    I think the WordNet stemmer was removed since it did not work that well. Eventually, we try to re-animate it somewhen, but that is only speculation.

    Kind regards,
    Tobias
  • Options
    simon_knollsimon_knoll Member Posts: 40 Contributor II
    hi,
    i coded myself a wordnet operator, if someone is interested i can share code snippets.
    what i can say is that for my testing dataset i've got some good results by adding hyponyms  for kmeans clustering.

    all the best,
    simon
  • Options
    B_B_ Member Posts: 70 Maven
    Simon

    would appreciate seeing how you set this up. 
    thanks

    b.
  • Options
    simon_knollsimon_knoll Member Posts: 40 Contributor II
    hi,
    1st, you'll have to install wordnet
    2nd, you need a java wordnet api, i took this one http://projects.csail.mit.edu/jwi/ (not for commercial purposes, but the fastest i know)
    3rd, you'll have to implement an Operator (i added a new Class in the "com.rapidminer.operator.text.io.wordfilter" package)
    for this i just copied an operator of the text plugin, deleted all the things i do not need and added the code for wordnet (here i add hypernyms)

    i hope this was more helpful  than confusing ;)
    package com.rapidminer.operator.text.io.wordfilter;

    import java.io.File;
    import java.net.MalformedURLException;
    import java.net.URL;
    import java.util.ArrayList;
    import java.util.List;

    import com.rapidminer.operator.OperatorDescription;
    import com.rapidminer.operator.OperatorException;
    import com.rapidminer.operator.text.Document;
    import com.rapidminer.operator.text.Token;
    import com.rapidminer.operator.text.io.AbstractTokenProcessor;
    import com.rapidminer.parameter.UndefinedParameterError;

    import edu.mit.jwi.Dictionary;
    import edu.mit.jwi.IDictionary;
    import edu.mit.jwi.item.IIndexWord;
    import edu.mit.jwi.item.ISynset;
    import edu.mit.jwi.item.ISynsetID;
    import edu.mit.jwi.item.IWord;
    import edu.mit.jwi.item.IWordID;
    import edu.mit.jwi.item.POS;
    import edu.mit.jwi.item.Pointer;
    import edu.mit.jwi.morph.WordnetStemmer;

    public class WordnetHyponymOperator extends AbstractTokenProcessor {
    private WordnetStemmer stemmer;
    private IDictionary dict;

    public WordnetHyponymOperator(OperatorDescription description) {
    super(description);
    String wnhome = "/usr/local/WordNet-3.0/";
    String path = wnhome + File.separator + "dict";
    URL url = null;
    try {
    url = new URL("file", null, path);
    } catch (MalformedURLException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }

    // construct the dictionary object and open it
    IDictionary dict = new Dictionary(url);
    dict.open();
    WordnetStemmer stemmer = new WordnetStemmer(dict);
    this.dict = dict;
    this.stemmer = stemmer;
    }

    @Override
    protected Document doWork(Document textObject) throws OperatorException {

    List<Token> newSequence = new ArrayList<Token>(textObject
    .getTokenSequence().size());
    for (Token token : textObject.getTokenSequence()) {
    List<String> stems = stemmer.findStems(token.getToken(), POS.NOUN);
    if (stems != null && stems.size() > 0) {
    String word2 = stems.get(0);
    IIndexWord idxWord = dict.getIndexWord(word2, POS.NOUN);
    if (idxWord != null && idxWord.getWordIDs().size() > 0) {
    if (idxWord != null && idxWord.getWordIDs().size() > 0) {
    IWordID wordID = idxWord.getWordIDs().get(0);
    IWord word = dict.getWord(wordID);
    ISynset synset = word.getSynset();
    List<ISynsetID> blub = synset.getRelatedMap().get(
    Pointer.HYPERNYM);

    for (ISynsetID iSynsetID : blub) {
    ISynset set = dict.getSynset(iSynsetID);
    List<IWord> bla = set.getWords();
    for (IWord iWord : bla) {
    newSequence.add(new Token(iWord.getLemma(),
    token.getWeight()));
    }

    }
    }
    }
    }
    newSequence.add(token);
    }
    textObject.setTokenSequence(newSequence);
    return textObject;
    }

    }
  • Options
    TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Simon,

    thank you very much for sharing your work. At the moment, our work at the text processing extension is almost idle because of other work. But maybe we have a look at it sometime ...?!

    Best regards,
    Tobias
  • Options
    simon_knollsimon_knoll Member Posts: 40 Contributor II
    Yes, would be cool if this kind of features would be added again to the text plugin.
  • Options
    B_B_ Member Posts: 70 Maven
    thanks for the example Simon
Sign In or Register to comment.