operating generate N-Grams (terms)

Fred12Fred12 Member Posts: 344 Unicorn
edited November 2018 in Help


I would like to know how the n-grams are generated, I noticed, some words are grouped together as n-gram (terms), and some others are not (single words), how does it decide which terms group together and which not? many of the most frequent occuring terms have no n-gram groupings...


  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    The way n-grams works is like this if you set it to 2.  It will make combinations of the following sentence "RapidMiner Studio is the best."







    Assuming your corpus of documents is about RapidMiner Studio reviews and you have TF-IDF set as your word vector creation, it will likely give "is_the" a very low value and "RapidMiner_Studio" and "the_best" as higher values. Of course if you have stemming, filtering, and pruning set, it might just drop out "is_the" completely out, and that's probably what's happening with your process.

  • Options
    Fred12Fred12 Member Posts: 344 Unicorn

    well inside process documents operator, I had tokenize, stemming, stopwords and n-gram operator, but this might have been the cause...

Sign In or Register to comment.