operating generate N-Grams (terms)

Fred12 · October 2016

hi,

I would like to know how the n-grams are generated, I noticed, some words are grouped together as n-gram (terms), and some others are not (single words), how does it decide which terms group together and which not? many of the most frequent occuring terms have no n-gram groupings...

Thomas_Ott · October 2016

The way n-grams works is like this if you set it to 2. It will make combinations of the following sentence "RapidMiner Studio is the best."

RapidMiner_Studio

Studio_is

is_the

the_best

Assuming your corpus of documents is about RapidMiner Studio reviews and you have TF-IDF set as your word vector creation, it will likely give "is_the" a very low value and "RapidMiner_Studio" and "the_best" as higher values. Of course if you have stemming, filtering, and pruning set, it might just drop out "is_the" completely out, and that's probably what's happening with your process.

Fred12 · October 2016

well inside process documents operator, I had tokenize, stemming, stopwords and n-gram operator, but this might have been the cause...

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

operating generate N-Grams (terms)

Answers