List of words that are filtered with Stopwords, Stemming and Tokenizing?

Jonas97Jonas97 Member Posts: 2 Newbie

is there a function in Rapid Miner that I can use to create a list of words or the number of words, which the Process Steps Filter Stopwords, Stemming and Tokenizing has identiefied and excluded from the analyse of the Textcorpus?

Thank you in advance!



  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    I am not sure if there is a direct way to view this, but you could accomplish this if you first run your document through and just tokenize, then run it through a 2nd time and tokenize as well as the other text processing options you want (stopwords, stemming, etc.) and then take both resulting wordlist datasets and use Set Minus (join type) to get the non-matches.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.