Options

Finding most common words in text attribute

ricklugerrickluger Member Posts: 1 Contributor I
edited August 2020 in Help

Hello all,

 

This is my first post on this forum, though I have been using RapidMiner for some time now, so hi to all of you! I hope you can help me out with a problem that I just can't seem to solve.

 

I want to get a list (like a top 10 or a top 20) of the most common word throughout a text attribute. I have already performed the basics (Nominal to text, Process Documents, tokenize, filter stopwords) and even developed some prediction models, but I am just not finding any operator that will show me the words that occur most commonly throughout the dataset (or better yet, the most common words per label). Can anyone help?

 

Thank you so much in advance. Regards, Rick

Answers

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @rickluger Have you tried connecting the WOR port from a Process Documents operator to RES? That'll give you what you seek. 

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    And if you want to do anything other than look at the wordlist, try the Wordlist to Data operator on that output.  This will turn the wordlist into a normal exampleset so you can do things like filter, generate data visualizations, and anything else you mgiht like.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    and welcome to the community! :)

     

    Scott

     

Sign In or Register to comment.