Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Filtering ExampleSet Keywords

MireilleMireille Member Posts: 3 Contributor I
edited December 2018 in Help

Hi!

I have created a process thanks to the "Process Documents from Files" operator, and included Tokenize, Filter Stopwords, Filter Tokens, Transform Cases, Create n-Grams and Stem. I also selected the vector creation option with TF-IDF. Since I am trying to find keywords in dozens of documents, in the results I am getting an ExampleSet chart with over 5000 columns. I was wondering if anyone knew how I could filter these results, so that I could have the top 100 relevant keywords or so? 

Or alternatively, if there was a way to graphically visualize all the keywords, so that the most important would become obvious?

Any help would be greatly appreciated:)

 

 

Tagged:

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Attach a Wordlist to Data operator to the WOR port of your Process Documents operator and then use a Sort Operator to sort them in descending fashion. You will get an example set of the most frequent words.

  • MireilleMireille Member Posts: 3 Contributor I

    Thank you very much for your help!

     

    I am just a little confused as to how to use the "sort" operator because it asks me which attribute to sort, however, each word is listed as a different attribute. 

    Thanks again!

     

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Sort by Total. 

  • MireilleMireille Member Posts: 3 Contributor I

    Thank you,

     

    Is there any way to order words by their TF-IDF weighted value rather than their frequency?

     

Sign In or Register to comment.