Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Filtering ExampleSet Keywords
Hi!
I have created a process thanks to the "Process Documents from Files" operator, and included Tokenize, Filter Stopwords, Filter Tokens, Transform Cases, Create n-Grams and Stem. I also selected the vector creation option with TF-IDF. Since I am trying to find keywords in dozens of documents, in the results I am getting an ExampleSet chart with over 5000 columns. I was wondering if anyone knew how I could filter these results, so that I could have the top 100 relevant keywords or so?
Or alternatively, if there was a way to graphically visualize all the keywords, so that the most important would become obvious?
Any help would be greatly appreciated:)
Tagged:
0
Answers
Attach a Wordlist to Data operator to the WOR port of your Process Documents operator and then use a Sort Operator to sort them in descending fashion. You will get an example set of the most frequent words.
Thank you very much for your help!
I am just a little confused as to how to use the "sort" operator because it asks me which attribute to sort, however, each word is listed as a different attribute.
Thanks again!
Sort by Total.
Thank you,
Is there any way to order words by their TF-IDF weighted value rather than their frequency?