Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Dictionary based analysis"

mlctumlctu Member Posts: 1 Learner III
edited May 2019 in Help
Hi!

Is there a way to use RapidMiner to perform dictionary-based analysis of document collections?

In particular I'm interested in term frequency and other statistics to be applied to term occurrences in the documents, where the terms of interest are provided by the user, already classified in one or more user-defined category lists (dictionaries).

Thanks for your help!
Giulio

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    yes this is easily possible with the Text Processing Extension. You can simply use the Dictionary based filtering to remove all uninteresting words.

    Another way around would be to first count all words and then postprocess this word list using the "WordList to Data" operator.

    Greetings,
      Sebastian
Sign In or Register to comment.