Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

How can I plot the frequency of word?

LindsayKelevraLindsayKelevra Member Posts: 5 Learner I
edited June 2020 in Help

Hello everyone!

I'm trying to use the operator Generate Gaussian in order to plot the frequency of words, but comparing my results (calculated manually) with them they're really different. I need this operation to understand which values ​​to discard through the pruning. What's the formula that RapidMiner uses to create the Gaussian? 

Thank you.

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Are you expecting your word frequency to follow a normal distribution?  It's not clear that is the best a priori model for word distributions depending on the type of text.
    I am also not clear how conformity to a hypothetically pure statistical distribution affects pruning.  You might be better off simply setting pruning thresholds by frequency or by percentage at a few different levels and seeing what words are dropped as a consequence.  Typically having a lot of words with only a handful of occurrences does nothing at all for model performance but can lead to large datasets and long runtimes. 
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.