How to filter token by total occurrences?

cc4699cc4699 Member Posts: 6 Contributor I
edited November 2018 in Help

Prune method on the "Process Documents" is filtering the tokens by the number of document occurrences. How can I filter them by "Total Occurrences" instead?

Tagged:

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,235   Unicorn

    You should be able to review the output wordlist and identify the tokens you want to eliminate by sorting by total occurrences, and create a small text file with those words.  Then you can use Filter Stopwords (dictionary) to suppress those tokens from your document processing. 

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • cc4699cc4699 Member Posts: 6 Contributor I

    Thanks for the reply. I see what you are saying however,the problem is that I cannot filter or use the total occurrences field. Sorting seems to be an unnecessary step since I need to filter the ones under certain threshold.

     

    Sorting is also not working properly. There is nothing in populated in attribute list for sort operator but if I put total_occurrences, it works with a warning.  

Sign In or Register to comment.