How can I calculate the frequency of specific words for each row in the excel data

psyduckpsyduck Member Posts: 1 Newbie
edited December 2018 in Help
I'm working on a data that each sentence is in separate rows. I want to determine word frequency in each row with a word list that I have created. Then I would like to add these values to my dataframe as a new variable.

For example:
Let's say, I have a list of words that contains apple and banana (it is my dictionary). And I have independent sentences in rows like that:
1. X x x apple x x banana x apple.
2. X apple x x x x.
3. X x banana x apple x.
Now I want to calculate how many times the words in my list have been repeated separately. As a result, the new column I want to create is:
1. = 3
2. = 1
3. = 2
Thanks in advance.


  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 916   Unicorn
    If I understand your question, this is pretty straightforward in RapidMiner.  Process your text data using the "Process Documents from Data" operator, which allows you to input both a defined wordlist and your data source.  Inside you'll need to use Tokenize to split your text into words and then set the word vector option to "term occurrences".  The output will be a new attribute (column) for each word in your wordlist with the count of the number of occurrences for the text you process (each text will be its own row or example).  
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.