How can I calculate the frequency of specific words for each row in the excel data

psyduck · November 2018

Hi,

I'm working on a data that each sentence is in separate rows. I want to determine word frequency in each row with a word list that I have created. Then I would like to add these values to my dataframe as a new variable.

For example:

Let's say, I have a list of words that contains apple and banana (it is my dictionary). And I have independent sentences in rows like that:

1. X x x apple x x banana x apple.

2. X apple x x x x.

3. X x banana x apple x.
.
..
...

Now I want to calculate how many times the words in my list have been repeated separately. As a result, the new column I want to create is:

1. = 3

2. = 1

3. = 2
.
..
...

Thanks in advance.

Telcontar120 · November 2018

If I understand your question, this is pretty straightforward in RapidMiner. Process your text data using the "Process Documents from Data" operator, which allows you to input both a defined wordlist and your data source. Inside you'll need to use Tokenize to split your text into words and then set the word vector option to "term occurrences". The output will be a new attribute (column) for each word in your wordlist with the count of the number of occurrences for the text you process (each text will be its own row or example).

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

How can I calculate the frequency of specific words for each row in the excel data

Best Answer