RapidMiner

RapidMiner

Text filtering problem! Please help!

Contributor

Text filtering problem! Please help!

Hey community,

I'm new in working with rapidminer and I try to filter multiple words from different pdf-files.

First I tried to filter just one word after tokenizing the files with the "Filter Tokens (by content)" Module.
I used the condition "contains" and specified my "string". This actually works fine.
Now i want to filter multiple words but i just dont know how to do this.

Can you please help me? I would really appreciate it!

Background:
I'm trying to classify some documents by using a wordlist with positive and negative words.
Rapidminer should analyse the given pdf-files regarding the amount of positive and negative words.

Any ideas?
1 REPLY
Super Contributor

Re: Text filtering problem! Please help!

Hello

You could use a word list to filter the document for those words only.

Here is an example that does more than you need.

http://rapidminernotes.blogspot.co.uk/2013/04/finding-needles-in-text-haystacks.html

You will need to make some changes for what you want.

regards

Andrew