Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Text filtering problem! Please help!
Hey community,
I'm new in working with rapidminer and I try to filter multiple words from different pdf-files.
First I tried to filter just one word after tokenizing the files with the "Filter Tokens (by content)" Module.
I used the condition "contains" and specified my "string". This actually works fine.
Now i want to filter multiple words but i just dont know how to do this.
Can you please help me? I would really appreciate it!
Background:
I'm trying to classify some documents by using a wordlist with positive and negative words.
Rapidminer should analyse the given pdf-files regarding the amount of positive and negative words.
Any ideas?
I'm new in working with rapidminer and I try to filter multiple words from different pdf-files.
First I tried to filter just one word after tokenizing the files with the "Filter Tokens (by content)" Module.
I used the condition "contains" and specified my "string". This actually works fine.
Now i want to filter multiple words but i just dont know how to do this.
Can you please help me? I would really appreciate it!
Background:
I'm trying to classify some documents by using a wordlist with positive and negative words.
Rapidminer should analyse the given pdf-files regarding the amount of positive and negative words.
Any ideas?
0
Answers
You could use a word list to filter the document for those words only.
Here is an example that does more than you need.
http://rapidminernotes.blogspot.co.uk/2013/04/finding-needles-in-text-haystacks.html
You will need to make some changes for what you want.
regards
Andrew