How to filter text records out based on a wordlist
I am using the "filter stopwords" to filter out words matching a list. Is there also a function in Rapidminer available that works the other way around, for example that selects only records when a word in the text of a record matches a word on al list?
I am only interesed in text containing certain words and I am looking for a way to filter these recors out.
Does anabody knows how to do this?
Try a Filter Documents or Filter Content operator. Those two operators have a "Invert Condition" parameter that lets you select the filterwords. Or you can use a Wordlist to data operator and then do a generic Filter Examples on it. There's a few ways to go about it I believe.
Sorry for my late response. I looked at your suggestions and they will properly work. At the moment I use the " Cut document" operator to cut reviews into sentences. I can use the "Filter Example" operator to select the sentences containing certain keywords. The problem I have is that I got a hugh list of keywords, like a couple of thousand.
I could manually enter the keywords in the "Filter Example" operator using the custom filter, but I hope that there is a more easier way. For example using the kew wordlist to filter out sentences containing these keywords.
You could use macros and loop to loop over the wordlist and automatically drop it into the custom parameter for Filter Examples. I've done that before and it works well.