How to filter text records out based on a wordlist

ArnoGArnoG Member Posts: 22 Contributor II
edited November 2018 in Help

I am using the "filter stopwords" to filter out words matching a list. Is there also a function in Rapidminer available that works the other way around, for example that selects only records when a word in the text of a record matches a word on al list?

I am only interesed in text containing certain words and I am looking for a way to filter these recors out.

Does anabody knows how to do this?

 

Arno

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Try a Filter Documents or Filter Content operator. Those two operators have a "Invert Condition" parameter that lets you select the filterwords.  Or you can use a Wordlist to data operator and then do a generic Filter Examples on it. There's a few ways to go about it I believe. 

  • ArnoGArnoG Member Posts: 22 Contributor II

    Hi Thomas,

    Sorry for my late response. I looked at your suggestions and they will properly work. At the moment I use the " Cut document"  operator to cut reviews into sentences. I can use the "Filter Example"  operator to select the sentences containing certain keywords. The problem I have is that I got a hugh list of keywords, like a couple of thousand.

    I could manually enter the keywords in the "Filter Example"  operator using the custom filter, but I hope that there is a more easier way. For example using the kew wordlist to filter out sentences containing these keywords.

     

    Regards,

     

    Arno

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    You could use macros and loop to loop over the wordlist and automatically drop it into the custom parameter for Filter Examples. I've done that before and it works well.

Sign In or Register to comment.