RapidMiner

RapidMiner

how to remove rows containing a particular string/word from an excel file?

Contributor

how to remove rows containing a particular string/word from an excel file?

Hi i want to delete rows which contains a specific word in excel file and get output without those rows. I am using 5.0.13 version rapid miner. i have started using rapid miner recently. can anyone suggest me how to go about it and what operators to choose?
i have read about "filter examples" operator. now having an excel file in .xls format, what will be the best way to get output without rows containing a particular word? please reply.
1 REPLY
Super Contributor

Re: how to remove rows containing a particular string/word from an excel file?

You did already import the data via the Read Excel operator, right? Then just add a Filter Examples operator. With RapidMiner 5 you then can filter on one column. Select attribute_value_filter as condition_class. Then the parameter_string

column1 != .*badWord.*

will keep all rows where column1 does not contain the string "badWord".

To match only whole words, your filter should look like this:

column1 != != ^(.+\s)*badWord(\s.*)*$

The cryptic syntax used here are regular expressions Smiley Happy Google for that term to get more information.

Best regards,
Marius