Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Textmining Problem - Keyword search and customized tokenization

MasseAlarmMasseAlarm Member Posts: 1 Learner I
Dear Rapidminer Community,

for a university project I have to evaluate about 900 business reports and I want to do this via Rapidminer. Unfortunately I'm still a complete beginner regarding the software and need your help. 
I have installed the Text Processing Extension for Rapidminer.

The problem:
I need to search the reports for 120 specified keywords. If this word occurs, I must extract an additional 20 words before and after the keyword in order to understand the context.

My current state:
With "Tokenize" I get a sentence output, but how does it work with exactly 20 words before and after the keyword?
With "Filter Tokens (by Content)" I can always get one of the 120 words displayed. But how do I make sure that all 120 words are directly taken into account?

I've been sitting on it for quite a while now and have searched through all kinds of forum entries without a suitable solution so far. I hope you can help me. Thanks a lot!

Best regards


Sign In or Register to comment.