Textmining Problem - Keyword search and customized tokenization

MasseAlarm · April 2019

Dear Rapidminer Community,

for a university project I have to evaluate about 900 business reports and I want to do this via Rapidminer. Unfortunately I'm still a complete beginner regarding the software and need your help.
I have installed the Text Processing Extension for Rapidminer.

The problem:
I need to search the reports for 120 specified keywords. If this word occurs, I must extract an additional 20 words before and after the keyword in order to understand the context.

My current state:
With "Tokenize" I get a sentence output, but how does it work with exactly 20 words before and after the keyword?
With "Filter Tokens (by Content)" I can always get one of the 120 words displayed. But how do I make sure that all 120 words are directly taken into account?

I've been sitting on it for quite a while now and have searched through all kinds of forum entries without a suitable solution so far. I hope you can help me. Thanks a lot!

Best regards

sgenzer · April 2019

hi @MasseAlarm I would strongly recommend getting a foundation in RapidMiner before tackling this problem:

https://academy.rapidminer.com/learning-paths/get-started-with-rapidminer-and-machine-learning
https://academy.rapidminer.com/courses/text-and-web-mining-with-rapidminer

Scott

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Textmining Problem - Keyword search and customized tokenization

Answers