How to check if some specific indicators are mentioned in a set of business reports?

TENATENA Member Posts: 1 Newbie

We are analysing some business annual reports (13 reports in pdf format). We are new in using Rapidminer, but thanks to the training resources and the answers in the community we managed to run a cluster analysis of some parts of the annual reports we are interested in. In this kind of analysis we used the operator Process operator documents from files  to extract the words, which are then used by the clustering operator.
Now we are interested in a different analysis, since we do not want Rapidminer to extract the list of word from the reports, but we have already a given wordlist, since we want to check if a list of given indicators (words) are mentioned or not in the business reports. However, I have not seen any example to learn how to create a process to get this result. I would be very grateful if you could help me by giving some example or indication of the operators to be used.


  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @TENA,

    I worked on a similar project some months ago.
    The process in attached file extract the sentense(s) of the report where the keyword(s) appear(s).
    To run the process in attached file, you will need : 
     - to install Python on your computer
     - Install the Python Scripting  extension

    If this process is not adapted to your use case, please provide at least 2 representative pdf reports and a list 
    of indicators(words).


Sign In or Register to comment.