Hi! Question on data extraction steps for Word & PDFs

pimlico35pimlico35 Member Posts: 4 Newbie
Hi folks,
Im new to this s/ware and trying to figure out some basics..... ;)

I have word and pdf files - theyre reports from various companies - what I want to do is to search for keywords (there are about 20 Im interested in) to find out the frequency of them.   Ideally, Id like to search the documents and pull the data into a spreadsheet - its very basic but I cant figure out how to do it... ;(

Ive put the docs into the folder, tried to extract data but then I get lost as Im not sure what to do next.....  if theres a quick step guide that would be great.  Apologies if this has been done but I couldnt find it.

many thanks!

Best Answer

  • Options
    pimlico35pimlico35 Member Posts: 4 Newbie
    Solution Accepted
    Thanks Martin - I will try that now.  Im just trying to find my way around operators and what the steps are to get it to work! 



  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    did you try the Read Office operator?

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    pimlico35pimlico35 Member Posts: 4 Newbie
    Thanks - I needed to get the extension; works great now!   Just need to figure out how to extract keywords and frequency from the document into a table.... :)
Sign In or Register to comment.