🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Hi! Question on data extraction steps for Word & PDFs

pimlico35pimlico35 Member Posts: 4 Newbie
Hi folks,
Im new to this s/ware and trying to figure out some basics..... ;)

I have word and pdf files - theyre reports from various companies - what I want to do is to search for keywords (there are about 20 Im interested in) to find out the frequency of them.   Ideally, Id like to search the documents and pull the data into a spreadsheet - its very basic but I cant figure out how to do it... ;(

Ive put the docs into the folder, tried to extract data but then I get lost as Im not sure what to do next.....  if theres a quick step guide that would be great.  Apologies if this has been done but I couldnt find it.

many thanks!

Best Answer

  • pimlico35pimlico35 Member Posts: 4 Newbie
    Solution Accepted
    Thanks Martin - I will try that now.  Im just trying to find my way around operators and what the steps are to get it to work! 

    :|:smile:

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,120  RM Data Scientist
    Hi,
    did you try the Read Office operator?

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    pimlico35
  • pimlico35pimlico35 Member Posts: 4 Newbie
    Thanks - I needed to get the extension; works great now!   Just need to figure out how to extract keywords and frequency from the document into a table.... :)
Sign In or Register to comment.