Question to get the date out of a document

BadBoy20BadBoy20 Member Posts: 5 Contributor I
edited November 2018 in Help
So I have pdf files and each of these pdf files (articles) have a date at the top of the page. not at the very top. but around there. The date format is like 19 April 2012. I want to get the first date that shows up and set it as an attribute called "Mydate", is that even possible in rapidminer and how would I go about doing that? thank you.

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,049  RM Data Scientist
    Hi,

    you probably need to use Read Document, Process Documents and Keep Document Part and a clever regex. It is hard to say which w/o the document itself.

    Cheers,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 563   Unicorn
    This sounds pretty similar to this post from a few days back.  Could you rework the process in that?

    rapid-i.com/rapidforum/index.php/topic,8874.msg29914.html
Sign In or Register to comment.