Question to get the date out of a document

BadBoy20BadBoy20 Member Posts: 5 Contributor II
edited November 2018 in Help
So I have pdf files and each of these pdf files (articles) have a date at the top of the page. not at the very top. but around there. The date format is like 19 April 2012. I want to get the first date that shows up and set it as an attribute called "Mydate", is that even possible in rapidminer and how would I go about doing that? thank you.


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,508 RM Data Scientist

    you probably need to use Read Document, Process Documents and Keep Document Part and a clever regex. It is hard to say which w/o the document itself.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    This sounds pretty similar to this post from a few days back.  Could you rework the process in that?

Sign In or Register to comment.