🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

"Form-Based Processing/Extraction Based on Templates"

thapli_64thapli_64 Member Posts: 18  Maven
edited June 2019 in Help

Hi All,

I was wondering if there is any precedent (in RapidMiner) of processing and extracting text information from forms where we have the templates, using either a rule-based approach or possibly even Machine Learning? 

Tagged:

Best Answer

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959  Community Manager
    Solution Accepted
    hi @thapli_64 - wrote you a PM.

    Scott
    thapli_64

Answers

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959  Community Manager

    hello @thapli_64 - so there are lots of ways to extract and process text from a variety of sources.  What format are your templates in?

     

    Scott

     

  • thapli_64thapli_64 Member Posts: 18  Maven

    @sgenzer thanks for your response. So I checked, and it seems we wont really have templates before hand per se. but we'll have lots of documents (forms) that have a particular layout. We'll be performing OCR on them, so we'll have TXT and XML files. So I guess those will serve as a starting point. I hope this clarifies things? 

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959  Community Manager

    hi @thapli_64 - yes that all makes sense and all very doable.  You're going to want to use the Text Processing extension for your TXT files and the Read XML operator for your XML files.  Then you are off and running.  :)

     

    Scott

  • thapli_64thapli_64 Member Posts: 18  Maven

    @sgenzer I've been trying to work with the Read XML operator and running into serious issues. Can you point me to some tutorial or or other resource that covers that operator so I can figure out if I'm working with it properly?

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959  Community Manager

    hmm that's a good question.  I actually don't know of a Read XML tutorial.  Perhaps you can post the xml file you're trying to read, along with your process XML, and I can take a look?

     

    Scott

     

  • thapli_64thapli_64 Member Posts: 18  Maven

    @sgenzer That's the trouble I've been having. I have been bery successful finding resources on almost every question I've had, but Read XML has ben particually vexing. Is there any place we can put in a request for this? Maybe something that could also be covered in an upcoming blog post, webinar or office hours?

     

    My company is an RM customer. Is there someone we can reach out to for one on one help with this?

     

    Unfortunately, the XML files contain sensitive data so I can't share them. I will try to see if I can create or procure dummy files with fake data.

  • thapli_64thapli_64 Member Posts: 18  Maven

    thanks Scott

Sign In or Register to comment.