Options

text mining on specific section in pdf files

Mahmud_elaboMahmud_elabo Member Posts: 7 Learner I
edited January 2021 in Help
I wanna do text mining on a specific section(for examples just abstracts) from pdf files
anyone can help here, please
thanks so much in advance

Answers

  • Options
    kaymankayman Member Posts: 662 Unicorn
    The read document operator allows you to read your pdf as text, so you can use all of the text mining / NLP magic as if it were a text file.
  • Options
    Mahmud_elaboMahmud_elabo Member Posts: 7 Learner I
    edited January 2021
    kayman I tried that but as I mentioned I have 200 pdf files and I need to do text mining just on a specific section like Abstracts or just introductions  
  • Options
    kaymankayman Member Posts: 662 Unicorn
    Then you need to combine with loop documents. Point it to your folder with your pdfs, extract the data that you need, one by one till number 200.

    So basically create a process that works for one first, and then use it to loop through all your pdf's one by one. Whether it's 1, 20, 200 or 2000 pdf's doesn't make a difference.

    You just have to decide if you want the outcome combined in a collection or finalise it in the loop process. 
  • Options
    Mahmud_elaboMahmud_elabo Member Posts: 7 Learner I
    @kayman thank you so much, I wonder if is there any video or tutorial showing these process 
  • Options
    kaymankayman Member Posts: 662 Unicorn
    Have you tried Rapidminer academy? There is plenty of training on nlp / textmining there, and around loops. You may need to combine a few but there is a ton of info there. 

    Also looking at youtube will provide some good info on textmining with Rapidminer. 
  • Options
    Mahmud_elaboMahmud_elabo Member Posts: 7 Learner I
    @kayman yes I have tried rapidminer community and I looked for this process on youtube but I did not find anything about what I need 
Sign In or Register to comment.