🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

pdf to database

r_esmaeilzadeh1r_esmaeilzadeh1 Member Posts: 1 Newbie
edited September 4 in Help
Hellow  everyone
I am a new member and had studies about the software but I have a problem:
I need to read a lot of PDFs, delete the references sections, categorize them by year of publication, and then do the text mining and found The most repetitive words.
 how can I do that?
Thanks in advance for your guidance

Answers

  • kaymankayman Member Posts: 652   Unicorn
    If your pdf's are based on text (so not scanned) you can use the read document operator ans select pdf as format. This will convert the pdf to a plain text file.

    Next you can use the replace operators and regex to strip what you don't need and use the document to data operators for the mining part. 
    r_esmaeilzadeh1
Sign In or Register to comment.