pdf to database

r_esmaeilzadeh1
New Altair Community Member
Hellow everyone
I am a new member and had studies about the software but I have a problem:
I need to read a lot of PDFs, delete the references sections, categorize them by year of publication, and then do the text mining and found The most repetitive words.
how can I do that?
Thanks in advance for your guidance
Tagged:
0
Answers
-
If your pdf's are based on text (so not scanned) you can use the read document operator ans select pdf as format. This will convert the pdf to a plain text file.
Next you can use the replace operators and regex to strip what you don't need and use the document to data operators for the mining part.1