Segmenting millions of text segments with Textsegmenter
I am building a text mining process that should be able to process various xml-files with following properties:
Each XML file contains several thousands blogposts and some information to each post (author, time, etc.).
My question: Is there a way to process this file, taking in account the segments but not necessarily dividing this file in millions of other files like the TextSegmenter does.
My assumption: It will take ages to process millions of files to mine for knowledge or do sentiment analysis
Any help will be greatly appreciated.