text mining pdf articles omitting references
In a previous post https://community.rapidminer.com/discussion/53107/text-mining-of-multiple-pdf-files-with-separate-key-word-counts an approach for mining multiple pdf files was described.
If the pdfs are articles, is there a way to exclude References section from being mined. The section often starts with the same term (i.e. 'References'), so I tried to define some Split or a specific Tokenize option but I failed.
I would be grateful for any suggestion.