🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉
Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.
[SOLVED] Deleting text noise from large corpus
I have a pdf file which contains several thousand pages of emails. The problem is that each email contains a unique set of noise (unique because it does not repeat). For example:
x-Mail: hbcFNvIWLDtFlpP.yxyP9bkreUY5ZzdUGPpkOhYIoRThis noise sometimes fills entire pages.
Can anyone point me in the right direction on how to minimize this noise, or somehow go around it?