Does anyone understand how to process a corpus file - a .txt file where the entries or documents are separated by line breaks. For instance how do you run a clustering algorithm on such a file?
Hi Jeremy,

have you checked for example this howto? http://vancouverdata.blogspot.de/2010/11/text-analytics-with-rapidminer-loading.html

otherwise i think either read document or read csv should works fine to get in. Just take a delimitor like ##### which will not be in the text.

