Text Processing

RhmanigRhmanig Member Posts: 9 Contributor II

I am using Process Document to tokenize text (plus transform case, filter stop words and generate n-grams). I wonder why RapidMiner does not make a use of free memory and CPU and the process takes such a log time.

The current data size is 1059MB and the process is running for almost 5 days :/ The system has four cores and 29GB RAM. on average it uses %46 of CPU and right now it uses 75% of memory (the memory usage is going up slowly).

Please explain if you know why.



    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist

    Could you provide me with the process itself? Are there any Loops inside?


