🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.


BUG REPORT: text mining, the clustering process

YungChengYungCheng Member Posts: 1 Newbie
When I try to run the clustering process of text mining, it came out the error message. Process, error message and csv files are attached below.     

Best Answer

  • jacobcybulskijacobcybulski Member, University Professor Posts: 391   Unicorn
    Solution Accepted
    Hi, you have not included the actual RMP file so I am only guessing what may have gone wrong. Your data is over 20K examples and your text has 1000s of unique terms, k-means clustering is not very good deaing with 1000s of attributes. So I assume you have ran out of memory on your computer. To test this out, I suggest to reduce your sample size to 1000 (just for testing). More importantly, you need to reduce the number of terms generated by the parsing process. So I suggest to enable pruning within the Process Documents from Data, make it simple, e.g. percentual from 5% to 30%, which would possibly bring the number of attributes to less than 300. If it works, use all 100% of data. I also note that you have not normalised your data before clustering, so it will be difficult to visually analyse your data. Good luck!
Sign In or Register to comment.