BUG REPORT: text mining, the clustering process

YungChengYungCheng Member Posts: 1 Newbie
When I try to run the clustering process of text mining, it came out the error message. Process, error message and csv files are attached below.     
Tagged:

Best Answer

  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Solution Accepted
    Hi, you have not included the actual RMP file so I am only guessing what may have gone wrong. Your data is over 20K examples and your text has 1000s of unique terms, k-means clustering is not very good deaing with 1000s of attributes. So I assume you have ran out of memory on your computer. To test this out, I suggest to reduce your sample size to 1000 (just for testing). More importantly, you need to reduce the number of terms generated by the parsing process. So I suggest to enable pruning within the Process Documents from Data, make it simple, e.g. percentual from 5% to 30%, which would possibly bring the number of attributes to less than 300. If it works, use all 100% of data. I also note that you have not normalised your data before clustering, so it will be difficult to visually analyse your data. Good luck!
    Jacob
    lionelderkrikorYungCheng
Sign In or Register to comment.