BUG REPORT: text mining, the clustering process

YungCheng · April 2021

When I try to run the clustering process of text mining, it came out the error message. Process, error message and csv files are attached below.

jacobcybulski · April 2021

Hi, you have not included the actual RMP file so I am only guessing what may have gone wrong. Your data is over 20K examples and your text has 1000s of unique terms, k-means clustering is not very good deaing with 1000s of attributes. So I assume you have ran out of memory on your computer. To test this out, I suggest to reduce your sample size to 1000 (just for testing). More importantly, you need to reduce the number of terms generated by the parsing process. So I suggest to enable pruning within the Process Documents from Data, make it simple, e.g. percentual from 5% to 30%, which would possibly bring the number of attributes to less than 300. If it works, use all 100% of data. I also note that you have not normalised your data before clustering, so it will be difficult to visually analyse your data. Good luck!
Jacob

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

BUG REPORT: text mining, the clustering process

Best Answer