🎉 🎉. RAPIDMINER 9.8 IS OUT!!! 🎉 🎉
RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance
CLICK HERE TO DOWNLOAD
What's the best way to determine the number of topics in the Extract Topics from Data (LDA) operator
I have a dataset made of thousands of ways users have listed product names. For example, Apple MacBook, MacBook, MacBookPro, etc. There are all sorts of products included, but I'm trying to group similar ways people have described them into clusters. The Extract Topics from Data operator seems to be doing the trick but I'm manually having to choose the number of groups. Is there a way to determine the number of groups based on similarity? I hope this makes sense.
Tagged:
0
Best Answer
-
lionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,124  
 Unicorn
Hi @cmoten,
In RapidMiner, in first approximation, I see the following method (method to be confirmed by @mschmitz : Extract Topics - LDA- operator is Martin's baby ...) :
Use an Optimize parameters (grid) operator and plot the "Perplexity" according to the number of topic(s) k :
The lower the perplexity, the better the model.
For example in the example below, the "optimal" number of topics k is 6 :
In attached file, an example of process to find the optimal number of topics using Optimize Parameters (Grid) operator.
Regards,
Lionel
6
Answers
Dortmund, Germany