Summary of comments

jozeftomas_2020jozeftomas_2020 Member Posts: 40
edited December 2018 in Help

Hello

I have a question to thank you for answering.
That's what I was looking for, but I did not find it
That
I have a few comments I want to summarize in terms of content in four major categories
Do you know how to do?
This exercise is a data mining course at my university
Thanks a lot

Answers

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi Jozef,

     

    You can check the last webinar by @sgenzer. Although there are some web scraping and API concepts that maybe you don't need, two techniques for classification of chatbot conversations are introduced: K-Means clustering and LDA. They surely apply to your problem.

     

    https://rapidminer.com/resource/text-mining-online-chats/?utm_content=buffere3fad&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

     

    Regards,

    Sebastian

  • jozeftomas_2020jozeftomas_2020 Member Posts: 40

    Hello
    Thank you so much for your answer:heart:

    Did i get it right? Should I kmeans comments on clustering? And then apply any LDA cluster?
    How do I figure out what content is there in each cluster?

    (Is it possible to view the shape of clusters and centers?)
    Thanks if you help me:smileyhappy:
    Waiting...

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi Jozef,

     

    I'm not sure I understand the questions, but K-means and LDA are two different techniques. Both will assign each sample to one of the clusters. I'm afraid that deciding which to use and with which parameters is problem-dependent and requires a good dose of trial and error.

     

    Regarding the visualization, that would be possible only with two dimensions (like the classic example of the iris dataset).

     

    Regards,

    Sebastian

  • jozeftomas_2020jozeftomas_2020 Member Posts: 40

    Hi,

    thanks so much for your friend @SGolbert

    I want to be able to know what content is in each cluster. Can I understand by LDA? How can I use LDA to find the best K? Thanks if you help With respect

Sign In or Register to comment.