Interpreting LogLikelihood For LDA Topic Modeling

svtorykhsvtorykh Member Posts: 35 Guru
edited November 2018 in Help

Hi RM Community,

 

Based on the attached picture, how should I interpret Loglikelihood values changing with number of topics. Is higher better or lower better. Does it needs to be squared to be positive?

 

Thanks!

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    Hi @svtorykh,

     

    -240000 is better.

     

    BR,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

    it's the negative LLH. The lower the better.

     

    BR,
    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • svtorykhsvtorykh Member Posts: 35 Guru

    Thanks for prompt reply, so in this case -230000 is better than -240000 or vice versa?

  • svtorykhsvtorykh Member Posts: 35 Guru

    Thanks so much Martin!

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    By the way, @svtorykh,

    one of the next updates will have more performance measures for LDA. Just need to find time to implement it. LLH by itself is always tricky, because it naturally falls down for more topics.

     

    BR,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • svtorykhsvtorykh Member Posts: 35 Guru

    That would be very nice to have! Please keep us posted Martin!

  • jozeftomas_2020jozeftomas_2020 Member Posts: 40


    Hello. I want to find the optimal K-number for KMEANS with the LDA Loglikelihood value

    For me, using alpha and beta as heuristics for the top 5 is the highest. Now, how to use K optimally. Does anyone know how to help? Thanks a lot I searched a lot, but I did not find anything
    :smileysad:

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hey @jozeftomas_2020,

    i am fairly confused. KMeans and LDA are fairly different models. Why and how do you want to mix them?

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • jozeftomas_2020jozeftomas_2020 Member Posts: 40

    In the articles I have seen using the LDA to find optimal k, but I do not know how?
    And how can I understand which LDA has a better result? Alpha and beta need to be adjusted a little or too high to get a better result?

    I'm so sorry
    Thanks a lot

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    @svtorykh,

    i've added Perplexity as the default to the performance of LDA. Perplexity is defined as

    exp(-LLH/tokens)

    and should be minimized. That's somewhat what you see in common blog posts on LDA.

     

    It's not yet on the marketplace. Let's see when we have enough features to publish.

     

    Cheers,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • svtorykhsvtorykh Member Posts: 35 Guru

    Thanks much!

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Always happy to help! Will it be possible that you present your use case at RM Wisdom in October?

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • ayaRizkayaRizk Member Posts: 6 Contributor II
    @svtorykh
    May I ask how you generated the evaluation plot? Is there a specific operator for that or plotted it outside of RapidMiner?

    Thanks!
    /Aya
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi Aya,

    Optimize Parameters (Grid) can create the log for it.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • ayaRizkayaRizk Member Posts: 6 Contributor II
    Hi @mschmitz
    Yes, this works well. Thanks a lot!
    /Aya
Sign In or Register to comment.