Options

Latent Dirichlet Allocation topics (?)

ysmndhiraysmndhira Member Posts: 2 Newbie
edited July 2023 in Help
Hello, I have a question to ask. I have ran the Latent Dirichlet Allocation using the Optimize Parameters Grid operator but, the result shows that some word is repeated across other topics. how to make it so that the weight of some words do not get separated from each other. for example, I have the word shoe, its overall weight is 8k+ but when I run for the lda, the shoe's weight get separated from each other so the word shoe is repeating in some topics. for example, topic 1 have 1k+ weight of shoe, topic 2 have 2k+ weight of shoe and so on. I want to make it so that 1 word stick in 1 topic, is there any way for that? Here I also include my process

Thank you in advance!

Answers

  • Options
    ysmndhiraysmndhira Member Posts: 2 Newbie
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,509 RM Data Scientist
    Hey,
    you almost certainly want to play around with the alpha and beta parameters and not use the heuristics. Quote from https://www.thoughtvector.io/blog/lda-alpha-and-beta-parameters-the-intuition/

    Here, alpha represents document-topic density - with a higher alpha, documents are made up of more topics, and with lower alpha, documents contain fewer topics. Beta represents topic-word density - with a high beta, topics are made up of most of the words in the corpus, and with a low beta they consist of few words.

    Basically you can control how much one document is allowed to be in more than one topic (alpha) and how much one word is allowed to be two topics (beta).


    Best,

    Martin



    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.