Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Latent Dirichlet Allocation topics (?)

ysmndhiraysmndhira Member Posts: 2 Learner I
edited July 2023 in Help
Hello, I have a question to ask. I have ran the Latent Dirichlet Allocation using the Optimize Parameters Grid operator but, the result shows that some word is repeated across other topics. how to make it so that the weight of some words do not get separated from each other. for example, I have the word shoe, its overall weight is 8k+ but when I run for the lda, the shoe's weight get separated from each other so the word shoe is repeating in some topics. for example, topic 1 have 1k+ weight of shoe, topic 2 have 2k+ weight of shoe and so on. I want to make it so that 1 word stick in 1 topic, is there any way for that? Here I also include my process

Thank you in advance!

Answers

  • ysmndhiraysmndhira Member Posts: 2 Learner I
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data Scientist
    Hey,
    you almost certainly want to play around with the alpha and beta parameters and not use the heuristics. Quote from https://www.thoughtvector.io/blog/lda-alpha-and-beta-parameters-the-intuition/

    Here, alpha represents document-topic density - with a higher alpha, documents are made up of more topics, and with lower alpha, documents contain fewer topics. Beta represents topic-word density - with a high beta, topics are made up of most of the words in the corpus, and with a low beta they consist of few words.

    Basically you can control how much one document is allowed to be in more than one topic (alpha) and how much one word is allowed to be two topics (beta).


    Best,

    Martin



    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.