Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Predefined Topic lists

ManarManar Member Posts: 9 Learner I
Hello everyone..
I have a question.
1- When I have predefined topic lists, which contains some words to extract the suitable topic of each Arabic documents.
Cosine similarity is considered a good solution for this problem?
or latent Dirichlet allocation (LDA) ?
Please, could you guide me to do that in rapidminer? 
Thanks.

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    This is an interesting question.
    @mschmitz is the resident expert on LDA (well at least he has written the operator) but I am pretty sure that is not going to help you here because I don't think you can feed the LDA algorithm a predefined set of topics.

    So I am not actually sure what the best way to accomplish this would be. I guess you could put together a wordlist with the words for each predefined cluster and then try to build a polynominal classification model but that might not give you the output you really want.  @mschmitz do you have another approach you would recommend here?  

    P.S.  I don't think the language is really an issue, it has more to do with the structure of the problem.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data Scientist
    Hi,
    so you have a word list which contains key words. The more keywords are in a text, the more likely it should be in the topic?

    That's not LDA.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • ManarManar Member Posts: 9 Learner I
    Ok , thank you.. 


Sign In or Register to comment.