Options

Can I conduct LDA model and emotion analysis with Rapidminer in Chinese text?

PollyPolly Member Posts: 3 Learner I
Hi everyone,

I am a newbie here and this is my question.

I need to apply Latent Dirichlet Allocation model and emotion analysis to Chinese text, but I don't know whether I can do these with Rapidminer, or which extensions I need to install further to be able to conduct the analyses.
I have already searched discussions about Chinese/mandarin, and already installed the Hanminer extensions mentioned in a discussion. But I don't think the Hanminer extensions are enough to conduct both analyses, and no one seems to put forward the question before.

Please give me some suggestions. Any ideas would be much appreciated!

Best,
Polly

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,509 RM Data Scientist
    Hi,
    from my understanding, it should work. But @yyhuang is or mandarin expert.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    PollyPolly Member Posts: 3 Learner I
    Hi Martin @mschmitz ,

    Thank you for your reply. 
    I read other discussions about LDA, and just to make sure, if I want to conduct Latent Dirichlet Allocation model, is 'Linear Discriminant Analysis' the operator that I should use? Is it the 'Extract Topic from Data' operator that most people mentioned in the discussions?

    Also, I wonder which operator I should use to conduct emotion analysis? Is it the Singular Value Decomposition (SVD)?

    Besides, because in a discussion about LDA that no results showed in the process, you asked whether "is this 'western' text? LDA uses a default tokenization on this tokens like spaces and so on. This may totally fail if this is not in latin alphabet?", I guess the text language has a great influence on the results. Thus, to conduct analysis with Chinese text, are there any extensions or operators I need to install or combine to use? 

    Sorry for the huge amount of questions. I would be much appreciated if you could give me some advice. Thanks in advance!

    Regards,
    Polly

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,509 RM Data Scientist
    Hi @Polly,
    the operator you want to use is Extract Topics from Data, not Linear Discreminant analysis.

    And yes, LDA uses tokenization inside. And i just realized, that the default tokenization is on \s and not changeable, so i guess it is very hard to be applied on mandarin. As i said - I only speak German and English and am just not an expert on tokenization of mandarin/cantonese. So i don't know if it would even help if I offer the tokenization as an option.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    PollyPolly Member Posts: 3 Learner I
    Hi Martin, 

    Thank you for your help :smiley:
    I hope maybe @yyhuang can give me some advice on it.

    Cheers,
    Polly
Sign In or Register to comment.