sentiment extraction for non-English

wclasterwclaster Member, University Professor Posts: 43 University Professor
Hello. Are there sentiment analysis operators or tools for working with Japanese? How about Chinese? And how about other Asian languages? I saw the Sentiment Extract operator. It seems to have German and French versions for Vader. Thank you!

Best Answer

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,233 RM Data Scientist
    Solution Accepted
    in principal yes, but definitely this is nothing one can do quickly.

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    wclaster

Answers

  • wclasterwclaster Member, University Professor Posts: 43 University Professor
    Thank you! I will leave this question open because I am really looking for Japanese.
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,233 RM Data Scientist
    if you have chinese or japanase dictionaries i can add them :). Not a big thing. The bigger one would be tokenization in those languages.

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • ceaperezceaperez Member Posts: 302 Unicorn
    Hi @wclaster
    I hope you can solve this issue and then you can share your good practice

    regards
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,233 RM Data Scientist
    @ceaperez by the way, if you have a good Spanish dictionary I am happy to add this as well :). I didn't find anything in a quick search for one. ideally i want to cover the big languages with a dictionary each.
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • wclasterwclaster Member, University Professor Posts: 43 University Professor
    Hello mschmitz, thank you. Yes, I think tokenization would be quite a challenge. MeCab is an open-source text segmentation library for use with text written in the Japanese language but I don't know how this would all fit together. 
    From Wikipedia
    Besides segmenting the text, MeCab also lists the part of speech of the word, and, if applicable and in the dictionary, its pronunciation.

    MeCab - Wikipedia
    Would this be simple?
  • ceaperezceaperez Member Posts: 302 Unicorn
    @mschmitz. Thanks for your help. I will check if I have a good one. 
    regards. 
  • kaymankayman Member Posts: 662 Unicorn
    Bit late to the party but we had some decent results using Ginza together with Spacy, using the python extension in some of our rapidminer workflows.
Sign In or Register to comment.