The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

sentiment extraction for non-English

wclasterwclaster Member, University Professor Posts: 43 University Professor
Hello. Are there sentiment analysis operators or tools for working with Japanese? How about Chinese? And how about other Asian languages? I saw the Sentiment Extract operator. It seems to have German and French versions for Vader. Thank you!

Best Answer

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    Solution Accepted
    in principal yes, but definitely this is nothing one can do quickly.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • Options
    wclasterwclaster Member, University Professor Posts: 43 University Professor
    Thank you! I will leave this question open because I am really looking for Japanese.
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    if you have chinese or japanase dictionaries i can add them :). Not a big thing. The bigger one would be tokenization in those languages.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    ceaperezceaperez Member Posts: 541 Unicorn
    Hi @wclaster
    I hope you can solve this issue and then you can share your good practice

    regards
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    @ceaperez by the way, if you have a good Spanish dictionary I am happy to add this as well :). I didn't find anything in a quick search for one. ideally i want to cover the big languages with a dictionary each.
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    wclasterwclaster Member, University Professor Posts: 43 University Professor
    Hello mschmitz, thank you. Yes, I think tokenization would be quite a challenge. MeCab is an open-source text segmentation library for use with text written in the Japanese language but I don't know how this would all fit together. 
    From Wikipedia
    Besides segmenting the text, MeCab also lists the part of speech of the word, and, if applicable and in the dictionary, its pronunciation.

    MeCab - Wikipedia
    Would this be simple?
  • Options
    ceaperezceaperez Member Posts: 541 Unicorn
    @mschmitz. Thanks for your help. I will check if I have a good one. 
    regards. 
  • Options
    kaymankayman Member Posts: 662 Unicorn
    Bit late to the party but we had some decent results using Ginza together with Spacy, using the python extension in some of our rapidminer workflows.
Sign In or Register to comment.