Sentiment Analysis in German

katyaegodigitalkatyaegodigital Member Posts: 1 Newbie
in Help

I am new to RapidMiner, but I would like to use it for sentiment analysis to analyze comments on social networks. 
The question is it possible to do with german language? 

Any help is appreciated.



  • kaymankayman Member Posts: 318   Unicorn
    Depends a bit...

    The out of the box options (Vader, sentiwordnet etc)  typically use models trained on English content, so you won't get far with German or any other language.

    However, if you would have training data available, or can generate this, there is of course no reason at all why you couldn't use German and just generate your own 'sentiment' set, this is in essence just a classification task.

    If you don't have trained data available nor the time to manually annotate a data set the 'easiest' way to get this is to crawl sites that offer reviews (for instance amazon.de or otto.de). Reviews are always rated 1 to 5, so you could consider everything scoring 1 or 2 as negative, and 4 or 5 as positive. Use these to define your labels, pre-process your source content (casing, stopwords, lemmatizing etc) and get the top keywords associated with both (scheisse, schlecht etc, you get the idea)

    You'll need to combine some different skills but this is indeed possible to do with Rapidminer
  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 340   Unicorn
    Guten abend, @katyaegodigital,

    To complement the response by my sensei @kayman: It is indeed possible to accomplish a lot with RapidMiner. I used Sentiment Analysis as part of a fraud research I conducted in Switzerland and have just two tips for you:
    1. If your coding skills are good, experiment with this: https://github.com/hdaSprachtechnologie/odenet. I was able to parse the XML (using Ruby, sorry) and use it as a WordNet. You may also want to get the http://www.sfs.uni-tuebingen.de/GermaNet/licenses.shtml GermaNet collection of words.
    2. You may want to play with POS tagging. Use the Python Scripting Extension and the pattern library for that. https://www.clips.uantwerpen.be/pattern. I don't know if my lack of skills in German played against me on this project (though @mschmitz thinks it's good), but I was able to get much better results with a bit of Python embedded in my processes.
    Don't hesitate to contact me if you need a bit more help. (Though my NLP master is @ghislaine_gueri, I invoke her).

    All the best,

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 300   Unicorn

    As @kayman said, you can totally train sentiment analysis models if you have some sort of label. All of the text processing tools support German (RapidMiner originated in Germany after all). With enough data this will actually perform better than out of the box models!

    Without labels you can also use the Dictionary-Based Sentiment operator, but then you need a dictionary with positive and negative words.


  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,020  RM Data Scientist
    Maybe this is the time where i need to add a german dict to Extract Sentiment? @sgenzer ; :)
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.