Web scraping & sentiment analysis in non-English language

linn_ansved_636linn_ansved_636 Member Posts: 4 Contributor I
edited December 2018 in Help

Hi,

 

I'm new to Rapidminer and I'm hoping to use RapidMiner and Aylien to web scrape and perform sentiment analysis on many different news pages. The problem is that I want to gather the information from articles written in Swedish. Does anyone know if this is possible and if so, where can I find more information? I've already checked these tutorials out:

https://docs.aylien.com/textapi/rapidminer-extension/#step-3-categorizing-tweets

 

I've also looked at Aylien's news API, but don't know if that could help.

https://aylien.com/news-api/ 

 

Would really appreciate some guidance on this!

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi @linn_ansved_636 - welcome to the community. So webscraping websites in Swedish is no problem at all. Just use the various operators in the Web Mining extension as you would do in English.

     

    The sentiment analysis is more of an interesting question. Aylien does not appear to support native sentiment analysis in Swedish (see https://docs.aylien.com/textapi/#language-support). And it does not seem that IBM Watson Tone Analyzer does Swedish either. So if you want to use one of these tools, I'd recommend pre-processing the text through a translation engine first (although some of the "tone" will likely be inaccurate due to the translation).


    Scott

     

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi @linn_ansved_636,

     

    Most of the steps in text processing are language agnostic. The only steps that are specific for a language are stop words and stemming. In both cases you can use the Filter Stopwords (Dictionary) and Stemming (Dictionary) operators with external dictionaries.

     

    I hope that that helps!

  • linn_ansved_636linn_ansved_636 Member Posts: 4 Contributor I

    Great, thanks for the reply! Do you know if RapidMiner has a built-in translate function? If so, I could scrape websites written in Swedish, then translate them into English, and then perform the sentiment analysis. My hope is that all of this would be able to do in RapidMiner. Any thoughts?

     

    Thanks,

    Linn

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    RM itself does not yet have this build in - maybe a nice feature to add?

     

    Maybe @koen can help?

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    there is no current built-in feature but hopefully our Google Cloud custom operators will improve over time so that we can include Google Translate. Meanwhile I did write this KB article a while back that will do the trick (albeit without an "out-of-the-box" custom operator).

     

    https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/How-to-interact-with-Google-Cloud-APIs-with-the-Web-Mining/ta-p/35280


    Scott

     

  • linn_ansved_636linn_ansved_636 Member Posts: 4 Contributor I

    Great, thanks. Finally, do you know if there are any tutorials on how do web scrape and perform a sentiment analysis in English using RapidMiner?

     

     

    /Linn

  • linn_ansved_636linn_ansved_636 Member Posts: 4 Contributor I

    Hi again,

     

    I'm also interested in getting a graph of how the sentiment changes over time, e.g. in may the number of positives is X, in june... etc

     

    Any guidance?

     

    Thanks,

    /Linn

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi @linn_ansved_636 - sure lots of resources on that. Have you first checked out our YouTube channel?

     

    https://www.youtube.com/channel/UCxneJBWWNLs-A6ckls1Rrug?view_as=subscriber

     

    Scott

     

     

Sign In or Register to comment.