🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

Web scraping & sentiment analysis in non-English language

linn_ansved_636linn_ansved_636 Member Posts: 4 Contributor I
edited December 2018 in Help

Hi,

 

I'm new to Rapidminer and I'm hoping to use RapidMiner and Aylien to web scrape and perform sentiment analysis on many different news pages. The problem is that I want to gather the information from articles written in Swedish. Does anyone know if this is possible and if so, where can I find more information? I've already checked these tutorials out:

https://docs.aylien.com/textapi/rapidminer-extension/#step-3-categorizing-tweets

 

I've also looked at Aylien's news API, but don't know if that could help.

https://aylien.com/news-api/ 

 

Would really appreciate some guidance on this!

Answers

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,519  Community Manager

    hi @linn_ansved_636 - welcome to the community. So webscraping websites in Swedish is no problem at all. Just use the various operators in the Web Mining extension as you would do in English.

     

    The sentiment analysis is more of an interesting question. Aylien does not appear to support native sentiment analysis in Swedish (see https://docs.aylien.com/textapi/#language-support). And it does not seem that IBM Watson Tone Analyzer does Swedish either. So if you want to use one of these tools, I'd recommend pre-processing the text through a translation engine first (although some of the "tone" will likely be inaccurate due to the translation).


    Scott

     

    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 341   Unicorn

    Hi @linn_ansved_636,

     

    Most of the steps in text processing are language agnostic. The only steps that are specific for a language are stop words and stemming. In both cases you can use the Filter Stopwords (Dictionary) and Stemming (Dictionary) operators with external dictionaries.

     

    I hope that that helps!

  • linn_ansved_636linn_ansved_636 Member Posts: 4 Contributor I

    Great, thanks for the reply! Do you know if RapidMiner has a built-in translate function? If so, I could scrape websites written in Swedish, then translate them into English, and then perform the sentiment analysis. My hope is that all of this would be able to do in RapidMiner. Any thoughts?

     

    Thanks,

    Linn

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,155  RM Data Scientist

    Hi,

     

    RM itself does not yet have this build in - maybe a nice feature to add?

     

    Maybe @koen can help?

     

    Best,

    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,519  Community Manager

    there is no current built-in feature but hopefully our Google Cloud custom operators will improve over time so that we can include Google Translate. Meanwhile I did write this KB article a while back that will do the trick (albeit without an "out-of-the-box" custom operator).

     

    https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/How-to-interact-with-Google-Cloud-APIs-with-the-Web-Mining/ta-p/35280


    Scott

     

    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 

  • linn_ansved_636linn_ansved_636 Member Posts: 4 Contributor I

    Great, thanks. Finally, do you know if there are any tutorials on how do web scrape and perform a sentiment analysis in English using RapidMiner?

     

     

    /Linn

  • linn_ansved_636linn_ansved_636 Member Posts: 4 Contributor I

    Hi again,

     

    I'm also interested in getting a graph of how the sentiment changes over time, e.g. in may the number of positives is X, in june... etc

     

    Any guidance?

     

    Thanks,

    /Linn

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,519  Community Manager

    hi @linn_ansved_636 - sure lots of resources on that. Have you first checked out our YouTube channel?

     

    https://www.youtube.com/channel/UCxneJBWWNLs-A6ckls1Rrug?view_as=subscriber

     

    Scott

     

     

    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 

Sign In or Register to comment.