RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
Entity Sentiment and Extraction - Analyzing English and French tweets using Rapidminer and Rosette
Vive la République et Vive la France!
Analyzing English and French tweets during the election weekend using Rapidminer and Rosette
After the surprising results of the U.S. presidential election and the UK “Brexit” vote, many expected another populist upset in France’s recent election. As we now know, Emmanuel Macron of En Marche! defeated populist candidate Marine Le Pen of the National Front. Did popularity and sentiment in social media reflect the election outcome? We took a look using Rosette API and RapidMiner.
We used Rosette API’s entity extraction on English and French tweets to see who were the most mentioned people during that weekend. Marine Le Pen is often referred to as the “French Trump,” so we were curious to see if the U.S. president would make it to the top of the most mentioned people. We also looked at sentiment analysis to look for trends in Twitter.
The free version of RapidMiner Studio 7.5.001 enabled us to analyze 10,000 tweets (1 tweet = 1 row) which could incur a 30% loss of accuracy with such a small model. Naturally for enterprise analysis, we encourage you to upgrade to build models on more data for better results.
Unsurprisingly, most of the tweets we collected are in French (about 80%), followed by English and Italian. Apart from a single tweet in Uzbek, the other languages identified were native European languages (German, Lithuanian, Estonian, Czech, Polish, and a few others) and languages commonly spoken in Europe, including Arabic and Turkish.
Extracting entities is a powerful way to get an accurate snapshot of what figures are trending without reading each tweet. While keyword search may return some information, they are inherently biased by your expectations, and less valuable than letting the data speak for itself.
In this data set, the most frequently occurring entity types are Titles (M. - Mrs), Organization (ex. Reuters, BBC), and Location (France naturally being the most common).
As you can imagine the most popular names were “Marine Le Pen” and “Emmanuel Macron,” but Mme Le Pen outranked her opponent in mentions. Former French president, Francois Hollande, was the third most frequently mentioned French figure. Among Americans, Barack Obama and Donald Trump did make it to the top 15, following several French celebrities. As President Obama officially endorsed Emmanuel Macron, he was more popular in our data than Trump (placing #13 for Obama, #15 for Trump).
Additionally, the majority of tweets include a URL indicating that people are more likely to share information or an image than draft a tweet from scratch with just plain text.
Overall, the tweets we collected were more negative than positive. This trend is not unique to politics, as far more people tend to take to the internet to complain than praise.
Being the most frequently mentioned person does not necessarily make you the most “desired person.” This theory is evidenced by the fact that Mme Le Pen lost the election despite being more widely talked about. However we decided to go a step further by analyzing Entity-specific sentiment analysis which shows the feelings for a given entity.
We applied entity sentiment to Emmanuel Macron and Marine Le Pen to see what people were “feeling” about each candidate. As expected, neutral sentiment dominated as people neutrally share URLs mentioning the candidate. However the results followed the same trend as the overall sample: the majority of tweets were negative about both candidates (see pie chart of Emmanuel Macron entity sentiment).
Now that we know that Emmanuel Macron has been elected the 8th President of the Fifth French Republic, we can also confirm that being the most popular/mentioned person on Twitter does not make you the winner. It was also interesting to see that there wasn’t a huge difference between Emmanuel Macron’s negative/positive sentiment ratio and Marine Le Pen’s: neutral first, then negative then positive in about the same ratio.
Perform your own analysis
To process our 10,000 tweets we used several Rosette API operators, including:
- Identify language
- Entity extraction
- Sentiment analysis
- Entity sentiment
We also used Rapidminer operators to compare results between English tweets and French tweets:
- Filter example - to filter results to just English, for example
Then, to narrow the results, we used Grouping Aggregate and Sort.
- Grouping aggregate - to visualize results by entity
- Sort - to sort the results from most to least numerous
Want to play around using Rapidminer Studio and Rosette API? Download the tools for free now. Learn how to get started with Rosette operator by looking at our previous blog post about entity extraction.