🎉 🎉   RAPIDMINER 9.5 BETA IS OUT!!!   🎉 🎉
GRAB THE HOTTEST NEW BETA OF RAPIDMINER STUDIO, SERVER, AND RADOOP. LET US KNOW WHAT YOU THINK!
🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤
We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.
Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!
Counting Emojis in Text Mining
Hello - there was a good question about how to do text mining with strange characters such as emojis. I like to do a little "ETL jujitsu" when I work with text data like this, converting the text temporarily to unicode/UTF-8 Hex to get unique, easily parsed tokens, and then converting back. Here's the idea:
1. Import your example set of text data:
2. Get your master set of emojis (I got them from here) and then put them into an Excel doc or whatever. I like putting the Unicode in brackets so I can find it easily + tokenize if desired (see "Unicode RM" column):
3. Use the Encode URL to convert your text to UTF-8 Hex, Replace the UTF-8 Hex to Unicode or whatever with your Excel Dictionary, and then convert back:
Voilà - perfect conversion (well not bad anyway!)
If you want to put that in a process that counts emojis, just add on some text mining using Process Documents From Data and join back with the original data set:
Thanks to user @gjagiello for the data and the inspiration!
[process attached for those that want to take a look]
Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.
Wisdom 2020 – Call for Speakers Form