Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Counting Emojis

SurvuelSurvuel Member Posts: 4 Learner I
Hi guys, I created community account just for this problem:
I have a excel file full of extracted comments from facebook group and I need to mine all the emojis out of it and count them. Could you please tell me how to do it? I've seen one post where it was described but it uses Encode/decode operator and I don't have them and I don't really understand how to do these kinds of things (and also I'm newb, downloaded trial version just for this one-time use) Any help is greatly appreciated

Answers

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research

    I would recommend to start with watching the introduction videos at the RapidMiner Academy: https://academy.rapidminer.com/

    For an overview about text mining, check these tutorials:

    The reason why you didn't find the operator is that you need to install the Text Mining extension via the RapidMiner marketplace.
    In the top menu got to Extensions -> Marketplace and search for "text processing"


    I hope that helps for a first start.


  • SurvuelSurvuel Member Posts: 4 Learner I
    edited April 2019
    Thanks @David_A , I have installed Text processing already and know the basics like process documents from files and then tokenize but sadly when I search for encode/decode url it won't show up anywhere.
  • kaymankayman Member Posts: 662 Unicorn
    The encode and decode operators are part of the web extension, but not sure if you really need it.
    In your text your emojis might already be represented in their unicode format, if not the decode may be useful.

    Then the challenge will be to find the valid unicode ranges, and transform them into a meaningful name for grouping purposes.

    You can find the whole unicode list here : https://unicode.org/emoji/charts/full-emoji-list.html

    So a possible workflow could be as follows : 
    -> use the text operators to tokenize all your content, by splitting on space or so
    -> keep only the ones within the emoji unicode range (1F600 to E007F)
    -> count these and eventually map them to something meaningful (like 1F4A9 = pile of poo). You could use the above link to generate this mapping table also.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    I really think there is a problem with our search engine. :( I wrote this KB a while ago about exactly this use case..

    https://community.rapidminer.com/discussion/44237/counting-emojis-in-text-mining

    Scott

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
    Once again I'm impressed with amount of knowledge present in the community. :o
    Thanks Scott.
  • SurvuelSurvuel Member Posts: 4 Learner I
    edited April 2019
    Yes I know, I'm trying to follow the post but I'm completely lost. I tried to download and run the process you provided in the end but I just don't know how to run it (please consider that I've downloaded RapidMiner yesterday so my knowledge and skills are really limited. I have few screenshots and If you could help by telling me what to put there

    So for example in "encode url" what do I put into the url attribute bar? (obviously not cell range lol hence it doesn't get me anywhere) and is the encoding selected right? (UTF-8)

    Next on,  Replace (dictionary), I have no idea whatsoever what to do with it (which attribute filter do I need? What do I need to write after "from attribute" and "to attribute"?)

    And same goes for "Decode URL" what am I supposed to put in url attribute and encoding ?
    I would provide you with screenshots but I'm not long enough a member to post them.

     I mean don't get me wrong this programme looks amazing I just can't seem to learn these things in one day (been up till 4 AM last night trying to figure things out)
    Thanks :hushed:
  • SurvuelSurvuel Member Posts: 4 Learner I
    Well i got your process working but now I'm ever more lost than before so I'm just going to Ctrl+F find it in Excel probably
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    HI @Survuel - well you are doing pretty well for someone who downloaded RM yesterday :smile: I would strongly recommend taking a little time to go through the basic training before tackling tricky stuff like this. It will be well worth your time:

    https://academy.rapidminer.com/

    Scott

Sign In or Register to comment.