Supervised Sentiment Analysis - Removing @

Anna_May1Anna_May1 Member Posts: 14 Learner I
edited October 2020 in Help
Hi there, 

I'm currently working on doing a supervised sentiment analysis with Instagram comments. One of the issues I'm having is that there are a lot of comment replies, which start by mentioning the name of the person that the reply is directed at. 
So one person comments on something and another person replies to this comment by starting their reply with @nameofthecommenter . This name though, by being part of the excel sheet and thus the data I'm taking into consideration, is being taken into the analysis and is thus influencing the outcome of it, because the name is also being rated. I know that I can remove whole cells containing an @ , but that would also remove the rest of the comment and thus valuable data. 

Is there any way to only remove what follows the @ right away, thus only removing the name of the person that is being replied to, without deleting the whole comment?

Thanks in advance!

Anna May

Best Answer


  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi @Anna_May1,

    there are multiple ways to achieve this.
    Do you have the data in a table, with the comment in one nominal column? In that case, use Blending/Values/Replace with a regular expression. E. g. you would replace "^\@[a-zA-Z0-9]+ *" (without the quotes) by nothing. This expression means:
    ^    Begin of the string
    \@  The at sign, escaped with a backslash to remove any special meaning
    [a-zA-Z0-9_]+  One or more of the mentioned character classes, following the @ sign.
     *    Zero or more spaces (so the remaining text won't start with a space)

    The regular expression editor window has a drop-down menu with hints for these and other parts of regular expressions.

    You can leave the replacement empty, because you replace the user name with nothing. 

    If you work with already tokenized data (split to single words), you can use Replace Tokens with the same regular expression.

    Best regards,
  • Options
    Anna_May1Anna_May1 Member Posts: 14 Learner I
    edited November 2020

    thanks a lot for your reply. I have tried your suggestions but they sadly didn't work for me. Not sure whether I did it the right way. 

    I have attached my process as well as the raw data.
    The goals I'm trying to achieve is: 
    -remove any word (not the whole row) starting with "@". 
    -remove empty rows
    -remove duplicates
    -remove emojis (right now, with this process I ended up with question marks instead of the emojis as output, so I'd rather remove the emojis right away)

    Do you have any input for me as to how to achieve that?

    Have a lovely day!

    Kind regards

    Anna May
Sign In or Register to comment.