🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

Problem with special preprocessing of texts

ghamgham Member Posts: 24
edited December 2018 in Help
Hi, is there any possibility of preprocessing in the RapidMiner? How?

Removing links/URL and Hash tags Tweet may


contain URL, hash tags and words start with ‘@’
character. We removed these entities since found no
significance in our scoring approach.

Replacing word with contractions Contractions such as
‘didn’t’, ‘ain’t’ ‘couldn’t’ are common in tweets.

Elongation replacer People often use elongation like
‘loooooooove’ to emphasise words. Elongation can be
at the beginning (‘ooooooh’), end (‘toooooo’) or in
between (‘loooove’)
example ooooooooh what a coooooool breeze => ooh what a cool breeze

WordNet Lemmatizing Wordnet lemmatizer is used to
get a valid meaningful root word. Each word (except
slang/abbreviation) is lemmatized after tokenizing.


Explicit negation handling We used an antonym
replacer using WordNet to replace word preceded by
‘not,’ ‘never,’ etc.

thanks
Tagged:

Answers

  • ghamgham Member Posts: 24

    hi

    please help me..

     

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,512  Community Manager

    hello @gham - so I again suspect that no one responded because no one can quite understand your question. Perhaps you could rephrase AND add your XML process so we can see what you are doing?


    Scott

     

    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 

  • ghamgham Member Posts: 24

    Hello
    I want to preprocess them for sentiment analysis Twitter data
    As I mentioned above
    But I do not know what operators and how to use them?
    For example, how can I remove links in tweets? or remove symbol in tweet?
    Thankful

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 564   Unicorn

    It sounds like you should be able to use the operator Replace Tokens to do most of that list. 

     

     

    sgenzer
  • ghamgham Member Posts: 24

    Hello
    I need your help . you're welcome
    How to remove links from tweets. How do I remove "@"?
    ???

     

  • ghamgham Member Posts: 24

    Hello
    I want internet links like
    http: // jhghjgjh / jhghjgh
      Delete from texts.
    I do not know which operator to use?
    And I want to
    Prefixes like words
    veeeeryyyHello
    I want internet links like
    http: // jhghjgjh / jhghjgh
      Delete from texts.
    I do not know which operator to use?
    And I want to
    Prefixes like words
    veeeeryyy
    That word is very
    Have been deleted. But I do not know what the operator does.
    And I will extend the words of the abbreviation to their original. How?
    Please help
    That word is very
    Have been deleted. But I do not know what the operator does.
    And I will extend the words of the abbreviation to their original. How?
    Please help

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,512  Community Manager

    hello @gham - you're likely going to want to learn how to write "Regular Expressions" (also known as "RegEx") to deal with situations like this. RegEx is a language that will take some time to learn and understand but it is well worth it. Some great resources for you are (1) this book, recommended to me by @Telcontar120 and now sits permanently on my desk, and (2) the website https://regexr.com/ which is the go-to reference for many many people.

     

    In RapidMiner, you're going to do this by using the Replace Tokens operator (as mentioned above) -> Parameters -> replace dictionary -> click on "Edit List..." -> click on the "RegEx" button as shown below -> typing in your RegEx expression or using one of the pre-made ones.

     

    Screen Shot 2018-04-12 at 9.30.12 AM.png Screen Shot 2018-04-12 at 9.30.19 AM.png Screen Shot 2018-04-12 at 9.30.28 AM.png

    ----------------------
    Don't forget to submit your great ideas for Wisdom 2020! Deadline is November 15.

    Wisdom 2020 – Call for Speakers Form 

    Telcontar120
  • ghamgham Member Posts: 24

    Hello
    Thank you very much
    I wrote this to remove the link but the entire text was deleted!
    Please help me master
    [http://a-z.az/a-z]
    Please tell me how should I write
    And
    Do you have a dictionary to remove stop words and root? Do i download Thank you for helping me
    And
    How to write regular expressions to remove the letters?
    Like veeerrryyyyy -> very

Sign In or Register to comment.