🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤
We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.
Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!
Problem with special preprocessing of texts
Removing links/URL and Hash tags Tweet may
contain URL, hash tags and words start with ‘@’
character. We removed these entities since found no
significance in our scoring approach.
Replacing word with contractions Contractions such as
‘didn’t’, ‘ain’t’ ‘couldn’t’ are common in tweets.
Elongation replacer People often use elongation like
‘loooooooove’ to emphasise words. Elongation can be
at the beginning (‘ooooooh’), end (‘toooooo’) or in
example ooooooooh what a coooooool breeze => ooh what a cool breeze
WordNet Lemmatizing Wordnet lemmatizer is used to
get a valid meaningful root word. Each word (except
slang/abbreviation) is lemmatized after tokenizing.
Explicit negation handling We used an antonym
replacer using WordNet to replace word preceded by
‘not,’ ‘never,’ etc.