Preserve rows during text processing

JohnG22JohnG22 Member Posts: 2 Newbie
Hello Rapidminer friends - I'm trying to process some text for sentiment analysis and have gotten stuck. I have an excel spreadsheet with about 3000 rows, each of which is a free-text comment expressing sentiment towards an experience. I would like to use the "Extract Sentiment" operator from the operator toolbox extension to allocate a sentiment value to each individual comment.

I am importing, changing nominal to text, and then using process documents from data with the sub-operators Tokenize, Transform Cases, Filter Stopwords (English), Filter Tokens (by Length), Stem (Porter). When I check the results at this stage, each row is associated with a token rather than an original string of tokens that would have formed a row. Is there a way around this, or of re-stitching discrete tokens back together after the above steps? I need to allocate a sentiment to each row within the spreadsheet, rather than the entire spreadsheet.

Many thanks for your help - and apologies if this is a newbie query :smile:

Best Answer

Answers

  • JohnG22JohnG22 Member Posts: 2 Newbie
    Thankyou so much @mschmitz - this has solved the problem. Hope you have a great weekend!
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi @JohnG22,

    if you have an Id attribute in your spreadsheet, you can use that; if not, just use Generate Id.

    Then duplicate the table using Multiply. On one copy, do the preprocessing, join back the other after doing the processing. Select the attributes you need.

    Another way would be creating a copy of the text attribute, but keeping it with the Nominal type. 

    It depends on your process if the first or second approach is easier.

    Best regards,

    Balázs
Sign In or Register to comment.