Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Sentence Extraction based on wordlist

kkdataminerkkdataminer Member Posts: 10 Contributor II
edited November 2018 in Help

How to extract a whole sentence from the document if there is a keyword match?

 

I followed the below steps but it is returning just the matching words instead of the whole sentence. Any help on this would be really appreciated.

 

1) Retrieve Wordlist (this doc has keywords)

2) Nominal to Text

3) Process Documents

4) Output of Process Documents Words connected to Input of 2nd process document(this has whole sentences)

5) Further more in 2nd process document , used some other operators like tokenize.

 

Final output is Wordlist with matching keywords. I want whole sentence from 2nd document to be displayed instead of just the matching keywords.Please help.

Best Answer

Answers

  • jeannejeanne Member Posts: 4 Contributor I

    Hello, 

    Another option, if you're interested, would be to use Rosette Text Toolkit and to try processing the text as sentences from the beginning in an ExampleSet.

     

    Split your text first into sentences using Rosette's Extract Sentences operator. That will put each sentence as an Example in an ExampleSet.

     

    Now that you have sentences already separated, you should be able to Filter Examples with a custom filter by "contains word in this wordlist". That last step, matching on an entire wordlist, might require some Text Processing or more Rosette work -- you may need to then separate sentences into tokens with Rosette's Tokenize operator, and then you have a token list linked to the original sentence that you can filter against the wordlist (carrying the original sentence through the process). Matching against an entire wordlist versus one or two specific tokens is what I haven't tried yet (maybe someone else can help with that?). I'd assume you could somehow use the Wordlist as an attribute to filter against...

     

    Good luck with that, 

     

    Jeanne

     
Sign In or Register to comment.