Sentence Extraction based on wordlist

kkdataminerkkdataminer Member Posts: 10 Contributor II
edited November 2018 in Help

How to extract a whole sentence from the document if there is a keyword match?


I followed the below steps but it is returning just the matching words instead of the whole sentence. Any help on this would be really appreciated.


1) Retrieve Wordlist (this doc has keywords)

2) Nominal to Text

3) Process Documents

4) Output of Process Documents Words connected to Input of 2nd process document(this has whole sentences)

5) Further more in 2nd process document , used some other operators like tokenize.


Final output is Wordlist with matching keywords. I want whole sentence from 2nd document to be displayed instead of just the matching keywords.Please help.

Best Answer


  • Options
    jeannejeanne Member Posts: 4 Contributor I


    Another option, if you're interested, would be to use Rosette Text Toolkit and to try processing the text as sentences from the beginning in an ExampleSet.


    Split your text first into sentences using Rosette's Extract Sentences operator. That will put each sentence as an Example in an ExampleSet.


    Now that you have sentences already separated, you should be able to Filter Examples with a custom filter by "contains word in this wordlist". That last step, matching on an entire wordlist, might require some Text Processing or more Rosette work -- you may need to then separate sentences into tokens with Rosette's Tokenize operator, and then you have a token list linked to the original sentence that you can filter against the wordlist (carrying the original sentence through the process). Matching against an entire wordlist versus one or two specific tokens is what I haven't tried yet (maybe someone else can help with that?). I'd assume you could somehow use the Wordlist as an attribute to filter against...


    Good luck with that, 



Sign In or Register to comment.