negation in clinical note text mining
I'm working with a large set of clinical notes and it seems like the clinicians are trained to spend half their time writing down what is NOT going on with the patient. So, in order to apply many text mining techniques I'm having to learn how to handle negation in context.
I've seen a brief dialog about this topic in which @mschmitz and @SvenVanPoucke discussed the issue https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Include-Negations-in-Dictionary-based-Sentiment-Approach/m-p/44266/highlight/true#M29247
And, I see that Martin added negation with a word window to the Operator toolkit Dictionary Based Sentiment. I think the way it was implemented is very flexible and I look forward to using it when I focus on sentiment.
Right now, I'm attempting to 'tag' my corpus of documents regarding "Suicide-Mentioned" vs. "Suicide-Deny-Mention" as a way to make our documents search a little better. It's difficult to write the regex or Lucene queries needed to reliably find Suicide related notes so I want to preprocess and tag the notes for the clinicians using Python or RapidMiner's more sophisticated toolsets.
There are 2M documents in the corpus, each of which may be as short at 1-2 sentences to as long as several pages. They are typical unstructured text notes although there are patterns in how the different clinicians discuss suicidality (deny or endorse).
My first pass at the task used regex inside of SQL Server and ran for three days to get through the 2M documents. The quality is being reviewed now, but I don't think it will be acceptable to the clinical director. Recall may not be high enough for field use with this approach.
There are some medical note negation tools available Negex and PyContext and several papers that address the issue. I'm new to RapidMiner and would like to apply RM to the issue and thought to ask for advice on how folks here might address an issue like this.
Thanks in advance for your help/advice...Steve