Need help in analyzing medical keywords in a column of free text. Excel does not work here..

arsalan_karimarsalan_karim Member Posts: 14 Contributor II
edited December 2018 in Help

hi Everyone.

I need some serious help. I have been working on an excel file. Just one column. It contains data coming from pHysicians offices. Its a string of free text that the doctor would write down when examining a patient. This column pertains to the daignosis information. I need to create a model to give this data some structure. 

I am specifically trying to filter out all conditions that are related to migraine. the way I am doing it in microsoft excel is that I am using the "if,error,search" functions to sniff out the keywords from the table. I need two kinds of Keywords:

includes: i.e all keywords that can be "Migraines"

excludes: i.e all keywords that if present can never be migraines.

Sometimes I have to combine "includes" and "excludes" to find out the actual migraine.. for example:

 

Includes = Migraine

Excludes = family History of

 

in this case I am trying to look for a patient with Migraine, not someone who has a family history of Migraine. So I need to exclude the text "family History of". its like "this string should include this keyword and exclude this keyword"

 

I think this should be faily simple in rapidminer. It is taking my hours and hours of formulas in excel and driving me crazy since i have about half a million rows to analyse and too many formulas. The objective is to create a model that i can scale up to other diseases as well. 

 

Can anyone help...

 

I am attaching the excel file with some data as well as some examples of includes and excludes I am using. Created a zip file with the excel file inserted

 

Thanks 

Arsalan (MD)

Answers

  • FBTFBT Member Posts: 106 Unicorn

    You could try the "Replace" operator. It allows you to replace your values with some regular expression logic. Afterwards it's just a matter of filtering the examples. However, be careful with respect to typos. You may for example want to just look for "migr", instead of "Migraine" to decrease the chance of missing something. 

  • arsalan_karimarsalan_karim Member Posts: 14 Contributor II

    Hi FBT

    Thanks for replying, but I cant figure out how to use the Replace function.

    I tried it but it does not seem to be changing anything in my original text. 

  • FBTFBT Member Posts: 106 Unicorn

    Ok, there is actually a simpler way. You can just use the "Filter Example" operator and select "Contain" together with your specified keyword to filter for what you are looking for. Take a look at the process below (just copy & paste it into your XML tab and press the green checkmark on the left top side).

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.5.000" expanded="true" height="68" name="Retrieve Migraine snipper test file" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Local Repository/Migraine snipper test file"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.5.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="34"/>
    <operator activated="true" class="filter_examples" compatibility="7.5.000" expanded="true" height="103" name="Filter Examples - No Migraine" width="90" x="380" y="187">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Name_Clean.contains.Hemmeroids"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.5.000" expanded="true" height="103" name="Filter Examples - Migraine" width="90" x="380" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Name_Clean.contains.Migr"/>
    </list>
    </operator>
    <connect from_op="Retrieve Migraine snipper test file" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Filter Examples - Migraine" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Filter Examples - No Migraine" to_port="example set input"/>
    <connect from_op="Filter Examples - No Migraine" from_port="example set output" to_port="result 2"/>
    <connect from_op="Filter Examples - Migraine" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    The "No Migraine" filter is just for illustration purposes. You would need to enter whatever your relevant keyword is. If you have more, remember to select "match any" within the operators filtering panel.

  • arsalan_karimarsalan_karim Member Posts: 14 Contributor II

    Super - I have crossed stage 1 with your help. Please check attached file.

    Now for stage 2 - I need to create seperate attributes for each of these key words. I need a table like this below. 

     

    Name_Clean Type of keyword
    Migraines since Grade 7. Migraine
    ?episodic paroxysmal hemicrania Hemicrania
    severe headache 2010 - spinal tap negative headache

     

    By doing this i could easily generate aggregates and do my counts. Need help in creating this table.

  • FBTFBT Member Posts: 106 Unicorn

    See, if the below process does what you are looking for:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.5.000" expanded="true" height="68" name="Retrieve Migraine snipper test file" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Local Repository/Migraine snipper test file"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.5.000" expanded="true" height="124" name="Multiply" width="90" x="179" y="34"/>
    <operator activated="true" class="filter_examples" compatibility="7.5.000" expanded="true" height="103" name="Filter Examples - Rheumatic Fever" width="90" x="380" y="238">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Name_Clean.contains.Rheumatic"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.5.000" expanded="true" height="103" name="Filter Examples - Migraine" width="90" x="380" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Name_Clean.contains.Migr"/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.5.000" expanded="true" height="82" name="Generate Attributes - Migrane Keyword" width="90" x="581" y="34">
    <list key="function_descriptions">
    <parameter key="Type of Keyword" value="&quot;Migraine&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.5.000" expanded="true" height="82" name="Generate Attributes - Rheumatic Fever" width="90" x="581" y="238">
    <list key="function_descriptions">
    <parameter key="Type of Keyword" value="&quot;Rheumatic Fever&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.5.000" expanded="true" height="103" name="Filter Examples - Headache" width="90" x="380" y="391">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Name_Clean.contains.headac"/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.5.000" expanded="true" height="82" name="Generate Attributes - Headache" width="90" x="581" y="391">
    <list key="function_descriptions">
    <parameter key="Type of Keyword" value="&quot;Headache&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="append" compatibility="7.5.000" expanded="true" height="124" name="Append" width="90" x="782" y="238"/>
    <connect from_op="Retrieve Migraine snipper test file" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Filter Examples - Migraine" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Filter Examples - Rheumatic Fever" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 3" to_op="Filter Examples - Headache" to_port="example set input"/>
    <connect from_op="Filter Examples - Rheumatic Fever" from_port="example set output" to_op="Generate Attributes - Rheumatic Fever" to_port="example set input"/>
    <connect from_op="Filter Examples - Migraine" from_port="example set output" to_op="Generate Attributes - Migrane Keyword" to_port="example set input"/>
    <connect from_op="Generate Attributes - Migrane Keyword" from_port="example set output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Generate Attributes - Rheumatic Fever" from_port="example set output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Filter Examples - Headache" from_port="example set output" to_op="Generate Attributes - Headache" to_port="example set input"/>
    <connect from_op="Generate Attributes - Headache" from_port="example set output" to_op="Append" to_port="example set 3"/>
    <connect from_op="Append" from_port="merged set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    It's based on your sample file again, which does not have an example of Hemicrania, hence I replaced it with Rheumatic Fever, for demo purposes. A word of caution: migraine and headache show up simultaniously in one example. This may become a cause of error in your further analysis, hence you may want to set a more elaborate filter, to make sure that the right keywords are assigned to such examples.

  • arsalan_karimarsalan_karim Member Posts: 14 Contributor II

    Oh yeah this is much better . we break the data into smaller filters and tag them with the keyword and join them back into one in the end...

    Thank you so much for your help FBT.

    Really appreciate it...

     

    Arsalan 

     

  • DocMusherDocMusher Member Posts: 333 Unicorn

    Hi,

    You could also take a look at Metamap (https://metamap.nlm.nih.gov/), we used it before with RM. 

    Cheers

    Sven

Sign In or Register to comment.