🎉 🎉. RAPIDMINER 9.8 IS OUT!!! 🎉 🎉

RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance

CLICK HERE TO DOWNLOAD

ICD 9/10 classification

rtrivedirtrivedi Member Posts: 3 Contributor I
edited December 2019 in Help
Has anyone created any pipelines that can accurately return ICD 9 /10 clinical codes from free text, any insights around this topic are much appreciated !!

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,126   Unicorn

    Hi @rtrivedi,

     

    I'm not specialist of medecine, so I went to Internet to understand what is ICD 9 classification.

    If I good understand ICD 9 is an expression like abc.d (for example 123.5) or abc.de (for example 425.23) where a,b,c,d,e are numbers, to define a disease, right ?

    In this case I propose the following process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:process_document_from_file" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Files" width="90" x="179" y="85">
    <list key="text_directories">
    <parameter key="ID9" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Extract_informations"/>
    </list>
    <parameter key="file_pattern" value="*.pdf"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="34">
    <parameter key="mode" value="regular expression"/>
    <parameter key="expression" value="[ ]"/>
    </operator>
    <operator activated="false" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Information" width="90" x="380" y="34">
    <parameter key="query_type" value="Regular Expression"/>
    <list key="string_machting_queries"/>
    <list key="regular_expression_queries">
    <parameter key="IDx" value="/\[[0-9]+\]/"/>
    </list>
    <list key="regular_region_queries"/>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    </operator>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="85">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value=".*\..*"/>
    </operator>
    <connect from_op="Process Documents from Files" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    To execute this process, you have to : 

     - Set the path where your text file is stored in the parameters of Process Documents from Files operator.

     - Eventually set the file pattern in the parameters of Process Documents from Files operator.

     

    Does this process answer to your need ?

    If not, can you be more explicit about what you want to do ?

     

    Regards,

     

    Lionel

    sgenzer
  • rtrivedirtrivedi Member Posts: 3 Contributor I

    Hi , Thank you looking into this ,  i have attached the data file below that has the ICD 10 codes, the use case i have is as follows, IF a user types Typhoid Fever, it will in turn return the  ICD 10 Code , A04.  I was planning on using this file to train my model and then use the input as Free Text to return the ICD 10 with a confidence interval.

     

    Here is the link to the ICD-10 training file

     

    https://drive.google.com/open?id=19Y8gn3qRNmIsJdYB1FupSTiXTIGm1pdS

  • DocMusherDocMusher Member Posts: 329   Unicorn

    Hi,

    We have been working on this topic too. 

    Perhaps you could try to specify more the use case you have.

    1. Negation and similar, are important to consider. The example universally used is the radiologist note mentioning: "Probably a possible tumor. "
    2. Next, what is the granularity you need to use while coding? In other words how perfect should your ICD coding be? Is G30 sufficient or do you need G30.1 (in the context of Alzheimer)?
    3. What is your source free text? notes, another coding? How unstructured is the text. Are spelling mistakes and abreviations part of your text? 
    4. I have seen acceptable results using Elastic Search in this setting. 
    5. What is the language you are working in? More tools are available in English for instance. 
      Happy to help you in the next steps.
      Sven
    sgenzer
  • rtrivedirtrivedi Member Posts: 3 Contributor I

    Thank you for looking into this Sven,

    1. Negation and similar, are important to consider. The example universally used is the radiologist note mentioning: "Probably a possible tumor. "
    • This would be too high level to get a proper Estimation
  • Next, what is the granularity you need to use while coding? In other words how perfect should your ICD coding be? Is G30 sufficient or do you need G30.1 (in the context of Alzheimer)?
    • G30 would be ideal
  • What is your source free text? notes, another coding? How unstructured is the text. Are spelling mistakes and abreviations part of your text? 
    • It is free Text, where users enter the diagnosis in a CRM application
  • I have seen acceptable results using Elastic Search in this setting. 
  • What is the language you are working in? More tools are available in English for instance.  All English.