The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

Disease Ontology for text mining

DocMusherDocMusher Member Posts: 333 Unicorn
edited November 2018 in Help
Hi,

Somebody used DO before with RapidMiner?

Maybe somebody can help me how to use this technology for text mining within RM?
http://disease-ontology.org/

Thanks
Sven


Does the DO browser have an API that can be accessed in a programmer-friendly way?
The DO browser does have an API that can be accessed via HTTP requests in a programmatic fashion. Currently only retrieving term metadata is supported but in the future we hope to expand the functionality. More information can be found on the tutorial page
What database does the Disease Ontology browser use?
The Disease Ontology browser uses Neo4j to store the ontology metadata. Neo4j falls under the umbrella of NoSQL databases (Wikipedia) being an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs. Since Neo4j is a graph-based persistence engine, representing graph structures that include multiple relationships is very easy and data retrieval is very fast. Where a relational database might store ontology terms in one table with a one-to-many connection to another table containing relationships, a graph database stores terms as nodes that are connected to each other by edges (relationships).

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    Hi Sven,

    getting the metadata in is easy with a process as attached. But i guess you want to get the PDFs and work on them?

    Edit: It can be possible to do some loops around this and get this into RM using some wget fun. mhhhh.. Anyway: I would store this in a SQL-DB or maybe even better, a Solr Server.

    ~Martin

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.5.002">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.5.002" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="web:get_webpage" compatibility="6.5.000" expanded="true" height="60" name="Get Page" width="90" x="179" y="120">
           <parameter key="url" value="http://www.disease-ontology.org/api/metadata/DOID:4"/>
           <list key="query_parameters"/>
           <list key="request_properties"/>
         </operator>
         <operator activated="true" class="text:json_to_data" compatibility="6.5.000" expanded="true" height="76" name="JSON To Data" width="90" x="313" y="120"/>
         <connect from_op="Get Page" from_port="output" to_op="JSON To Data" to_port="documents 1"/>
         <connect from_op="JSON To Data" from_port="example set" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • DocMusherDocMusher Member Posts: 333 Unicorn
    Dear Martin,
    Thanks for the feedback!
    In fact I would like to use the Disease Ontology (DO) for a text mining task. My data consists of one ID column and one column with a description of the diagnosis on critical care  unit admission. The DO might provide a classification of patients based on the diagnosis on admission.
    Cheers
    Sven
  • DocMusherDocMusher Member Posts: 333 Unicorn
    Hi  all,
    This is the mail I got from the DO website.

    Hello Sven,
    Thank you for your interest in the Disease Ontology.
    The ontology is licensed to allow use for any application without restrictions. The data within the file is presented in tag: value pairs, such as name: Alzheimer's disease, which should be useful for text mining. Our ontology file also includes textual definitions and synonyms that should be useful for data retrieval.
    Please let us know if we can be of further help with your project.
    Regards,
    Lynn Schriml

    Anyone a suggestion (maybe somebody already got a similar xml)to use this for classification of Medical Diagnosis?
    Thanks
    Sven

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hi Sven,

    I agree with Martin that this could be very useful to combine in a Solr server as the text search features are very powerful. 
    From the description of the data format being in value pairs separated by a colon, this description sounds like JSON format to me.  I wonder if they store things in Mongo? 
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    Hi Sven,

    i extended the process a bit. For a given root it now crawls all the children and puts it into a nicer format. Is one of the xrefs something we can use to get the full text?

    ~Martin

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.5.002">
      <operator activated="true" class="web:get_webpage" compatibility="6.5.000" expanded="true" height="60" name="Get Page" width="90" x="112" y="120">
        <parameter key="url" value="http://www.disease-ontology.org/api/metadata/DOID:225"/>
        <parameter key="random_user_agent" value="false"/>
        <parameter key="connection_timeout" value="10000"/>
        <parameter key="read_timeout" value="10000"/>
        <parameter key="follow_redirects" value="true"/>
        <parameter key="accept_cookies" value="none"/>
        <parameter key="cookie_scope" value="global"/>
        <parameter key="request_method" value="GET"/>
        <list key="query_parameters"/>
        <list key="request_properties"/>
        <parameter key="override_encoding" value="false"/>
        <parameter key="encoding" value="SYSTEM"/>
      </operator>
    </process>
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.5.002">
      <operator activated="true" class="text:json_to_data" compatibility="6.5.000" expanded="true" height="76" name="JSON To Data" width="90" x="313" y="120">
        <parameter key="ignore_arrays" value="false"/>
        <parameter key="limit_attributes" value="false"/>
        <parameter key="skip_invalid_documents" value="false"/>
      </operator>
    </process>
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.5.002">
      <operator activated="true" class="transpose" compatibility="6.5.002" expanded="true" height="76" name="Transpose" width="90" x="447" y="120"/>
    </process>
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.5.002">
      <operator activated="true" class="replace" compatibility="6.5.002" expanded="true" height="76" name="Replace" width="90" x="581" y="120">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="id"/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="replace_what" value="(\[[0-9]+\])\[1\]"/>
        <parameter key="replace_by" value="$1 DOI"/>
      </operator>
    </process>
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.5.002">
      <operator activated="true" class="replace" compatibility="6.5.002" expanded="true" height="76" name="Replace (2)" width="90" x="715" y="120">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="id"/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="replace_what" value="(\[[0-9]+\])\[0\]"/>
        <parameter key="replace_by" value="$1 Name"/>
      </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.