Options

Importing XML data from a database

zwembroekzwembroek Member Posts: 2 Contributor I
edited September 2019 in Help
Hello everyone.

Sorry if this is a repost, but I could not find the information.

I have an oracle database at hand, which I want to import. In my database, there is actually a very simple table, with some columns with metadata, and in one data column an xml structure is present, with in it all kind of information on which I want to mine after the data import.

I see that there is a read database service in rapidminer. In this service I can define a query to extract my data. I believe I can not use the read database service, as it extracts attributes 'directly' from the tables using the query. I saw that there is also a Read XML service, that allows you to use xpath to extract attributes from XML structures. This service uses a file as input, not a database. So my question is, how does the process look like in rapid miner that extracts an xml from the database, and then extracts the attributes out of the xml? Directly connecting the read database with the read xml gives 'Expected FileObject but received ExampleSet', which makes sense.

Thanks for your answer!
Tagged:

Answers

  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Assuming all the XML structures in the field are the same you can use the a write operator to create it as a file before reading it.

    Something like this below would do it, it turns your XML field stored as text into documents and then loops through the resulting collection creating XML files in memory creating XML files in memory which can then used by the ReadXML operator. 
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.4.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_database" compatibility="6.4.000" expanded="true" height="60" name="Read Database" width="90" x="45" y="120">
            <parameter key="query" value="SELECT id, xml FROM myTable; "/>
            <enumeration key="parameters"/>
          </operator>
          <operator activated="true" class="text:data_to_documents" compatibility="6.4.001" expanded="true" height="60" name="Data to Documents" width="90" x="179" y="120">
            <list key="specify_weights"/>
          </operator>
          <operator activated="true" class="loop_collection" compatibility="6.4.000" expanded="true" height="76" name="Loop Collection" width="90" x="313" y="120">
            <process expanded="true">
              <operator activated="true" class="text:write_document" compatibility="6.4.001" expanded="true" height="76" name="Write Document" width="90" x="112" y="120"/>
              <operator activated="true" class="read_xml" compatibility="6.4.000" expanded="true" height="60" name="Read XML" width="90" x="246" y="120">
                <enumeration key="xpaths_for_attributes"/>
                <list key="namespaces"/>
                <list key="annotations"/>
                <list key="data_set_meta_data_information"/>
              </operator>
              <connect from_port="single" to_op="Write Document" to_port="document"/>
              <connect from_op="Write Document" from_port="file" to_op="Read XML" to_port="file"/>
              <connect from_op="Read XML" from_port="output" to_port="output 1"/>
              <portSpacing port="source_single" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Read Database" from_port="output" to_op="Data to Documents" to_port="example set"/>
          <connect from_op="Data to Documents" from_port="documents" to_op="Loop Collection" to_port="collection"/>
          <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    zwembroekzwembroek Member Posts: 2 Contributor I
    Thanks. They are indeed the same structure. I will try this.
Sign In or Register to comment.