Options

[SOLVED] How to add column names from a database table to a headerless text file

gvanvuurengvanvuuren Member Posts: 8 Contributor II
edited September 2019 in Help
I've got a standard csv text file, but without headers, which I want to write to an existing database schema. So first I load both the csv file and the database table into RM. Then I need to extract the column names from the existing database table. Then I need to rename the default column names assigned by RM to the loaded csv file by using the extracted column names from the database. And only then can I write the modified csv file back to the database.

The problem is that I cannot seem to join the db column names with the headerless csv columns. I've tried all the operators I could think of, but it doesn't seem possible to extract the db table's meta information into a seperate row in order to prepend the row to the csv file.

I'm sure such a simple ETL task shouldn't be a problem for RM. So I'd appreciate it if someone could point me in the right direction. Is there a way to extract meta-information from a table?

Thanks
Gideon

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hey, I'm sorry, it's not that easy. You need some macro magic and a loop. Please see the attached process for details. In your setting, instead of Generate Data you would read the CSV file.

    Maybe you can do the job also with some clever Transposings and Joins.

    Best, Marius
  • Options
    gvanvuurengvanvuuren Member Posts: 8 Contributor II
    Hi Marius,

    thanks. I appreciate the effort. Could you please re-attach the process? I cannot see it.

    But I think what's missing is an operator that appends rows to a reference table based on the column indices (and types) of the reference table, instead of expecting column names to match. I've been meaning to try my hand at writing an operator, so maybe this is my opportunity  ;)

    Best
    Gideon
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Well, I can't see it neither. Probably someone  ::) forgot to insert the process. Here you go:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
        <process expanded="true" height="341" width="547">
          <operator activated="true" class="generate_data" compatibility="5.3.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_examples" value="10"/>
            <parameter key="number_of_attributes" value="450"/>
          </operator>
          <operator activated="true" class="remember" compatibility="5.3.000" expanded="true" height="60" name="Remember" width="90" x="179" y="30">
            <parameter key="name" value="data"/>
            <parameter key="io_object" value="ExampleSet"/>
          </operator>
          <operator activated="true" class="read_database" compatibility="5.3.000" expanded="true" height="60" name="Read Database" width="90" x="45" y="120">
            <parameter key="connection" value="GezondeKas"/>
            <parameter key="query" value="SHOW COLUMNS FROM WeeklyAggregatedValue;"/>
            <enumeration key="parameters"/>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="5.3.000" expanded="true" height="60" name="Extract Macro" width="90" x="179" y="120">
            <parameter key="macro" value="attributeCount"/>
            <list key="additional_macros"/>
          </operator>
          <operator activated="true" class="loop" compatibility="5.3.000" expanded="true" height="76" name="Loop" width="90" x="313" y="120">
            <parameter key="set_iteration_macro" value="true"/>
            <parameter key="iterations" value="%{attributeCount}"/>
            <process expanded="true" height="439" width="1003">
              <operator activated="true" class="extract_macro" compatibility="5.3.000" expanded="true" height="60" name="Extract Macro (2)" width="90" x="45" y="30">
                <parameter key="macro" value="attributeName"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="Field"/>
                <parameter key="example_index" value="%{iteration}"/>
                <list key="additional_macros"/>
              </operator>
              <operator activated="true" class="recall" compatibility="5.3.000" expanded="true" height="60" name="Recall" width="90" x="179" y="30">
                <parameter key="name" value="data"/>
                <parameter key="io_object" value="ExampleSet"/>
              </operator>
              <operator activated="true" class="rename" compatibility="5.3.000" expanded="true" height="76" name="Rename" width="90" x="313" y="30">
                <parameter key="old_name" value="att%{iteration}"/>
                <parameter key="new_name" value="%{attributeName}"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <operator activated="true" class="remember" compatibility="5.3.000" expanded="true" height="60" name="Remember (2)" width="90" x="648" y="30">
                <parameter key="name" value="data"/>
                <parameter key="io_object" value="ExampleSet"/>
              </operator>
              <connect from_port="input 1" to_op="Extract Macro (2)" to_port="example set"/>
              <connect from_op="Recall" from_port="result" to_op="Rename" to_port="example set input"/>
              <connect from_op="Rename" from_port="example set output" to_op="Remember (2)" to_port="store"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="recall" compatibility="5.3.000" expanded="true" height="60" name="Recall (2)" width="90" x="447" y="120">
            <parameter key="name" value="data"/>
            <parameter key="io_object" value="ExampleSet"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Remember" to_port="store"/>
          <connect from_op="Read Database" from_port="output" to_op="Extract Macro" to_port="example set"/>
          <connect from_op="Extract Macro" from_port="example set" to_op="Loop" to_port="input 1"/>
          <connect from_op="Recall (2)" from_port="result" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    gvanvuurengvanvuuren Member Posts: 8 Contributor II
    I still haven't got it working quite yet, but I got the idea, thanks!

    The operators "Extract Macro", "Set Macro", "Loop Attributes", and "Rename" will do the trick.
Sign In or Register to comment.