"normalizing error (works backwards in workflow??)"

michaelhecht · November 2011

Hello,

I have a workflow which starts with a

1. Excel file reader
2. then selects attributes
3. then send the original data to a CSV writer
4. and the selected attributes to a normalizer for further processing.

If I now run the workflow and have a look to the written csv file, all "real" columns are normalized despite the normalizer is applied after sending the data to the CSV writer. This is really strange.

So - what can I do to store the original data?

Marco_Boeck · November 2011

Hi,

the normalization "branch" of your process is done before your write csv operator starts working. I suggest the following quick fix:


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
    <process expanded="true" height="235" width="681">
      <operator activated="true" class="read_excel" compatibility="5.1.014" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="30">
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="true" class="write_csv" compatibility="5.1.014" expanded="true" height="60" name="Write CSV" width="90" x="179" y="30">
        <parameter key="csv_file" value="C:\Users\boeck\Desktop\Test.csv"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.1.014" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="30">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Test"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="5.1.014" expanded="true" height="94" name="Normalize" width="90" x="447" y="30"/>
      <connect from_op="Read Excel (2)" from_port="output" to_op="Write CSV" to_port="input"/>
      <connect from_op="Write CSV" from_port="through" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

That way, your csv gets created before anything else, and then your data is modified.

Regards,
Marco

michaelhecht · December 2011

Thank You, this might work, but isn't a solution to my original problem.
Nevertheless, I'm glad (not really) that it is a bug ant not my own incompetence

I've got data with different (more than one) id-columns that I want to pass through the workflow.
If I don't care, RapidMiner selects one column to be the only id. The selected column unfortunately
isn't unique. Therefore I removed all non-necessary (non-unique) id columns prior to the actual workflow
but want to add these again at the end of the workflow, before I write all to the csv-file. To be able to
understand the result of the workflow I also wanted to write the non-normalized columns - which didn't
work. That's why I need to write the csv at the end of the workflow.

Meanwhile I found that I can join all data with the ori-output of the normalizer. This
seems to be a better workaround.

By the way: I wonder why there is no de-normalizer node to improve the readability of
the output.

Marco_Boeck · December 2011

Hi,

Just a hint, you can switch the ID role to the real ID column by using the Exchange role operator. That way, you don't need to remove any columns for your process.

Regards,
Marco

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"normalizing error (works backwards in workflow??)"

Answers