The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

"normalizing error (works backwards in workflow??)"

michaelhechtmichaelhecht Member Posts: 89 Maven
edited June 2019 in Help

I have a workflow which starts with a

1. Excel file reader
2. then selects attributes
3. then send the original data to a CSV writer
4. and the selected attributes to a normalizer for further processing.

If I now run the workflow and have a look to the written csv file, all "real" columns are normalized despite the normalizer is applied after sending the data to the CSV writer. This is really strange.

So - what can I do to store the original data?



  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering

    the normalization "branch" of your process is done before your write csv operator starts working. I suggest the following quick fix:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.014">
      <operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
        <process expanded="true" height="235" width="681">
          <operator activated="true" class="read_excel" compatibility="5.1.014" expanded="true" height="60" name="Read Excel (2)" width="90" x="45" y="30">
            <list key="annotations"/>
            <list key="data_set_meta_data_information"/>
          <operator activated="true" class="write_csv" compatibility="5.1.014" expanded="true" height="60" name="Write CSV" width="90" x="179" y="30">
            <parameter key="csv_file" value="C:\Users\boeck\Desktop\Test.csv"/>
          <operator activated="true" class="select_attributes" compatibility="5.1.014" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Test"/>
          <operator activated="true" class="normalize" compatibility="5.1.014" expanded="true" height="94" name="Normalize" width="90" x="447" y="30"/>
          <connect from_op="Read Excel (2)" from_port="output" to_op="Write CSV" to_port="input"/>
          <connect from_op="Write CSV" from_port="through" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
    That way, your csv gets created before anything else, and then your data is modified.

  • Options
    michaelhechtmichaelhecht Member Posts: 89 Maven
    Thank You, this might work, but isn't a solution to my original problem.
    Nevertheless, I'm glad (not really) that it is a bug ant not my own incompetence  ;)

    I've got data with different (more than one) id-columns that I want to pass through the workflow.
    If I don't care, RapidMiner selects one column to be the only id. The selected column unfortunately
    isn't unique. Therefore I removed all non-necessary (non-unique) id columns prior to the actual workflow
    but want to add these again at the end of the workflow, before I write all to the csv-file. To be able to
    understand the result of the workflow I also wanted to write the non-normalized columns - which didn't
    work. That's why I need to write the csv at the end of the workflow.

    Meanwhile I found that I can join all data with the ori-output of the normalizer. This
    seems to be a better workaround.

    By the way: I wonder why there is no de-normalizer node to improve the readability of
    the output.
  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,996 RM Engineering

    Just a hint, you can switch the ID role to the real ID column by using the Exchange role operator. That way, you don't need to remove any columns for your process.

Sign In or Register to comment.