Warning: Overwriting old id attribute = Hang

hughesfleminghughesfleming Member Posts: 14 Contributor II
edited November 2018 in Help
I have a problem that is driving me crazy. I am using two svms to predict values and then joining followed by a write to csv. Randomly the processes wiill hang after training and in the log I have com.rapidminer.operator.preprocessing.IdTagging apply and then WARNING: Overwriting old attribute...and then nothing. Most of the time, the processes works without problems but not always.  This happens more often if I run the processes in sequence using batch files or as subprocesses in parallel.

I have attached a shortened script showing the joins. I would like to run this process daily unattended but this error is making things difficult. Anyone have some ideas on how to avoid this.

Many thanks,

Alex


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <parameter key="parallelize_main_process" value="true"/>
    <process expanded="true" height="836" width="1732">
      <operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing" width="90" x="849" y="30">
        <parameter key="horizon" value="1"/>
        <parameter key="window_size" value="1"/>
        <parameter key="create_label" value="true"/>
        <parameter key="label_attribute" value="High"/>
      </operator>
      <operator activated="true" class="series:sliding_window_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation" width="90" x="983" y="30">
        <parameter key="training_window_width" value="90"/>
        <parameter key="training_window_step_size" value="1"/>
        <parameter key="test_window_width" value="5"/>
        <parameter key="cumulative_training" value="true"/>
        <parameter key="parallelize_training" value="true"/>
        <parameter key="parallelize_testing" value="true"/>
        <process expanded="true" height="566" width="480">
          <operator activated="true" class="support_vector_machine" compatibility="5.2.008" expanded="true" height="112" name="SVM" width="90" x="277" y="383">
            <parameter key="C" value="-1.0"/>
          </operator>
          <connect from_port="training" to_op="SVM" to_port="training set"/>
          <connect from_op="SVM" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="435" width="346">
          <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="75">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="series:forecasting_performance" compatibility="5.1.002" expanded="true" height="76" name="Performance" width="90" x="179" y="120">
            <parameter key="horizon" value="1"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing (3)" width="90" x="849" y="390">
        <parameter key="horizon" value="1"/>
        <parameter key="window_size" value="1"/>
        <parameter key="create_label" value="true"/>
        <parameter key="label_attribute" value="Low"/>
      </operator>
      <operator activated="true" class="series:sliding_window_validation" compatibility="5.1.002" expanded="true" height="112" name="Validation (2)" width="90" x="983" y="390">
        <parameter key="training_window_width" value="90"/>
        <parameter key="training_window_step_size" value="1"/>
        <parameter key="test_window_width" value="5"/>
        <parameter key="cumulative_training" value="true"/>
        <parameter key="parallelize_training" value="true"/>
        <parameter key="parallelize_testing" value="true"/>
        <process expanded="true" height="757" width="523">
          <operator activated="true" class="support_vector_machine" compatibility="5.2.008" expanded="true" height="112" name="SVM (2)" width="90" x="288" y="30">
            <parameter key="C" value="-1.0"/>
          </operator>
          <connect from_port="training" to_op="SVM (2)" to_port="training set"/>
          <connect from_op="SVM (2)" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="757" width="523">
          <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model (5)" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="series:forecasting_performance" compatibility="5.1.002" expanded="true" height="76" name="Performance (2)" width="90" x="288" y="30">
            <parameter key="horizon" value="1"/>
          </operator>
          <connect from_port="model" to_op="Apply Model (5)" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model (5)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (5)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing (2)" width="90" x="849" y="120">
        <parameter key="window_size" value="1"/>
        <parameter key="label_attribute" value="High"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model (2)" width="90" x="983" y="165">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing (4)" width="90" x="849" y="480">
        <parameter key="window_size" value="1"/>
        <parameter key="label_attribute" value="Low"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model (6)" width="90" x="983" y="525">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="rename" compatibility="5.2.008" expanded="true" height="76" name="Rename" width="90" x="983" y="255">
        <parameter key="old_name" value="prediction(label)"/>
        <parameter key="new_name" value="Predicted_High"/>
        <list key="rename_additional_attributes"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role (5)" width="90" x="1117" y="30">
        <parameter key="name" value="Predicted_High"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="5.2.008" expanded="true" height="76" name="Generate ID (3)" width="90" x="1117" y="120"/>
      <operator activated="true" class="rename" compatibility="5.2.008" expanded="true" height="76" name="Rename (2)" width="90" x="983" y="615">
        <parameter key="old_name" value="prediction(label)"/>
        <parameter key="new_name" value="Predicted_Low"/>
        <list key="rename_additional_attributes"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role (6)" width="90" x="1117" y="390">
        <parameter key="name" value="Predicted_Low"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="5.2.008" expanded="true" height="76" name="Generate ID" width="90" x="1117" y="300"/>
      <operator activated="true" class="join" compatibility="5.1.008" expanded="true" height="76" name="Join" width="90" x="1117" y="210">
        <parameter key="remove_double_attributes" value="false"/>
        <list key="key_attributes">
          <parameter key="id" value="id"/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="1117" y="480">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="|Predicted_High|Predicted_Low"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes (5)" width="90" x="1117" y="570">
        <list key="function_descriptions">
          <parameter key="Predicted Pivot" value="(Predicted_High+Predicted_Low)/2"/>
          <parameter key="Predicted Range" value="(Predicted_High-Predicted_Low)"/>
          <parameter key="Predicted PV" value="(Predicted_High+Predicted_Low)/2"/>
        </list>
      </operator>
      <connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
      <connect from_op="Windowing (3)" from_port="example set output" to_op="Validation (2)" to_port="training"/>
      <connect from_op="Validation (2)" from_port="model" to_op="Apply Model (6)" to_port="model"/>
      <connect from_op="Validation (2)" from_port="averagable 1" to_port="result 2"/>
      <connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Rename" to_port="example set input"/>
      <connect from_op="Windowing (4)" from_port="example set output" to_op="Apply Model (6)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (6)" from_port="labelled data" to_op="Rename (2)" to_port="example set input"/>
      <connect from_op="Rename" from_port="example set output" to_op="Set Role (5)" to_port="example set input"/>
      <connect from_op="Set Role (5)" from_port="example set output" to_op="Generate ID (3)" to_port="example set input"/>
      <connect from_op="Generate ID (3)" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Rename (2)" from_port="example set output" to_op="Set Role (6)" to_port="example set input"/>
      <connect from_op="Set Role (6)" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="right"/>
      <connect from_op="Join" from_port="join" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes (5)" to_port="example set input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Hi Alex,

    can you please if the problem still occurs if you disable all the "parallelize XXX" paramters? In some cases, parallelization does not work well and makes the process hang.

    Best,
      Marius
  • hughesfleminghughesfleming Member Posts: 14 Contributor II
    Hi Marius,

    I have disabled the parallel functions and will see what happens..The odd thing is that it could work fine times in a row and then stop with the warning for no apparent reason. I have several of these to run which I am running at night using batch files. As long as I am asleep, I won't notice the slowdown. :)

    Thanks!

    Alex
  • hughesfleminghughesfleming Member Posts: 14 Contributor II
    Hi Marius,

    It seems to be the parallelize training and testing from the sliding window validation. I switched on parallelize main process and I can run on two threads without causing the error.

    Thanks for bringing this to my attention.

    regards,

    Alex
Sign In or Register to comment.