Options

Memory Overflow

MuehliManMuehliMan Member Posts: 85 Maven
edited November 2018 in Help
Hi,

My process is either freezing my PC or failing due to low memory. Unfortunately my dataset consists of 700 attributes, so the amount of data is really large.Therefore I already log only the top 200 models and I also free memory after every iteration. Any suggestions why it is still not running properly?

Thanks a lot in advance,
Markus
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.0.0" expanded="true" name="Process">
    <process expanded="true" height="449" width="1150">
      <operator activated="true" class="generate_data" compatibility="5.0.0" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="target_function" value="polynomial"/>
        <parameter key="number_examples" value="300"/>
        <parameter key="number_of_attributes" value="700"/>
      </operator>
      <operator activated="true" class="add_noise" compatibility="5.0.0" expanded="true" height="94" name="Add Noise" width="90" x="179" y="30">
        <parameter key="random_attributes" value="15"/>
        <list key="noise"/>
      </operator>
      <operator activated="true" class="discretize_by_user_specification" compatibility="5.0.0" expanded="true" height="94" name="Discretize" width="90" x="313" y="30">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="include_special_attributes" value="true"/>
        <list key="classes">
          <parameter key="first" value="0.5"/>
          <parameter key="last" value="Infinity"/>
        </list>
      </operator>
      <operator activated="true" class="nominal_to_binominal" compatibility="5.0.0" expanded="true" height="94" name="Nominal to Binominal" width="90" x="447" y="30">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="loop_attribute_subsets" compatibility="5.0.0" expanded="true" height="60" name="Loop Subsets" width="90" x="581" y="30">
        <parameter key="use_exact_number" value="true"/>
        <parameter key="exact_number_of_attributes" value="2"/>
        <process expanded="true" height="372" width="817">
          <operator activated="true" class="multiply" compatibility="5.0.0" expanded="true" height="94" name="Multiply" width="90" x="45" y="30"/>
          <operator activated="false" class="x_validation" compatibility="5.0.0" expanded="true" height="112" name="Validation" width="90" x="179" y="165">
            <parameter key="number_of_validations" value="5"/>
            <process expanded="true" height="390" width="303">
              <operator activated="false" class="decision_tree" compatibility="5.0.0" expanded="true" height="76" name="Decision Tree" width="90" x="112" y="30">
                <parameter key="criterion" value="gini_index"/>
                <parameter key="minimal_size_for_split" value="6"/>
                <parameter key="minimal_leaf_size" value="3"/>
                <parameter key="maximal_depth" value="4"/>
              </operator>
              <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
              <connect from_op="Decision Tree" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="390" width="303">
              <operator activated="false" class="apply_model" compatibility="5.0.0" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="false" class="performance_binominal_classification" compatibility="5.0.0" expanded="true" height="76" name="Performance" width="90" x="174" y="30">
                <parameter key="youden" value="true"/>
                <parameter key="psep" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="decision_tree" compatibility="5.0.0" expanded="true" height="76" name="Decision Tree (2)" width="90" x="179" y="30">
            <parameter key="criterion" value="gini_index"/>
            <parameter key="minimal_size_for_split" value="6"/>
            <parameter key="minimal_leaf_size" value="3"/>
            <parameter key="maximal_depth" value="5"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.0.0" expanded="true" height="76" name="Apply Model (2)" width="90" x="313" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_binominal_classification" compatibility="5.0.0" expanded="true" height="76" name="Performance (2)" width="90" x="447" y="30">
            <parameter key="youden" value="true"/>
            <parameter key="psep" value="true"/>
          </operator>
          <operator activated="true" class="log" compatibility="5.0.0" expanded="true" height="94" name="Log" width="90" x="581" y="30">
            <list key="log">
              <parameter key="feature_names" value="operator.Loop Subsets.value.feature_names"/>
              <parameter key="youden" value="operator.Performance (2).value.youden"/>
              <parameter key="accuracy" value="operator.Performance (2).value.accuracy"/>
            </list>
            <parameter key="sorting_type" value="top-k"/>
            <parameter key="sorting_dimension" value="youden"/>
            <parameter key="sorting_k" value="200"/>
          </operator>
          <operator activated="true" class="free_memory" compatibility="5.0.0" expanded="true" height="76" name="Free Memory" width="90" x="717" y="30"/>
          <connect from_port="example set" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Decision Tree (2)" to_port="training set"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Log" to_port="through 2"/>
          <connect from_op="Decision Tree (2)" from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Decision Tree (2)" from_port="exampleSet" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_op="Free Memory" to_port="through 1"/>
          <portSpacing port="source_example set" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="log_to_data" compatibility="5.0.0" expanded="true" height="94" name="Log to Data" width="90" x="715" y="30"/>
      <operator activated="true" class="store" compatibility="5.0.0" expanded="true" height="60" name="Store" width="90" x="849" y="30">
        <parameter key="repository_entry" value="../Logs/Training DT 2 atts"/>
      </operator>
      <operator activated="true" class="log_to_weights" compatibility="5.0.10" expanded="true" height="60" name="Log to Weights" width="90" x="849" y="120">
        <parameter key="attribute_names_column" value="feature_names"/>
      </operator>
      <operator activated="true" class="store" compatibility="5.0.10" expanded="true" height="60" name="Store (2)" width="90" x="983" y="120">
        <parameter key="repository_entry" value="../weights/Training DT 2 atts"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Add Noise" to_port="example set input"/>
      <connect from_op="Add Noise" from_port="example set output" to_op="Discretize" to_port="example set input"/>
      <connect from_op="Discretize" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
      <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Loop Subsets" to_port="example set"/>
      <connect from_op="Loop Subsets" from_port="example set" to_op="Log to Data" to_port="through 1"/>
      <connect from_op="Log to Data" from_port="exampleSet" to_op="Store" to_port="input"/>
      <connect from_op="Log to Weights" from_port="weights" to_op="Store (2)" to_port="input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Markus,
    I have a suspicion about this problem. Would you please try to execute this process? I replaced the free memory operator by a script. In fact the free memory operator doesn't do too much. Only in situations where you are really on the limit, it might help prevent the Virtual Machine to decide that it only is working on freeing memory.

    Please tell me if it worked now, it would prove my hypothesis.

    Greetings,
      Sebastian
  • Options
    MuehliManMuehliMan Member Posts: 85 Maven
    This process?
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Yes, exactly this process. I guess, you didn't have a memory overflow, did you?

    ...


    ****. Should have posted it here. Now it is lost. Sorry about this.

    Here's another idea I just recently came up with that could solve the problem. Might be surprising but the solution might be to copy the data first :)
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.0" expanded="true" name="Process">
        <process expanded="true" height="449" width="1150">
          <operator activated="true" class="generate_data" compatibility="5.0.0" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="polynomial"/>
            <parameter key="number_examples" value="300"/>
            <parameter key="number_of_attributes" value="700"/>
          </operator>
          <operator activated="true" class="add_noise" compatibility="5.0.0" expanded="true" height="94" name="Add Noise" width="90" x="179" y="30">
            <parameter key="random_attributes" value="15"/>
            <list key="noise"/>
          </operator>
          <operator activated="true" class="discretize_by_user_specification" compatibility="5.0.0" expanded="true" height="94" name="Discretize" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="label"/>
            <parameter key="include_special_attributes" value="true"/>
            <list key="classes">
              <parameter key="first" value="0.5"/>
              <parameter key="last" value="Infinity"/>
            </list>
          </operator>
          <operator activated="true" class="nominal_to_binominal" compatibility="5.0.0" expanded="true" height="94" name="Nominal to Binominal" width="90" x="447" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="label"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="loop_attribute_subsets" compatibility="5.0.0" expanded="true" height="60" name="Loop Subsets" width="90" x="581" y="30">
            <parameter key="use_exact_number" value="true"/>
            <parameter key="exact_number_of_attributes" value="2"/>
            <process expanded="true" height="372" width="817">
              <operator activated="false" class="x_validation" compatibility="5.0.0" expanded="true" height="112" name="Validation" width="90" x="179" y="165">
                <parameter key="number_of_validations" value="5"/>
                <process expanded="true" height="390" width="303">
                  <operator activated="false" class="decision_tree" compatibility="5.0.0" expanded="true" height="76" name="Decision Tree" width="90" x="112" y="30">
                    <parameter key="criterion" value="gini_index"/>
                    <parameter key="minimal_size_for_split" value="6"/>
                    <parameter key="minimal_leaf_size" value="3"/>
                    <parameter key="maximal_depth" value="4"/>
                  </operator>
                  <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
                  <connect from_op="Decision Tree" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true" height="390" width="303">
                  <operator activated="false" class="apply_model" compatibility="5.0.0" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="false" class="performance_binominal_classification" compatibility="5.0.0" expanded="true" height="76" name="Performance" width="90" x="174" y="30">
                    <parameter key="youden" value="true"/>
                    <parameter key="psep" value="true"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="materialize_data" compatibility="5.0.8" expanded="true" height="76" name="Materialize Data" width="90" x="45" y="165"/>
              <operator activated="true" class="multiply" compatibility="5.0.0" expanded="true" height="94" name="Multiply" width="90" x="45" y="30"/>
              <operator activated="true" class="decision_tree" compatibility="5.0.0" expanded="true" height="76" name="Decision Tree (2)" width="90" x="179" y="30">
                <parameter key="criterion" value="gini_index"/>
                <parameter key="minimal_size_for_split" value="6"/>
                <parameter key="minimal_leaf_size" value="3"/>
                <parameter key="maximal_depth" value="5"/>
              </operator>
              <operator activated="true" class="apply_model" compatibility="5.0.0" expanded="true" height="76" name="Apply Model (2)" width="90" x="313" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_binominal_classification" compatibility="5.0.0" expanded="true" height="76" name="Performance (2)" width="90" x="447" y="30">
                <parameter key="youden" value="true"/>
                <parameter key="psep" value="true"/>
              </operator>
              <operator activated="true" class="log" compatibility="5.0.0" expanded="true" height="94" name="Log" width="90" x="581" y="30">
                <list key="log">
                  <parameter key="feature_names" value="operator.Loop Subsets.value.feature_names"/>
                  <parameter key="youden" value="operator.Performance (2).value.youden"/>
                  <parameter key="accuracy" value="operator.Performance (2).value.accuracy"/>
                </list>
                <parameter key="sorting_type" value="top-k"/>
                <parameter key="sorting_dimension" value="youden"/>
                <parameter key="sorting_k" value="200"/>
              </operator>
              <connect from_port="example set" to_op="Materialize Data" to_port="example set input"/>
              <connect from_op="Materialize Data" from_port="example set output" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_op="Decision Tree (2)" to_port="training set"/>
              <connect from_op="Multiply" from_port="output 2" to_op="Log" to_port="through 2"/>
              <connect from_op="Decision Tree (2)" from_port="model" to_op="Apply Model (2)" to_port="model"/>
              <connect from_op="Decision Tree (2)" from_port="exampleSet" to_op="Apply Model (2)" to_port="unlabelled data"/>
              <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_op="Log" to_port="through 1"/>
              <portSpacing port="source_example set" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="log_to_data" compatibility="5.0.0" expanded="true" height="94" name="Log to Data" width="90" x="715" y="30"/>
          <operator activated="true" class="store" compatibility="5.0.0" expanded="true" height="60" name="Store" width="90" x="849" y="30">
            <parameter key="repository_entry" value="../Logs/Training DT 2 atts"/>
          </operator>
          <operator activated="true" class="log_to_weights" compatibility="5.0.10" expanded="true" height="60" name="Log to Weights" width="90" x="849" y="120">
            <parameter key="attribute_names_column" value="feature_names"/>
          </operator>
          <operator activated="true" class="store" compatibility="5.0.10" expanded="true" height="60" name="Store (2)" width="90" x="983" y="120">
            <parameter key="repository_entry" value="../weights/Training DT 2 atts"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Add Noise" to_port="example set input"/>
          <connect from_op="Add Noise" from_port="example set output" to_op="Discretize" to_port="example set input"/>
          <connect from_op="Discretize" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Loop Subsets" to_port="example set"/>
          <connect from_op="Loop Subsets" from_port="example set" to_op="Log to Data" to_port="through 1"/>
          <connect from_op="Log to Data" from_port="exampleSet" to_op="Store" to_port="input"/>
          <connect from_op="Log to Weights" from_port="weights" to_op="Store (2)" to_port="input"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
    Greetings,
      Sebastian
  • Options
    MuehliManMuehliMan Member Posts: 85 Maven
    Hey Sebastian,

    well no problem. I could try our workflow and it seems as if it works!

    Could you please give me an insight into why this new workflow is running? Does Materalize Data perform the things that I assumed for Free Memory? Is this the most efficient and fastest way (in calculation time) to run this workflow?

    Best regards,
    Markus
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    the problems lies in the depth of the current data core of RapidMiner. Unfortunately it isn't fixable with ease, otherwise we would do. Creating a complete copy which is discarded after being used avoids the memory leak causing problem.
    Well the best idea probably would be to pay us for finally implement the new data core we have in mind since many months. At least this is true from my perspective and all the other (not paying) users benefiting from this :)
    Ok, just joking. In fact there might be another way around: You could write a script operator, removing all confidence and the prediction columns from the example set and table(!), before the loop ends. This was my first solution, but since it forgot to save it, I can't post it here again.

    Greetings,
      Sebastian
  • Options
    MuehliManMuehliMan Member Posts: 85 Maven
    Thank you for the detailed explanation.
    This script sounds very interesting. If you rebuild it for some reason, please save and post it. Many users would be grateful for that.

    Thanks,
    Markus
Sign In or Register to comment.