"[Solved] Memory used up with loop function"

olioli Member Posts: 6 Contributor II
edited June 2019 in Help
Hi,

I am having an issue with the amount of memory my code uses below. I only have quite a low spec computer and only about 2gb spare memory, so I know I am limited.

In my data set I want to go through about 14,000 examples which uses about 50,000 lines of raw data to find the K-NN.

At the moment I can only do batches of about 250 examples before I run out of memory. I have had a look around the forum and tried a few different things but nothing seems to reduce the memory I use.

I am a little unsure why it uses so much memory as once it loops and gets the K-NN prediction for one example and stores the result so it can write it to the excel file it can forget the rest of the information like the model etc...

Any help would be much appreciated of pointing me in the right direction to read up about this.

Thanks,

Oli
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="">
   <process expanded="true">
     <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve Name250" width="90" x="45" y="165">
       <parameter key="repository_entry" value="data/Name250"/>
     </operator>
     <operator activated="true" class="loop_values" compatibility="5.3.008" expanded="true" height="76" name="Loop Values" width="90" x="179" y="210">
       <parameter key="attribute" value="NAME"/>
       <process expanded="true">
         <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve data aw (2)" width="90" x="45" y="435">
           <parameter key="repository_entry" value="data/data aw"/>
         </operator>
         <operator activated="true" class="filter_examples" compatibility="5.3.008" expanded="true" height="76" name="Filter Examples (2)" width="90" x="45" y="255">
           <parameter key="condition_class" value="attribute_value_filter"/>
           <parameter key="parameter_string" value="NAME=%{loop_value}"/>
         </operator>
         <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve predict" width="90" x="313" y="300">
           <parameter key="repository_entry" value="data/predict"/>
         </operator>
         <operator activated="true" class="filter_examples" compatibility="5.3.008" expanded="true" height="76" name="Filter Examples (3)" width="90" x="313" y="165">
           <parameter key="condition_class" value="attribute_value_filter"/>
           <parameter key="parameter_string" value="NAME=%{loop_value}"/>
         </operator>
         <operator activated="true" class="k_nn" compatibility="5.3.008" expanded="true" height="76" name="k-NN (2)" width="90" x="315" y="30"/>
         <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model (2)" width="90" x="450" y="30">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="performance" compatibility="5.3.008" expanded="true" height="76" name="Performance (2)" width="90" x="585" y="30"/>
         <connect from_op="Retrieve data aw (2)" from_port="output" to_op="Filter Examples (2)" to_port="example set input"/>
         <connect from_op="Filter Examples (2)" from_port="example set output" to_op="k-NN (2)" to_port="training set"/>
         <connect from_op="Retrieve predict" from_port="output" to_op="Filter Examples (3)" to_port="example set input"/>
         <connect from_op="Filter Examples (3)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
         <connect from_op="k-NN (2)" from_port="model" to_op="Apply Model (2)" to_port="model"/>
         <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
         <connect from_op="Performance (2)" from_port="example set" to_port="out 1"/>
         <portSpacing port="source_example set" spacing="0"/>
         <portSpacing port="sink_out 1" spacing="0"/>
         <portSpacing port="sink_out 2" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="append" compatibility="5.3.008" expanded="true" height="76" name="Append" width="90" x="313" y="210"/>
     <operator activated="true" class="write_excel" compatibility="5.3.008" expanded="true" height="76" name="Write Excel" width="90" x="447" y="255">
       <parameter key="excel_file" value="C:\Users\Oliver\Documents\Gambling\Dump\write test.xlsx"/>
       <parameter key="file_format" value="xlsx"/>
     </operator>
     <connect from_op="Retrieve Name250" from_port="output" to_op="Loop Values" to_port="example set"/>
     <connect from_op="Loop Values" from_port="out 1" to_op="Append" to_port="example set 1"/>
     <connect from_op="Append" from_port="merged set" to_op="Write Excel" to_port="input"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
   </process>
 </operator>
</process>

Answers

  • olioli Member Posts: 6 Contributor II
    Hi,

    I think I found a solution for this by using the free memory and Materialize Data operator. Using these functions seems to keep the memory low. I did a few tests to ensure that the data did not change and my sample seemed to be ok.

    I have pasted the code in below, if anyone does see any issues I would be interested to know.

    Thanks,

    Oli
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="">
        <process expanded="true">
          <operator activated="true" class="free_memory" compatibility="5.3.008" expanded="true" height="60" name="Free Memory" width="90" x="179" y="75"/>
          <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve Names2" width="90" x="45" y="345">
            <parameter key="repository_entry" value="data/Names2"/>
          </operator>
          <operator activated="true" class="loop_values" compatibility="5.3.008" expanded="true" height="76" name="Loop Values" width="90" x="179" y="210">
            <parameter key="attribute" value="NAME"/>
            <process expanded="true">
              <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve data aw (2)" width="90" x="45" y="435">
                <parameter key="repository_entry" value="data/data aw"/>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="5.3.008" expanded="true" height="76" name="Filter Examples (2)" width="90" x="45" y="255">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="NAME=%{loop_value}"/>
              </operator>
              <operator activated="true" class="free_memory" compatibility="5.3.008" expanded="true" height="60" name="Free Memory (2)" width="90" x="112" y="75"/>
              <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve predict" width="90" x="313" y="480">
                <parameter key="repository_entry" value="data/predict"/>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="5.3.008" expanded="true" height="76" name="Filter Examples (3)" width="90" x="246" y="300">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="NAME=%{loop_value}"/>
              </operator>
              <operator activated="true" class="k_nn" compatibility="5.3.008" expanded="true" height="76" name="k-NN (2)" width="90" x="315" y="30"/>
              <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model (2)" width="90" x="450" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.3.008" expanded="true" height="76" name="Performance (2)" width="90" x="585" y="30"/>
              <operator activated="true" class="materialize_data" compatibility="5.3.008" expanded="true" height="76" name="Materialize Data" width="90" x="514" y="210"/>
              <connect from_op="Retrieve data aw (2)" from_port="output" to_op="Filter Examples (2)" to_port="example set input"/>
              <connect from_op="Filter Examples (2)" from_port="example set output" to_op="k-NN (2)" to_port="training set"/>
              <connect from_op="Retrieve predict" from_port="output" to_op="Filter Examples (3)" to_port="example set input"/>
              <connect from_op="Filter Examples (3)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
              <connect from_op="k-NN (2)" from_port="model" to_op="Apply Model (2)" to_port="model"/>
              <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="example set" to_op="Materialize Data" to_port="example set input"/>
              <connect from_op="Materialize Data" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="5.3.008" expanded="true" height="76" name="Append" width="90" x="380" y="165"/>
          <operator activated="true" class="write_csv" compatibility="5.3.008" expanded="true" height="76" name="Write CSV" width="90" x="447" y="300">
            <parameter key="csv_file" value="C:\Users\Oliver\Documents\Gambling\Dump\write test.CSV"/>
            <parameter key="column_separator" value=","/>
          </operator>
          <connect from_op="Retrieve Names2" from_port="output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_op="Write CSV" to_port="input"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.