[SOLVED] Which is better for memory management when dealing with Collections

JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 564   Unicorn
edited November 2018 in Help
Hello,

Does anyone know which is better for memory management, I have a loop which generates a collection inside another loop.  (So the final output is several collections).  

I need to append these collections together into a final example set and I wonder if anyone has any opinions on which is better for memory management, adding an append operator once (just before the final output) or twice (just outside the nested loop and the just before the final output).
Below is a small example showing what I mean.

In my final model I've gone with the first option as in my mind it seems better to join lots of small bits together at the end than small bits, then medium sized bits. However, I'm happy to be corrected.  
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="generate_data" compatibility="5.3.015" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
     <operator activated="true" class="multiply" compatibility="5.3.015" expanded="true" height="94" name="Multiply" width="90" x="45" y="120"/>
     <operator activated="true" class="subprocess" compatibility="5.3.015" expanded="true" height="76" name="Option 1" width="90" x="179" y="30">
       <process expanded="true">
         <operator activated="true" class="generate_attributes" compatibility="5.3.015" expanded="true" height="76" name="Generate Attributes (3)" width="90" x="45" y="120">
           <list key="function_descriptions">
             <parameter key="Option" value="&quot;Option 1&quot;"/>
           </list>
         </operator>
         <operator activated="true" class="loop" compatibility="5.3.015" expanded="true" height="76" name="Loop" width="90" x="112" y="30">
           <parameter key="set_iteration_macro" value="true"/>
           <parameter key="iterations" value="20"/>
           <process expanded="true">
             <operator activated="true" class="generate_attributes" compatibility="5.3.015" expanded="true" height="76" name="Generate Attributes" width="90" x="112" y="30">
               <list key="function_descriptions">
                 <parameter key="1stLoopID" value="%{iteration}"/>
               </list>
             </operator>
             <operator activated="true" class="loop" compatibility="5.3.015" expanded="true" height="76" name="NestedLoop" width="90" x="246" y="30">
               <parameter key="set_iteration_macro" value="true"/>
               <parameter key="macro_name" value="nestediteration"/>
               <parameter key="iterations" value="10"/>
               <process expanded="true">
                 <operator activated="true" class="generate_attributes" compatibility="5.3.015" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="112" y="75">
                   <list key="function_descriptions">
                     <parameter key="NestedLoopID" value="%{nestediteration}"/>
                   </list>
                 </operator>
                 <connect from_port="input 1" to_op="Generate Attributes (2)" to_port="example set input"/>
                 <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="output 1"/>
                 <portSpacing port="source_input 1" spacing="0"/>
                 <portSpacing port="source_input 2" spacing="0"/>
                 <portSpacing port="sink_output 1" spacing="0"/>
                 <portSpacing port="sink_output 2" spacing="0"/>
               </process>
             </operator>
             <connect from_port="input 1" to_op="Generate Attributes" to_port="example set input"/>
             <connect from_op="Generate Attributes" from_port="example set output" to_op="NestedLoop" to_port="input 1"/>
             <connect from_op="NestedLoop" from_port="output 1" to_port="output 1"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_output 1" spacing="0"/>
             <portSpacing port="sink_output 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="append" compatibility="5.3.015" expanded="true" height="76" name="Option 1 Append" width="90" x="246" y="30"/>
         <connect from_port="in 1" to_op="Generate Attributes (3)" to_port="example set input"/>
         <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Loop" to_port="input 1"/>
         <connect from_op="Loop" from_port="output 1" to_op="Option 1 Append" to_port="example set 1"/>
         <connect from_op="Option 1 Append" from_port="merged set" to_port="out 1"/>
         <portSpacing port="source_in 1" spacing="0"/>
         <portSpacing port="source_in 2" spacing="0"/>
         <portSpacing port="sink_out 1" spacing="0"/>
         <portSpacing port="sink_out 2" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="subprocess" compatibility="5.3.015" expanded="true" height="76" name="Option 2" width="90" x="179" y="120">
       <process expanded="true">
         <operator activated="true" class="generate_attributes" compatibility="5.3.015" expanded="true" height="76" name="Generate Attributes (4)" width="90" x="45" y="30">
           <list key="function_descriptions">
             <parameter key="Option" value="&quot;Option 1&quot;"/>
           </list>
         </operator>
         <operator activated="true" class="loop" compatibility="5.3.015" expanded="true" height="76" name="Loop (2)" width="90" x="180" y="30">
           <parameter key="set_iteration_macro" value="true"/>
           <parameter key="iterations" value="20"/>
           <process expanded="true">
             <operator activated="true" class="generate_attributes" compatibility="5.3.015" expanded="true" height="76" name="Generate Attributes (5)" width="90" x="45" y="30">
               <list key="function_descriptions">
                 <parameter key="1stLoopID" value="%{iteration}"/>
               </list>
             </operator>
             <operator activated="true" class="loop" compatibility="5.3.015" expanded="true" height="76" name="NestedLoop (2)" width="90" x="248" y="30">
               <parameter key="set_iteration_macro" value="true"/>
               <parameter key="macro_name" value="nestediteration"/>
               <parameter key="iterations" value="10"/>
               <process expanded="true">
                 <operator activated="true" class="generate_attributes" compatibility="5.3.015" expanded="true" name="Generate Attributes (6)">
                   <list key="function_descriptions">
                     <parameter key="NestedLoopID" value="%{nestediteration}"/>
                   </list>
                 </operator>
                 <connect from_port="input 1" to_op="Generate Attributes (6)" to_port="example set input"/>
                 <connect from_op="Generate Attributes (6)" from_port="example set output" to_port="output 1"/>
                 <portSpacing port="source_input 1" spacing="0"/>
                 <portSpacing port="source_input 2" spacing="0"/>
                 <portSpacing port="sink_output 1" spacing="0"/>
                 <portSpacing port="sink_output 2" spacing="0"/>
               </process>
             </operator>
             <operator activated="true" class="append" compatibility="5.3.015" expanded="true" height="76" name="Append (3)" width="90" x="246" y="120"/>
             <connect from_port="input 1" to_op="Generate Attributes (5)" to_port="example set input"/>
             <connect from_op="Generate Attributes (5)" from_port="example set output" to_op="NestedLoop (2)" to_port="input 1"/>
             <connect from_op="NestedLoop (2)" from_port="output 1" to_op="Append (3)" to_port="example set 1"/>
             <connect from_op="Append (3)" from_port="merged set" to_port="output 1"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_output 1" spacing="0"/>
             <portSpacing port="sink_output 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="append" compatibility="5.3.015" expanded="true" height="76" name="Option 2 Append" width="90" x="306" y="30"/>
         <connect from_port="in 1" to_op="Generate Attributes (4)" to_port="example set input"/>
         <connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Loop (2)" to_port="input 1"/>
         <connect from_op="Loop (2)" from_port="output 1" to_op="Option 2 Append" to_port="example set 1"/>
         <connect from_op="Option 2 Append" from_port="merged set" to_port="out 1"/>
         <portSpacing port="source_in 1" spacing="0"/>
         <portSpacing port="source_in 2" spacing="0"/>
         <portSpacing port="sink_out 1" spacing="0"/>
         <portSpacing port="sink_out 2" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Generate Data" from_port="output" to_op="Multiply" to_port="input"/>
     <connect from_op="Multiply" from_port="output 1" to_op="Option 1" to_port="in 1"/>
     <connect from_op="Multiply" from_port="output 2" to_op="Option 2" to_port="in 1"/>
     <connect from_op="Option 1" from_port="out 1" to_port="result 1"/>
     <connect from_op="Option 2" from_port="out 1" to_port="result 2"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>

Answers

  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Administrator, Moderator, Employee, Member, University Professor Posts: 1,850   RM Engineering
    Hi,

    when using option 1 you will have all small example sets to be merged in memory at the same time. Option 2 causes you to have the smaller sets and the medium sets to be in memory at the same time, so I'd say the memory usage of both variants is almost identical. Also the Java garbage collector should not be too bothered either way. However by using option 1 you will run the Append operator once while the second option runs it 21 times. This should not produce considerable memory overhead either way, however the second option will take longer because of all the checks inside the operator making sure the example sets are compatible.
    To sum it up: Option 1 is the better one in my opinion because it's faster and it reduces process complexity.

    Regards,
    Marco
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 564   Unicorn
    Cheers!
Sign In or Register to comment.