Options

Free memory operator does not work

seshadotcomseshadotcom Member Posts: 33 Contributor II
edited November 2018 in Help
Folks,

I finally made a logging for every db read I make and the free memory operator passes through so quickly and not even two seconds is spent in the operator when I see the timing in my logfile but the memory was rising and reaching a peak. I think it does not work correctly. I am clogged with memory issues running my experiments, need your advice for workaround :(. I cannot do a FP Growth for one set and then other set because ultimately -I need the combined dataset for association rule generation which is again a problem for a huge data set, I have tried it logging from CSV but it does not work :(

I love rapidminer tool as an wonderful idea but the memory issues :( God

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Can you please post your process setup?

    Best regards,
    Marius
  • Options
    seshadotcomseshadotcom Member Posts: 33 Contributor II
    Hi marius,

    Here is process.. I will give you a basic structure of what my plan was in this.. I make a join of two tables at a time and then I use the result for the another join and so on.. I realized that the memory was hitting a peak when the rapidminer tries to make a read operation from one the table so I use Free Memory after every block of Join I make. But the problem I see is it is not freeing up all the used memory and instead the system is frozen and bogged down eventhough you use the Free Memory.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
       <parameter key="logverbosity" value="init"/>
       <parameter key="random_seed" value="2001"/>
       <parameter key="send_mail" value="never"/>
       <parameter key="notification_email" value=""/>
       <parameter key="process_duration_for_mail" value="30"/>
       <parameter key="encoding" value="SYSTEM"/>
       <parameter key="parallelize_main_process" value="false"/>
       <process expanded="true">
         <operator activated="true" class="read_database" compatibility="5.3.008" expanded="true" height="60" name="Transaction_Join1" width="90" x="45" y="30">
           <parameter key="define_connection" value="predefined"/>
           <parameter key="connection" value="Test"/>
           <parameter key="database_system" value="MySQL"/>
           <parameter key="define_query" value="query"/>
           <parameter key="query" value="SELECT * FROM `transaction_mapping` where transaction_mapping.diffbwddrd&gt;2 AND transaction_mapping.delivery_counter is not NULL limit 3000;"/>
           <parameter key="use_default_schema" value="true"/>
           <parameter key="prepare_statement" value="false"/>
           <enumeration key="parameters"/>
           <parameter key="datamanagement" value="double_array"/>
         </operator>
         <operator activated="true" class="read_database" compatibility="5.3.008" expanded="true" height="60" name="order_header" width="90" x="45" y="300">
           <parameter key="define_connection" value="predefined"/>
           <parameter key="connection" value="Test"/>
           <parameter key="database_system" value="MySQL"/>
           <parameter key="define_query" value="query"/>
           <parameter key="query" value="select * from order_header_mapping limit 3000;"/>
           <parameter key="use_default_schema" value="true"/>
           <parameter key="prepare_statement" value="false"/>
           <enumeration key="parameters"/>
           <parameter key="datamanagement" value="double_array"/>
         </operator>
         <operator activated="true" class="join" compatibility="5.3.008" expanded="true" height="76" name="Join_T_OH" width="90" x="112" y="165">
           <parameter key="remove_double_attributes" value="true"/>
           <parameter key="join_type" value="left"/>
           <parameter key="use_id_attribute_as_key" value="false"/>
           <list key="key_attributes">
             <parameter key="id_order_header" value="id_order_header"/>
           </list>
           <parameter key="keep_both_join_attributes" value="false"/>
         </operator>
         <operator activated="true" class="free_memory" compatibility="5.3.008" expanded="true" height="76" name="Free Memory" width="90" x="246" y="75"/>
         <operator activated="true" class="read_database" compatibility="5.3.008" expanded="true" height="60" name="order_line" width="90" x="112" y="435">
           <parameter key="define_connection" value="predefined"/>
           <parameter key="connection" value="Test"/>
           <parameter key="database_system" value="MySQL"/>
           <parameter key="define_query" value="query"/>
           <parameter key="query" value="SELECT *&#10;FROM order_line_mapping limit 3000;"/>
           <parameter key="use_default_schema" value="true"/>
           <parameter key="prepare_statement" value="false"/>
           <enumeration key="parameters"/>
           <parameter key="datamanagement" value="double_array"/>
         </operator>
         <operator activated="true" class="join" compatibility="5.3.008" expanded="true" height="76" name="Join" width="90" x="313" y="255">
           <parameter key="remove_double_attributes" value="true"/>
           <parameter key="join_type" value="right"/>
           <parameter key="use_id_attribute_as_key" value="false"/>
           <list key="key_attributes">
             <parameter key="id_order_line" value="id_order_line"/>
           </list>
           <parameter key="keep_both_join_attributes" value="false"/>
         </operator>
         <operator activated="true" class="free_memory" compatibility="5.3.008" expanded="true" height="76" name="Free Memory (2)" width="90" x="514" y="75"/>
         <operator activated="true" class="read_database" compatibility="5.3.008" expanded="true" height="60" name="Read State" width="90" x="112" y="570">
           <parameter key="define_connection" value="predefined"/>
           <parameter key="connection" value="Test"/>
           <parameter key="database_system" value="MySQL"/>
           <parameter key="define_query" value="query"/>
           <parameter key="query" value="select * from state_mapping limit 10000;"/>
           <parameter key="use_default_schema" value="true"/>
           <parameter key="prepare_statement" value="false"/>
           <enumeration key="parameters"/>
           <parameter key="datamanagement" value="double_array"/>
         </operator>
         <operator activated="true" class="join" compatibility="5.3.008" expanded="true" height="76" name="Join (2)" width="90" x="313" y="525">
           <parameter key="remove_double_attributes" value="true"/>
           <parameter key="join_type" value="right"/>
           <parameter key="use_id_attribute_as_key" value="false"/>
           <list key="key_attributes">
             <parameter key="id_state" value="id_state"/>
           </list>
           <parameter key="keep_both_join_attributes" value="false"/>
         </operator>
         <operator activated="true" breakpoints="after" class="write_csv" compatibility="5.3.008" expanded="true" height="76" name="Write CSV" width="90" x="514" y="525">
           <parameter key="csv_file" value="M:\Work\1.csv"/>
           <parameter key="column_separator" value=";"/>
           <parameter key="write_attribute_names" value="true"/>
           <parameter key="quote_nominal_values" value="true"/>
           <parameter key="format_date_attributes" value="true"/>
           <parameter key="append_to_file" value="false"/>
           <parameter key="encoding" value="SYSTEM"/>
         </operator>
         <operator activated="true" class="set_role" compatibility="5.3.008" expanded="true" height="76" name="Set Role ID" width="90" x="581" y="390">
           <parameter key="attribute_name" value="line_type"/>
           <parameter key="target_role" value="id"/>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="set_role" compatibility="5.3.008" expanded="true" height="76" name="Set Role Label" width="90" x="648" y="210">
           <parameter key="attribute_name" value="id_transaction"/>
           <parameter key="target_role" value="label"/>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="715" y="480">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attribute" value="delivery_status"/>
           <parameter key="attributes" value="|counter_line|current_price4unit|current_quantity|delivery_qty|diffbwddrd|diffbwsdrd|diffbwsugqtyreqqty|diffbwtdrd|id_order_line|id_order_header|id_network|id_modifier|id_manem_doctype|price_unit|payment|priority|received_qty|id_assegnee|id_supplier|id_transaction|issued_price4unit|issued_quantity|new_suggested_qty|new_suggested_price|new_requested_qty|new_requested_price|order_number|total_delivered|transport_doc_code|special_mark|id_order_type|id_icon|i_name"/>
           <parameter key="use_except_expression" value="false"/>
           <parameter key="value_type" value="attribute_value"/>
           <parameter key="use_value_type_exception" value="false"/>
           <parameter key="except_value_type" value="time"/>
           <parameter key="block_type" value="attribute_block"/>
           <parameter key="use_block_type_exception" value="false"/>
           <parameter key="except_block_type" value="value_matrix_row_start"/>
           <parameter key="invert_selection" value="false"/>
           <parameter key="include_special_attributes" value="false"/>
         </operator>
         <operator activated="true" class="numerical_to_binominal" compatibility="5.3.008" expanded="true" height="76" name="Numerical to Binominal" width="90" x="916" y="435">
           <parameter key="attribute_filter_type" value="all"/>
           <parameter key="attribute" value=""/>
           <parameter key="attributes" value=""/>
           <parameter key="use_except_expression" value="false"/>
           <parameter key="value_type" value="numeric"/>
           <parameter key="use_value_type_exception" value="false"/>
           <parameter key="except_value_type" value="real"/>
           <parameter key="block_type" value="value_series"/>
           <parameter key="use_block_type_exception" value="false"/>
           <parameter key="except_block_type" value="value_series_end"/>
           <parameter key="invert_selection" value="false"/>
           <parameter key="include_special_attributes" value="false"/>
           <parameter key="min" value="0.0"/>
           <parameter key="max" value="0.0"/>
         </operator>
         <operator activated="true" class="fp_growth" compatibility="5.3.008" expanded="true" height="76" name="FP-Growth" width="90" x="916" y="300">
           <parameter key="find_min_number_of_itemsets" value="true"/>
           <parameter key="min_number_of_itemsets" value="1000"/>
           <parameter key="max_number_of_retries" value="15"/>
           <parameter key="min_support" value="0.54"/>
           <parameter key="max_items" value="-1"/>
           <parameter key="keep_example_set" value="false"/>
         </operator>
         <operator activated="true" class="free_memory" compatibility="5.3.008" expanded="true" height="76" name="Free Memory (3)" width="90" x="916" y="165"/>
         <operator activated="true" class="create_association_rules" compatibility="5.3.008" expanded="true" height="76" name="Create Association Rules" width="90" x="983" y="30">
           <parameter key="criterion" value="laplace"/>
           <parameter key="min_confidence" value="0.3"/>
           <parameter key="min_criterion_value" value="0.2"/>
           <parameter key="gain_theta" value="0.4"/>
           <parameter key="laplace_k" value="1.0"/>
         </operator>
         <connect from_op="Transaction_Join1" from_port="output" to_op="Join_T_OH" to_port="left"/>
         <connect from_op="order_header" from_port="output" to_op="Join_T_OH" to_port="right"/>
         <connect from_op="Join_T_OH" from_port="join" to_op="Free Memory" to_port="through 1"/>
         <connect from_op="Free Memory" from_port="through 1" to_op="Join" to_port="left"/>
         <connect from_op="order_line" from_port="output" to_op="Join" to_port="right"/>
         <connect from_op="Join" from_port="join" to_op="Free Memory (2)" to_port="through 1"/>
         <connect from_op="Free Memory (2)" from_port="through 1" to_op="Join (2)" to_port="left"/>
         <connect from_op="Read State" from_port="output" to_op="Join (2)" to_port="right"/>
         <connect from_op="Join (2)" from_port="join" to_op="Write CSV" to_port="input"/>
         <connect from_op="Write CSV" from_port="through" to_op="Set Role ID" to_port="example set input"/>
         <connect from_op="Set Role ID" from_port="example set output" to_op="Set Role Label" to_port="example set input"/>
         <connect from_op="Set Role Label" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
         <connect from_op="Select Attributes" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
         <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
         <connect from_op="FP-Growth" from_port="frequent sets" to_op="Free Memory (3)" to_port="through 1"/>
         <connect from_op="Free Memory (3)" from_port="through 1" to_op="Create Association Rules" to_port="item sets"/>
         <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi there,

    It could be that it is the association rules operator that is causing the problem, so I'd suggest putting a break after the FP-Growth operator. If it runs to there then your memory problems are probably the same as those discussed in this post http://rapid-i.com/rapidforum/index.php/topic,6837.0.html .

    Best

    H
  • Options
    seshadotcomseshadotcom Member Posts: 33 Contributor II
    Hello Haddock,

    Thanks for your reply. I already tried this and I get FP growth frequency item sets with true, for the attributes . it is the association rule operator which gives the memory error.

    But what is the solution I could try? So it does not work with growing attributes. My requirement in future might be for 43-45 attributes.
  • Options
    seshadotcomseshadotcom Member Posts: 33 Contributor II
    By the way Haddock/Marius if you see in my process I have used Free Memory after every block which I thought consumes the memory so even if it going to come to point where I evaluate the association rules(With CreateAssociationRules) there is a Free Memory before that so I am just trying to understand whether this block clears any memory used because the memory consumption does not reduce at all. And infact if you notice my queries I have restricted the process much more than what it can do by using the limit in my SQL queries ,. There may be a data set of close to 1mi rows if I do not use limit in which case I think it will definitely fail because it does not work for a lesser set of values.
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi there,

    It's not the number of examples or attributes in the example set that matters, it's the number of attributes in the itemsets that are found by FPGrowth. So an itemset att1=1 & att200=1 & att996=1 has 3 attributes, and one that had 43 attributes would choke the association rules operator. I regularly mine datasets with 1000+ attributes and 1M+ examples, and have to resort to alternative techniques ( CUDA ) for long itemsets.

    On the dark arts of the Java stack, heap, and trail I defer to others further up the pond life scale!
  • Options
    seshadotcomseshadotcom Member Posts: 33 Contributor II
    Hello Haddock,

    Thank you very much for the reply. I will wait for the reply from others.
  • Options
    seshadotcomseshadotcom Member Posts: 33 Contributor II
    Hello folks,

    I was trying to figure out this searching some previous posts if someone has experienced a similar problem with FP growth or Association rule operator and guess what I caught one other post with similar memory problem.

    excerpt from the post- I have also asked the user who replied he has a workaround to know what he did.
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    But in the current release there is also a bug that prevents RapidMiner from freeing some of the memory, even if in theory it would be releasable. That bug has already been fixed and will be included in the next release.
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    may be I am not the only person who has this problem?
  • Options
    seshadotcomseshadotcom Member Posts: 33 Contributor II
    Hello folks does anyone have any other ideas for resolving this issue?

    I also increased the JAVA heap space today in the new system and tried but no success :(
  • Options
    MandarMandar Member Posts: 8 Contributor II
    Sesha,

    I read your post and you have mentioned that you are doing join of two tables and then using the combined data set in CreateAssociationRules operator using ReadDatabase.
    I would suggest doing the join and creating a view in the database instead of RapidMiner. The ReadDatabase operator loads the data set in the main memory so it will be memory consuming. Additionally you are doing join and then generating the data set so again memory is accumulated. Try to use only one ReadDatabase operator from the table which contains your final data and then apply CreateAssociationRules. I believe the memory the consumption will be less since you are loading the huge data set only once.
    You can also explore the StreamDatabase operator and see if it helps.

    Regards,
    Mandar
  • Options
    seshadotcomseshadotcom Member Posts: 33 Contributor II
    Hi Mandar,

    I will try this today and let you know the outcome.

    Regards
    Sesha
Sign In or Register to comment.