"De-Pivot Memory issue"

jlabadojlabado Member Posts: 3 Contributor I
edited June 11 in Help

Hello Everyone,

 

I am running a process that needs to use 2 De-pivot operators. The dataset is large (aprox 900k examples). The first De-pivot runs fine but the second one crashes out of memory, even though I have a machine with 16GB of RAM.

 

I know there should be some workaround by using the Loop Batches (inserting inside the second de-pivot) and Append operators, but I just can't get it to work. Could you please explain me how to exactly set these operators to work?


Thanks a lot for your help,

Best,

Best Answer

  • Telcontar120Telcontar120 Posts: 1,226   Unicorn
    Solution Accepted

    Without seeing your process or your data it is difficult to give you an exact solution.  But take a look at this example, which is a simple modification of the de-pivot tutorial process.  This loops through each example (one-by-one) and depivots it, then at the end it takes all the individual de-pivoted examples and appends them into one combined dataset.  Something like this should work for you (assuming your resulting dataset is in fact able to be contained in memory).

     

    Regards,

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,226   Unicorn

    I would suggest that you do some testing first to determine the maximum number of records that you can get to run successfully through your two-step de-pivoting process.  You should be able to do that manually by adding a "filter example range" operator and just trying increasingly larger values until the process fails.  

     

    One you know how many examples you can successfully process, then you should be able to create a loop to go through the examples in suitable chunks, complete the de-pivoting, and then store the results to a repository entry.  All those repositories should then be able to be appended together as the last step.  

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • jlabadojlabado Member Posts: 3 Contributor I

    Hi Brian,

     

    Thanks a lot for your input. Would you please give me more detail on which operator to select (just "loop" or another one?) and how to connect it with the corresponding appends? Would you mind sharing an XML?


    Thanks once again,

  • jlabadojlabado Member Posts: 3 Contributor I

    Thanks a lot Brian, that solved my problem!!!

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,226   Unicorn

    @jlabado I am glad that solved your problem!  You may want to "accept as solution" that post so if other community members are searching for this topic or something similar then it will show up in the results as solved.  

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.