"Out of Memory when doing text classification without GUI"

MerlotMerlot Member Posts: 12 Contributor II
edited June 2019 in Help
Hi all,

I want to do some text classification tasks out of a self-written Java program using RapidMiner. I already learned a SVN Classification model and stored it to the repository. In my Java application, I read ids out of a database which points to my HDD where the text data is stored. This data is passed to RapidMiner. In order to save memory, the classification task isn't done for all data at once. Instead, I use block sizes. This is basically my application:
public class ApplyModel {

static String process_definition_file = "apply_model.xml";
static int num_of_domains = 100000;
static int block_size = 100; // determines the number of examples classified at once
static Boolean debug = true;

public static void main(String[] args) {

System.out.println("START OF APPLY MODEL");

try {

// set RapidMiner confs
RapidMiner.setExecutionMode(ExecutionMode.COMMAND_LINE);

int start = 0;
int iteration = 1;
while(start < num_of_domains) {

// init RapidMiner
RapidMiner.init();

// read process definition
Process rm = new Process(new File(process_definition_file));

// avoid to fetch block size if limit is smaller than block size
int current_limit = block_size;

if(num_of_domains < block_size)

current_limit = num_of_domains;

// get data
ImmutableList<RapidMiner2Row> data = [...]

// transform to ExampleSet
ExampleSet ex = new CData2ExampleSet().getExampleSet(data);

// create IO Object
IOObject ioo = ex;
IOContainer ioc = new IOContainer(new IOObject[] {ioo});

// run RapidMiner process
IOContainer res_ioc = rm.run(ioc);

// analyze results
if(res_ioc.getElementAt(0) instanceof ExampleSet) {

ExampleSet resultSet = (ExampleSet)res_ioc.getElementAt(0);

// go through results
for (Example example : resultSet) {

[...]

}

}

start += current_limit;
iteration++;

// clean up
cdata = null;
data = null;
ex = null;
ioo = null;
ioc = null;
rm = null;

} // end of while

}

catch(Exception e) {

[...]

}

System.out.println("END OF APPLY MODEL");

}

}
Although the RapidMiner process is reinitiated for every data block, i am running into an OutOfMemory Exception (GC overhead limit exceeded). The memory problem depends on the actual amount of data. It only makes a small difference whether I run 100 iterations with 10 data sets or 10 iterations with 100 data sets. Does anyone have an idea?

Regards
Merlot

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Merlot,

    can you process the data from within RapidMiner's GUI? Then you probably assigned more memory to the GUI application than to your own program. You can set the maximum amount of memory which is available for the Java Virtual Machine by specifying the -Xmx parameter, e.g. java -Xmx2048m to assign 2GB of RAM.

    If you are using eclipse, you can set that parameter in the run configuration of your project.

    Best regards,
    Marius
  • MerlotMerlot Member Posts: 12 Contributor II
    Hi Marius,

    thanks for your advice. I didn't try to run the process within RM's GUI yet as my data is split into database values (id + label) and files on my HDD (textual content) and I would like to avoid to implement the "logic" into RM.

    I already set the -Xms and -Xmx option in Eclipse to 4 GB. As far as I can see, this amount of memory is really in use. Is there a way to destroy RapidMiner objects explicitely (maybe within my while loop) to free all used space after processing each data block?

    Regards
    Merlot
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    you can drop a hint to the garbage collector that now might be a good time to do some work via use of

    System.gc();
    However that is not guarantueed to work.

    You could also try to use a dirty hack, though I would not advise using it:

    Object obj = new Object();
    WeakReference ref = new WeakReference<Object>(obj);
    obj = null;
    while(ref.get() != null) {
          System.gc();
    }
    Use at your own risk.

    And please remove the RapidMiner.init() from the loop and place it just after

    RapidMiner.setExecutionMode(ExecutionMode.COMMAND_LINE);
    Regards,
    Marco
  • MerlotMerlot Member Posts: 12 Contributor II
    Hi Marco,

    I already tried to call System.gc(); at the end of the while loop. No effect. :-(

    I would like to avoid to use your dirty hack because this code will be part of my thesis. So it looks like I'm stuck in my memory problem.

    Regards
    Merlot
Sign In or Register to comment.