Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Out of Memory when doing text classification without GUI"
Hi all,
I want to do some text classification tasks out of a self-written Java program using RapidMiner. I already learned a SVN Classification model and stored it to the repository. In my Java application, I read ids out of a database which points to my HDD where the text data is stored. This data is passed to RapidMiner. In order to save memory, the classification task isn't done for all data at once. Instead, I use block sizes. This is basically my application:
Regards
Merlot
I want to do some text classification tasks out of a self-written Java program using RapidMiner. I already learned a SVN Classification model and stored it to the repository. In my Java application, I read ids out of a database which points to my HDD where the text data is stored. This data is passed to RapidMiner. In order to save memory, the classification task isn't done for all data at once. Instead, I use block sizes. This is basically my application:
public class ApplyModel {Although the RapidMiner process is reinitiated for every data block, i am running into an OutOfMemory Exception (GC overhead limit exceeded). The memory problem depends on the actual amount of data. It only makes a small difference whether I run 100 iterations with 10 data sets or 10 iterations with 100 data sets. Does anyone have an idea?
static String process_definition_file = "apply_model.xml";
static int num_of_domains = 100000;
static int block_size = 100; // determines the number of examples classified at once
static Boolean debug = true;
public static void main(String[] args) {
System.out.println("START OF APPLY MODEL");
try {
// set RapidMiner confs
RapidMiner.setExecutionMode(ExecutionMode.COMMAND_LINE);
int start = 0;
int iteration = 1;
while(start < num_of_domains) {
// init RapidMiner
RapidMiner.init();
// read process definition
Process rm = new Process(new File(process_definition_file));
// avoid to fetch block size if limit is smaller than block size
int current_limit = block_size;
if(num_of_domains < block_size)
current_limit = num_of_domains;
// get data
ImmutableList<RapidMiner2Row> data = [...]
// transform to ExampleSet
ExampleSet ex = new CData2ExampleSet().getExampleSet(data);
// create IO Object
IOObject ioo = ex;
IOContainer ioc = new IOContainer(new IOObject[] {ioo});
// run RapidMiner process
IOContainer res_ioc = rm.run(ioc);
// analyze results
if(res_ioc.getElementAt(0) instanceof ExampleSet) {
ExampleSet resultSet = (ExampleSet)res_ioc.getElementAt(0);
// go through results
for (Example example : resultSet) {
[...]
}
}
start += current_limit;
iteration++;
// clean up
cdata = null;
data = null;
ex = null;
ioo = null;
ioc = null;
rm = null;
} // end of while
}
catch(Exception e) {
[...]
}
System.out.println("END OF APPLY MODEL");
}
}
Regards
Merlot
Tagged:
0
Answers
can you process the data from within RapidMiner's GUI? Then you probably assigned more memory to the GUI application than to your own program. You can set the maximum amount of memory which is available for the Java Virtual Machine by specifying the -Xmx parameter, e.g. java -Xmx2048m to assign 2GB of RAM.
If you are using eclipse, you can set that parameter in the run configuration of your project.
Best regards,
Marius
thanks for your advice. I didn't try to run the process within RM's GUI yet as my data is split into database values (id + label) and files on my HDD (textual content) and I would like to avoid to implement the "logic" into RM.
I already set the -Xms and -Xmx option in Eclipse to 4 GB. As far as I can see, this amount of memory is really in use. Is there a way to destroy RapidMiner objects explicitely (maybe within my while loop) to free all used space after processing each data block?
Regards
Merlot
you can drop a hint to the garbage collector that now might be a good time to do some work via use of However that is not guarantueed to work.
You could also try to use a dirty hack, though I would not advise using it: Use at your own risk.
And please remove the RapidMiner.init() from the loop and place it just after Regards,
Marco
I already tried to call System.gc(); at the end of the while loop. No effect. :-(
I would like to avoid to use your dirty hack because this code will be part of my thesis. So it looks like I'm stuck in my memory problem.
Regards
Merlot