"Java Heap space"

laurab · November 2008

Hi,

I am trying to integrate RapidMiner with a Java application. I have used a W-MultilayerPerceptron which I set up and trained in RapidMiner. The model file is 31.1MB and when the java application trys to load/apply the model the following occurs:

P Nov 11, 2008 10:27:51 AM: Process:
Root[1] (Process)
+- ModelLoader[1] (ModelLoader)
+- ModelApplier[0] (ModelApplier)
com.rapidminer.operator.UserError: Could not read file 'C:\Documents and Settings\laurab\My Documents\java NN\DemandForecastingNN\mlp model\EF_GeneralV2.mod': Cannot read from XML stream, wrong format: Java heap space.
at com.rapidminer.operator.io.ModelLoader.apply(ModelLoader.java:97)
at com.rapidminer.operator.Operator.apply(Operator.java:656)

The model loads with no problems in RapidMiner. I have increasing the Java heap space to 1024mb via the command line. The model file is saved as XML Zipped I assume thats the most compressed format already? When I load a smaller model file into the java application then it works with no problems.

Thanks

Laura

land · November 2008

Hi Laura,
you could try the binary format. That should be smallest. I'm not quite sure, what WEKA does save within its models, but 31.1 MB for a neural net seems to be far from reasonable. It might be, that they save the complete training data. You could avoid this by using the rapidMiner NeuralNet.
Just to be curious: what does your application predict? Perhabs other learner would suit your needs better...

Greetings,
Sebastian

Legacy User · November 2008

Hi Sebastian,

I am using it for sales forecasting of pharmacuticals. I have a fairly large training dataset so that would explain the largeness of the model file, if it does save the training data aswell.

I would be interested if you do have another method might be more suitable because I was looking to combine two methods and at the moment could not think of another approach that gets as good results.

Also, I would like to model scenarios, such as when a new, cheap, drug becomes available what happens to the sales of the expensive drugs over the next six months. I dont know what would be the best approach for this problem or even how to approach it at all. It is a simailar problem to the normal sales forecasting in that they are both provided six months sales figures as attributes. In the normal sales forecasting (not in training) a prediction results is being added to the end of the attribute and is then used to get the next month predictions. The scenario modelling seems a bit different though as its triggered by an event, the previous prediction cannot really added as a new dimension for the next prediction. It would better if the method could plot all six months at once instead of a kind of sliding window approach. If you have any suggestions then I would be really interested to hear them.

Thanks

Laura

land · November 2008

Hi Laura,
a detailed suggestion would somehow exceeding the possibilities of this forum. So some short hints.
- For this kind of problem the good old Linear Regression always is worth a look. Since its very regular, it doesn't tend to overfitt the traindata the hell like grown up neural nets do. Constructing new features like sin, exp or polynomials of the old features helps to fit the function enough.
- Insert a trigger attribute for your event. Suppose you have training examples with such events (in the other case you can't learn anything anyway) and created a dummy varibale coded 0 and 1 before and after the cheap drug. You now could change labels and try to predict from the sales of the past if such an event has occured. This then might be used for predicting the sales.
- Some short legal hint: The community edition of rapidMiner is licensed under AGPL3, if you include it within your software, it needs to be under AGPL3, too. I only mention, because some people tend to oversee this tiny little detail of open-source

Greetings,
Sebastian

Legacy User · November 2008

Hi Sebastian,

Thanks for your reply it is really useful. I have started having a look at linear regression and the model file is only 4kb.

Saving the file using binary results in a bigger model file than XML Zipped. Also I am having trouble with the RapidMiner NeuralNet. Is it designed for regression? because it is providing the same value for nearly all the prediction. I have used the deafuly parameters. Although the model file is as big for the NeuralNet as it is for the W-MultilayerPerceptron? Is there any way of not saing the training dataset with the model file?

Thanks

Laura

land · November 2008

Hi Laura,
yes its possible. Just don't use NeuralNets

Unfortunatly the NeuralNet learner used within rapid miner includes the training set in the model. I now remeber that. It's a "feature" of the library we use and cannt be disabled until we are going to implement our own neural net.
Another very powerfull learner for regression is the LibSVMLearner. You could try using it, but its far more complicated to fit than linear regression

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Java Heap space"

Answers