"RM Performance Optimization"

lubomir_karliklubomir_karlik Member Posts: 4 Contributor I
edited May 2019 in Help

I have integrated rapid miner into my application for event predictions. The rapid miner is a clear performance bottleneck. Do you know whether it is possible to optimize it?

I have 1 000 prediction model (neural networks) i.e. 1000 RM scripts. Every script expects as an input training sample set that is build in iterations. Let’s say 500 samples is required. Every sample is gathered by a RM script. The RM script connects to a DB, makes a select and then some data transformation (simple ones).  Samples are merged afterwards. Java profiler shows that 99.9 percent of time is spend on running the RM script. To get 500 samples takes about 10-15 minutes.
Rapid miner is initialized through RapidMiner.init(false, true, true, true);

It is not possible to reduce the number of the RM scripts e. g. run one script to get all the data. I am interested whether the RM script is not always creating a new DB connection or the pooling is supported. Might be there another pitfall?

Thank you for your response in advance!

Lubomir Karlik


  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    without knowing, what you are doing, I cannot say, if there are other pitfalls, beside the fact, that NeuralNetworks aren't very fast in general.
    I'm not quite sure if we already pool the database connections, but if not, we will set it on our agenda.

  • lubomir_karliklubomir_karlik Member Posts: 4 Contributor I
    Thank you for response!

    Meanwhile, I have used profiler. I have realized that the low performance is caused mainly by operator Nominal2Date (ca. 30% of time for the script execution). The DB connection seems to be remained open. Execute query taskes ca. 13% and the DB is huge, so this is reasonable.

    This reminds me that I had to apply conversion of date to nominal and vice versa because OLAP operators cannot group by non-nominal attrbiutes like date or integer. Am I wrong here?

    Best regards,
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    you are correct: Grouping is only possible by nominal values. Perhaps you could save some time if you won't have to convert the attribute back but instead hold both versions?

Sign In or Register to comment.