πŸŽ‰ πŸŽ‰. RAPIDMINER 9.8 IS OUT!!! πŸŽ‰ πŸŽ‰

RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance

CLICK HERE TO DOWNLOAD

"RM Performance Optimization"

lubomir_karliklubomir_karlik Member Posts: 4 Contributor I
edited May 2019 in Help
Hi!

I have integrated rapid miner into my application for event predictions. The rapid miner is a clear performance bottleneck. Do you know whether it is possible to optimize it?

Situation:
I have 1 000 prediction model (neural networks) i.e. 1000 RM scripts. Every script expects as an input training sample set that is build in iterations. Let’s say 500 samples is required. Every sample is gathered by a RM script. The RM script connects to a DB, makes a select and then some data transformation (simple ones).Β  Samples are merged afterwards. Java profiler shows that 99.9 percent of time is spend on running the RM script. To get 500 samples takes about 10-15 minutes.
Rapid miner is initialized through RapidMiner.init(false, true, true, true);

It is not possible to reduce the number of the RM scripts e. g. run one script to get all the data. I am interested whether the RM script is not always creating a new DB connection or the pooling is supported. Might be there another pitfall?

Thank you for your response in advance!

Lubomir Karlik
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi,
    without knowing, what you are doing, I cannot say, if there are other pitfalls, beside the fact, that NeuralNetworks aren't very fast in general.
    I'm not quite sure if we already pool the database connections, but if not, we will set it on our agenda.

    Greetings,
    Β  Sebastian
  • lubomir_karliklubomir_karlik Member Posts: 4 Contributor I
    Thank you for response!

    Meanwhile, I have used profiler. I have realized that the low performance is caused mainly by operator Nominal2Date (ca. 30% of time for the script execution). The DB connection seems to be remained open. Execute query taskes ca. 13% and the DB is huge, so this is reasonable.

    This reminds me that I had to apply conversion of date to nominal and vice versa because OLAP operators cannot group by non-nominal attrbiutes like date or integer. Am I wrong here?

    Best regards,
    Lubomir
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi,
    you are correct: Grouping is only possible by nominal values. Perhaps you could save some time if you won't have to convert the attribute back but instead hold both versions?

    Greetings,
    Β  Sebastian
Sign In or Register to comment.