🎉 🎉. RAPIDMINER 9.8 IS OUT!!! 🎉 🎉

RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance

CLICK HERE TO DOWNLOAD

Training and prediction elapsed time

pietro_fardellapietro_fardella Member Posts: 2 Contributor I
edited December 2018 in Help

Hello there,

I'm working on my thesis essay which is about time series forecasting. Specifically, I've built a framework for stock recommandation based on stocks' historical data. I've tested a lot of regression models to see which one performs better in terms of prediction error and actual gain when it's used for trading.

I'd like to evaluate this framework in terms of scalabilty, measuring the time needed to perform a prediction and the time needed for model learning, related to the number of stocks one wants to forecast. In this way, I could show if the designed framework can be adopted for online analysis (e.g. if the prediction time is really low).

Is there in RapidMiner an operator to measure those quantities (e.g. the time needed for a trained model to perform a record prediction, or the training time - but this is relatively simple to infer)?

I know yet this is a weird question since this time varies with respect to the underlying hardware, but I'd need a graph to show those performances in terms of time elapsed.

Best Answer

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,858  RM Data Scientist
    Solution Accepted

    Hi @pietro_fardella,

    i think you need to distingiush between a three different things here:

    1. Time for applying "the function" to a record

    2. Time for appyling all other preprocessing needed to a record (e.g. normalization, but also Generate Attributes etc)

    3. Time for the deployed service to respond.

     

    I think your initial question was regarding #1. From my commercial experience, the usual question is for the whole pipeline, including the invocation of the WS and so on.

     

    For #1 (and 2) you can use a log operator and log the execution time of any operator. That will give you a good indication. I think you can do this also for Execute Process operators which execute the whole application process (#1+#2).

     

    I would personally recommend measuring the total response time of a Webservice with your scoring using tools like jMeter from the outside. We did this when we designed the RealTimeScoring agents and got answer times < 30 ms for models like a GBT. I would be very interested in your results though.

     

    Will your thesis be in English and available? Would love to read it.

     

    Cheers,
    Martin

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    pietro_fardella

Answers

  • DocMusherDocMusher Member Posts: 329   Unicorn

    Hi,

    Good luck with your thesis and would love to read it too.

    Sven

  • pietro_fardellapietro_fardella Member Posts: 2 Contributor I

    Hi and thank you for clarifying me something. You've totally got the point.

    Specifically, I should measure the time needed to create the record and to apply the model function on it, since the model training can be done offline (e.g. when the market is closed), so it's not so important to measure it.

     

    By the way, the framework is far away to be a working web service. Essentially, I've designed and implemented 3 steps: data crawling and preprocessing, model learning, stock recommandation. Then I've validated it in terms of time needed, prediction error and profitability. Next step (as a future work) should be the integration of these 3 functional blocks - a web service could be a nice idea.


    Unfortunately, I'm writing my essay in Italian only for a matter of time, but I'm available to write an English version of it in the next two months.

     

    Cheers.

    Pietro

    mschmitz
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,858  RM Data Scientist

    Hi,

     

    okay, so i guess the easiest way would be to use an Execute process to do what you want to do and use the Log operator extract the time it needs. Maybe you want to run this on a Server and also check how quick this is if you use Background execution instead of normal execution (there is a difference).

    Keep in mind that normal studio executions are no tuned for low-latency. Thats what the RTS is for. Please let me know if you need any more help.

     

    W.r.t your thesis: A pity :/. The colleague i am working within after sales is Italian, but I guess this won't help since she is non-technial. It would be from our end super cool if we could turn this into some kind of consumable story for our users.

     

    CC: @sgenzer 

    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.