🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.


"RapidMiner Studio 8.2 Release - May 8, 2018"

sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959  Community Manager
edited June 2019 in Help

Hi all - just opening a thread today for the RM Studio 8.2 release. Any feedback (positive or "constructive") by replying on this thread very welcome. Bugs should be posted in the Product Feedback section as usual. Ideas for future releases should still be posted in the Product Ideas section. Thanks!






  • earmijoearmijo Member Posts: 265   Unicorn

    I noticed that FP-Growth is now accepting new formats. That is really good news.  My question is: Will it take the following format?


    Screen Shot 2018-05-08 at 10.28.34 AM.png

    I think this is the most efficient format to store transactions. (I know there is a process, Transactions2Basket, to perform the conversion. I was just wondering if this format would be accepted directly)


    Thanks in advance for any info

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,053  RM Data Scientist


    if i remember the discussion correctly - yes. I guess this is even the preferred format.




    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • gmeiergmeier Employee, Member Posts: 24   RM Engineering

    All the new input formats still require each basket to be in a single row. Please have a look at the tutorial process "The input formats of the FP-Growth Operator" in the Help for FP-Growth.

    What changed is that you need fewer operators to transform an input of earmijo's format into an accepted input format for FP-Growth. One Aggregate with concatenation should do it plus a Set Role.

  • earmijoearmijo Member Posts: 265   Unicorn

     I had not seen the tutorials. It is certainly simpler now. Thank you @gmeier

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 290   Unicorn

    Hi all, 

    The announcement looks really intriguing, and the new real-time scoring feature should be a killer one :) 


    Though I have the question about this: "The request latency for retrieving a score is less than 25ms".

    Specifically, how exactly that latency is measured (what kind of hardware setup) and what exactly falls into this 25ms window? Only model response (which exactly model in this case), or response from some sample process (what pipeline is included there in this case)?


    In my understanding, the bottleneck is still a network speed, while the model itself responds really fast, but again, if there's an underlying process, it also depends heavily on the data preprocessing included in it, db queries and so on. I have been stress testing different setups with RM server 3 years ago and compared it with another setup I have used in production recently (just by sending POST requests to a web service), and the magnitude of response time could be really high: from 80-100ms for a simple process like 'Read XML + Apply Model' to 6000-8000ms for complex process which included several SQL queries with aggregations before applying the model.  

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959  Community Manager

    cc @jpuente @Edin_Klapic



  • Nils_WoehlerNils_Woehler Member Posts: 463  Maven

    Hi @kypexin,


    we have conducted the real-time scoring tests on AWS via jmeter. The test processes ranged from a simple process with no logic at all (baseline) to scoring processes with a small, medium, and large models.

    This was the hardware the Scoring Agent was deployed on:

    Screen Shot 2018-05-11 at 10.03.08.png


    And here's an overview of our test results:

    Screen Shot 2018-05-11 at 10.03.01.png


    As you can see the baseline for a process which is just piping the input to the output is at about 6ms per request. So everything else done within the process adds to the latency.

    In a more recent version of the real-time scoring, which will be released with v8.3, we have added input caching which even reduced the latency by a magnitude of 2-3 for processes with larger models. Here's a preliminary test result for the caching mechanism:


    Screen Shot 2018-05-11 at 10.07.03.png


    Please note that the real-time scoring does not support any external connections, e.g. DB connections, at the moment.




  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344   Unicorn

    Hi @Nils_Woehler,


    can you elaborate on the differences between a call to the scoring agent and a normal web service call? Why is it faster?

  • Nils_WoehlerNils_Woehler Member Posts: 463  Maven

    Hi @SGolbert,


    sure. The web services in RapidMiner Server are not as light-weight, performant, and scalable as the ones of the new Real-time Scoring.

    Everytime a request is made in RM Server the process is loaded from the database, including permission checking, etc. This does allow good but not real-time performance. Also, as they are part of RapidMiner Server, they are not as good scalable as the new Real-time Scoring components. With RM Server it is a bit hard to run multiple instances to react in case the load increases. Real-time Scoring components can be scaled up as needed. Last but not least RapidMiner Server web services can be edited while they are active, which might lead to errors in production. With RapidMiner Real-time scoring we have changed the concept to a deployment based one which will prevent users from accidentally changing web services which are used in production.








Sign In or Register to comment.