"RapidMiner Studio 8.2 Release - May 8, 2018"

sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
edited June 2019 in Help

Hi all - just opening a thread today for the RM Studio 8.2 release. Any feedback (positive or "constructive") by replying on this thread very welcome. Bugs should be posted in the Product Feedback section as usual. Ideas for future releases should still be posted in the Product Ideas section. Thanks!

 

Scott

 

Tagged:

Answers

  • earmijoearmijo Member Posts: 270 Unicorn

    I noticed that FP-Growth is now accepting new formats. That is really good news.  My question is: Will it take the following format?

     

    Screen Shot 2018-05-08 at 10.28.34 AM.png

    I think this is the most efficient format to store transactions. (I know there is a process, Transactions2Basket, to perform the conversion. I was just wondering if this format would be accepted directly)

     

    Thanks in advance for any info

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

    if i remember the discussion correctly - yes. I guess this is even the preferred format.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • gmeiergmeier Employee, Member Posts: 25 RM Engineering

    All the new input formats still require each basket to be in a single row. Please have a look at the tutorial process "The input formats of the FP-Growth Operator" in the Help for FP-Growth.

    What changed is that you need fewer operators to transform an input of earmijo's format into an accepted input format for FP-Growth. One Aggregate with concatenation should do it plus a Set Role.

  • earmijoearmijo Member Posts: 270 Unicorn

     I had not seen the tutorials. It is certainly simpler now. Thank you @gmeier

  • kypexinkypexin Moderator, RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi all, 

    The announcement looks really intriguing, and the new real-time scoring feature should be a killer one :) 

     

    Though I have the question about this: "The request latency for retrieving a score is less than 25ms".

    Specifically, how exactly that latency is measured (what kind of hardware setup) and what exactly falls into this 25ms window? Only model response (which exactly model in this case), or response from some sample process (what pipeline is included there in this case)?

     

    In my understanding, the bottleneck is still a network speed, while the model itself responds really fast, but again, if there's an underlying process, it also depends heavily on the data preprocessing included in it, db queries and so on. I have been stress testing different setups with RM server 3 years ago and compared it with another setup I have used in production recently (just by sending POST requests to a web service), and the magnitude of response time could be really high: from 80-100ms for a simple process like 'Read XML + Apply Model' to 6000-8000ms for complex process which included several SQL queries with aggregations before applying the model.  

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    cc @jpuente @Edin_Klapic

     

     

  • Nils_WoehlerNils_Woehler Member Posts: 463 Maven

    Hi @kypexin,

     

    we have conducted the real-time scoring tests on AWS via jmeter. The test processes ranged from a simple process with no logic at all (baseline) to scoring processes with a small, medium, and large models.

    This was the hardware the Scoring Agent was deployed on:

    Screen Shot 2018-05-11 at 10.03.08.png

     

    And here's an overview of our test results:

    Screen Shot 2018-05-11 at 10.03.01.png

     

    As you can see the baseline for a process which is just piping the input to the output is at about 6ms per request. So everything else done within the process adds to the latency.

    In a more recent version of the real-time scoring, which will be released with v8.3, we have added input caching which even reduced the latency by a magnitude of 2-3 for processes with larger models. Here's a preliminary test result for the caching mechanism:

     

    Screen Shot 2018-05-11 at 10.07.03.png

     

    Please note that the real-time scoring does not support any external connections, e.g. DB connections, at the moment.

     

    Best,

    Nils

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi @Nils_Woehler,

     

    can you elaborate on the differences between a call to the scoring agent and a normal web service call? Why is it faster?

  • Nils_WoehlerNils_Woehler Member Posts: 463 Maven

    Hi @SGolbert,

     

    sure. The web services in RapidMiner Server are not as light-weight, performant, and scalable as the ones of the new Real-time Scoring.

    Everytime a request is made in RM Server the process is loaded from the database, including permission checking, etc. This does allow good but not real-time performance. Also, as they are part of RapidMiner Server, they are not as good scalable as the new Real-time Scoring components. With RM Server it is a bit hard to run multiple instances to react in case the load increases. Real-time Scoring components can be scaled up as needed. Last but not least RapidMiner Server web services can be edited while they are active, which might lead to errors in production. With RapidMiner Real-time scoring we have changed the concept to a deployment based one which will prevent users from accidentally changing web services which are used in production.

     

    Cheers,

    Nils

     

     

     

     

Sign In or Register to comment.