bulk scoring in rapidminer server

Neel · January 2020

Hi everyone,

What is the best way to bulk score new records (100s of thousands originating from enterprise DB) using a deployed model (deployed via Deployment) in the Rapidminer server?

>I have tried using the web service, but it does not scale. The response time for a single record is around 3 seconds currently.
> There's no 'real-time' scoring requirement. It is a daily single bulk request.

IngoRM · January 2020

Hi,

With the upcoming RM 9.6 version you can turn off explanations for predictions which slows down the scoring a lot. But for true bulk scoring a single row web service approach does not seem to be great anyway IMHO.

If you check the repository folder of the deployed models, you will find a process called "score_set" which you can use as a blueprint. Make a copy of this and adapt it a bit (especially for the operator "Explain Prediction" turn on the parameter "only predictions" to speed things up!) and add a data source (reading from you DB) in the beginning. If you also want to add the monitoring, you may also want to add the operator MDMLogging to this (which is a bit more complicated - I suggest to deal with this last if everything else works and you want the logging...).

Hope this helps,
Ingo

Neel · January 2020

Hi @IngoRM,

Thank you. I could re-purpose the "score_set" to "bulk-score" by
1. setting the "select which=1" for "Define Target" block as there shouldn't be a target column for prediction.
2. setting the "select which=1" for "Define ID" block as the training mode doesn't need an identifier (optional) and prediction needed one.

It would actually be great to have a standard "bulk-score" process auto-generated from the deployment.

Cheers,
Neel

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

bulk scoring in rapidminer server

Best Answer

Answers