Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
bulk scoring in rapidminer server
Hi everyone,
What is the best way to bulk score new records (100s of thousands originating from enterprise DB) using a deployed model (deployed via Deployment) in the Rapidminer server?
>I have tried using the web service, but it does not scale. The response time for a single record is around 3 seconds currently.
> There's no 'real-time' scoring requirement. It is a daily single bulk request.
> There's no 'real-time' scoring requirement. It is a daily single bulk request.
1
Best Answer
-
IngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderHi,With the upcoming RM 9.6 version you can turn off explanations for predictions which slows down the scoring a lot. But for true bulk scoring a single row web service approach does not seem to be great anyway IMHO.If you check the repository folder of the deployed models, you will find a process called "score_set" which you can use as a blueprint. Make a copy of this and adapt it a bit (especially for the operator "Explain Prediction" turn on the parameter "only predictions" to speed things up!) and add a data source (reading from you DB) in the beginning. If you also want to add the monitoring, you may also want to add the operator MDMLogging to this (which is a bit more complicated - I suggest to deal with this last if everything else works and you want the logging...).Hope this helps,
Ingo7
Answers
Thank you. I could re-purpose the "score_set" to "bulk-score" by
1. setting the "select which=1" for "Define Target" block as there shouldn't be a target column for prediction.
2. setting the "select which=1" for "Define ID" block as the training mode doesn't need an identifier (optional) and prediction needed one.
It would actually be great to have a standard "bulk-score" process auto-generated from the deployment.
Cheers,
Neel