The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
bulk scoring in rapidminer server
Hi everyone,
What is the best way to bulk score new records (100s of thousands originating from enterprise DB) using a deployed model (deployed via Deployment) in the Rapidminer server?
>I have tried using the web service, but it does not scale. The response time for a single record is around 3 seconds currently.
> There's no 'real-time' scoring requirement. It is a daily single bulk request.
> There's no 'real-time' scoring requirement. It is a daily single bulk request.
1
Best Answer
-
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderHi,With the upcoming RM 9.6 version you can turn off explanations for predictions which slows down the scoring a lot. But for true bulk scoring a single row web service approach does not seem to be great anyway IMHO.If you check the repository folder of the deployed models, you will find a process called "score_set" which you can use as a blueprint. Make a copy of this and adapt it a bit (especially for the operator "Explain Prediction" turn on the parameter "only predictions" to speed things up!) and add a data source (reading from you DB) in the beginning. If you also want to add the monitoring, you may also want to add the operator MDMLogging to this (which is a bit more complicated - I suggest to deal with this last if everything else works and you want the logging...).Hope this helps,
Ingo7
Answers
Thank you. I could re-purpose the "score_set" to "bulk-score" by
1. setting the "select which=1" for "Define Target" block as there shouldn't be a target column for prediction.
2. setting the "select which=1" for "Define ID" block as the training mode doesn't need an identifier (optional) and prediction needed one.
It would actually be great to have a standard "bulk-score" process auto-generated from the deployment.
Cheers,
Neel