webservice performance
Hi...question from a potential new user of RM Server:
"I am curious about performance of the rapidminer api - if we determine that we want to build models using that tool is it capable of then serving those results to an app with a large audience?"
Thoughts? I assume the performance of any RapidMiner webservice (api) is purely dependent on the server on which it is hosted...? Any insight or experiences using a RM Server webservice in high capacity situations?
Scott
Best Answer
-
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
Hi Scott,
There is probably no generic answer to this but I will try. First, it depends what the process behind the service is doing. Data prep? Modeling? Scoring? Most integrations focus more on scoring (plus the necessary data prep) so I will focus on this. Let's also assume that you made all the smart choices (e.g. you are not using a K-NN model trained on a billion examples and hope for a fast scoring ;-)).
In those situations, and of course this depends on the used machines, you can easily get up to millions of scores per second through executing the process directly. You will get the same throughput if you are using the Java API of RapidMiner. That is more than enough for most situations I have seen (with the notable exception of automatic trading).
The web service API slows this down a bit. Hard to tell by how much since this not only depends on the machines but also on the network infrastructure. But then on the other hand you can overcome this easily by running multiple scoring instances in parallel.
So in short: 10+ million / second is what you can achieve in a single instance for scoring plus simple data prep. But this is only true if you are using the Java API. It will be considerably less for the web service API. But here the advantage is that you can easily scale this out horizontally if you actually need to.
Hope this helps a bit,
Ingo
0
Answers
Thank you Ingo for your very quick and informative response. Yes I realize there are a million different use cases and hence it is impossible to answer this question with any degree of accuracy. I don't really know at this point what the client is looking for but yes, I agree that it likely is scoring of some flavor...
I apologize for my ignorance but I don't understand what you mean by the "Java API of RapidMiner". All I know is how to create a webservice via RapidMiner Server. Could you point me in a direction so I could see what this looks like? I almost never deal with real-time deployment services so this is new to me. Creating parallel scoring instances, on the other hand, makes sense to me.
Again THANK YOU!
Scott
I am also interested in this topic. @IngoRM what do you mean by speeding things up by running multiple scoring instances in parallel? Are you talking about actually running separate RapidMiner Servers and having the requests split up between them beforehand? Or something more complex that is happening inside a single RapidMiner server?
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi,
Sure, no prob. @sgenzer - here are a couple of links demonstrating how to embed RapidMiner processes (or single operators) directly into a Java application. The documentation on this is unfortunately not really great but Java coders should get the right idea from the examples:
The basic idea is that you build your process (either by coding) or by reading the process XML script and execute the process directly from your Java code. That is typically the fastest way to get scores (on a single machine).
Integrating via web services is of course a lot easier and also works better with other languages than Java. You both know how to do this for a single server. For all the others, please read here: https://docs.rapidminer.com/server/advanced-topics/qlik-integration/expose-process.html
If you want to scale this, @Telcontar120, one way to do this by putting a load balancer in front of the servers. I actually did not do this myself yet but here should be the general idea: https://docs.rapidminer.com/server/advanced-topics/high-availability/general-setup.html
Hope this helps,
Ingo
Yes I was thinking the same thing about load balancing multiple RM Servers on EC2 (e.g. https://aws.amazon.com/elasticloadbalancing/) like in that diagram. Never done it before but nice to know that this seems to be a viable approach. Has anyone done this on AWS? I'd love to hear how it went.
Thank you for sharing your expert knowledge, @IngoRM. Those links are my next project...always learning something new.
Scott