"RapidMiner Studio 8.2 Release - May 8, 2018"

sgenzer · May 2018

Hi all - just opening a thread today for the RM Studio 8.2 release. Any feedback (positive or "constructive") by replying on this thread very welcome. Bugs should be posted in the Product Feedback section as usual. Ideas for future releases should still be posted in the Product Ideas section. Thanks!

Scott

earmijo · May 2018

I noticed that FP-Growth is now accepting new formats. That is really good news. My question is: Will it take the following format?

Screen Shot 2018-05-08 at 10.28.34 AM.png

I think this is the most efficient format to store transactions. (I know there is a process, Transactions2Basket, to perform the conversion. I was just wondering if this format would be accepted directly)

Thanks in advance for any info

MartinLiebig · May 2018

Hi,

if i remember the discussion correctly - yes. I guess this is even the preferred format.

Best,

Martin

gmeier · May 2018

All the new input formats still require each basket to be in a single row. Please have a look at the tutorial process "The input formats of the FP-Growth Operator" in the Help for FP-Growth.

What changed is that you need fewer operators to transform an input of earmijo's format into an accepted input format for FP-Growth. One Aggregate with concatenation should do it plus a Set Role.

earmijo · May 2018

I had not seen the tutorials. It is certainly simpler now. Thank you @gmeier

kypexin · May 2018

Hi all,

The announcement looks really intriguing, and the new real-time scoring feature should be a killer one

Though I have the question about this: "The request latency for retrieving a score is less than 25ms".

Specifically, how exactly that latency is measured (what kind of hardware setup) and what exactly falls into this 25ms window? Only model response (which exactly model in this case), or response from some sample process (what pipeline is included there in this case)?

In my understanding, the bottleneck is still a network speed, while the model itself responds really fast, but again, if there's an underlying process, it also depends heavily on the data preprocessing included in it, db queries and so on. I have been stress testing different setups with RM server 3 years ago and compared it with another setup I have used in production recently (just by sending POST requests to a web service), and the magnitude of response time could be really high: from 80-100ms for a simple process like 'Read XML + Apply Model' to 6000-8000ms for complex process which included several SQL queries with aggregations before applying the model.

sgenzer · May 2018

cc @jpuente @Edin_Klapic

Nils_Woehler · May 2018

Hi @kypexin,

we have conducted the real-time scoring tests on AWS via jmeter. The test processes ranged from a simple process with no logic at all (baseline) to scoring processes with a small, medium, and large models.

This was the hardware the Scoring Agent was deployed on:

Screen Shot 2018-05-11 at 10.03.08.png

And here's an overview of our test results:

Screen Shot 2018-05-11 at 10.03.01.png

As you can see the baseline for a process which is just piping the input to the output is at about 6ms per request. So everything else done within the process adds to the latency.

In a more recent version of the real-time scoring, which will be released with v8.3, we have added input caching which even reduced the latency by a magnitude of 2-3 for processes with larger models. Here's a preliminary test result for the caching mechanism:

Screen Shot 2018-05-11 at 10.07.03.png

Please note that the real-time scoring does not support any external connections, e.g. DB connections, at the moment.

Best,

Nils

SGolbert · May 2018

Hi @Nils_Woehler,

can you elaborate on the differences between a call to the scoring agent and a normal web service call? Why is it faster?

Nils_Woehler · May 2018

Hi @SGolbert,

sure. The web services in RapidMiner Server are not as light-weight, performant, and scalable as the ones of the new Real-time Scoring.

Everytime a request is made in RM Server the process is loaded from the database, including permission checking, etc. This does allow good but not real-time performance. Also, as they are part of RapidMiner Server, they are not as good scalable as the new Real-time Scoring components. With RM Server it is a bit hard to run multiple instances to react in case the load increases. Real-time Scoring components can be scaled up as needed. Last but not least RapidMiner Server web services can be edited while they are active, which might lead to errors in production. With RapidMiner Real-time scoring we have changed the concept to a deployment based one which will prevent users from accidentally changing web services which are used in production.

Cheers,

Nils

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"RapidMiner Studio 8.2 Release - May 8, 2018"

Answers