The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"RapidMiner Studio 8.2 Release - May 8, 2018"
sgenzer
Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
Hi all - just opening a thread today for the RM Studio 8.2 release. Any feedback (positive or "constructive") by replying on this thread very welcome. Bugs should be posted in the Product Feedback section as usual. Ideas for future releases should still be posted in the Product Ideas section. Thanks!
Scott
Tagged:
1
Answers
I noticed that FP-Growth is now accepting new formats. That is really good news. My question is: Will it take the following format?
I think this is the most efficient format to store transactions. (I know there is a process, Transactions2Basket, to perform the conversion. I was just wondering if this format would be accepted directly)
Thanks in advance for any info
Hi,
if i remember the discussion correctly - yes. I guess this is even the preferred format.
Best,
Martin
Dortmund, Germany
All the new input formats still require each basket to be in a single row. Please have a look at the tutorial process "The input formats of the FP-Growth Operator" in the Help for FP-Growth.
What changed is that you need fewer operators to transform an input of earmijo's format into an accepted input format for FP-Growth. One Aggregate with concatenation should do it plus a Set Role.
I had not seen the tutorials. It is certainly simpler now. Thank you @gmeier
Hi all,
The announcement looks really intriguing, and the new real-time scoring feature should be a killer one
Though I have the question about this: "The request latency for retrieving a score is less than 25ms".
Specifically, how exactly that latency is measured (what kind of hardware setup) and what exactly falls into this 25ms window? Only model response (which exactly model in this case), or response from some sample process (what pipeline is included there in this case)?
In my understanding, the bottleneck is still a network speed, while the model itself responds really fast, but again, if there's an underlying process, it also depends heavily on the data preprocessing included in it, db queries and so on. I have been stress testing different setups with RM server 3 years ago and compared it with another setup I have used in production recently (just by sending POST requests to a web service), and the magnitude of response time could be really high: from 80-100ms for a simple process like 'Read XML + Apply Model' to 6000-8000ms for complex process which included several SQL queries with aggregations before applying the model.
Vladimir
http://whatthefraud.wtf
cc @jpuente @Edin_Klapic
Hi @kypexin,
we have conducted the real-time scoring tests on AWS via jmeter. The test processes ranged from a simple process with no logic at all (baseline) to scoring processes with a small, medium, and large models.
This was the hardware the Scoring Agent was deployed on:
And here's an overview of our test results:
As you can see the baseline for a process which is just piping the input to the output is at about 6ms per request. So everything else done within the process adds to the latency.
In a more recent version of the real-time scoring, which will be released with v8.3, we have added input caching which even reduced the latency by a magnitude of 2-3 for processes with larger models. Here's a preliminary test result for the caching mechanism:
Please note that the real-time scoring does not support any external connections, e.g. DB connections, at the moment.
Best,
Nils
Hi @Nils_Woehler,
can you elaborate on the differences between a call to the scoring agent and a normal web service call? Why is it faster?
Hi @SGolbert,
sure. The web services in RapidMiner Server are not as light-weight, performant, and scalable as the ones of the new Real-time Scoring.
Everytime a request is made in RM Server the process is loaded from the database, including permission checking, etc. This does allow good but not real-time performance. Also, as they are part of RapidMiner Server, they are not as good scalable as the new Real-time Scoring components. With RM Server it is a bit hard to run multiple instances to react in case the load increases. Real-time Scoring components can be scaled up as needed. Last but not least RapidMiner Server web services can be edited while they are active, which might lead to errors in production. With RapidMiner Real-time scoring we have changed the concept to a deployment based one which will prevent users from accidentally changing web services which are used in production.
Cheers,
Nils