RapidMiner Server: memory management for repository access
Hi all, hi Marco!
We were wondering how RapidMiner Server handles the reading and writing of big big objects from and to the Server Repository.
Say, we write an ExampleSet or a big model (e.g. a complex RandomForest model) of 2 GB to the Server repository. Does the Server cache the complete object in memory, or does it stream to the database? What when we read it back?
In other words: if the memory of the server is restricted to 2 GB, can we still reliably store bigger objects in the repository? (whether this is good practive is another question, but sometimes you have no choice...)
Also, does accessing the repository count against the api limit of the free RapidMiner Server, or does the api limit only apply to processes that are exposed as a webservice?
Cheers,
Marius
Best Answer
-
Marco_Boeck Administrator, Moderator, Employee-RapidMiner, Member, University Professor Posts: 1,996 RM Engineering
Hi Marius!
Let me provide a few more details here:
- If you create the objects on RM Server itself, then at that point, they already have to be entirely in memory. Does not matter which type of object. So that naturally becomes tricky if they are larger than the maximum memory of RM Server.
- If you store an ExampleSet on Server via Studio or the REST API, it will be streamed. This means yes, you can upload larger sets than your memory limit.
- If you store a model (or any other IOObject really), it is stored as a binary blob. Thus it will be completely loaded into memory and as such problematic for objects larger than the memory limit.
- Indeed, only actual web services count against the limit. Neither does testing those count, nor does using the repository either via REST API or via Studio (SOAP API).
Cheers,
Marco
1
Answers
Hi Marius,
Good to hear from you :-)
Unless I am corrected by one of your Server experts, I think the answers to your questions are "yes" and "no". Yes, you can write larger objects to the repository as part of the process execution on Server. If the result is a data set though and you read it back into a free RapidMiner Studio still the row limit would apply though.
And no, the repository access does not count against the API limit. Only web service calls do.
Cheers,
Ingo
Hi Ingo, hi Marco,
thanks for your replies! That answers all my questions
So basically, if you need just the former Collaboration Tier, the Free server will do, unless you train overly complex models or otherwise create big IOObjects. The real use of course only comes in when you can also execute background and heavy duty jobs on the server, so the limit of the Free edition will quickly be reached...
Cheers,
~Marius