The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Feature Request: Loop Repository without retrieving any files
christos_karras
Member Posts: 50 Guru
in Help
The Loop Repository operator provides in its inner subprocess, a "rep" input that provides the repository entry loaded in memory. This is causing unnecessary delays for our use cases, because we have additional conditions inside the inner subprocess to decide which entries actually need to be loaded (and only a minority of them are needed). We then retrieve the entries we really need using the "Retrieve" operator and the %{repository_path} macro. The available filtering options, based on regular expressions, are not adequate for our use case because the decision is based on a lookup on another example set.
Even though our process is not using the "rep" input, RapidMiner still loads each matched repository entry in memory, which causes a process that should take a few seconds to run to instead take 30-60 minutes.
I would like to request an option to "disable automatic loading of repository entries". This could either be an explicit option (checkbox), or maybe RapidMiner could automatically detect we do not want to load entries if nothing is connected to the "rep" input.
Thanks
Tagged:
1
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi @christos_karras ,what i could offer in relativly short term would be an operator which gives you a list of object with it's types. You can combine this with a Loop values operator where you retrieve the object using retrieve.Would that cut it?Best,Marti- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany5
Answers
Yes, that's probably even better. The resulting ExampleSet would need to to have the same attributes that are provided as macros in the Loop Repository operator:
* entry_name
* repository_path
* parent_folder
I would probably use Loop Examples instead of Loop Values because I would need to access, for example, both the entry_name and repository_path in each iteration of the loop.
If it's fast to do at the same time (and if the information is available), I suggest also adding a column with the Last Modified Timestamp for each repository entry.
Thanks