Options

Feature Request: Loop Repository without retrieving any files

christos_karraschristos_karras Member Posts: 50 Guru
The Loop Repository operator provides in its inner subprocess, a "rep" input that provides the repository entry loaded in memory. This is causing unnecessary delays for our use cases, because we have additional conditions inside the inner subprocess to decide which entries actually need to be loaded (and only a minority of them are needed). We then retrieve the entries we really need using the "Retrieve" operator and the %{repository_path} macro. The available filtering options, based on regular expressions, are not adequate for our use case because the decision is based on a lookup on another example set.

Even though our process is not using the "rep" input, RapidMiner still loads each matched repository entry in memory, which causes a process that should take a few seconds to run to instead take 30-60 minutes.

I would like to request an option to "disable automatic loading of repository entries". This could either be an explicit option (checkbox), or maybe RapidMiner could automatically detect we do not want to load entries if nothing is connected to the "rep" input.





Thanks


Tagged:

Best Answer

Answers

  • Options
    christos_karraschristos_karras Member Posts: 50 Guru
    Hi @mschmitz ,

    Yes, that's probably even better. The resulting ExampleSet would need to to have the same attributes that are provided as macros in the Loop Repository operator:
    * entry_name
    * repository_path
    * parent_folder

    I would probably use Loop Examples instead of Loop Values because I would need to access, for example, both the entry_name and repository_path in each iteration of the loop. 

    If it's fast to do at the same time (and if the information is available), I suggest also adding a column with the Last Modified Timestamp for each repository entry.

    Thanks
Sign In or Register to comment.