Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Loop over datasets in repository?"
Dear All,
How to execute the same process for different datasets in your repository?
I can't figure out how to use the "Loop Repository" operator.
Best regards,
Wessel
How to execute the same process for different datasets in your repository?
I can't figure out how to use the "Loop Repository" operator.
Best regards,
Wessel
Tagged:
0
Answers
the following process shows the basic usage: it loops over the example sets in the samples directory and delivers them as a collection (of course you could do anything else in the loop then just deliver the data...). In addition, it collects all data set sizes with a logging operator which demonstrates the usage of the predefined macros in the loop. Cheers,
Ingo
Thanks a lot.
This works like a charm.
Unfortunately it gives both warnings and errors:
- Expected ExampleSet but received IOObject.
- Meta data is underspecified. Cannot check precondition.
I use this in a process and get these errors more than 20 times.
This is a bit of a bummer because now I can't see other actually important errors.
Best regards,
Wessel
edit: I think the trick is to pass the meta data from the first dataset in the folder.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.017">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
<process expanded="true" height="432" width="1043">
<operator activated="true" class="loop_repository" compatibility="5.1.017" expanded="true" height="76" name="Loop Repository" width="90" x="179" y="184">
<parameter key="repository_folder" value="//Samples/data/"/>
<parameter key="entry_type" value="IOObject"/>
<parameter key="entry_name_macro" value="Golf"/>
<process expanded="true" height="432" width="705">
<operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/%{Golf}"/>
</operator>
<operator activated="true" class="remove_useless_attributes" compatibility="5.1.017" expanded="true" height="76" name="Remove Useless Attributes" width="90" x="180" y="30"/>
<operator activated="true" class="extract_macro" compatibility="5.1.017" expanded="true" height="60" name="Extract Macro" width="90" x="315" y="30">
<parameter key="macro" value="size"/>
</operator>
<operator activated="true" class="provide_macro_as_log_value" compatibility="5.1.017" expanded="true" height="76" name="Provide Macro as Log Value (2)" width="90" x="450" y="30">
<parameter key="macro_name" value="repository_path"/>
</operator>
<operator activated="true" class="log" compatibility="5.1.017" expanded="true" height="76" name="Log" width="90" x="585" y="30">
<list key="log">
<parameter key="Dataset" value="operator.Provide Macro as Log Value (2).value.macro_value"/>
<parameter key="Size" value="operator.Extract Macro.value.macro_value"/>
</list>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Remove Useless Attributes" to_port="example set input"/>
<connect from_op="Remove Useless Attributes" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_op="Provide Macro as Log Value (2)" to_port="through 1"/>
<connect from_op="Provide Macro as Log Value (2)" from_port="through 1" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_repository object" spacing="0"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_op="Loop Repository" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
By the way: Those are not "errors" but "potential problems" as stated at top of the "Problem" view. And indeed the meta data is underspecified so it cannot be guaranteed that the process will run without actually executing it :P
Cheers,
Ingo