"How to loop through all values in a nominal set"

macaronmacaron Member Posts: 3 Contributor I
edited June 2019 in Help
Hi everyone.

The Loop Values operator's wiki states throughout that it iterates over "all the possible values of the selected attribute".

In view of this, I expected the process
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="generate_data_user_specification" compatibility="5.3.008" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="45" y="75">
       <list key="attribute_values">
         <parameter key="a" value="&quot;hello&quot;"/>
       </list>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="add" compatibility="5.3.008" expanded="true" height="76" name="Add" width="90" x="179" y="75">
       <parameter key="attribute_name" value="a"/>
       <parameter key="new_value" value="world"/>
     </operator>
     <operator activated="true" class="loop_values" compatibility="5.3.008" expanded="true" height="76" name="Loop Values" width="90" x="313" y="75">
       <parameter key="attribute" value="a"/>
       <process expanded="true">
         <operator activated="true" class="print_to_console" compatibility="5.3.008" expanded="true" height="76" name="Print to Console" width="90" x="112" y="30">
           <parameter key="log_value" value="%{loop_value}"/>
         </operator>
         <connect from_port="example set" to_op="Print to Console" to_port="through 1"/>
         <connect from_op="Print to Console" from_port="through 1" to_port="out 1"/>
         <portSpacing port="source_example set" spacing="0"/>
         <portSpacing port="sink_out 1" spacing="0"/>
         <portSpacing port="sink_out 2" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Generate Data by User Specification" from_port="output" to_op="Add" to_port="example set input"/>
     <connect from_op="Add" from_port="example set output" to_op="Loop Values" to_port="example set"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
   </process>
 </operator>
</process>
to print to the console:
hello
world
But it only prints
hello
Is there a way in which I can iterate over the entire nominal set of the attribute, even those values that do not appear in the particular data currently provided?

Regards,
Kurk

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Kurk,

    unfortunately that is not possible. I think we even changed the behaviour such that unused values are ignored, because that is what the average user will expect.
    The only workaround is probably to use some kind of groovy scripting with the Execute Script operator to first create a data set that contains all values in the mapping, and then loop that data set.

    Best regards,
    Marius
  • macaronmacaron Member Posts: 3 Contributor I
    Thank you Marius. Would it be a big change to implement a boolean parameter "include unused values" for the Loop Values operator?
    Some nominal sets are inherently defined. E.g. weekdays, months, and in some cases one needs the processing structure not to be altered by the data that happens to go through it.

    It could be an "advanced parameter" so as not to bother the average user. Do you think this would be a reasonable request or is it too custom?

    Regards,
    Kurk
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    That's probably too specific, because in reality we want to hide the nominal mapping from the users, not expose it, because it is only the internal implementation and could in theory vanish completely one day.

    We also have such problems, but in that case again we would create a data set that contains nothing but the loop values (e.g. days of week), loop over that data set and perform the necessary steps on the other data set.

    Best regards,
    Marius
  • macaronmacaron Member Posts: 3 Contributor I
    Ok. Thanks for taking the time to explain.

    Just wondering: Why is there an Add operator then? For what could it be used if the set of possible nominal values is supposed to be hidden from the user?

    Thanks,
    Kurk
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    To be honest, until last week I had no idea that it even exists and I have never used it before. It's probably there for compatibility reason with the previous versions.

    Best regards,
    Marius
Sign In or Register to comment.