RapidMiner

How to "loop" over a filter?

SOLVED
Highlighted
Super Contributor

How to "loop" over a filter?

Hey everybody,

I have a dataset containing the hours of a day of the whole year. What I want to do is to filter each day. Obviously doing that manually would be very hard, as I had to do that 365 times. Is there a way to somehow loop this thing?

Thanks Smiley Happy

19 REPLIES
Moderator

Re: How to "loop" over a filter?

Hey,

 

loop values would do the job. Maybe our new Group Into Collection operator from the Operator Toolbox is even better, it gives you a collection with an example set per day. You can work with Loop Collection trhough the days.

 

~Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Super Contributor

Re: How to "loop" over a filter?

Thanks for your reply,

that sounds pretty good. But could you specify? I downloaded the operator toolbox, but as soon as I put the Group into Collection operator into the Loop Collection operator the Error Message "Expected IOObjectCollection but received Examples set" occurs. Since I would call myself a newbie I would be grateful if you could provide me how to do so :-).

Regards
Philipp

Super Contributor

Re: How to "loop" over a filter?

[ Edited ]

And besides that is it possible to group by 2 attributes?

_______________________________________________
Okay, I solved this by putting another Group into Collection operator into the loop collection?! Now the problem occured that I can't join a collection with another dataset?

Moderator

Re: How to "loop" over a filter?

[ Edited ]

Hi,

 

i currently cannot run you proces, but i think you need to use an append before the join to get an example set again.

 

Edit: For the two attributes. Thats on our list to add. The Toolbox extension is a community like extension, even tough it is a rapidminer-interal community Smiley Happy. So far you need to go for Generate Attribute and Concat to do two attributes.

 

Loop Values with Filter Example is by the way also a viable option, but slightly slower in execution time. 

~Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Super Contributor

Re: How to "loop" over a filter?

Oh, okay. That's maybe because I have so many dataset etc.

But if I append now I have the same result as before. What I whant to do is to join every collection in this case e.g. 365 with another example set (which contains e.g. the name of the days of the week). So to append wouldn't be an option or?

Moderator

Re: How to "loop" over a filter?

I think you need to get the data set into your loop collection using remember recall. See attached process

 

With 1-2 more operators we could use a usual loop, with select operator. The standard loop has an additional input and is working in parallel. Quite some options to go there Smiley Happy.

 

Best,

Martin

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="85">
        <parameter key="csv_file" value="/Users/Philipp/Desktop/Tank_Muenster.csv"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="UTF-8"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="brand.true.polynominal.attribute"/>
          <parameter key="1" value="name.true.polynominal.attribute"/>
          <parameter key="2" value="Day.true.polynominal.attribute"/>
          <parameter key="3" value="Time.true.polynominal.attribute"/>
          <parameter key="4" value="street.true.polynominal.attribute"/>
          <parameter key="5" value="lat.true.real.attribute"/>
          <parameter key="6" value="lng.true.real.attribute"/>
          <parameter key="7" value="place.true.polynominal.attribute"/>
          <parameter key="8" value="post_code.true.integer.attribute"/>
          <parameter key="9" value="Benzin e5 in ¨.true.polynominal.attribute"/>
          <parameter key="10" value="Diesel in ¨.true.polynominal.attribute"/>
          <parameter key="11" value="stid.true.polynominal.attribute"/>
          <parameter key="12" value="TagdW.true.polynominal.attribute"/>
          <parameter key="13" value="Feiertag.true.polynominal.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="nominal_to_date" compatibility="7.3.001" expanded="true" height="82" name="Nominal to Date" width="90" x="179" y="85">
        <parameter key="attribute_name" value="Time"/>
        <parameter key="date_type" value="time"/>
        <parameter key="date_format" value="h:mm a"/>
        <parameter key="locale" value="German (Germany)"/>
      </operator>
      <operator activated="true" class="date_to_numerical" compatibility="7.3.001" expanded="true" height="82" name="Date to Numerical (2)" width="90" x="313" y="85">
        <parameter key="attribute_name" value="Time"/>
        <parameter key="time_unit" value="minute"/>
        <parameter key="minute_relative_to" value="day"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="7.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="85">
        <list key="function_descriptions">
          <parameter key="Grid" value="if(Time&gt;0&amp;&amp;Time&lt;=15,15,&#10;if(Time&gt;15&amp;&amp;Time&lt;=30,30,&#10;if(Time&gt;30&amp;&amp;Time&lt;=45,45,&#10;if(Time&gt;45&amp;&amp;Time&lt;=60,60,&#10;if(Time&gt;60&amp;&amp;Time&lt;=75,75,&#10;if(Time&gt;75&amp;&amp;Time&lt;=90,90,&#10;if(Time&gt;90&amp;&amp;Time&lt;=105,105,&#10;if(Time&gt;105&amp;&amp;Time&lt;=120,120,&#10;if(Time&gt;120&amp;&amp;Time&lt;=135,135,&#10;if(Time&gt;135&amp;&amp;Time&lt;=150,150,&#10;if(Time&gt;150&amp;&amp;Time&lt;=165,165,&#10;if(Time&gt;165&amp;&amp;Time&lt;=180,180,&#10;if(Time&gt;180&amp;&amp;Time&lt;=195,195,&#10;if(Time&gt;195&amp;&amp;Time&lt;=210,210,&#10;if(Time&gt;210&amp;&amp;Time&lt;=225,225,&#10;if(Time&gt;225&amp;&amp;Time&lt;=240,240,&#10;if(Time&gt;240&amp;&amp;Time&lt;=255,255,&#10;if(Time&gt;255&amp;&amp;Time&lt;=270,270,&#10;if(Time&gt;270&amp;&amp;Time&lt;=285,285,&#10;if(Time&gt;285&amp;&amp;Time&lt;=300,300,&#10;if(Time&gt;300&amp;&amp;Time&lt;=315,315,&#10;if(Time&gt;315&amp;&amp;Time&lt;=330,330,&#10;if(Time&gt;330&amp;&amp;Time&lt;=345,345,&#10;if(Time&gt;345&amp;&amp;Time&lt;=360,360,&#10;if(Time&gt;360&amp;&amp;Time&lt;=375,375,&#10;if(Time&gt;375&amp;&amp;Time&lt;=390,390,&#10;if(Time&gt;390&amp;&amp;Time&lt;=405,405,&#10;if(Time&gt;405&amp;&amp;Time&lt;=420,420,&#10;if(Time&gt;420&amp;&amp;Time&lt;=435,435,&#10;if(Time&gt;435&amp;&amp;Time&lt;=450,450,&#10;if(Time&gt;450&amp;&amp;Time&lt;=465,465,&#10;if(Time&gt;465&amp;&amp;Time&lt;=480,480,&#10;if(Time&gt;480&amp;&amp;Time&lt;=495,495,&#10;if(Time&gt;495&amp;&amp;Time&lt;=510,510,&#10;if(Time&gt;510&amp;&amp;Time&lt;=525,525,&#10;if(Time&gt;525&amp;&amp;Time&lt;=540,540,&#10;if(Time&gt;540&amp;&amp;Time&lt;=555,555,&#10;if(Time&gt;555&amp;&amp;Time&lt;=570,570,&#10;if(Time&gt;570&amp;&amp;Time&lt;=585,585,&#10;if(Time&gt;585&amp;&amp;Time&lt;=600,600,&#10;if(Time&gt;600&amp;&amp;Time&lt;=615,615,&#10;if(Time&gt;615&amp;&amp;Time&lt;=630,630,&#10;if(Time&gt;630&amp;&amp;Time&lt;=645,645,&#10;if(Time&gt;645&amp;&amp;Time&lt;=660,660,&#10;if(Time&gt;660&amp;&amp;Time&lt;=675,675,&#10;if(Time&gt;675&amp;&amp;Time&lt;=690,690,&#10;if(Time&gt;690&amp;&amp;Time&lt;=705,705,&#10;if(Time&gt;705&amp;&amp;Time&lt;=720,720,&#10;if(Time&gt;720&amp;&amp;Time&lt;=735,735,&#10;if(Time&gt;735&amp;&amp;Time&lt;=750,750,&#10;if(Time&gt;750&amp;&amp;Time&lt;=765,765,&#10;if(Time&gt;765&amp;&amp;Time&lt;=780,780,&#10;if(Time&gt;780&amp;&amp;Time&lt;=795,795,&#10;if(Time&gt;795&amp;&amp;Time&lt;=810,810,&#10;if(Time&gt;810&amp;&amp;Time&lt;=825,825,&#10;if(Time&gt;825&amp;&amp;Time&lt;=840,840,&#10;if(Time&gt;840&amp;&amp;Time&lt;=855,855,&#10;if(Time&gt;855&amp;&amp;Time&lt;=870,870,&#10;if(Time&gt;870&amp;&amp;Time&lt;=885,885,&#10;if(Time&gt;885&amp;&amp;Time&lt;=900,900,&#10;if(Time&gt;900&amp;&amp;Time&lt;=915,915,&#10;if(Time&gt;915&amp;&amp;Time&lt;=930,930,&#10;if(Time&gt;930&amp;&amp;Time&lt;=945,945,&#10;if(Time&gt;945&amp;&amp;Time&lt;=960,960,&#10;if(Time&gt;960&amp;&amp;Time&lt;=975,975,&#10;if(Time&gt;975&amp;&amp;Time&lt;=990,990,&#10;if(Time&gt;990&amp;&amp;Time&lt;=1005,1005,&#10;if(Time&gt;1005&amp;&amp;Time&lt;=1020,1020,&#10;if(Time&gt;1020&amp;&amp;Time&lt;=1035,1035,&#10;if(Time&gt;1035&amp;&amp;Time&lt;=1050,1050,&#10;if(Time&gt;1050&amp;&amp;Time&lt;=1065,1065,&#10;if(Time&gt;1065&amp;&amp;Time&lt;=1080,1080,&#10;if(Time&gt;1080&amp;&amp;Time&lt;=1095,1095,&#10;if(Time&gt;1095&amp;&amp;Time&lt;=1110,1110,&#10;if(Time&gt;1110&amp;&amp;Time&lt;=1125,1125,&#10;if(Time&gt;1125&amp;&amp;Time&lt;=1140,1140,&#10;if(Time&gt;1140&amp;&amp;Time&lt;=1155,1155,&#10;if(Time&gt;1155&amp;&amp;Time&lt;=1170,1170,&#10;if(Time&gt;1170&amp;&amp;Time&lt;=1185,1185,&#10;if(Time&gt;1185&amp;&amp;Time&lt;=1200,1200,&#10;if(Time&gt;1200&amp;&amp;Time&lt;=1215,1215,&#10;if(Time&gt;1215&amp;&amp;Time&lt;=1230,1230,&#10;if(Time&gt;1230&amp;&amp;Time&lt;=1245,1245,&#10;if(Time&gt;1245&amp;&amp;Time&lt;=1260,1260,&#10;if(Time&gt;1260&amp;&amp;Time&lt;=1275,1275,&#10;if(Time&gt;1275&amp;&amp;Time&lt;=1290,1290,&#10;if(Time&gt;1290&amp;&amp;Time&lt;=1305,1305,&#10;if(Time&gt;1305&amp;&amp;Time&lt;=1320,1320,&#10;if(Time&gt;1320&amp;&amp;Time&lt;=1335,1335,&#10;if(Time&gt;1335&amp;&amp;Time&lt;=1350,1350,&#10;if(Time&gt;1350&amp;&amp;Time&lt;=1365,1365,&#10;if(Time&gt;1365&amp;&amp;Time&lt;=1380,1380,&#10;if(Time&gt;1380&amp;&amp;Time&lt;=1395,1395,&#10;if(Time&gt;1395&amp;&amp;Time&lt;=1410,1410,&#10;if(Time&gt;1410&amp;&amp;Time&lt;=1425,1425,&#10;if(Time&gt;1425&amp;&amp;Time&lt;=1440,1440,666))))&#10;))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))"/>
        </list>
      </operator>
      <operator activated="true" class="numerical_to_real" compatibility="7.3.001" expanded="true" height="82" name="Numerical to Real" width="90" x="313" y="238">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Grid"/>
      </operator>
      <operator activated="true" class="read_excel" compatibility="7.3.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="391">
        <parameter key="excel_file" value="/Users/Philipp/Desktop/Zeit_.xlsx"/>
        <parameter key="imported_cell_range" value="A1:B97"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Time.true.time.attribute"/>
          <parameter key="1" value="Timegrid.true.time.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="date_to_numerical" compatibility="7.3.001" expanded="true" height="82" name="Date to Numerical" width="90" x="179" y="391">
        <parameter key="attribute_name" value="Time"/>
        <parameter key="time_unit" value="minute"/>
        <parameter key="minute_relative_to" value="day"/>
      </operator>
      <operator activated="true" class="date_to_numerical" compatibility="7.3.001" expanded="true" height="82" name="Date to Numerical (3)" width="90" x="313" y="391">
        <parameter key="attribute_name" value="Timegrid"/>
        <parameter key="time_unit" value="minute"/>
        <parameter key="minute_relative_to" value="day"/>
      </operator>
      <operator activated="true" class="numerical_to_real" compatibility="7.3.001" expanded="true" height="82" name="Numerical to Real (2)" width="90" x="447" y="391"/>
      <operator activated="true" class="remember" compatibility="7.3.001" expanded="true" height="68" name="Remember" width="90" x="581" y="391">
        <parameter key="name" value="data"/>
      </operator>
      <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="0.1.000" expanded="true" height="82" name="Group Into Collection (2)" width="90" x="447" y="238">
        <parameter key="group_by_attribute" value="Day"/>
      </operator>
      <operator activated="true" class="delay" compatibility="7.3.001" expanded="true" height="103" name="Delay" width="90" x="648" y="238">
        <parameter key="delay" value="none"/>
        <description align="center" color="transparent" colored="false" width="126">Just to ensure execution order</description>
      </operator>
      <operator activated="true" class="loop_collection" compatibility="7.3.001" expanded="true" height="82" name="Loop Collection" width="90" x="782" y="238">
        <process expanded="true">
          <operator activated="false" class="operator_toolbox:group_into_collection" compatibility="0.1.000" expanded="true" height="82" name="Group Into Collection" width="90" x="112" y="238">
            <parameter key="group_by_attribute" value="stid"/>
          </operator>
          <operator activated="true" class="recall" compatibility="7.3.001" expanded="true" height="68" name="Recall" width="90" x="112" y="85">
            <parameter key="name" value="data"/>
          </operator>
          <operator activated="true" class="join" compatibility="7.3.001" expanded="true" height="82" name="Join" width="90" x="246" y="34">
            <parameter key="remove_double_attributes" value="false"/>
            <parameter key="join_type" value="right"/>
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="Grid" value="Time"/>
            </list>
          </operator>
          <connect from_port="single" to_op="Join" to_port="left"/>
          <connect from_op="Recall" from_port="result" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_port="output 1"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
      <connect from_op="Nominal to Date" from_port="example set output" to_op="Date to Numerical (2)" to_port="example set input"/>
      <connect from_op="Date to Numerical (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Numerical to Real" to_port="example set input"/>
      <connect from_op="Numerical to Real" from_port="example set output" to_op="Group Into Collection (2)" to_port="exa"/>
      <connect from_op="Read Excel" from_port="output" to_op="Date to Numerical" to_port="example set input"/>
      <connect from_op="Date to Numerical" from_port="example set output" to_op="Date to Numerical (3)" to_port="example set input"/>
      <connect from_op="Date to Numerical (3)" from_port="example set output" to_op="Numerical to Real (2)" to_port="example set input"/>
      <connect from_op="Numerical to Real (2)" from_port="example set output" to_op="Remember" to_port="store"/>
      <connect from_op="Remember" from_port="stored" to_op="Delay" to_port="through 2"/>
      <connect from_op="Group Into Collection (2)" from_port="col" to_op="Delay" to_port="through 1"/>
      <connect from_op="Delay" from_port="through 1" to_op="Loop Collection" to_port="collection"/>
      <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Super Contributor

Re: How to "loop" over a filter?

I think we are near the finish line. Thank you for your process, that looks like it can work. But there is one error message occuring in the recall "no object with name data was found" despite we set it "data" in remember operator. 

Moderator

Re: How to "loop" over a filter?

[ Edited ]

Could you check if the remember operator is executed before the recall?

 

See: http://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Change-the-Execution-Order-of-Pr...

 

 

 

Best,

Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Super Contributor

Re: How to "loop" over a filter?

Thanks for your fast response.

According to RapidMiner it is definitely executed before the recall.