How to "loop" over a filter?

eldenosoeldenoso Member Posts: 65 Contributor I
edited November 2018 in Help

Hey everybody,

I have a dataset containing the hours of a day of the whole year. What I want to do is to filter each day. Obviously doing that manually would be very hard, as I had to do that 365 times. Is there a way to somehow loop this thing?

Thanks :)

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    I think you need to get the data set into your loop collection using remember recall. See attached process

     

    With 1-2 more operators we could use a usual loop, with select operator. The standard loop has an additional input and is working in parallel. Quite some options to go there :).

     

    Best,

    Martin

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="85">
    <parameter key="csv_file" value="/Users/Philipp/Desktop/Tank_Muenster.csv"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="UTF-8"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="brand.true.polynominal.attribute"/>
    <parameter key="1" value="name.true.polynominal.attribute"/>
    <parameter key="2" value="Day.true.polynominal.attribute"/>
    <parameter key="3" value="Time.true.polynominal.attribute"/>
    <parameter key="4" value="street.true.polynominal.attribute"/>
    <parameter key="5" value="lat.true.real.attribute"/>
    <parameter key="6" value="lng.true.real.attribute"/>
    <parameter key="7" value="place.true.polynominal.attribute"/>
    <parameter key="8" value="post_code.true.integer.attribute"/>
    <parameter key="9" value="Benzin e5 in ¨.true.polynominal.attribute"/>
    <parameter key="10" value="Diesel in ¨.true.polynominal.attribute"/>
    <parameter key="11" value="stid.true.polynominal.attribute"/>
    <parameter key="12" value="TagdW.true.polynominal.attribute"/>
    <parameter key="13" value="Feiertag.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="nominal_to_date" compatibility="7.3.001" expanded="true" height="82" name="Nominal to Date" width="90" x="179" y="85">
    <parameter key="attribute_name" value="Time"/>
    <parameter key="date_type" value="time"/>
    <parameter key="date_format" value="h:mm a"/>
    <parameter key="locale" value="German (Germany)"/>
    </operator>
    <operator activated="true" class="date_to_numerical" compatibility="7.3.001" expanded="true" height="82" name="Date to Numerical (2)" width="90" x="313" y="85">
    <parameter key="attribute_name" value="Time"/>
    <parameter key="time_unit" value="minute"/>
    <parameter key="minute_relative_to" value="day"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="85">
    <list key="function_descriptions">
    <parameter key="Grid" value="if(Time&gt;0&amp;&amp;Time&lt;=15,15,&#10;if(Time&gt;15&amp;&amp;Time&lt;=30,30,&#10;if(Time&gt;30&amp;&amp;Time&lt;=45,45,&#10;if(Time&gt;45&amp;&amp;Time&lt;=60,60,&#10;if(Time&gt;60&amp;&amp;Time&lt;=75,75,&#10;if(Time&gt;75&amp;&amp;Time&lt;=90,90,&#10;if(Time&gt;90&amp;&amp;Time&lt;=105,105,&#10;if(Time&gt;105&amp;&amp;Time&lt;=120,120,&#10;if(Time&gt;120&amp;&amp;Time&lt;=135,135,&#10;if(Time&gt;135&amp;&amp;Time&lt;=150,150,&#10;if(Time&gt;150&amp;&amp;Time&lt;=165,165,&#10;if(Time&gt;165&amp;&amp;Time&lt;=180,180,&#10;if(Time&gt;180&amp;&amp;Time&lt;=195,195,&#10;if(Time&gt;195&amp;&amp;Time&lt;=210,210,&#10;if(Time&gt;210&amp;&amp;Time&lt;=225,225,&#10;if(Time&gt;225&amp;&amp;Time&lt;=240,240,&#10;if(Time&gt;240&amp;&amp;Time&lt;=255,255,&#10;if(Time&gt;255&amp;&amp;Time&lt;=270,270,&#10;if(Time&gt;270&amp;&amp;Time&lt;=285,285,&#10;if(Time&gt;285&amp;&amp;Time&lt;=300,300,&#10;if(Time&gt;300&amp;&amp;Time&lt;=315,315,&#10;if(Time&gt;315&amp;&amp;Time&lt;=330,330,&#10;if(Time&gt;330&amp;&amp;Time&lt;=345,345,&#10;if(Time&gt;345&amp;&amp;Time&lt;=360,360,&#10;if(Time&gt;360&amp;&amp;Time&lt;=375,375,&#10;if(Time&gt;375&amp;&amp;Time&lt;=390,390,&#10;if(Time&gt;390&amp;&amp;Time&lt;=405,405,&#10;if(Time&gt;405&amp;&amp;Time&lt;=420,420,&#10;if(Time&gt;420&amp;&amp;Time&lt;=435,435,&#10;if(Time&gt;435&amp;&amp;Time&lt;=450,450,&#10;if(Time&gt;450&amp;&amp;Time&lt;=465,465,&#10;if(Time&gt;465&amp;&amp;Time&lt;=480,480,&#10;if(Time&gt;480&amp;&amp;Time&lt;=495,495,&#10;if(Time&gt;495&amp;&amp;Time&lt;=510,510,&#10;if(Time&gt;510&amp;&amp;Time&lt;=525,525,&#10;if(Time&gt;525&amp;&amp;Time&lt;=540,540,&#10;if(Time&gt;540&amp;&amp;Time&lt;=555,555,&#10;if(Time&gt;555&amp;&amp;Time&lt;=570,570,&#10;if(Time&gt;570&amp;&amp;Time&lt;=585,585,&#10;if(Time&gt;585&amp;&amp;Time&lt;=600,600,&#10;if(Time&gt;600&amp;&amp;Time&lt;=615,615,&#10;if(Time&gt;615&amp;&amp;Time&lt;=630,630,&#10;if(Time&gt;630&amp;&amp;Time&lt;=645,645,&#10;if(Time&gt;645&amp;&amp;Time&lt;=660,660,&#10;if(Time&gt;660&amp;&amp;Time&lt;=675,675,&#10;if(Time&gt;675&amp;&amp;Time&lt;=690,690,&#10;if(Time&gt;690&amp;&amp;Time&lt;=705,705,&#10;if(Time&gt;705&amp;&amp;Time&lt;=720,720,&#10;if(Time&gt;720&amp;&amp;Time&lt;=735,735,&#10;if(Time&gt;735&amp;&amp;Time&lt;=750,750,&#10;if(Time&gt;750&amp;&amp;Time&lt;=765,765,&#10;if(Time&gt;765&amp;&amp;Time&lt;=780,780,&#10;if(Time&gt;780&amp;&amp;Time&lt;=795,795,&#10;if(Time&gt;795&amp;&amp;Time&lt;=810,810,&#10;if(Time&gt;810&amp;&amp;Time&lt;=825,825,&#10;if(Time&gt;825&amp;&amp;Time&lt;=840,840,&#10;if(Time&gt;840&amp;&amp;Time&lt;=855,855,&#10;if(Time&gt;855&amp;&amp;Time&lt;=870,870,&#10;if(Time&gt;870&amp;&amp;Time&lt;=885,885,&#10;if(Time&gt;885&amp;&amp;Time&lt;=900,900,&#10;if(Time&gt;900&amp;&amp;Time&lt;=915,915,&#10;if(Time&gt;915&amp;&amp;Time&lt;=930,930,&#10;if(Time&gt;930&amp;&amp;Time&lt;=945,945,&#10;if(Time&gt;945&amp;&amp;Time&lt;=960,960,&#10;if(Time&gt;960&amp;&amp;Time&lt;=975,975,&#10;if(Time&gt;975&amp;&amp;Time&lt;=990,990,&#10;if(Time&gt;990&amp;&amp;Time&lt;=1005,1005,&#10;if(Time&gt;1005&amp;&amp;Time&lt;=1020,1020,&#10;if(Time&gt;1020&amp;&amp;Time&lt;=1035,1035,&#10;if(Time&gt;1035&amp;&amp;Time&lt;=1050,1050,&#10;if(Time&gt;1050&amp;&amp;Time&lt;=1065,1065,&#10;if(Time&gt;1065&amp;&amp;Time&lt;=1080,1080,&#10;if(Time&gt;1080&amp;&amp;Time&lt;=1095,1095,&#10;if(Time&gt;1095&amp;&amp;Time&lt;=1110,1110,&#10;if(Time&gt;1110&amp;&amp;Time&lt;=1125,1125,&#10;if(Time&gt;1125&amp;&amp;Time&lt;=1140,1140,&#10;if(Time&gt;1140&amp;&amp;Time&lt;=1155,1155,&#10;if(Time&gt;1155&amp;&amp;Time&lt;=1170,1170,&#10;if(Time&gt;1170&amp;&amp;Time&lt;=1185,1185,&#10;if(Time&gt;1185&amp;&amp;Time&lt;=1200,1200,&#10;if(Time&gt;1200&amp;&amp;Time&lt;=1215,1215,&#10;if(Time&gt;1215&amp;&amp;Time&lt;=1230,1230,&#10;if(Time&gt;1230&amp;&amp;Time&lt;=1245,1245,&#10;if(Time&gt;1245&amp;&amp;Time&lt;=1260,1260,&#10;if(Time&gt;1260&amp;&amp;Time&lt;=1275,1275,&#10;if(Time&gt;1275&amp;&amp;Time&lt;=1290,1290,&#10;if(Time&gt;1290&amp;&amp;Time&lt;=1305,1305,&#10;if(Time&gt;1305&amp;&amp;Time&lt;=1320,1320,&#10;if(Time&gt;1320&amp;&amp;Time&lt;=1335,1335,&#10;if(Time&gt;1335&amp;&amp;Time&lt;=1350,1350,&#10;if(Time&gt;1350&amp;&amp;Time&lt;=1365,1365,&#10;if(Time&gt;1365&amp;&amp;Time&lt;=1380,1380,&#10;if(Time&gt;1380&amp;&amp;Time&lt;=1395,1395,&#10;if(Time&gt;1395&amp;&amp;Time&lt;=1410,1410,&#10;if(Time&gt;1410&amp;&amp;Time&lt;=1425,1425,&#10;if(Time&gt;1425&amp;&amp;Time&lt;=1440,1440,666))))&#10;))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))"/>
    </list>
    </operator>
    <operator activated="true" class="numerical_to_real" compatibility="7.3.001" expanded="true" height="82" name="Numerical to Real" width="90" x="313" y="238">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Grid"/>
    </operator>
    <operator activated="true" class="read_excel" compatibility="7.3.001" expanded="true" height="68" name="Read Excel" width="90" x="45" y="391">
    <parameter key="excel_file" value="/Users/Philipp/Desktop/Zeit_.xlsx"/>
    <parameter key="imported_cell_range" value="A1:B97"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Time.true.time.attribute"/>
    <parameter key="1" value="Timegrid.true.time.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="date_to_numerical" compatibility="7.3.001" expanded="true" height="82" name="Date to Numerical" width="90" x="179" y="391">
    <parameter key="attribute_name" value="Time"/>
    <parameter key="time_unit" value="minute"/>
    <parameter key="minute_relative_to" value="day"/>
    </operator>
    <operator activated="true" class="date_to_numerical" compatibility="7.3.001" expanded="true" height="82" name="Date to Numerical (3)" width="90" x="313" y="391">
    <parameter key="attribute_name" value="Timegrid"/>
    <parameter key="time_unit" value="minute"/>
    <parameter key="minute_relative_to" value="day"/>
    </operator>
    <operator activated="true" class="numerical_to_real" compatibility="7.3.001" expanded="true" height="82" name="Numerical to Real (2)" width="90" x="447" y="391"/>
    <operator activated="true" class="remember" compatibility="7.3.001" expanded="true" height="68" name="Remember" width="90" x="581" y="391">
    <parameter key="name" value="data"/>
    </operator>
    <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="0.1.000" expanded="true" height="82" name="Group Into Collection (2)" width="90" x="447" y="238">
    <parameter key="group_by_attribute" value="Day"/>
    </operator>
    <operator activated="true" class="delay" compatibility="7.3.001" expanded="true" height="103" name="Delay" width="90" x="648" y="238">
    <parameter key="delay" value="none"/>
    <description align="center" color="transparent" colored="false" width="126">Just to ensure execution order</description>
    </operator>
    <operator activated="true" class="loop_collection" compatibility="7.3.001" expanded="true" height="82" name="Loop Collection" width="90" x="782" y="238">
    <process expanded="true">
    <operator activated="false" class="operator_toolbox:group_into_collection" compatibility="0.1.000" expanded="true" height="82" name="Group Into Collection" width="90" x="112" y="238">
    <parameter key="group_by_attribute" value="stid"/>
    </operator>
    <operator activated="true" class="recall" compatibility="7.3.001" expanded="true" height="68" name="Recall" width="90" x="112" y="85">
    <parameter key="name" value="data"/>
    </operator>
    <operator activated="true" class="join" compatibility="7.3.001" expanded="true" height="82" name="Join" width="90" x="246" y="34">
    <parameter key="remove_double_attributes" value="false"/>
    <parameter key="join_type" value="right"/>
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="Grid" value="Time"/>
    </list>
    </operator>
    <connect from_port="single" to_op="Join" to_port="left"/>
    <connect from_op="Recall" from_port="result" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_port="output 1"/>
    <portSpacing port="source_single" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
    <connect from_op="Nominal to Date" from_port="example set output" to_op="Date to Numerical (2)" to_port="example set input"/>
    <connect from_op="Date to Numerical (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Numerical to Real" to_port="example set input"/>
    <connect from_op="Numerical to Real" from_port="example set output" to_op="Group Into Collection (2)" to_port="exa"/>
    <connect from_op="Read Excel" from_port="output" to_op="Date to Numerical" to_port="example set input"/>
    <connect from_op="Date to Numerical" from_port="example set output" to_op="Date to Numerical (3)" to_port="example set input"/>
    <connect from_op="Date to Numerical (3)" from_port="example set output" to_op="Numerical to Real (2)" to_port="example set input"/>
    <connect from_op="Numerical to Real (2)" from_port="example set output" to_op="Remember" to_port="store"/>
    <connect from_op="Remember" from_port="stored" to_op="Delay" to_port="through 2"/>
    <connect from_op="Group Into Collection (2)" from_port="col" to_op="Delay" to_port="through 1"/>
    <connect from_op="Delay" from_port="through 1" to_op="Loop Collection" to_port="collection"/>
    <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hey,

     

    loop values would do the job. Maybe our new Group Into Collection operator from the Operator Toolbox is even better, it gives you a collection with an example set per day. You can work with Loop Collection trhough the days.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • eldenosoeldenoso Member Posts: 65 Contributor I

    Thanks for your reply,

    that sounds pretty good. But could you specify? I downloaded the operator toolbox, but as soon as I put the Group into Collection operator into the Loop Collection operator the Error Message "Expected IOObjectCollection but received Examples set" occurs. Since I would call myself a newbie I would be grateful if you could provide me how to do so :-).

    Regards
    Philipp

  • eldenosoeldenoso Member Posts: 65 Contributor I

    And besides that is it possible to group by 2 attributes?

    _______________________________________________
    Okay, I solved this by putting another Group into Collection operator into the loop collection?! Now the problem occured that I can't join a collection with another dataset?

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    i currently cannot run you proces, but i think you need to use an append before the join to get an example set again.

     

    Edit: For the two attributes. Thats on our list to add. The Toolbox extension is a community like extension, even tough it is a rapidminer-interal community :). So far you need to go for Generate Attribute and Concat to do two attributes.

     

    Loop Values with Filter Example is by the way also a viable option, but slightly slower in execution time. 

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • eldenosoeldenoso Member Posts: 65 Contributor I

    Oh, okay. That's maybe because I have so many dataset etc.

    But if I append now I have the same result as before. What I whant to do is to join every collection in this case e.g. 365 with another example set (which contains e.g. the name of the days of the week). So to append wouldn't be an option or?

  • eldenosoeldenoso Member Posts: 65 Contributor I

    I think we are near the finish line. Thank you for your process, that looks like it can work. But there is one error message occuring in the recall "no object with name data was found" despite we set it "data" in remember operator. 

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Could you check if the remember operator is executed before the recall?

     

    See: http://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/Change-the-Execution-Order-of-Processes/ta-p/31780

     

     

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • eldenosoeldenoso Member Posts: 65 Contributor I

    Thanks for your fast response.

    According to RapidMiner it is definitely executed before the recall.

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Got it, Could you remove the remove from store option in recall. Otherwise it's not available in iteration 2. Sorry for this.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • eldenosoeldenoso Member Posts: 65 Contributor I

    If I "remove from store" to negative it works :-). Is that plausible?

  • eldenosoeldenoso Member Posts: 65 Contributor I

    Okay, I did it parallel. Thank you very much for this long discussion and helpful answers! Process works fine now :smileyhappy:

    Best regards 

    Philipp

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Yes,

     

    usually the objects are deleted once you recall them. This is to safe memory. In your special case you do not want to have it deleted. if you deactivate this option it's deleted once your process finishes.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • eldenosoeldenoso Member Posts: 65 Contributor I

    Hello again,

    I have a question concerning the metadata, because if I want to apply replace missing values (series) on each IOObject I can't pick them in the dropdown of the operator. :smileyfrustrated:

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    you an simply type in the attributes by hand. It works anyway.

     

    I think we need to investigate our meta data propagation there. But maybe it's just fine to take the meta data from Last execution (under Process).

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • eldenosoeldenoso Member Posts: 65 Contributor I

    Thank you that also worked! :-)

    Now that I wanted to do two collections (2 attributes) I created another collection of the collection. The Input of the join operator then says that it's the wrong input type.

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    do you want to group by two attributes? If so, then first built an indicator variable like concat(att1,att2) and then do one grouping.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • eldenosoeldenoso Member Posts: 65 Contributor I

    Okay that worked. Thank you! I think it's the routine, which hopefully lets me find this kind of solutions, too. 

    The whole process is finished now. It works fine, but is would there be a way to create or rather get back the meta data? It took me some typing to manually write all attribute names in a couple of different operators. 

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

    usually Process->Synchronize Data with Real Data should do the job.

     

    Propagating meta data from recalls in complex loops is kind of difficult..

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.