Data Preperation--pick the 1st,2nd,3rd and 4th Higest values

sunnyalsunnyal Member Posts: 44 Contributor II
edited December 2018 in Help

Mu source table.. I have multiple dates here.. showing only one sample date

Date_Time Load
7/24/17 12:00 AM 3987
7/24/17 1:00 AM 3748
7/24/17 2:00 AM 3608
7/24/17 3:00 AM 3526
7/24/17 4:00 AM 3493
7/24/17 5:00 AM 3545
7/24/17 6:00 AM 3683
7/24/17 7:00 AM 3827
7/24/17 8:00 AM 3942
7/24/17 9:00 AM 3956
7/24/17 10:00 AM 3985
7/24/17 11:00 AM 4000
7/24/17 12:00 PM 3917
7/24/17 1:00 PM 3834
7/24/17 2:00 PM 3901
7/24/17 3:00 PM 4132
7/24/17 4:00 PM 4388
7/24/17 5:00 PM 4497
7/24/17 6:00 PM 4675
7/24/17 7:00 PM 4713
7/24/17 8:00 PM 4743
7/24/17 9:00 PM 4704
7/24/17 10:00 PM 4540
7/24/17 11:00 PM 4227

 

My Data set needs to be transformed to the below. We need to pick the 1st,2nd,3rd and 4th Higest load values and mark them accordingly

Desired Table

Date_Time Load Peak-1 Peak-2 Peak-3 Peak-4
7/24/17 12:00 AM 3987        
7/24/17 1:00 AM 3748        
7/24/17 2:00 AM 3608        
7/24/17 3:00 AM 3526        
7/24/17 4:00 AM 3493        
7/24/17 5:00 AM 3545        
7/24/17 6:00 AM 3683        
7/24/17 7:00 AM 3827        
7/24/17 8:00 AM 3942        
7/24/17 9:00 AM 3956        
7/24/17 10:00 AM 3985        
7/24/17 11:00 AM 4000        
7/24/17 12:00 PM 3917        
7/24/17 1:00 PM 3834        
7/24/17 2:00 PM 3901        
7/24/17 3:00 PM 4132        
7/24/17 4:00 PM 4388        
7/24/17 5:00 PM 4497        
7/24/17 6:00 PM 4675       1
7/24/17 7:00 PM 4713   1    
7/24/17 8:00 PM 4743 1      
7/24/17 9:00 PM 4704     1  
7/24/17 10:00 PM 4540        
7/24/17 11:00 PM 4227        
Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

    the attached process should do the trikc. Not sure if you can do this with less opertors.

     

    Best,

    Martin

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="subprocess" compatibility="7.5.003" expanded="true" height="82" name="Subprocess" width="90" x="45" y="136">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="7.5.003" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
    <parameter key="number_of_attributes" value="1"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
    <list key="function_descriptions">
    <parameter key="date" value="date_add(date_now(),round(1000*rand()),DATE_UNIT_DAY)"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.003" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="label"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="sort" compatibility="7.5.003" expanded="true" height="82" name="Sort" width="90" x="581" y="34">
    <parameter key="attribute_name" value="date"/>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Generate Dummy Data&lt;br/&gt;</description>
    </operator>
    <operator activated="true" class="sort" compatibility="7.5.003" expanded="true" height="82" name="Sort (2)" width="90" x="179" y="136">
    <parameter key="attribute_name" value="att1"/>
    <parameter key="sorting_direction" value="decreasing"/>
    <description align="center" color="transparent" colored="false" width="126">sort by att1</description>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="7.5.003" expanded="true" height="68" name="Extract Macro" width="90" x="313" y="136">
    <parameter key="macro" value="noEx"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.5.003" expanded="true" height="103" name="Multiply" width="90" x="447" y="136"/>
    <operator activated="true" class="filter_example_range" compatibility="7.5.003" expanded="true" height="82" name="Filter Example Range (2)" width="90" x="581" y="187">
    <parameter key="first_example" value="5"/>
    <parameter key="last_example" value="%{noEx}"/>
    <description align="center" color="transparent" colored="false" width="126">other</description>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="715" y="187">
    <list key="function_descriptions">
    <parameter key="indicator" value="0"/>
    </list>
    </operator>
    <operator activated="true" class="filter_example_range" compatibility="7.5.003" expanded="true" height="82" name="Filter Example Range" width="90" x="581" y="34">
    <parameter key="first_example" value="1"/>
    <parameter key="last_example" value="4"/>
    <description align="center" color="transparent" colored="false" width="126">top4&lt;br/&gt;</description>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="715" y="34">
    <list key="function_descriptions">
    <parameter key="indicator" value="1"/>
    </list>
    </operator>
    <operator activated="true" class="append" compatibility="7.5.003" expanded="true" height="103" name="Append" width="90" x="849" y="85"/>
    <operator activated="true" class="sort" compatibility="7.5.003" expanded="true" height="82" name="Sort (3)" width="90" x="983" y="85">
    <description align="center" color="transparent" colored="false" width="126">Sort back</description>
    </operator>
    <connect from_op="Subprocess" from_port="out 1" to_op="Sort (2)" to_port="example set input"/>
    <connect from_op="Sort (2)" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Filter Example Range" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Filter Example Range (2)" to_port="example set input"/>
    <connect from_op="Filter Example Range (2)" from_port="example set output" to_op="Generate Attributes (3)" to_port="example set input"/>
    <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Filter Example Range" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_op="Sort (3)" to_port="example set input"/>
    <connect from_op="Sort (3)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Interesting problem.  Your data structure requirements are somewhat unsual, can you explain more why you need this particular structure?  

     

    Your orginal post is somewhat ambiguous, do you want the 4 peak loads defined for each date in your dataset, or do you want the peaks across all dates?  Martin's example process will do it across all dates, but if you want it for each date then you would need modifications.

     

    Not knowing exactly what your specified data structure is meant to convey, I myself would probably approach this differently. If the times are recorded at regular intervals then I would probably pivot this data grouping by date and making the time the index value.  That will give you a dataset with each row as a unique date and each time as a column with the respective load value.  From that point you could do additional transformations to get the 4 peaks for each row.  But what is the purpose of this transformation?

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.