RapidMiner

Data Preperation--pick the 1st,2nd,3rd and 4th Higest values

Contributor II sunnyal
Contributor II

Data Preperation--pick the 1st,2nd,3rd and 4th Higest values

Mu source table.. I have multiple dates here.. showing only one sample date

Date_Time Load
7/24/17 12:00 AM 3987
7/24/17 1:00 AM 3748
7/24/17 2:00 AM 3608
7/24/17 3:00 AM 3526
7/24/17 4:00 AM 3493
7/24/17 5:00 AM 3545
7/24/17 6:00 AM 3683
7/24/17 7:00 AM 3827
7/24/17 8:00 AM 3942
7/24/17 9:00 AM 3956
7/24/17 10:00 AM 3985
7/24/17 11:00 AM 4000
7/24/17 12:00 PM 3917
7/24/17 1:00 PM 3834
7/24/17 2:00 PM 3901
7/24/17 3:00 PM 4132
7/24/17 4:00 PM 4388
7/24/17 5:00 PM 4497
7/24/17 6:00 PM 4675
7/24/17 7:00 PM 4713
7/24/17 8:00 PM 4743
7/24/17 9:00 PM 4704
7/24/17 10:00 PM 4540
7/24/17 11:00 PM 4227

 

My Data set needs to be transformed to the below. We need to pick the 1st,2nd,3rd and 4th Higest load values and mark them accordingly

Desired Table

Date_Time Load Peak-1 Peak-2 Peak-3 Peak-4
7/24/17 12:00 AM 3987        
7/24/17 1:00 AM 3748        
7/24/17 2:00 AM 3608        
7/24/17 3:00 AM 3526        
7/24/17 4:00 AM 3493        
7/24/17 5:00 AM 3545        
7/24/17 6:00 AM 3683        
7/24/17 7:00 AM 3827        
7/24/17 8:00 AM 3942        
7/24/17 9:00 AM 3956        
7/24/17 10:00 AM 3985        
7/24/17 11:00 AM 4000        
7/24/17 12:00 PM 3917        
7/24/17 1:00 PM 3834        
7/24/17 2:00 PM 3901        
7/24/17 3:00 PM 4132        
7/24/17 4:00 PM 4388        
7/24/17 5:00 PM 4497        
7/24/17 6:00 PM 4675       1
7/24/17 7:00 PM 4713   1    
7/24/17 8:00 PM 4743 1      
7/24/17 9:00 PM 4704     1  
7/24/17 10:00 PM 4540        
7/24/17 11:00 PM 4227        
2 REPLIES
RM Staff
RM Staff

Re: Data Preperation--pick the 1st,2nd,3rd and 4th Higest values

Hi,

the attached process should do the trikc. Not sure if you can do this with less opertors.

 

Best,

Martin

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="subprocess" compatibility="7.5.003" expanded="true" height="82" name="Subprocess" width="90" x="45" y="136">
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="7.5.003" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
            <parameter key="number_of_attributes" value="1"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
            <list key="function_descriptions">
              <parameter key="date" value="date_add(date_now(),round(1000*rand()),DATE_UNIT_DAY)"/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="7.5.003" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="label"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="sort" compatibility="7.5.003" expanded="true" height="82" name="Sort" width="90" x="581" y="34">
            <parameter key="attribute_name" value="date"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Sort" to_port="example set input"/>
          <connect from_op="Sort" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Generate Dummy Data&lt;br/&gt;</description>
      </operator>
      <operator activated="true" class="sort" compatibility="7.5.003" expanded="true" height="82" name="Sort (2)" width="90" x="179" y="136">
        <parameter key="attribute_name" value="att1"/>
        <parameter key="sorting_direction" value="decreasing"/>
        <description align="center" color="transparent" colored="false" width="126">sort by att1</description>
      </operator>
      <operator activated="true" class="extract_macro" compatibility="7.5.003" expanded="true" height="68" name="Extract Macro" width="90" x="313" y="136">
        <parameter key="macro" value="noEx"/>
        <list key="additional_macros"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="7.5.003" expanded="true" height="103" name="Multiply" width="90" x="447" y="136"/>
      <operator activated="true" class="filter_example_range" compatibility="7.5.003" expanded="true" height="82" name="Filter Example Range (2)" width="90" x="581" y="187">
        <parameter key="first_example" value="5"/>
        <parameter key="last_example" value="%{noEx}"/>
        <description align="center" color="transparent" colored="false" width="126">other</description>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="715" y="187">
        <list key="function_descriptions">
          <parameter key="indicator" value="0"/>
        </list>
      </operator>
      <operator activated="true" class="filter_example_range" compatibility="7.5.003" expanded="true" height="82" name="Filter Example Range" width="90" x="581" y="34">
        <parameter key="first_example" value="1"/>
        <parameter key="last_example" value="4"/>
        <description align="center" color="transparent" colored="false" width="126">top4&lt;br/&gt;</description>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="7.5.003" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="715" y="34">
        <list key="function_descriptions">
          <parameter key="indicator" value="1"/>
        </list>
      </operator>
      <operator activated="true" class="append" compatibility="7.5.003" expanded="true" height="103" name="Append" width="90" x="849" y="85"/>
      <operator activated="true" class="sort" compatibility="7.5.003" expanded="true" height="82" name="Sort (3)" width="90" x="983" y="85">
        <description align="center" color="transparent" colored="false" width="126">Sort back</description>
      </operator>
      <connect from_op="Subprocess" from_port="out 1" to_op="Sort (2)" to_port="example set input"/>
      <connect from_op="Sort (2)" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
      <connect from_op="Extract Macro" from_port="example set" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Filter Example Range" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Filter Example Range (2)" to_port="example set input"/>
      <connect from_op="Filter Example Range (2)" from_port="example set output" to_op="Generate Attributes (3)" to_port="example set input"/>
      <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Append" to_port="example set 2"/>
      <connect from_op="Filter Example Range" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
      <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 1"/>
      <connect from_op="Append" from_port="merged set" to_op="Sort (3)" to_port="example set input"/>
      <connect from_op="Sort (3)" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Highlighted
RM Certified Expert
RM Certified Expert

Re: Data Preperation--pick the 1st,2nd,3rd and 4th Higest values

Interesting problem.  Your data structure requirements are somewhat unsual, can you explain more why you need this particular structure?  

 

Your orginal post is somewhat ambiguous, do you want the 4 peak loads defined for each date in your dataset, or do you want the peaks across all dates?  Martin's example process will do it across all dates, but if you want it for each date then you would need modifications.

 

Not knowing exactly what your specified data structure is meant to convey, I myself would probably approach this differently. If the times are recorded at regular intervals then I would probably pivot this data grouping by date and making the time the index value.  That will give you a dataset with each row as a unique date and each time as a column with the respective load value.  From that point you could do additional transformations to get the 4 peaks for each row.  But what is the purpose of this transformation?

 

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts