Multi-level sorting

2015512701120155127011 Member Posts: 1 Learner I
edited November 2018 in Help

I have a table something like this:

Column A Column B Column C
Jane Doe A1 1
Jane Doe B2 2
John Doe A1 1
John Doe A2 2
John Doe B1 1

 

I need to sort this table firstly Column A then Column B then Column C.  I can do this in Excel like this:a.png

 

I tried to chain multiple sort operators but it didn't work.

Answers

  • jreinosojreinoso RapidMiner Certified Analyst, Member Posts: 5 Contributor II
    Hi.  I tried the Jackhammer sort(advanced) operator, but it seems to NOT work if one of the attributes is date datatype.
  • hbajpaihbajpai Member Posts: 102 Unicorn
    Hey @jreinoso,

    If you are familiar with Python scripting in rapidminer, you can do achieve the operation in a single line of code. 

    Let's say you have this dataset,


    and use use pandas sort_values function.
    data.sort_values(['Date','ColA', 'ColB'], ascending = [True, False, True], inplace = True)
    You will get the following result.


    Check out the demo XML.
    <?xml version="1.0" encoding="UTF-8"?><process version="9.7.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.7.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="34">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="Date, ColA, ColB&#10;2020/05/05, 45, 20&#10;2020/05/05, 415, 2&#10;2020/05/05, 415, 0&#10;2020/05/03, -5, 6&#10;2020/05/08, 4, 8&#10;2020/05/15, 32, 9&#10;2020/05/08, 4, 8&#10;2020/05/08, -9, 21&#10;2020/05/08, 41, 8"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="nominal_to_date" compatibility="9.7.000" expanded="true" height="82" name="Nominal to Date" width="90" x="246" y="34">
            <parameter key="attribute_name" value="Date"/>
            <parameter key="date_type" value="date"/>
            <parameter key="date_format" value="yyyy/MM/dd"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="keep_old_attribute" value="false"/>
          </operator>
          <operator activated="true" class="python_scripting:execute_python" compatibility="9.6.000" expanded="true" height="103" name="Execute Python" width="90" x="447" y="34">
            <parameter key="script" value="import pandas&#10;&#10;def rm_main(data):&#10;    data.sort_values(['Date','ColA', 'ColB'], ascending = [True, False, True], inplace = True)&#10;    return data"/>
            <parameter key="notebook_cell_tag_filter" value=""/>
            <parameter key="use_default_python" value="true"/>
            <parameter key="package_manager" value="conda (anaconda)"/>
            <parameter key="use_macros" value="false"/>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
          <connect from_op="Nominal to Date" from_port="example set output" to_op="Execute Python" to_port="input 1"/>
          <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


    Best,
    Harshit
  • dyoolyoosdyoolyoos Member Posts: 6 Contributor II
    edited September 2020
    Sorry wrong initial comment. Was looking for how to manually change the order of nominal labels, say, in a heat map. (may be out of topic)
Sign In or Register to comment.