Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Multi-level sorting

2015512701120155127011 Member Posts: 1 Learner I
edited November 2018 in Help

I have a table something like this:

Column A Column B Column C
Jane Doe A1 1
Jane Doe B2 2
John Doe A1 1
John Doe A2 2
John Doe B1 1

 

I need to sort this table firstly Column A then Column B then Column C.  I can do this in Excel like this:a.png

 

I tried to chain multiple sort operators but it didn't work.

Answers

  • jreinosojreinoso RapidMiner Certified Analyst, Member Posts: 5 Contributor II
    Hi.  I tried the Jackhammer sort(advanced) operator, but it seems to NOT work if one of the attributes is date datatype.
  • hbajpaihbajpai Member Posts: 102 Unicorn
    Hey @jreinoso,

    If you are familiar with Python scripting in rapidminer, you can do achieve the operation in a single line of code. 

    Let's say you have this dataset,


    and use use pandas sort_values function.
    data.sort_values(['Date','ColA', 'ColB'], ascending = [True, False, True], inplace = True)
    You will get the following result.


    Check out the demo XML.
    <?xml version="1.0" encoding="UTF-8"?><process version="9.7.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.7.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="34">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="Date, ColA, ColB&#10;2020/05/05, 45, 20&#10;2020/05/05, 415, 2&#10;2020/05/05, 415, 0&#10;2020/05/03, -5, 6&#10;2020/05/08, 4, 8&#10;2020/05/15, 32, 9&#10;2020/05/08, 4, 8&#10;2020/05/08, -9, 21&#10;2020/05/08, 41, 8"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="nominal_to_date" compatibility="9.7.000" expanded="true" height="82" name="Nominal to Date" width="90" x="246" y="34">
            <parameter key="attribute_name" value="Date"/>
            <parameter key="date_type" value="date"/>
            <parameter key="date_format" value="yyyy/MM/dd"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="locale" value="English (United States)"/>
            <parameter key="keep_old_attribute" value="false"/>
          </operator>
          <operator activated="true" class="python_scripting:execute_python" compatibility="9.6.000" expanded="true" height="103" name="Execute Python" width="90" x="447" y="34">
            <parameter key="script" value="import pandas&#10;&#10;def rm_main(data):&#10;    data.sort_values(['Date','ColA', 'ColB'], ascending = [True, False, True], inplace = True)&#10;    return data"/>
            <parameter key="notebook_cell_tag_filter" value=""/>
            <parameter key="use_default_python" value="true"/>
            <parameter key="package_manager" value="conda (anaconda)"/>
            <parameter key="use_macros" value="false"/>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Nominal to Date" to_port="example set input"/>
          <connect from_op="Nominal to Date" from_port="example set output" to_op="Execute Python" to_port="input 1"/>
          <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


    Best,
    Harshit
  • dyoolyoosdyoolyoos Member Posts: 6 Contributor I
    edited September 2020
    Sorry wrong initial comment. Was looking for how to manually change the order of nominal labels, say, in a heat map. (may be out of topic)
Sign In or Register to comment.