Options

How to invert the example order?

NewbieNewbie Member Posts: 2 Newbie
Hey everyone!

I'm new to Rapid Miner and currently trying to use it on my first data set. I am desperately looking for a way to invert the order of my examples, i.e put the first row last, the second row second-to-last, and so on. The sort-operator refuses to work on the row-number (which kind of makes sense, since this isn't a real attribute). It's quit a big data set, and my current work-arounds take way to much time. Any ideas?

For context: I actually want to do this to remove certain duplicates. The remove duplicates operator seems to keep the first example and delete every duplication afterwards. I would like to keep the last example and remove all duplicates before (I'm filtering on a subset for the remove duplicates opertor). So my idea was to invert the order of examples to achieve this.

Thank you for your help!

Best Answer

  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited March 2019 Solution Accepted
    Hello @Newbie

    You can use generate ID operator that generated ID for all the examples in your dataset. Then sort based on ID column in decreasing order which will invert the examples. Sample XML code below. To run this XML code you need to open a blank process. Go to View --> Show Panel --> XML. You can copy paste this code in XML window and click the green color tick mark that will show the process in the process window. Run it so that you can see how this sample is inverted.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.000" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="179" y="34">
            <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000" expanded="true" height="82" name="Generate ID" width="90" x="380" y="34">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="sort" compatibility="9.2.000" expanded="true" height="82" name="Sort" width="90" x="581" y="34">
            <parameter key="attribute_name" value="id"/>
            <parameter key="sorting_direction" value="decreasing"/>
          </operator>
          <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Sort" to_port="example set input"/>
          <connect from_op="Sort" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    

    There might be other solutions as well. Hope this helps

    PS: Once they are inverted, then you can use select attributes operator to remove the ID column 
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Answers

  • Options
    NewbieNewbie Member Posts: 2 Newbie
    Thank you very much for the suggestion, it worked perfectly!
Sign In or Register to comment.