Options

[Solved] Ranking percentages by ID and label

Kate_StrydomKate_Strydom Member Posts: 19 Contributor II
Hi,

Can anyone assist me please!

I need to create a new attribute in RapidMiner called "Rank"  which ranks the percentages in descending order grouped by the unique ID and Category.

unique ID  Category        Percentage   Rank
001A          News                 0.23               1
001A          Weather             0.15               2
001A          Sports                0.09               3
001B          Weather             0.64               1
001B          News                 0.25               2
001B         Entertainment     0.02               3

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    Hi,

    you can use Sort together with Generate ID to solve your problem. If you need to calculate the percentages first, you can use Aggregate.

    Attached is a small process using the iris dataset and rank using the attribute a1.

    I hope this helps!

    Best,

    Martin

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.1.000">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.1.000" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="retrieve" compatibility="6.1.000" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
           <parameter key="repository_entry" value="//Samples/data/Iris"/>
         </operator>
         <operator activated="true" class="sort" compatibility="6.1.000" expanded="true" height="76" name="Sort" width="90" x="179" y="30">
           <parameter key="attribute_name" value="a1"/>
         </operator>
         <operator activated="true" class="generate_id" compatibility="6.1.000" expanded="true" height="76" name="Generate ID" width="90" x="313" y="30"/>
         <operator activated="true" class="set_role" compatibility="6.1.000" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
           <parameter key="attribute_name" value="id"/>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="rename" compatibility="6.1.000" expanded="true" height="76" name="Rename" width="90" x="581" y="30">
           <parameter key="old_name" value="id"/>
           <parameter key="new_name" value="Rank"/>
           <list key="rename_additional_attributes"/>
         </operator>
         <connect from_op="Retrieve Iris" from_port="output" to_op="Sort" to_port="example set input"/>
         <connect from_op="Sort" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
         <connect from_op="Generate ID" from_port="example set output" to_op="Set Role" to_port="example set input"/>
         <connect from_op="Set Role" from_port="example set output" to_op="Rename" to_port="example set input"/>
         <connect from_op="Rename" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    Edit: I overlooked that you want to do it grouped by your unique IDs. Here is an example process on sonar. The difference is basicly the loop

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.1.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.1.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="6.1.000" expanded="true" height="60" name="Retrieve Sonar" width="90" x="112" y="30">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="loop_values" compatibility="6.1.000" expanded="true" height="76" name="Loop Values" width="90" x="313" y="30">
            <parameter key="attribute" value="class"/>
            <process expanded="true">
              <operator activated="true" class="filter_examples" compatibility="6.1.000" expanded="true" height="94" name="Filter Examples" width="90" x="45" y="30">
                <parameter key="parameter_string" value="class=%{loop_value}"/>
                <parameter key="parameter_expression" value="class==%{loop_value}"/>
                <parameter key="condition_class" value="attribute_value_filter"/>
                <list key="filters_list"/>
              </operator>
              <operator activated="true" class="sort" compatibility="6.1.000" expanded="true" height="76" name="Sort" width="90" x="179" y="30">
                <parameter key="attribute_name" value="attribute_1"/>
                <parameter key="sorting_direction" value="decreasing"/>
              </operator>
              <operator activated="true" class="generate_id" compatibility="6.1.000" expanded="true" height="76" name="Generate ID" width="90" x="313" y="30"/>
              <operator activated="true" class="set_role" compatibility="6.1.000" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
                <parameter key="attribute_name" value="id"/>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="rename" compatibility="6.1.000" expanded="true" height="76" name="Rename" width="90" x="581" y="30">
                <parameter key="old_name" value="id"/>
                <parameter key="new_name" value="Rank"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Sort" to_port="example set input"/>
              <connect from_op="Sort" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
              <connect from_op="Generate ID" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="Rename" to_port="example set input"/>
              <connect from_op="Rename" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="6.1.000" expanded="true" height="76" name="Append" width="90" x="447" y="30"/>
          <connect from_op="Retrieve Sonar" from_port="output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    Kate_StrydomKate_Strydom Member Posts: 19 Contributor II
    Hi Martin,

    Thanks for the quick reply.

    The problem is that it just replaces my Unique Id attribute with the generated ID, and no grouping takes place.

    so I have operators
    retrieve data  - set role - sort by percentage - sort by unique ID - generate ID.

    I need the generate ID to also group, instead of replace.

    Thanks

    Regards
    Kate
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    Hi Kate,

    i am not completly aware of your problem.

    Am I right, that the generate ID operator "deletes" your UniqueID attribute? That might happen if the role of UniqueID is ID. Simply change the Role before to "something else". Then you can do the ranking. Afterwards set the role of UniqueID back to id.

    Best,

    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    Kate_StrydomKate_Strydom Member Posts: 19 Contributor II
    HI Martin,

    I will try this.

    I am trying to implement the script you sent. I used a execute script. pasted the edited code you sent in it. Linked it to the results. It gives an error - undefined macro: loop_value.

    I presume I need to insert the loop value operator between sort and generate ID operator?

    Will let you know how it works.

    Regards
    Kate
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    there is no need to use execute script. The XML code is a complete script. You can import it using the import dialogue or by pasting it into the XML - view. Be sure not to overwrite your process with this!
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    Kate_StrydomKate_Strydom Member Posts: 19 Contributor II
    Sorry Martin,

    It still does not work :(

    I must be doing something wrong. I also cannot see what the code you sent is supposed to do because of the error.

    Please help.

    Regards
    Kate
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    Maybe this thread: http://rapid-i.com/rapidforum/index.php/topic,4654.0.html helps you.Especially the last two sections.
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    Kate_StrydomKate_Strydom Member Posts: 19 Contributor II
    Hi Martin,

    Thank you :) so very much. I managed to follow your instructions.

    Sorry I am so new and I did not see the red saying that your post had come through. Hence some of the communication is out of order.

    I have successfully implemented the sonar code as per your instructions. I have also manage to successfully implement the code and the Loop value operator in my process. It is a very neat operator.

    Your advice has been extremely helpful after I have spent so much time searching and trying to do this. I would never had succeeded without the loop value and the filter macro. Wish I understood how to implement this in other areas of my RapidMiner processes.

    Regards
    Kate
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist
    Hi Kate,

    Great! It is always a pleasure to help.

    There is some documentation around. E.g. https://rapidminer.com/documentation/ . I personally like the book "Data Mining for the masses" which is linked there. Furthermore there are several youtube sites etc. One of our consultants has his own youtube-channel. See: https://www.youtube.com/user/neuralmarkettrends1

    But if you have further questions feel free to ask in this board.

    Best,

    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.