Updated Target Shuffling

npapan69npapan69 Member Posts: 17 Maven
Dear All,
I was wondering if there is any recent implementation of target shuffling, there is XML code from 2010 in one post that doesn't work with 9.7 Studio version.

Many thanks in advance

Nikos

Best Answers

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    what exactly are you referring to?
    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • npapan69npapan69 Member Posts: 17 Maven
    Thanks, Martin,
    I'm referring to the technique called "Target Shuffling" where you are generating multiple datasets by randomly shuffling the labels in order to compare the performance based on the real data as opposed to "bogus" data. There is a relative XML code posted back in 2011, but I cant make it work on the current RM Studio (9.7), so I was wondering if there is a process or an operator even better that could achieve the latter. Here comes the old post that I'm mentioning above:

    amnonkhen Posts: 3  Contributor I
    Hi,

    I implemented Target Shuffling in RM.
    I saved it as a Building Block for easy inclusion in projects.
    The enclosed code is for a building block. Save it in a file called [tt]Target Shuffling.buildingblock[/tt] your repository directory.

    I hope you find it useful.

    I'll be happy to get any comments.

    Sincerely,
      Amnon Khen

    Target Shuffling
    Shuffles the labels of the input example set. Be sure to define the label and id attribute names.
    sort_up_down.png
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!--
    This achieves "target shuffling".
    I don't know if it is the most elegant way.

    It does so by:
    1) multiplying the example set
    2) in one copy:
    2.1) leave only the label
    2.2) shuffle examples (which are only hte labels)
    3) in the other:
    3.1)  remove the label
    3.2) rename the id to old_id
    3.3) make it a regular attribute
    4) add a "fake" id column to both copies
    5) join copies
    6) clean up:
    6.1) remove fake id
    6.2) rename old_id to id
    6.3) make it an id attribute

    Assumptions:
    1) Input ExampleSet has a label attribute
    2) Input ExampleSet has an id attribute

    Instructions:
    1) set up name of label attribute
    2) set up name of id attribute

    Created by Amnon Khen <amnon.is@gmail.com>
    -->
    <operator activated="true" class="subprocess" compatibility="5.1.011" expanded="true" height="76" name="Target Shuffling" width="90" x="179" y="300">
      <description>This achieves "target shuffling".I don't know if it is the most elegant way.It does so by:1) multiplying the example set2) in one copy:2.1) leave only the label2.2) shuffle examples (which are only hte labels)3) in the other:3.1)  remove the label3.2) rename the id to old_id 3.3) make it a regular attribute4) add a "fake" id column to both copies5) join copies6) clean up:6.1) remove fake id6.2) rename old_id to id6.3) make it an id attributeAssumptions:1) Input ExampleSet has a label attribute2) Input ExampleSet has an id attributeInstructions:1) set up name of label attribute2) set up name of id attribute</description>
      <parameter key="parallelize_nested_chain" value="false"/>
      <process expanded="true" height="644" width="1054">
        <operator activated="true" class="print_to_console" compatibility="5.1.011" expanded="true" height="76" name="Print to Console (7)" width="90" x="45" y="30">
          <parameter key="log_value" value="shuffling labels"/>
        </operator>
        <operator activated="true" class="set_macro" compatibility="5.1.011" expanded="true" height="76" name="def. label attr. name" width="90" x="179" y="30">
          <parameter key="macro" value="label_attribute_name"/>
          <parameter key="value" value="Class"/>
        </operator>
        <operator activated="true" class="set_macro" compatibility="5.1.011" expanded="true" height="76" name="def. id attr." width="90" x="315" y="30">
          <parameter key="macro" value="id_attribute_name"/>
          <parameter key="value" value="id"/>
        </operator>
        <operator activated="true" class="print_to_console" compatibility="5.1.011" expanded="true" height="76" name="log label attr." width="90" x="447" y="30">
          <parameter key="log_value" value="label attribute: %{label_attribute_name}"/>
        </operator>
        <operator activated="true" class="print_to_console" compatibility="5.1.011" expanded="true" height="76" name="log id attr." width="90" x="585" y="30">
          <parameter key="log_value" value="id attribute: %{id_attribute_name}"/>
        </operator>
        <operator activated="true" class="multiply" compatibility="5.1.011" expanded="true" height="94" name="Multiply" width="90" x="45" y="210"/>
        <operator activated="true" class="set_role" compatibility="5.1.011" expanded="true" height="76" name="id -&gt; regular" width="90" x="112" y="345">
          <parameter key="name" value="%{id_attribute_name}"/>
          <parameter key="target_role" value="regular"/>
          <list key="set_additional_roles"/>
        </operator>
        <operator activated="true" class="rename" compatibility="5.1.011" expanded="true" height="76" name="rename id -&gt; old_id" width="90" x="246" y="345">
          <parameter key="old_name" value="%{id_attribute_name}"/>
          <parameter key="new_name" value="old_id"/>
          <list key="rename_additional_attributes"/>
        </operator>
        <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="remove label" width="90" x="380" y="345">
          <parameter key="attribute_filter_type" value="single"/>
          <parameter key="attribute" value="%{label_attribute_name}"/>
          <parameter key="attributes" value=""/>
          <parameter key="use_except_expression" value="false"/>
          <parameter key="value_type" value="attribute_value"/>
          <parameter key="use_value_type_exception" value="false"/>
          <parameter key="except_value_type" value="time"/>
          <parameter key="block_type" value="attribute_block"/>
          <parameter key="use_block_type_exception" value="false"/>
          <parameter key="except_block_type" value="value_matrix_row_start"/>
          <parameter key="invert_selection" value="true"/>
          <parameter key="include_special_attributes" value="true"/>
        </operator>
        <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="leave only labels" width="90" x="179" y="210">
          <parameter key="attribute_filter_type" value="single"/>
          <parameter key="attribute" value="%{label_attribute_name}"/>
          <parameter key="attributes" value=""/>
          <parameter key="use_except_expression" value="false"/>
          <parameter key="value_type" value="attribute_value"/>
          <parameter key="use_value_type_exception" value="false"/>
          <parameter key="except_value_type" value="time"/>
          <parameter key="block_type" value="attribute_block"/>
          <parameter key="use_block_type_exception" value="false"/>
          <parameter key="except_block_type" value="value_matrix_row_start"/>
          <parameter key="invert_selection" value="false"/>
          <parameter key="include_special_attributes" value="true"/>
        </operator>
        <operator activated="true" class="shuffle" compatibility="5.1.011" expanded="true" height="76" name="Shuffle labels" width="90" x="313" y="210">
          <parameter key="use_local_random_seed" value="false"/>
          <parameter key="local_random_seed" value="1992"/>
        </operator>
        <operator activated="true" class="generate_id" compatibility="5.1.011" expanded="true" height="76" name="generate fake id for shuffled labels" width="90" x="447" y="210">
          <parameter key="create_nominal_ids" value="false"/>
          <parameter key="offset" value="0"/>
        </operator>
        <operator activated="true" class="generate_id" compatibility="5.1.011" expanded="true" height="76" name="generate fake id for example set" width="90" x="514" y="345">
          <parameter key="create_nominal_ids" value="false"/>
          <parameter key="offset" value="0"/>
        </operator>
        <operator activated="true" class="join" compatibility="5.1.011" expanded="true" height="76" name="Join" width="90" x="648" y="300">
          <parameter key="remove_double_attributes" value="true"/>
          <parameter key="join_type" value="inner"/>
          <parameter key="use_id_attribute_as_key" value="true"/>
          <list key="key_attributes"/>
        </operator>
        <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="remove fake id" width="90" x="648" y="480">
          <parameter key="attribute_filter_type" value="single"/>
          <parameter key="attribute" value="id"/>
          <parameter key="attributes" value=""/>
          <parameter key="use_except_expression" value="false"/>
          <parameter key="value_type" value="attribute_value"/>
          <parameter key="use_value_type_exception" value="false"/>
          <parameter key="except_value_type" value="time"/>
          <parameter key="block_type" value="attribute_block"/>
          <parameter key="use_block_type_exception" value="false"/>
          <parameter key="except_block_type" value="value_matrix_row_start"/>
          <parameter key="invert_selection" value="true"/>
          <parameter key="include_special_attributes" value="true"/>
        </operator>
        <operator activated="true" class="rename" compatibility="5.1.011" expanded="true" height="76" name="rename old_id -&gt; id" width="90" x="782" y="480">
          <parameter key="old_name" value="old_id"/>
          <parameter key="new_name" value="%{id_attribute_name}"/>
          <list key="rename_additional_attributes"/>
        </operator>
        <operator activated="true" class="set_role" compatibility="5.1.011" expanded="true" height="76" name="Set Role" width="90" x="916" y="480">
          <parameter key="name" value="%{id_attribute_name}"/>
          <parameter key="target_role" value="id"/>
          <list key="set_additional_roles"/>
        </operator>
        <connect from_port="in 1" to_op="Print to Console (7)" to_port="through 1"/>
        <connect from_op="Print to Console (7)" from_port="through 1" to_op="def. label attr. name" to_port="through 1"/>
        <connect from_op="def. label attr. name" from_port="through 1" to_op="def. id attr." to_port="through 1"/>
        <connect from_op="def. id attr." from_port="through 1" to_op="log label attr." to_port="through 1"/>
        <connect from_op="log label attr." from_port="through 1" to_op="log id attr." to_port="through 1"/>
        <connect from_op="log id attr." from_port="through 1" to_op="Multiply" to_port="input"/>
        <connect from_op="Multiply" from_port="output 1" to_op="leave only labels" to_port="example set input"/>
        <connect from_op="Multiply" from_port="output 2" to_op="id -&gt; regular" to_port="example set input"/>
        <connect from_op="id -&gt; regular" from_port="example set output" to_op="rename id -&gt; old_id" to_port="example set input"/>
        <connect from_op="rename id -&gt; old_id" from_port="example set output" to_op="remove label" to_port="example set input"/>
        <connect from_op="remove label" from_port="example set output" to_op="generate fake id for example set" to_port="example set input"/>
        <connect from_op="leave only labels" from_port="example set output" to_op="Shuffle labels" to_port="example set input"/>
        <connect from_op="Shuffle labels" from_port="example set output" to_op="generate fake id for shuffled labels" to_port="example set input"/>
        <connect from_op="generate fake id for shuffled labels" from_port="example set output" to_op="Join" to_port="left"/>
        <connect from_op="generate fake id for example set" from_port="example set output" to_op="Join" to_port="right"/>
        <connect from_op="Join" from_port="join" to_op="remove fake id" to_port="example set input"/>
        <connect from_op="remove fake id" from_port="example set output" to_op="rename old_id -&gt; id" to_port="example set input"/>
        <connect from_op="rename old_id -&gt; id" from_port="example set output" to_op="Set Role" to_port="example set input"/>
        <connect from_op="Set Role" from_port="example set output" to_port="out 1"/>
        <portSpacing port="source_in 1" spacing="0"/>
        <portSpacing port="source_in 2" spacing="0"/>
        <portSpacing port="sink_out 1" spacing="432"/>
        <portSpacing port="sink_out 2" spacing="0"/>
      </process>
    </operator>
  • npapan69npapan69 Member Posts: 17 Maven
    Dear Martin,
    Many thanks for providing the solution, another question is how can I write multiple xls files that I can produce with your code? The Write Excel operator saves multiple sheets within the same file while I need to save multiple separate xls files. Is there a solution to that?

    Best 
    Nikos
Sign In or Register to comment.