Target Shuffling

iasoniason Member Posts: 20 Contributor II
edited November 2018 in Help
Is there any quick way to implement "Target Shuffling" in RM?
In a target shuffling model evaluation, performance should be measured for the actual dataset as well as for a number of datasets with randomly rearranged label values.
Using random labels is not enough. The actual labels should be used and assigned to different examples.

Answers

  • wesselwessel Member Posts: 537 Maven
    How does Target Shuffeling work exactly?

    The way I understand it is as following:
    Step 1. You train a classifier and observe that is has X percent accuracy.
    Step 2. You then randomize your labels, train another classifier, and observe that is has Y percent accuracy.
    Step 3. You repeat Step 2 multiple times and find Z = best(Y).

    When X is sufficiently better then Z, you claim that the model underlying X is not caused by noise.

    This is correct?






  • iasoniason Member Posts: 20 Contributor II
    Hello,

    Sorry for the late reply, I was out of office for a while.

    Target Shuffling works as you describe. The only restriction is that step 2 should take care not to bias the label distribution. That is why the original set of labels is used with randomized order (hence the term shuffling instead of randomizing).

    Such shuffled dataset can be easily constructed using R or even excel, but how could one implement the whole proccess in RM and get one final result?
    Some times the top-n random models are required for comparison and some dataset similarity measures. This is to avoid using too many repeats in step 3, in relation to the number of examples, and have a large number of datasets that are not truly shuffled.
  • amnonkhenamnonkhen Member Posts: 6 Contributor II
    Hello iason,
    I am struggling with the same problem of implementing target shuffling in RM.
    Did you get a response or manage to find a solution?
    Thanks,
      Amnon
  • amnonkhenamnonkhen Member Posts: 6 Contributor II
    Hi,

    I implemented Target Shuffling in RM.
    I saved it as a Building Block for easy inclusion in projects.
    The enclosed code is for a building block. Save it in a file called [tt]Target Shuffling.buildingblock[/tt] your repository directory.

    I hope you find it useful.

    I'll be happy to get any comments.

    Sincerely,
      Amnon Khen

    Target Shuffling
    Shuffles the labels of the input example set. Be sure to define the label and id attribute names.
    sort_up_down.png
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!--
    This achieves "target shuffling".
    I don't know if it is the most elegant way.

    It does so by:
    1) multiplying the example set
    2) in one copy:
    2.1) leave only the label
    2.2) shuffle examples (which are only hte labels)
    3) in the other:
    3.1)  remove the label
    3.2) rename the id to old_id
    3.3) make it a regular attribute
    4) add a "fake" id column to both copies
    5) join copies
    6) clean up:
    6.1) remove fake id
    6.2) rename old_id to id
    6.3) make it an id attribute

    Assumptions:
    1) Input ExampleSet has a label attribute
    2) Input ExampleSet has an id attribute

    Instructions:
    1) set up name of label attribute
    2) set up name of id attribute

    Created by Amnon Khen <amnon.is@gmail.com>
    -->
    <operator activated="true" class="subprocess" compatibility="5.1.011" expanded="true" height="76" name="Target Shuffling" width="90" x="179" y="300">
      <description>This achieves "target shuffling".I don't know if it is the most elegant way.It does so by:1) multiplying the example set2) in one copy:2.1) leave only the label2.2) shuffle examples (which are only hte labels)3) in the other:3.1)  remove the label3.2) rename the id to old_id 3.3) make it a regular attribute4) add a "fake" id column to both copies5) join copies6) clean up:6.1) remove fake id6.2) rename old_id to id6.3) make it an id attributeAssumptions:1) Input ExampleSet has a label attribute2) Input ExampleSet has an id attributeInstructions:1) set up name of label attribute2) set up name of id attribute</description>
      <parameter key="parallelize_nested_chain" value="false"/>
      <process expanded="true" height="644" width="1054">
        <operator activated="true" class="print_to_console" compatibility="5.1.011" expanded="true" height="76" name="Print to Console (7)" width="90" x="45" y="30">
          <parameter key="log_value" value="shuffling labels"/>
        </operator>
        <operator activated="true" class="set_macro" compatibility="5.1.011" expanded="true" height="76" name="def. label attr. name" width="90" x="179" y="30">
          <parameter key="macro" value="label_attribute_name"/>
          <parameter key="value" value="Class"/>
        </operator>
        <operator activated="true" class="set_macro" compatibility="5.1.011" expanded="true" height="76" name="def. id attr." width="90" x="315" y="30">
          <parameter key="macro" value="id_attribute_name"/>
          <parameter key="value" value="id"/>
        </operator>
        <operator activated="true" class="print_to_console" compatibility="5.1.011" expanded="true" height="76" name="log label attr." width="90" x="447" y="30">
          <parameter key="log_value" value="label attribute: %{label_attribute_name}"/>
        </operator>
        <operator activated="true" class="print_to_console" compatibility="5.1.011" expanded="true" height="76" name="log id attr." width="90" x="585" y="30">
          <parameter key="log_value" value="id attribute: %{id_attribute_name}"/>
        </operator>
        <operator activated="true" class="multiply" compatibility="5.1.011" expanded="true" height="94" name="Multiply" width="90" x="45" y="210"/>
        <operator activated="true" class="set_role" compatibility="5.1.011" expanded="true" height="76" name="id -&gt; regular" width="90" x="112" y="345">
          <parameter key="name" value="%{id_attribute_name}"/>
          <parameter key="target_role" value="regular"/>
          <list key="set_additional_roles"/>
        </operator>
        <operator activated="true" class="rename" compatibility="5.1.011" expanded="true" height="76" name="rename id -&gt; old_id" width="90" x="246" y="345">
          <parameter key="old_name" value="%{id_attribute_name}"/>
          <parameter key="new_name" value="old_id"/>
          <list key="rename_additional_attributes"/>
        </operator>
        <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="remove label" width="90" x="380" y="345">
          <parameter key="attribute_filter_type" value="single"/>
          <parameter key="attribute" value="%{label_attribute_name}"/>
          <parameter key="attributes" value=""/>
          <parameter key="use_except_expression" value="false"/>
          <parameter key="value_type" value="attribute_value"/>
          <parameter key="use_value_type_exception" value="false"/>
          <parameter key="except_value_type" value="time"/>
          <parameter key="block_type" value="attribute_block"/>
          <parameter key="use_block_type_exception" value="false"/>
          <parameter key="except_block_type" value="value_matrix_row_start"/>
          <parameter key="invert_selection" value="true"/>
          <parameter key="include_special_attributes" value="true"/>
        </operator>
        <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="leave only labels" width="90" x="179" y="210">
          <parameter key="attribute_filter_type" value="single"/>
          <parameter key="attribute" value="%{label_attribute_name}"/>
          <parameter key="attributes" value=""/>
          <parameter key="use_except_expression" value="false"/>
          <parameter key="value_type" value="attribute_value"/>
          <parameter key="use_value_type_exception" value="false"/>
          <parameter key="except_value_type" value="time"/>
          <parameter key="block_type" value="attribute_block"/>
          <parameter key="use_block_type_exception" value="false"/>
          <parameter key="except_block_type" value="value_matrix_row_start"/>
          <parameter key="invert_selection" value="false"/>
          <parameter key="include_special_attributes" value="true"/>
        </operator>
        <operator activated="true" class="shuffle" compatibility="5.1.011" expanded="true" height="76" name="Shuffle labels" width="90" x="313" y="210">
          <parameter key="use_local_random_seed" value="false"/>
          <parameter key="local_random_seed" value="1992"/>
        </operator>
        <operator activated="true" class="generate_id" compatibility="5.1.011" expanded="true" height="76" name="generate fake id for shuffled labels" width="90" x="447" y="210">
          <parameter key="create_nominal_ids" value="false"/>
          <parameter key="offset" value="0"/>
        </operator>
        <operator activated="true" class="generate_id" compatibility="5.1.011" expanded="true" height="76" name="generate fake id for example set" width="90" x="514" y="345">
          <parameter key="create_nominal_ids" value="false"/>
          <parameter key="offset" value="0"/>
        </operator>
        <operator activated="true" class="join" compatibility="5.1.011" expanded="true" height="76" name="Join" width="90" x="648" y="300">
          <parameter key="remove_double_attributes" value="true"/>
          <parameter key="join_type" value="inner"/>
          <parameter key="use_id_attribute_as_key" value="true"/>
          <list key="key_attributes"/>
        </operator>
        <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="remove fake id" width="90" x="648" y="480">
          <parameter key="attribute_filter_type" value="single"/>
          <parameter key="attribute" value="id"/>
          <parameter key="attributes" value=""/>
          <parameter key="use_except_expression" value="false"/>
          <parameter key="value_type" value="attribute_value"/>
          <parameter key="use_value_type_exception" value="false"/>
          <parameter key="except_value_type" value="time"/>
          <parameter key="block_type" value="attribute_block"/>
          <parameter key="use_block_type_exception" value="false"/>
          <parameter key="except_block_type" value="value_matrix_row_start"/>
          <parameter key="invert_selection" value="true"/>
          <parameter key="include_special_attributes" value="true"/>
        </operator>
        <operator activated="true" class="rename" compatibility="5.1.011" expanded="true" height="76" name="rename old_id -&gt; id" width="90" x="782" y="480">
          <parameter key="old_name" value="old_id"/>
          <parameter key="new_name" value="%{id_attribute_name}"/>
          <list key="rename_additional_attributes"/>
        </operator>
        <operator activated="true" class="set_role" compatibility="5.1.011" expanded="true" height="76" name="Set Role" width="90" x="916" y="480">
          <parameter key="name" value="%{id_attribute_name}"/>
          <parameter key="target_role" value="id"/>
          <list key="set_additional_roles"/>
        </operator>
        <connect from_port="in 1" to_op="Print to Console (7)" to_port="through 1"/>
        <connect from_op="Print to Console (7)" from_port="through 1" to_op="def. label attr. name" to_port="through 1"/>
        <connect from_op="def. label attr. name" from_port="through 1" to_op="def. id attr." to_port="through 1"/>
        <connect from_op="def. id attr." from_port="through 1" to_op="log label attr." to_port="through 1"/>
        <connect from_op="log label attr." from_port="through 1" to_op="log id attr." to_port="through 1"/>
        <connect from_op="log id attr." from_port="through 1" to_op="Multiply" to_port="input"/>
        <connect from_op="Multiply" from_port="output 1" to_op="leave only labels" to_port="example set input"/>
        <connect from_op="Multiply" from_port="output 2" to_op="id -&gt; regular" to_port="example set input"/>
        <connect from_op="id -&gt; regular" from_port="example set output" to_op="rename id -&gt; old_id" to_port="example set input"/>
        <connect from_op="rename id -&gt; old_id" from_port="example set output" to_op="remove label" to_port="example set input"/>
        <connect from_op="remove label" from_port="example set output" to_op="generate fake id for example set" to_port="example set input"/>
        <connect from_op="leave only labels" from_port="example set output" to_op="Shuffle labels" to_port="example set input"/>
        <connect from_op="Shuffle labels" from_port="example set output" to_op="generate fake id for shuffled labels" to_port="example set input"/>
        <connect from_op="generate fake id for shuffled labels" from_port="example set output" to_op="Join" to_port="left"/>
        <connect from_op="generate fake id for example set" from_port="example set output" to_op="Join" to_port="right"/>
        <connect from_op="Join" from_port="join" to_op="remove fake id" to_port="example set input"/>
        <connect from_op="remove fake id" from_port="example set output" to_op="rename old_id -&gt; id" to_port="example set input"/>
        <connect from_op="rename old_id -&gt; id" from_port="example set output" to_op="Set Role" to_port="example set input"/>
        <connect from_op="Set Role" from_port="example set output" to_port="out 1"/>
        <portSpacing port="source_in 1" spacing="0"/>
        <portSpacing port="source_in 2" spacing="0"/>
        <portSpacing port="sink_out 1" spacing="432"/>
        <portSpacing port="sink_out 2" spacing="0"/>
      </process>
    </operator>

Sign In or Register to comment.