RapidMiner

RapidMiner

How to do Y-randomization in Rapidminer?

Regular Contributor

How to do Y-randomization in Rapidminer?

Hi,

I was wondering how do I do Y-randomization in Rapidminer? In Y-randomization, the y value of an example is randomly exchanged with the y value of another example. This is used in validation of QSAR models, whereby the performance of the original model (r2) is compared to that of models built for permuted (randomly shuffled) response.

Regards
7 REPLIES
Elite

Re: How to do Y-randomization in Rapidminer?

Hi,
although there is no operator for Y-Randomization in RapidMiner yet, we can make use of its modularity. I have created a process, doing Y-randomization. You could encapsulate it within an OperatorChain to use it within your process.

<operator name="Root" class="Process" expanded="yes">
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="one third classification"/>
    </operator>
    <operator name="IdTagging" class="IdTagging">
    </operator>
    <operator name="IOMultiplier" class="IOMultiplier">
        <parameter key="io_object" value="ExampleSet"/>
    </operator>
    <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
        <parameter key="attribute_name_regex" value="label|id"/>
        <parameter key="condition_class" value="attribute_name_filter"/>
        <parameter key="keep_subset_only" value="true"/>
        <operator name="NoiseGenerator" class="NoiseGenerator">
            <parameter key="label_noise" value="0.0"/>
            <list key="noise">
            </list>
            <parameter key="random_attributes" value="1"/>
        </operator>
        <operator name="Sorting" class="Sorting">
            <parameter key="attribute_name" value="random"/>
        </operator>
        <operator name="IdTagging (2)" class="IdTagging">
        </operator>
    </operator>
    <operator name="IOSelector" class="IOSelector">
        <parameter key="io_object" value="ExampleSet"/>
        <parameter key="select_which" value="2"/>
    </operator>
    <operator name="ExampleSetJoin" class="ExampleSetJoin">
    </operator>
    <operator name="AttributeFilter (2)" class="AttributeFilter">
        <parameter key="condition_class" value="attribute_name_filter"/>
        <parameter key="invert_filter" value="true"/>
        <parameter key="parameter_string" value="random"/>
    </operator>
</operator>


Hope that helps.


Greetings,
  Sebastian
Regular Contributor

Re: How to do Y-randomization in Rapidminer?

Hi,

thank you for your help. The code worked perfectly. I am now trying to use Rapidminer to do y-randomization, train a model, evaluate the model using leave-one-out and repeat this 100 times to get an average classification error for the y-randomization. I am using the following code


<operator name="Root" class="Process" expanded="yes">
    <parameter key="random_seed" value="-1"/>
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="one third classification"/>
    </operator>
    <operator name="RepeatUntilOperatorChain" class="RepeatUntilOperatorChain" expanded="yes">
        <parameter key="max_iterations" value="100"/>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="IOMultiplier" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="no">
            <parameter key="attribute_name_regex" value="label|id"/>
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="keep_subset_only" value="true"/>
            <operator name="NoiseGenerator" class="NoiseGenerator">
                <parameter key="label_noise" value="0.0"/>
                <list key="noise">
                </list>
                <parameter key="random_attributes" value="1"/>
            </operator>
            <operator name="Sorting" class="Sorting">
                <parameter key="attribute_name" value="random"/>
            </operator>
            <operator name="IdTagging (2)" class="IdTagging">
            </operator>
        </operator>
        <operator name="IOSelector" class="IOSelector">
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="select_which" value="2"/>
        </operator>
        <operator name="ExampleSetJoin" class="ExampleSetJoin">
        </operator>
        <operator name="AttributeFilter (2)" class="AttributeFilter">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="invert_filter" value="true"/>
            <parameter key="parameter_string" value="random"/>
        </operator>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <parameter key="leave_one_out" value="true"/>
            <operator name="NearestNeighbors" class="NearestNeighbors">
                <parameter key="k" value="3"/>
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="ClassificationPerformance" class="ClassificationPerformance">
                    <list key="class_weights">
                    </list>
                    <parameter key="classification_error" value="true"/>
                </operator>
            </operator>
        </operator>
    </operator>
</operator>


However, it seems to give me an error about RepeatUntilOperatorChain.
Moderator

Re: How to do Y-randomization in Rapidminer?

Hi,

just a hint: why do you not use the [tt]IteratingPerformanceAverage[/tt] operator which also iterates for a predifined number of times and also averages the performance vectors resulting from the inner operator chain?

Regards,
Tobias
Regular Contributor

Re: How to do Y-randomization in Rapidminer?

Great hint!

Met another error..."Message: The attribute 'random' does not exist.". Done a bit of tracing. It seems like the AttributeFilter (2) removes the attribute 'random' after the first round but on the second round, the NoiseGenerator generates attribute 'random1' instead of 'random', thus causing the error.


<operator name="Root" class="Process" expanded="yes">
    <parameter key="random_seed" value="-1"/>
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="one third classification"/>
    </operator>
    <operator name="IteratingPerformanceAverage" class="IteratingPerformanceAverage" expanded="yes">
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="IOMultiplier" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="attribute_name_regex" value="label|id"/>
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="keep_subset_only" value="true"/>
            <operator name="NoiseGenerator" class="NoiseGenerator" breakpoints="after">
                <parameter key="label_noise" value="0.0"/>
                <list key="noise">
                </list>
                <parameter key="random_attributes" value="1"/>
            </operator>
            <operator name="Sorting" class="Sorting">
                <parameter key="attribute_name" value="random"/>
            </operator>
            <operator name="IdTagging (2)" class="IdTagging">
            </operator>
        </operator>
        <operator name="IOSelector" class="IOSelector">
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="select_which" value="2"/>
        </operator>
        <operator name="ExampleSetJoin" class="ExampleSetJoin">
        </operator>
        <operator name="AttributeFilter (2)" class="AttributeFilter">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="invert_filter" value="true"/>
            <parameter key="parameter_string" value="random"/>
        </operator>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <parameter key="leave_one_out" value="true"/>
            <operator name="NearestNeighbors" class="NearestNeighbors">
                <parameter key="k" value="3"/>
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="no">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="ClassificationPerformance" class="ClassificationPerformance">
                    <list key="class_weights">
                    </list>
                    <parameter key="classification_error" value="true"/>
                </operator>
            </operator>
        </operator>
    </operator>
</operator>
Elite

Re: How to do Y-randomization in Rapidminer?

Hi,
try to use our Permutation Operator. I forgot it myself in the previous solution. So many Operators... Smiley Happy

<operator name="Root" class="Process" expanded="yes">
    <parameter key="random_seed" value="-1"/>
    <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
        <parameter key="target_function" value="one third classification"/>
    </operator>
    <operator name="IteratingPerformanceAverage" class="IteratingPerformanceAverage" expanded="yes">
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="IOMultiplier" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="attribute_name_regex" value="label|id"/>
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="keep_subset_only" value="true"/>
            <operator name="Permutation" class="Permutation">
            </operator>
            <operator name="IdTagging (2)" class="IdTagging">
            </operator>
        </operator>
        <operator name="IOSelector" class="IOSelector">
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="select_which" value="2"/>
        </operator>
        <operator name="ExampleSetJoin" class="ExampleSetJoin">
        </operator>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <parameter key="leave_one_out" value="true"/>
            <operator name="NearestNeighbors" class="NearestNeighbors">
                <parameter key="k" value="3"/>
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="no">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="ClassificationPerformance" class="ClassificationPerformance">
                    <list key="class_weights">
                    </list>
                    <parameter key="classification_error" value="true"/>
                </operator>
            </operator>
        </operator>
    </operator>
</operator>



This should help.

Greetings,
  Sebastian
Regular Contributor

Re: How to do Y-randomization in Rapidminer?

Thank you so much. It worked perfectly.  ;D

Just one last question, when I do a breakpoint in ExampleSetJoin, I noticed that the id number of the dataset keeps increasing. Why is that so and will it have any impact on the memory?
Elite

Re: How to do Y-randomization in Rapidminer?

Hi,
no this won't increase the memory consumption. Memory of ExampleSets will be freed, if no ExampleSet exists adressing this memory. Keep in mind, that it have not be freed immediately. Java will free its memory when it thinks thats appropriate or needs it.

Greetings,
  Sebastian