🎉 🎉. RAPIDMINER 9.8 IS OUT!!! 🎉 🎉

RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance

CLICK HERE TO DOWNLOAD

"Customized X-fold cross-validation"

_paul__paul_ Member Posts: 14 Contributor II
edited May 2019 in Help
Hi,

I want to perform an X-fold cross-validation which however does not
operate on sets that are defined by RapidMiner's XValidation "sampling_type"
parameter but on sets which are constructed using a "marker" in the
examples provided by an ExampleSource operator.

To be more accurate, my input examples (pairs of feature vectors and
labels) used for classification contain an attribute that defines the
application this particular example was extracted from. Let's say the
examples come from three applications "A", "B", and "C" and each
example contains an attribute holding one of the three characters.

Based on this, I would like to perform a 3-fold cross-validation where in
a first run, examples from "A" are excluded and tested on examples from
"B" and "C" ...

Is there an operator for that in RapidMiner?

Regards,
Paul

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi Paul,
    there is a special variant of the XValidation called BatchXValidation, where it uses an attribute with the special role batch to define the splitting sets. I post a process below, making use of this operator.

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="sum classification"/>
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="attribute_name_regex" value="att1"/>
            <operator name="BinDiscretization" class="BinDiscretization">
                <parameter key="number_of_bins" value="3"/>
                <parameter key="range_name_type" value="short"/>
            </operator>
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="att1"/>
            <parameter key="target_role" value="batch"/>
        </operator>
        <operator name="BatchXValidation" class="BatchXValidation" expanded="yes">
            <operator name="DecisionTree (2)" class="DecisionTree" breakpoints="before,after">
            </operator>
            <operator name="TestChain (2)" class="OperatorChain" expanded="no">
                <operator name="ModelApplier (2)" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Performance (2)" class="Performance">
                </operator>
            </operator>
        </operator>
    </operator>
    Greetings,
      Sebastian
  • _paul__paul_ Member Posts: 14 Contributor II
    Hi Sebastian,

    sorry for the late answer. ;-)

    I didn't really get the idea of your model. How does the BatchXValidation operator
    work? I assume that it relies on the operator ChangeAttributeRole (also on
    AttributeSubsetProcessing?), but it's not clear to me how the operators
    communicate.

    Let's say I've this example set:

    att1;att2;att3;label
    1; 2; A; YES
    2; 2; A; NO
    3; 4; B; YES
    1,4; C; NO
    2,4; C; NO
    4,4; C; YES

    and I would like to have a 3-fold cross-validation where in each
    run of the validation I want to exclude the examples belonging
    to the class (A,B,C) specified by attribute "att3".

    Thus, the cross-validation would look something like:
    1. step: Exclude examples from class A, learn model for examples
      from class B and C, and apply this model to examples from class A
    2. step: Exclude B, learn for A and C, apply to B
    3. step: Exlcude C, learn for A and B, apply to C

    How can I model this type of validation?
    And is there a way to figure out within the BatchXValidation operator
    which examples are currently excluded (like att3=A in 1. step)?

    Thank you.

    Regards,
    Paul
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi Paul,
    the BatchXValidation does not divide examples of the same batch over folds. Instead the batches are always completedly swapped into one fold.
    So, if you define your attribute att3 as the batch attribute and set the number of validations of the BatchXValidation on the numbers of different values in att3, this should do the trick.
    In the first round the first fold is removed, containing all As and learning will be carried out on the remaining folds. And so on...

    I hope this clarifies it?

    Greetings,
      Sebastian
  • _paul__paul_ Member Posts: 14 Contributor II
    Hi Sebastian,

    yes, the x-validation is now clear. Thank you.

    Best,
    Paul
Sign In or Register to comment.