🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Xvalidation

evgenyevgeny Member Posts: 11 Contributor II
edited November 2018 in Help
hi,

i am a rapid-i novice.

i want to train the model on a specific set of data and then test on another specific set. from what i can see (so far), there is a round about way of doing this by using:

Xvalidation and selelecting sampling_type = "linear sampling" and number_of_validations = 2

although this requires that both training and testing data sets have the same number of elements and are in a particular order.

is there a more general / sensible way of doing this? in particular, can i base the sampling on one of the data attributes?

many thanks, evgeny.

Answers

  • haddockhaddock Member Posts: 849  Guru
    G'Day Evgeny,

    Welcome to the world of countless combinations! Sure, linear sampling in a validation wrapper would work, with the limitations you spot, but you can always go freestyle....

    1. Filter your examples by attribute value to make a training set.
    2. Add a learner to make a model, but do not keep the training examples.
    3. Load your test set and apply the model.
    4. Have a beer, and examine the results.

    Actually the XVal operators just bundle this up so you can test repeatedly, but don't stop for the beer ( but you can always insert a break for that  ;D ).

    Good luck!

  • evgenyevgeny Member Posts: 11 Contributor II
    tx for the quick response. i don't suppose you can post an example of pts 1-3? i.e. how one would do it in practice.
  • haddockhaddock Member Posts: 849  Guru
    Just this once  ;)
    <operator name="Root" class="Process" expanded="yes">
        <description text="Check comments tab!"/>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple non linear classification"/>
        </operator>
        <operator name="Copy exampleset" class="IOMultiplier">
            <description text="The Learner will consume the examples, so keep a copy for later."/>
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="1 Train on Att1 positive" class="ExampleFilter" breakpoints="after">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="att1&gt;0"/>
        </operator>
        <operator name="2 Make model" class="ID3Numerical">
        </operator>
        <operator name="3 Test on Att1 negative" class="ExampleFilter" breakpoints="after">
            <description text="As the learner has consumed the original exampleset, the copy madein step two is on top of the stack."/>
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="att1&lt;0"/>
        </operator>
        <operator name="Get Beer" class="ModelApplier">
            <parameter key="keep_model" value="true"/>
            <list key="application_parameters">
            </list>
        </operator>
    </operator>
  • evgenyevgeny Member Posts: 11 Contributor II
    thanks - that's very helpful.
Sign In or Register to comment.