Options

Splitting examples into training/test

kglowackkglowack Member Posts: 2 Contributor I
edited November 2018 in Help
Hello,

I'm new to RapidMiner but I have spent some time playing around with the software. Anyway, I haven't been able to find a way to split the input file into training and test sets using an attribute. So basically, in my dataset I have an attribute specifying which examples belong to the training set and which to the test set. How can I train a model only on the training examples and test it on the test set? The only solution I found was to use the BatchXValidation but I want to build a single model (I believe BatchXValidation would build 2 models, correct?).

Any help would be very much appreciated.

Thanks!

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    you could use the ExampleFilter operator with the attribute_value_filter option, in order to select only examples of the first or of the second type.
    Combined with an IOMultiplier, which doubles your input example set, it should be possible to do as you like.

    Here's a (not very sensible, since splitting after the label does not make sense) example process, which demonstrates my suggestion:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="sum classification"/>
        </operator>
        <operator name="IOMultiplier" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter" breakpoints="after">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label = positive"/>
        </operator>
        <operator name="DecisionTree" class="DecisionTree">
        </operator>
        <operator name="ExampleFilter (2)" class="ExampleFilter" breakpoints="after">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label = negative"/>
        </operator>
        <operator name="ModelApplier" class="ModelApplier">
            <list key="application_parameters">
            </list>
        </operator>
        <operator name="ClassificationPerformance" class="ClassificationPerformance">
            <parameter key="accuracy" value="true"/>
            <list key="class_weights">
            </list>
        </operator>
    </operator>

    Greetings,
      Sebastian
  • Options
    kglowackkglowack Member Posts: 2 Contributor I
    Awesome. This indeed should work.

    Thanks!
    Karolina.
Sign In or Register to comment.