ModelApplier on multiple Models

Legacy UserLegacy User Member Posts: 0 Newbie
edited November 2018 in Help
Hi,

I built up a DecisionTree-Model on a Training-Dataset. The Validation is done by a XValidation. After writing down the model I run it over the Test-Dataset with the ModelApplier. This whole Processchain runs perfectly. But: My idea is to find the model which perfectly fits to the Test-Dataset. So I like to build up multiple models by implementing Bagging and evaluate them with the Testdata.  The problem is that the ModelApplier can only handle one Model. Do you see an option to run multiple models on a Test-Dataset and evaluate them by ClassificationPerformance?

Regards,
Thorsten

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,643  RM Founder
    Hi Thorsten,

    you could use the %{a}-macro together with an IteratingOperatorChain like it is described in this posting:

    http://rapid-i.com/rapidforum/index.php/topic,32.0.html

    This should also work in combination with a ClassificationPerformance evaluator - at least to manually check which model is the best one. For an completely automated selection, this would need a little amount of coding...

    By the way: are you sure that it is a good idea to select the best model on the test set? This is actually like overfitting but now not on the training but on the test set. In general, I would always suggest to use all data for model building and use a validation scheme like cross validation for performance evaluation only but not for model selection...

    Cheers,
    Ingo
  • ThorstenThorsten Member Posts: 1 Contributor I
    Thanks for your fast reply and ideas. Using the IteratingOperatorChain is agood idea, but I still have problems with the ModelApplier. I thought if I set the iteration value to ten the ModelApplier will calculate ten predictions by using ten different models derived by Bagging.
    <operator name="Root" class="Process" expanded="yes">
        <operator name="Trainingsdaten einlesen" class="ExampleSource">
        </operator>
        <operator name="Herausfiltern von Feature A" class="FeatureNameFilter">
            <parameter key="skip_features_with_name" value="A"/>
        </operator>
        <operator name="Feature AdvRatios herausfiltern" class="FeatureNameFilter">
            <parameter key="filter_special_features" value="true"/>
            <parameter key="skip_features_with_name" value="AdvRatios"/>
        </operator>
        <operator name="Kreuzvalidierung" class="XValidation" expanded="yes">
            <parameter key="create_complete_model" value="true"/>
            <parameter key="keep_example_set" value="true"/>
            <parameter key="leave_one_out" value="true"/>
            <operator name="Modell lernen" class="OperatorChain" expanded="yes">
                <operator name="Bagging" class="Bagging" expanded="yes">
                    <operator name="DecisionTree" class="DecisionTree">
                    </operator>
                </operator>
                <operator name="ModelWriter" class="ModelWriter">
                    <parameter key="model_file" value="model_%{a}.mod"/>
                    <parameter key="output_type" value="XML"/>
                </operator>
            </operator>
            <operator name="Modell testen und bewerten" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Bewertung des Modells" class="ClassificationPerformance">
                    <list key="class_weights">
                    </list>
                    <parameter key="classification_error" value="true"/>
                    <parameter key="correlation" value="true"/>
                    <parameter key="keep_example_set" value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="Testdaten vorbereiten" class="OperatorChain" expanded="yes">
            <operator name="Testdaten einlesen" class="ExampleSource">
            </operator>
            <operator name="Herausfiltern von Feature A (2)" class="FeatureNameFilter">
                <parameter key="skip_features_with_name" value="A"/>
            </operator>
            <operator name="Feature AdvRatios herausfiltern (2)" class="FeatureNameFilter">
                <parameter key="filter_special_features" value="true"/>
                <parameter key="skip_features_with_name" value="AdvRatios"/>
            </operator>
        </operator>
        <operator name="IteratingOperatorChain" class="IteratingOperatorChain" expanded="yes">
            <parameter key="iterations" value="10"/>
            <operator name="ModelLoader" class="ModelLoader">
                <parameter key="model_file" value="model_%{a}.mod"/>
            </operator>
            <operator name="ModelApplier (2)" class="ModelApplier">
                <list key="application_parameters">
                </list>
                <parameter key="keep_model" value="true"/>
            </operator>
            <operator name="Klassifikationssicherheit Test-/Trainingsdaten" class="ClassificationPerformance">
                <parameter key="accuracy" value="true"/>
                <list key="class_weights">
                </list>
                <parameter key="classification_error" value="true"/>
                <parameter key="kappa" value="true"/>
                <parameter key="keep_example_set" value="true"/>
            </operator>
        </operator>
    </operator>
    I also used the hole dataset for calculating a model. This is surely the best solution for dicriminating all the classes for this dataset. But in my case the model gets quite too complex because of high variations within the classes. So I would like to have an easier one which could also be used for similiar data.

    Thanks so far,
    Thorsten
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,643  RM Founder
    Hi again,
    Using the IteratingOperatorChain is agood idea, but I still have problems with the ModelApplier. I thought if I set the iteration value to ten the ModelApplier will calculate ten predictions by using ten different models derived by Bagging.
    You will not see the ten predictions from bagging since the ten base models are included in the bagging model and taken into account by the overlying model. I am not sure but you can probably "simulate" Bagging with a combination of the IteratingOperatorChain, a sampling operator, and a learner. Then you will get 10 models and can apply them alone.

    Cheers,
    Ingo
Sign In or Register to comment.