RapidMiner

RapidMiner

Decision Tree Model Not Visible

Regular Contributor

Decision Tree Model Not Visible

Hello again,



I am trying to use a decision tree learner for a problem.  If i run the stream with just the input file node and the decision tree learner, the resulting decision tree is shown fine. However when i run the following stream (essentially i perform cross-validation), i cannot see the resulting tree (and hence the resulting model). Here is the setup :

<operator name="Root" class="Process" expanded="yes">
    <operator name="CSVExampleSource" class="CSVExampleSource">
        <parameter key="filename" value="D:\MyDocumentsr\kvltrain.csv"/>
        <parameter key="label_name" value="zkvl"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="skip_features_with_name" value="(Age|Profession)"/>
    </operator>
    <operator name="XValidation" class="XValidation" expanded="yes">
        <operator name="DecisionTree" class="DecisionTree">
            <parameter key="keep_example_set" value="true"/>
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="ModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="ClassificationPerformance" class="ClassificationPerformance">
                <parameter key="absolute_error" value="true"/>
                <parameter key="accuracy" value="true"/>
                <list key="class_weights">
                </list>
                <parameter key="classification_error" value="true"/>
                <parameter key="normalized_absolute_error" value="true"/>
                <parameter key="root_mean_squared_error" value="true"/>
                <parameter key="root_relative_squared_error" value="true"/>
            </operator>
        </operator>
    </operator>
    <operator name="ProcessLog" class="ProcessLog">
        <parameter key="filename" value="D:\Programs\Rapid-I\rm_workspace\logger.log"/>
        <list key="log">
          <parameter key="accuracy" value="operator.CSVExampleSource.value.null"/>
        </list>
    </operator>
    <operator name="GnuplotWriter" class="GnuplotWriter">
        <parameter key="additional_parameters" value="set grid"/>
        <parameter key="name" value="ProcessLog"/>
        <parameter key="output_file" value="D:\Programs\Rapid-I\rm_workspace\log.gnu"/>
        <parameter key="values" value="accuracy"/>
        <parameter key="x_axis" value="accuracy"/>
    </operator>
</operator>



Any idea as to why this is happening?



Thanks,


Harry
3 REPLIES
Regular Contributor

Re: Decision Tree Model Not Visible

Ok found out what happened : The Model gets consumed (?) in the first operator of cross validation. However if i save the model first and then read it at the end of the process chain, the decision tree shows fine :


Here is the setup :

<operator name="Root" class="Process" expanded="yes">
    <operator name="CSVExampleSource" class="CSVExampleSource">
        <parameter key="filename" value="D:\MyDocuments\kvltrain.csv"/>
        <parameter key="label_name" value="zkvl"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="skip_features_with_name" value="(Age|Profession)"/>
    </operator>
    <operator name="XValidation" class="XValidation" expanded="yes">
        <parameter key="number_of_validations" value="3"/>
        <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
            <operator name="DecisionTree" class="DecisionTree">
                <parameter key="keep_example_set" value="true"/>
            </operator>
            <operator name="ModelWriter" class="ModelWriter">
                <parameter key="model_file" value="D:\Programs\Rapid-I\rm_workspace\model.mod"/>
            </operator>
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="ModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="ClassificationPerformance" class="ClassificationPerformance">
                <parameter key="absolute_error" value="true"/>
                <parameter key="accuracy" value="true"/>
                <list key="class_weights">
                </list>
                <parameter key="classification_error" value="true"/>
                <parameter key="normalized_absolute_error" value="true"/>
                <parameter key="root_mean_squared_error" value="true"/>
                <parameter key="root_relative_squared_error" value="true"/>
            </operator>
        </operator>
    </operator>
    <operator name="ProcessLog" class="ProcessLog">
        <parameter key="filename" value="D:\Programs\Rapid-I\rm_workspace\logger.log"/>
        <list key="log">
          <parameter key="accuracy" value="operator.CSVExampleSource.value.null"/>
        </list>
    </operator>
    <operator name="GnuplotWriter" class="GnuplotWriter">
        <parameter key="additional_parameters" value="set grid"/>
        <parameter key="name" value="ProcessLog"/>
        <parameter key="output_file" value="D:\Programs\Rapid-I\rm_workspace\log.gnu"/>
        <parameter key="values" value="accuracy"/>
        <parameter key="x_axis" value="accuracy"/>
    </operator>
    <operator name="ModelLoader" class="ModelLoader">
        <parameter key="model_file" value="D:\Programs\Rapid-I\rm_workspace\model.mod"/>
    </operator>
</operator>
Moderator

Re: Decision Tree Model Not Visible

Hi,

hgwelec wrote:

Ok found out what happened : The Model gets consumed (?) in the first operator of cross validation. However if i save the model first and then read it at the end of the process chain, the decision tree shows fine :


you are right in that a decision tree is shown but its probably not the decision tree you want to look at. The thing is, that the [tt]XValidation[/tt] is a kind of loop that repeatedly learns a model (by applying the [tt]DecisionTree[/tt] learner) on a portion of the data and tests its performance on the complementary portion of the data where the actual chosen portion differs from iteration to iteration. Hence, if you save the model inside the [tt]XValidation[/tt] operator you always save a model which is learned only on a portion of the data. Hence, if you want to learn the complete model in addition to the determination of the learning performance you may simply turn on the parameter [tt]learn_complete_model[/tt] in the parameters of the [tt]XValidation[/tt] operator which will then apply the learner once more on the complete set and finally output the resulting model. If you compare the resulting model to the model you wrote out during the cross validation, you will probably observe a difference between them.

Regards,
Tobias

Regular Contributor

Re: Decision Tree Model Not Visible

Hello Tobias,


First, thanks for your reply. After some experimentation i found out about the learn_complete_model option, right after i sent my first reply in the spirit of "we share our knowledge with the community"  Smiley Happy

Your reply puts things in order....Thanks again!