Options

"Classification Performance"

lindalinda Member Posts: 1 Contributor I
edited May 2019 in Help
Hi,

Input > Learner (DecisionTree ID3) > Example Set Generator > Classification Performance

My data is like this;

Attributes : 1   2  3 ...............10  Status
                 A   B  C                P(Patient)
                 B   A  A                NP(Not Patient)

I choose all Attribute value types nominal, and I select attribute tab for all attribute types except status. I choose label for it. Than press play button.

When I run this process, I got an error message  " The label attribute (label) must be nominal for the calculation of performance criteria for classification tasks."

how can I handle this problem ? My aim is to draw a decision tree and obtain the accuracy and classification error.

thanks in advance,

Linda
Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Linda,
    you have already mastered the half of the way. Now let's go through your process, so that you see, where the problems are.

    You first loaded the data and then learned the decision tree. Depending on your parameter settings, the decision tree consumed the example set. (Posting the complete process would be helpful, because I can then check your parameter settings)
    Then you are using the ExampleSetGenerator, which will create a new, randomly drawn exampleSet following a selected distribution. It seems to me, you have selected an distribution with a numerical label. So the last operator, used for calculating performance after applying a model, complains about having no nominal label.
    So there are two problems to solve: First of all, you probably will keep your data, in order to test the decision tree on data with the same attributes like the one you trained on. Therefore check the "keep_exampleset" parameter in the Decision Tree operator. The second problem is, that all performance operators need a prediction beside the label. So you have to apply your model by using a model applier operator.
    This would give you an estimation of the accuracy of the learner, but only on the training data! To have a robust estimation of its performance on new, unseen data, you will have to use a cross-validation. I will post an example process below.
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="sum classification"/>
        </operator>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <operator name="DecisionTree" class="DecisionTree">
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Performance" class="Performance">
                </operator>
            </operator>
        </operator>
    </operator>
    By the way: You will learn more about such standard situations, if you go through all examples of the online tutorial available in the welcome screen.

    Greetings,
      Sebastian
Sign In or Register to comment.