beginers question

shoneshone Member Posts: 3 Contributor I
edited November 2018 in Help
Hi, in my coledge we have one project about data mining, and the tool we use is rapidminer. Since I'm new to rapidminer, a have one question for you. My process looks like this:

Root
     ExampleSource
     FeatureSelection
           XValidation (number_of_validations = 10)
           MetaCost
                 DecisionTree
           OperatorChain
                 ModelApplier
                 ClassificationPerformance

I figured that model building is happening in iterations and the model we get at the and is the one that has the best results. When the process is finished, it shows me PerformaceVector in form of confusion matrix. The question is: Is that ConfusionMatrix for the last model, or for the best model?

Answers

  • steffensteffen Member Posts: 347 Maven
    Hello and welcome to RapidMiner

    The answer is: the last model. But: Since the FeatureSelection stops when no more improvememt can be achieved (see description of FeatureSelection in tutorial.pdf or by selecting the operator and press F1) it is also the best model, which can represent a local maximum.

    See another example in <your-rm-workspace>\sample\05_Features\10_ForwardSelection.xml.

    regards,

    Steffen

  • shoneshone Member Posts: 3 Contributor I
    Thanks for the reply. :)
    The reason I asked this is, because, when I save the model (which is result of the given process), and load it in another process and apply it to the same data set, that was used in in the first process, Confusion matrix produced by ClassificationPerformance is different then the one in first process. Why is that?
  • steffensteffen Member Posts: 347 Maven
    Ok, I think some terms have been mixed up. In the future please provide the complete setup (just copy all the text from the xm-tab in RapidMiner ... and put it into the thread by please using the code (#) tag).

    Your posted setup as the example mentioned by me does not produce a model. It just produces AttributeWeights. So to gain comparable result you have to use a process like this one:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="Input" class="ExampleSource">
            <parameter key="attributes" value="../data/polynomial.aml"/>
        </operator>
        <operator name="AttributeWeightsLoader" class="AttributeWeightsLoader">
        </operator>
        <operator name="AttributeWeightsApplier" class="AttributeWeightsApplier">
        </operator>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <parameter key="create_complete_model" value="true"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <operator name="NearestNeighbors" class="NearestNeighbors">
                <parameter key="k" value="5"/>
            </operator>
            <operator name="ApplierChain" class="OperatorChain" expanded="yes">
                <operator name="Applier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Performance" class="Performance">
                </operator>
            </operator>
        </operator>
        <operator name="ProcessLog" class="ProcessLog">
            <list key="log">
              <parameter key="generation" value="operator.FS.value.generation"/>
              <parameter key="performance" value="operator.FS.value.performance"/>
            </list>
        </operator>
    </operator>
    I said "comparable" not "the same", because to gain exactly the same results you have to ensure that the data is splitted by XValidation exactly the same way as in the last iteration of FeatureSelection. You can achieve this by setting the parameter local_random_seed to a value > 0 (in both the FeatureSelection process and the process specified above). But I do not know why this should matter.

    If your proces does produce a model or I misunderstood anything else, please post it here. Otherwise I am restricted to guessing ...

    Hope this was helpful

    regards,

    Steffen

  • shoneshone Member Posts: 3 Contributor I
    I forgott to write, that i've added ModelWriter after the ClassificationPerformance operator.
  • steffensteffen Member Posts: 347 Maven
    Fine.

    So you save the model every step of XValidation or only the final model (by setting the related parameter) ? No matter what case is the true one, make sure that you have understood XValidation and / or read the documentation of the RapidMiner implementation (select the operator and press F1).
Sign In or Register to comment.