Confusion matrix as a file?

harrisharris Member Posts: 8 Contributor II
Hi Ingo et al.

I would like to use RM's confusion matrix visualisation for to report my results using Weka classifiers (several of them trained on different arffs)

This works...:
<operator name="Root" class="Process" expanded="yes">
    <description text="The performance criteria accuracy and classification_error produces a confusion matrix for both binary and multi classification tasks. This confusion matrix even holds the information for per-class performance."/>
    <operator name="ArffExampleSource" class="ArffExampleSource">
        <parameter key="data_file" value="C:\Users\HarriS\Desktop\datasets\runnables-CEO\datasets\27-fold-test-CSPZ.arff"/>
        <parameter key="label_attribute" value="fold"/>
    </operator>
    <operator name="XValidation" class="XValidation" expanded="yes">
        <operator name="NearestNeighbors" class="NearestNeighbors">
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="yes">
            <operator name="ModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="ClassificationPerformance" class="ClassificationPerformance">
                <parameter key="accuracy" value="true"/>
                <list key="class_weights">
                </list>
                <parameter key="kappa" value="true"/>
            </operator>
        </operator>
    </operator>
</operator>

...only with one classifier trained on one arff but while I'm sure multiple classifiers on multiple arffs can be done in RM (and would be obliged for any script revisions or sample scripts on how to nest them here)

But perhaps this might be quicker and more in line with my obtained results: can I somehow give RM a preprepared confusion matrix that ClassificationPerformance operator could accept, i.e. so that those nests of multiple classifier operators on multiple input sources wouldn't need to be run in RM because I'm not at all sure that the same EXACT results would emerge as they do with Weka 3.5.7 (can be wrong but running all of them again would require a lot of further work on my part).

If so, what would be the format of the cofusion matrix?
thanks, Harri S

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Harri,
    although this is not as easy as adding one operator, this is still feasible. Take a look at this example process:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="sample\data\iris.aml"/>
        </operator>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <operator name="DecisionTree" class="DecisionTree">
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="PerformanceEvaluator" class="PerformanceEvaluator">
                    <parameter key="accuracy" value="true"/>
                </operator>
            </operator>
        </operator>
        <operator name="IOObjectWriter" class="IOObjectWriter">
            <parameter key="io_object" value="PerformanceVector"/>
            <parameter key="object_file" value="test.xml"/>
            <parameter key="output_type" value="XML"/>
        </operator>
    </operator>
    This will create a test.xml file in the processes directory. This will look like that:
    <object-stream>
      <PerformanceVector id="1">
        <currentValues id="2">
          <entry>
            <string>accuracy</string>
            <double>0.9400000000000001</double>
          </entry>
        </currentValues>
        <comparator class="com.rapidminer.operator.performance.PerformanceVector$DefaultComparator" id="3"/>
        <mainCriterion>first</mainCriterion>
        <averagesList id="4">
          <kappa id="5">
            <counter id="6">
              <double-array id="7">
                <double>50.0</double>
                <double>0.0</double>
                <double>0.0</double>
              </double-array>
              <double-array id="8">
                <double>0.0</double>
                <double>45.0</double>
                <double>5.0</double>
              </double-array>
              <double-array id="9">
                <double>0.0</double>
                <double>4.0</double>
                <double>46.0</double>
              </double-array>
            </counter>
            <classNames id="10">
              <string>Iris-setosa</string>
              <string>Iris-versicolor</string>
              <string>Iris-virginica</string>
            </classNames>
            <classNameMap id="11">
              <entry>
                <string>Iris-virginica</string>
                <int>2</int>
              </entry>
              <entry>
                <string>Iris-setosa</string>
                <int>0</int>
              </entry>
              <entry>
                <string>Iris-versicolor</string>
                <int>1</int>
              </entry>
            </classNameMap>
            <labelAttribute class="PolynominalAttribute" id="12">
              <nominalMapping class="PolynominalMapping" id="13">
                <symbolToIndexMap id="14">
                  <entry>
                    <string>Iris-virginica</string>
                    <int>2</int>
                  </entry>
                  <entry>
                    <string>Iris-setosa</string>
                    <int>0</int>
                  </entry>
                  <entry>
                    <string>Iris-versicolor</string>
                    <int>1</int>
                  </entry>
                </symbolToIndexMap>
                <indexToSymbolMap id="15">
                  <string>Iris-setosa</string>
                  <string>Iris-versicolor</string>
                  <string>Iris-virginica</string>
                </indexToSymbolMap>
              </nominalMapping>
              <attributeDescription id="16">
                <name>label</name>
                <valueType>1</valueType>
                <blockType>1</blockType>
                <defaultValue>0.0</defaultValue>
                <index>5</index>
              </attributeDescription>
              <transformations id="17">
                <com.rapidminer.example.set.AttributeTransformationRemapping id="18">
                  <overlayedMapping class="PolynominalMapping" reference="13"/>
                </com.rapidminer.example.set.AttributeTransformationRemapping>
              </transformations>
              <statistics class="linked-list" id="19">
                <NominalStatistics id="20">
                  <mode>-1</mode>
                  <maxCounter>0</maxCounter>
                </NominalStatistics>
                <UnknownStatistics id="21">
                  <unknownCounter>0</unknownCounter>
                </UnknownStatistics>
              </statistics>
              <constructionDescription>label</constructionDescription>
            </labelAttribute>
            <predictedLabelAttribute class="PolynominalAttribute" id="22">
              <nominalMapping class="PolynominalMapping" reference="13"/>
              <attributeDescription id="23">
                <name>prediction(label)</name>
                <valueType>1</valueType>
                <blockType>1</blockType>
                <defaultValue>0.0</defaultValue>
                <index>6</index>
              </attributeDescription>
              <transformations id="24"/>
              <statistics class="linked-list" id="25">
                <NominalStatistics id="26">
                  <mode>0</mode>
                  <maxCounter>45</maxCounter>
                  <scores id="27">
                    <long>45</long>
                    <long>45</long>
                    <long>45</long>
                  </scores>
                </NominalStatistics>
                <UnknownStatistics id="28">
                  <unknownCounter>0</unknownCounter>
                </UnknownStatistics>
              </statistics>
              <constructionDescription>prediction(label)</constructionDescription>
            </predictedLabelAttribute>
            <type>0</type>
            <meanSum>9.4</meanSum>
            <meanSquaredSum>8.84888888888889</meanSquaredSum>
            <averageCount>10</averageCount>
          </kappa>
        </averagesList>
        <source>PerformanceEvaluator</source>
      </PerformanceVector>
    You could then insert your values and reload the file using the IOObjectReader Operator.

    If its only because of the plotter, you simply could load the files as an exampleSet and set the plotter properties in an apropriate way to get the same visualisation.

    Greetings,
      Sebastian

    PS: Its possible to nest those learners and the files using the operator DirectoryIterator and ParameterIteration together with the OperatorSelector.

    Greetings,
      Sebastian
Sign In or Register to comment.