"Bug in Binary2MultiClassLearner Output Display"

mdc · December 2009

Hi,

I am doing text classification using LibSVMLearner inside the Binary2MultiClassLearner. In the GUI, a display of the output of the Binary2MultiClassLearner is shown. The display shows different tabs like: "ClassX vs all other", "ClassY vs all other", "ClassZ vs all other", etc.
The strange part:
1. Each tab (class) shows its top (weight) attributes that should belong to the next tab (class). For example, the ClassY tab shows its top attributes that I think should belong to ClassZ.
2. The first tab (class) seems to contain all the top attributes from the other classes.
3. I don't know the importance of the Bias (offset) but the first tab has positive 1.001 while the rest of the tabs have biases close to negative 1.

Has anybody encountered this? I'm sure I've placed the training data in the right directories and labelled them correctly.

thanks,
Matthew

land · December 2009

Hi,
could you post your process here, after having exchanged the data source operator by an example Set generator?
Then I would take a look at the problem.

Greetings,
Sebastian

mdc · December 2009

Hi Sebastian,

I apologize for taking so long to reply. Here is the process file that I use for text mining, unfortunately I don't know how to replace it with the equivalent example set generator.

It is not really a problem because the model generated classifies my input texts correctly. I just noticed that the Binary2Multiclass output window show interchanged labels (x vs all other, y vs all other, etc) and thought that this could be a bug.

thanks,
Matthew

<operator name="Root" class="Process" expanded="yes">
    <operator name="FeatureExtraction" class="FeatureExtraction">
        <list key="texts">
          <parameter key="ADC"	value="../01 Data/Model Patents/ADC"/>
          <parameter key="DAC"	value="../01 Data/Model Patents/DAC"/>
          <parameter key="Supply"	value="../01 Data/Model Patents/Supply"/>
          <parameter key="ESD"	value="../01 Data/Model Patents/ESD"/>
          <parameter key="IO"	value="../01 Data/Model Patents/IO"/>
          <parameter key="Non_Volatile"	value="../01 Data/Model Patents/Flash"/>
          <parameter key="PLL"	value="../01 Data/Model Patents/PLL"/>
          <parameter key="DLL"	value="../01 Data/Model Patents/DLL"/>
          <parameter key="Process"	value="../01 Data/Model Patents/Process"/>
          <parameter key="Package"	value="../01 Data/Model Patents/Package"/>
          <parameter key="Amplifer"	value="../01 Data/Model Patents/Amplifier"/>
          <parameter key="MEMS"	value="../01 Data/Model Patents/MEMS"/>
          <parameter key="Optoelectronics"	value="../01 Data/Model Patents/Optoelectronics"/>
        </list>
        <parameter key="id_attribute_type"	value="short"/>
        <list key="attributes">
          <parameter key="XTitle"	value="//x:title[@language=&amp;#39;en&#39;]/text()"/>
          <parameter key="XAbstract"	value="//x:abstract/x:paragraph/text()"/>
        </list>
        <list key="namespaces">
          <parameter key="x"	value="http://schemas.delphion.com/20031014/ippublication"/>
        </list>
    </operator>
    <operator name="Nominal2String" class="Nominal2String">
    </operator>
    <operator name="StringTextInput" class="StringTextInput" expanded="no">
        <parameter key="remove_original_attributes"	value="true"/>
        <parameter key="id_attribute_type"	value="short"/>
        <list key="namespaces">
        </list>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
        <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
        </operator>
        <operator name="TokenLengthFilter" class="TokenLengthFilter">
            <parameter key="min_chars"	value="2"/>
            <parameter key="max_chars"	value="15"/>
        </operator>
        <operator name="PorterStemmer" class="PorterStemmer">
        </operator>
    </operator>
    <operator name="SVMWeighting" class="SVMWeighting">
    </operator>
    <operator name="AttributeWeightSelection" class="AttributeWeightSelection">
        <parameter key="weight_relation"	value="top k"/>
        <parameter key="k"	value="500"/>
    </operator>
    <operator name="ExampleSet2AttributeWeights" class="ExampleSet2AttributeWeights">
    </operator>
    <operator name="AttributeWeightsWriter" class="AttributeWeightsWriter">
        <parameter key="attribute_weights_file"	value="%{process_name}_AttrWeight.wgt"/>
    </operator>
    <operator name="Binary2MultiClassLearner" class="Binary2MultiClassLearner" expanded="yes">
        <operator name="Weighted Class" class="LibSVMLearner">
            <parameter key="kernel_type"	value="linear"/>
            <parameter key="C"	value="10.0"/>
            <list key="class_weights">
              <parameter key="Clocking"	value="2.0"/>
              <parameter key="Memory"	value="2.0"/>
              <parameter key="Converter"	value="2.0"/>
              <parameter key="Process"	value="3.0"/>
              <parameter key="Package"	value="3.0"/>
              <parameter key="IO"	value="2.0"/>
              <parameter key="ESD"	value="2.0"/>
              <parameter key="Supply"	value="2.0"/>
            </list>
            <parameter key="calculate_confidences"	value="true"/>
        </operator>
    </operator>
    <operator name="ModelWriter" class="ModelWriter">
        <parameter key="model_file"	value="%{process_name}_Model.mod"/>
        <parameter key="output_type"	value="XML"/>
    </operator>
</operator>

land · December 2009

Hi,
where's the interchange in these titles? The binary to multiclass learner does learn each class against all others. So for each class there's a tab.
I don't see the problem here.

Greetings,
Sebastian

mdc · December 2009

Hi,

I'll try to illustrate the problem here.

X vs all other Y vs all other Z vs all other
Aattr1 Battr1 Cattr1
Aattr2 Battr2 Cattr2
Aattr3 Battr3 Cattr3
Aattr4 Battr4 Cattr4
... .... ....

Each column is the tab. In each tab I clicked on the Weight Table View and sort (descending) by Weight to see the top attributes for each tab. But when I analyzed the top attributes for one tab, I think those attributes belong to the next tab. For the example above, the attributes Aattr should belong to the Y tab, and the Battr should belong to the Z tab.

Is this one clear enough. As I've said, I think this is just on the display since I am still getting correct classification.

thanks,

land · December 2009

Hi,
now I understand what you mean. I have checked this, but the error you describe is simply impossible, so I guess you have to reinterpret your results

If you check the text view, you will see, that these weights are connected to the class shown in the tab and are the same as on the weights table view.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Bug in Binary2MultiClassLearner Output Display"

Answers