"Bug in Binary2MultiClassLearner Output Display"

mdcmdc Member Posts: 58 Maven
edited June 2019 in Help
Hi,

I am doing text classification using LibSVMLearner inside the Binary2MultiClassLearner. In the GUI, a display of the output of the Binary2MultiClassLearner is shown. The display shows different tabs like: "ClassX vs all other", "ClassY vs all other", "ClassZ vs all other", etc.
The strange part:
1. Each tab (class) shows its top (weight) attributes that should belong to the next tab (class). For example, the ClassY tab shows its top attributes that I think should belong to ClassZ.
2. The first tab (class) seems to contain all the top attributes from the other classes.
3. I don't know the importance of the Bias (offset) but the first tab has positive 1.001 while the rest of the tabs have biases close to negative 1.

Has anybody encountered this? I'm sure I've placed the training data in the right directories and labelled them correctly.

thanks,
Matthew
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    could you post your process here, after having exchanged the data source operator by an example Set generator?
    Then I would take a look at the problem.


    Greetings,
    Sebastian
  • mdcmdc Member Posts: 58 Maven
    Hi Sebastian,

    I apologize for taking so long to reply. Here is the process file that I use for text mining, unfortunately I don't know how to replace it with the equivalent example set generator.

    It is not really a problem because the model generated classifies my input texts correctly. I just noticed that the Binary2Multiclass output window show interchanged labels (x vs all other, y vs all other, etc) and thought that this could be a bug.

    thanks,
    Matthew
    <operator name="Root" class="Process" expanded="yes">
        <operator name="FeatureExtraction" class="FeatureExtraction">
            <list key="texts">
              <parameter key="ADC" value="../01 Data/Model Patents/ADC"/>
              <parameter key="DAC" value="../01 Data/Model Patents/DAC"/>
              <parameter key="Supply" value="../01 Data/Model Patents/Supply"/>
              <parameter key="ESD" value="../01 Data/Model Patents/ESD"/>
              <parameter key="IO" value="../01 Data/Model Patents/IO"/>
              <parameter key="Non_Volatile" value="../01 Data/Model Patents/Flash"/>
              <parameter key="PLL" value="../01 Data/Model Patents/PLL"/>
              <parameter key="DLL" value="../01 Data/Model Patents/DLL"/>
              <parameter key="Process" value="../01 Data/Model Patents/Process"/>
              <parameter key="Package" value="../01 Data/Model Patents/Package"/>
              <parameter key="Amplifer" value="../01 Data/Model Patents/Amplifier"/>
              <parameter key="MEMS" value="../01 Data/Model Patents/MEMS"/>
              <parameter key="Optoelectronics" value="../01 Data/Model Patents/Optoelectronics"/>
            </list>
            <parameter key="id_attribute_type" value="short"/>
            <list key="attributes">
              <parameter key="XTitle" value="//x:title[@language=&amp;#39;en&#39;]/text()"/>
              <parameter key="XAbstract" value="//x:abstract/x:paragraph/text()"/>
            </list>
            <list key="namespaces">
              <parameter key="x" value="http://schemas.delphion.com/20031014/ippublication"/>
            </list>
        </operator>
        <operator name="Nominal2String" class="Nominal2String">
        </operator>
        <operator name="StringTextInput" class="StringTextInput" expanded="no">
            <parameter key="remove_original_attributes" value="true"/>
            <parameter key="id_attribute_type" value="short"/>
            <list key="namespaces">
            </list>
            <operator name="StringTokenizer" class="StringTokenizer">
            </operator>
            <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
            </operator>
            <operator name="TokenLengthFilter" class="TokenLengthFilter">
                <parameter key="min_chars" value="2"/>
                <parameter key="max_chars" value="15"/>
            </operator>
            <operator name="PorterStemmer" class="PorterStemmer">
            </operator>
        </operator>
        <operator name="SVMWeighting" class="SVMWeighting">
        </operator>
        <operator name="AttributeWeightSelection" class="AttributeWeightSelection">
            <parameter key="weight_relation" value="top k"/>
            <parameter key="k" value="500"/>
        </operator>
        <operator name="ExampleSet2AttributeWeights" class="ExampleSet2AttributeWeights">
        </operator>
        <operator name="AttributeWeightsWriter" class="AttributeWeightsWriter">
            <parameter key="attribute_weights_file" value="%{process_name}_AttrWeight.wgt"/>
        </operator>
        <operator name="Binary2MultiClassLearner" class="Binary2MultiClassLearner" expanded="yes">
            <operator name="Weighted Class" class="LibSVMLearner">
                <parameter key="kernel_type" value="linear"/>
                <parameter key="C" value="10.0"/>
                <list key="class_weights">
                  <parameter key="Clocking" value="2.0"/>
                  <parameter key="Memory" value="2.0"/>
                  <parameter key="Converter" value="2.0"/>
                  <parameter key="Process" value="3.0"/>
                  <parameter key="Package" value="3.0"/>
                  <parameter key="IO" value="2.0"/>
                  <parameter key="ESD" value="2.0"/>
                  <parameter key="Supply" value="2.0"/>
                </list>
                <parameter key="calculate_confidences" value="true"/>
            </operator>
        </operator>
        <operator name="ModelWriter" class="ModelWriter">
            <parameter key="model_file" value="%{process_name}_Model.mod"/>
            <parameter key="output_type" value="XML"/>
        </operator>
    </operator>
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    where's the interchange in these titles? The binary to multiclass learner does learn each class against all others. So for each class there's a tab.
    I don't see the problem here.

    Greetings,
      Sebastian
  • mdcmdc Member Posts: 58 Maven

    Hi,

    I'll try to illustrate the problem here.

    X vs all other            Y vs all other          Z vs all other
    Aattr1                                Battr1                        Cattr1
    Aattr2                                Battr2                        Cattr2
    Aattr3                                Battr3                        Cattr3
    Aattr4                                Battr4                        Cattr4
    ...                                        ....                              ....

    Each column is the tab. In each tab I clicked on the Weight Table View and sort (descending) by Weight to see the top attributes for each tab. But when I analyzed the top attributes for one tab, I think those attributes belong to the next tab. For the example above, the attributes Aattr should belong to the Y tab, and the Battr should belong to the Z tab.

    Is this one clear enough. As I've said, I think this is just on the display since I am still getting correct classification.

    thanks,

    image
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    now I understand what you mean. I have checked this, but the error you describe is simply impossible, so I guess you have to reinterpret your results :) If you check the text view, you will see, that these weights are connected to the class shown in the tab and are the same as on the weights table view.

    Greetings,
      Sebastian
Sign In or Register to comment.