Options

"Mis-labeling bug in model applying"

mdcmdc Member Posts: 58 Maven
edited June 2019 in Help

Hi,

I think this bug is related to the bug mentioned in this post http://rapid-i.com/rapidforum/index.php/topic,776.msg2897.html#new. Unfortunately, I couldn't understand the exact problem and the suggested workaround did not work for me.

The process file, shown below and includes both training and scoring, works without problem. My documents are classified correctly. However when I run a process file that includes only the scoring (the operators after the MemoryCleanUp) the documents are mislabeled.

Can anyone suggest a workaround?

thanks,
Matthew
<operator name="Root" class="Process" expanded="yes">
    <operator name="FeatureExtraction" class="FeatureExtraction">
        <list key="texts">
          <parameter key="ADC" value="../01 Data/Model Patents/ADC"/>
          <parameter key="DAC" value="../01 Data/Model Patents/DAC"/>
          <parameter key="Supply" value="../01 Data/Model Patents/Supply"/>
          <parameter key="ESD" value="../01 Data/Model Patents/ESD"/>
          <parameter key="IO" value="../01 Data/Model Patents/IO"/>
          <parameter key="Non_Volatile" value="../01 Data/Model Patents/Flash"/>
          <parameter key="PLL" value="../01 Data/Model Patents/PLL"/>
          <parameter key="DLL" value="../01 Data/Model Patents/DLL"/>
          <parameter key="Process" value="../01 Data/Model Patents/Process"/>
          <parameter key="Package" value="../01 Data/Model Patents/Package"/>
          <parameter key="Amplifer" value="../01 Data/Model Patents/Amplifier"/>
          <parameter key="MEMS" value="../01 Data/Model Patents/MEMS"/>
          <parameter key="Optoelectronics" value="../01 Data/Model Patents/Optoelectronics"/>
        </list>
        <parameter key="id_attribute_type" value="short"/>
        <list key="attributes">
          <parameter key="XTitle" value="//x:title[@language=&amp;#39;en&#39;]/text()"/>
          <parameter key="XAbstract" value="//x:abstract/x:paragraph/text()"/>
        </list>
        <list key="namespaces">
          <parameter key="x" value="http://schemas.delphion.com/20031014/ippublication"/>
        </list>
    </operator>
    <operator name="Nominal2String" class="Nominal2String">
    </operator>
    <operator name="StringTextInput" class="StringTextInput" expanded="no">
        <parameter key="remove_original_attributes" value="true"/>
        <parameter key="id_attribute_type" value="short"/>
        <list key="namespaces">
        </list>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
        <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
        </operator>
        <operator name="TokenLengthFilter" class="TokenLengthFilter">
            <parameter key="min_chars" value="2"/>
            <parameter key="max_chars" value="15"/>
        </operator>
        <operator name="PorterStemmer" class="PorterStemmer">
        </operator>
    </operator>
    <operator name="SVMWeighting" class="SVMWeighting">
    </operator>
    <operator name="AttributeWeightSelection" class="AttributeWeightSelection">
        <parameter key="weight_relation" value="top k"/>
        <parameter key="k" value="500"/>
    </operator>
    <operator name="ExampleSet2AttributeWeights" class="ExampleSet2AttributeWeights">
    </operator>
    <operator name="AttributeWeightsWriter" class="AttributeWeightsWriter">
        <parameter key="attribute_weights_file" value="%{process_name}_AttrWeight.wgt"/>
    </operator>
    <operator name="LibSVMLearner" class="LibSVMLearner">
        <parameter key="kernel_type" value="linear"/>
        <list key="class_weights">
        </list>
        <parameter key="calculate_confidences" value="true"/>
    </operator>
    <operator name="ModelWriter" class="ModelWriter">
        <parameter key="model_file" value="%{process_name}_Model.mod"/>
        <parameter key="output_type" value="XML"/>
    </operator>
    <operator name="MemoryCleanUp" class="MemoryCleanUp">
    </operator>
    <operator name="FeatureExtraction (2)" class="FeatureExtraction">
        <list key="texts">
          <parameter key="Uncategorized" value="../01 Data/Test Patents"/>
        </list>
        <parameter key="id_attribute_type" value="short"/>
        <list key="attributes">
          <parameter key="XTitle" value="//x:title[@language=&amp;#39;en&#39;]/text()"/>
          <parameter key="XAbstract" value="//x:abstract/x:paragraph/text()"/>
        </list>
        <list key="namespaces">
          <parameter key="x" value="http://schemas.delphion.com/20031014/ippublication"/>
        </list>
    </operator>
    <operator name="Nominal2String (2)" class="Nominal2String">
    </operator>
    <operator name="StringTextInput (2)" class="StringTextInput" expanded="no">
        <parameter key="id_attribute_type" value="short"/>
        <list key="namespaces">
        </list>
        <operator name="StringTokenizer (2)" class="StringTokenizer">
        </operator>
        <operator name="EnglishStopwordFilter (2)" class="EnglishStopwordFilter">
        </operator>
        <operator name="TokenLengthFilter (2)" class="TokenLengthFilter">
            <parameter key="min_chars" value="2"/>
            <parameter key="max_chars" value="13"/>
        </operator>
        <operator name="PorterStemmer (2)" class="PorterStemmer">
        </operator>
    </operator>
    <operator name="Title" class="ChangeAttributeRole">
        <parameter key="name" value="XTitle"/>
        <parameter key="target_role" value="XTitle"/>
    </operator>
    <operator name="Abstract" class="ChangeAttributeRole">
        <parameter key="name" value="XAbstract"/>
        <parameter key="target_role" value="XAbstract"/>
    </operator>
    <operator name="AttributeWeightsLoader (2)" class="AttributeWeightsLoader">
        <parameter key="attribute_weights_file" value="%{process_name}_AttrWeight.wgt"/>
    </operator>
    <operator name="AttributeWeightsApplier (2)" class="AttributeWeightsApplier">
    </operator>
    <operator name="ModelLoader" class="ModelLoader">
        <parameter key="model_file" value="%{process_name}_Model.mod"/>
    </operator>
    <operator name="ModelApplier" class="ModelApplier">
        <list key="application_parameters">
        </list>
    </operator>
</operator>

Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    what exactly do you mean by mislabeling? That the predictions are wrong, or that the label values get mingled? The later would be the case if something happens like: Each "A" is predicted as "B" and each "B" is predicted as "A".
    Greetings,
      Sebastian
  • Options
    mdcmdc Member Posts: 58 Maven
    Hi Sebastian,

    It's the second one, the label values get mingled.

    The process in my first post includes the training and the scoring ---and the labels are not mingled in the output. But when I run a process file with only the scoring part, the labels are mingled.

    thanks,
    Matthew
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    that should not happen. Hm, I will check that. Until I took a look inside the code, I would recommend a workaround:
    Unify both datasets and just filter out the examples with an undefined label before training and the examples with labels before applying. This should ensure that the internal mapping of nominal values is the same.

    Greetings,
      Sebastian
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I cannot reproduce your problem. Could you build a process with some sample data, reproducing the problem? Something I can easily load and test?

    Greetings,
      Sebastian
Sign In or Register to comment.