Options

Problem with Naive-Bayes [SOLVED]

josejose Member Posts: 16 Contributor II
edited November 2018 in Help
Hi,

I am using the operator "Naive Bayes"  and shows me the following error:

"the learning scheme Naive Bayes does not have sufficient capabilities for handling an example set with only one label.
There are existing special modeling operators if only examples for one class are know. They support the "one class label" capability"

Anyone can explain me better referred to the error?

Thanks.

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Jose,

    Naive Bayes is trained on training data which must contain examples for each class. So if you want to create a Naive Bayes model to separate positive and negative examples, your training data must contain both examples with positive and negative label. If your data contains for example only positive examples, you get the error that you quoted.

    Best, Marius
  • Options
    josejose Member Posts: 16 Contributor II
    Hi Marius,

    What do you mean by that data must be labeled positive or negative?

    I have a text file (txt) with the positive feedback training in a folder. And in another folder I have the negative comments.
    Both files with positive and negative comments ... Should they be in the same folder?

    Thanks.
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Having the different classes in separate folders is fine, maybe there is a configuration error in your process. Please post the xml of the process you are using, so that we can check it.

    Best, Marius
  • Options
    josejose Member Posts: 16 Contributor II

    ok, here is the xml process...


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
        <process expanded="true" height="235" width="480">
          <operator activated="true" class="text:process_document_from_file" compatibility="5.2.001" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="30">
            <list key="text_directories">
              <parameter key="positivos" value="C:\Users\yop\Desktop\archivos de tesis\comentarios positivos"/>
              <parameter key="negativos" value="C:\Users\yop\Desktop\archivos de tesis\comentarios negativos"/>
            </list>
            <parameter key="encoding" value="UTF-8"/>
            <parameter key="vector_creation" value="Term Frequency"/>
            <process expanded="true" height="352" width="586">
              <operator activated="true" class="text:transform_cases" compatibility="5.2.001" expanded="true" height="60" name="Transform Cases" width="90" x="45" y="75"/>
              <operator activated="true" class="text:tokenize" compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="179" y="75"/>
              <operator activated="true" class="text:filter_by_length" compatibility="5.2.001" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="313" y="75"/>
              <operator activated="true" class="text:stem_snowball" compatibility="5.2.001" expanded="true" height="60" name="Stem (Snowball)" width="90" x="447" y="75">
                <parameter key="language" value="Spanish"/>
              </operator>
              <connect from_port="document" to_op="Transform Cases" to_port="document"/>
              <connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/>
              <connect from_op="Tokenize" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
              <connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
              <connect from_op="Stem (Snowball)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="x_validation" compatibility="5.2.000" expanded="true" height="130" name="Validation" width="90" x="179" y="30">
            <parameter key="number_of_validations" value="2"/>
            <process expanded="true" height="352" width="268">
              <operator activated="true" class="naive_bayes" compatibility="5.2.000" expanded="true" height="76" name="Naive Bayes" width="90" x="99" y="59">
                <parameter key="laplace_correction" value="false"/>
              </operator>
              <connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
              <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="352" width="279">
              <operator activated="true" class="apply_model" compatibility="5.2.000" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.2.000" expanded="true" height="76" name="Performance" width="90" x="112" y="165"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
              <portSpacing port="sink_averagable 3" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="write_model" compatibility="5.2.000" expanded="true" height="60" name="Write Model" width="90" x="380" y="165">
            <parameter key="model_file" value="C:\Users\yop\Desktop\archivos de tesis\model\model"/>
          </operator>
          <connect from_op="Process Documents from Files" from_port="example set" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="model" to_op="Write Model" to_port="input"/>
          <connect from_op="Validation" from_port="training" to_port="result 1"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
          <connect from_op="Validation" from_port="averagable 2" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hm, that looks fine. Is it possible that you have a type in one of the input paths, such that the example set contains only positives or only negatives? You can check that e.g. by setting a breakpoint after process documents and checking the values of the label attribute.
    Problems could also occur if your data contains only *very* few examples for one or both of the classes.

    Best, Marius
  • Options
    josejose Member Posts: 16 Contributor II


    Thanks Marius.

      the amount of data was small. I increased the amount of data and it worked perfectly.
    Thanks for your help.
Sign In or Register to comment.