RapidMiner

Learner I hugo_gha
Learner I

Problem with naive bayes

Hi, 

 

I am trying to use a Naive Bayes model to train my sentiment analysis, but when I try to apply the model to my scoring set I get the following error:

 

"The learned model "Simple Distribution" does not support the parameter "create_view". Some models support parameters for the predictions of values. This model does not support the given parameter". 

 

I would love it if someone can help me with this. Thanks!

10 REPLIES
Maven
Maven

Re: Problem with naive bayes

Does your test set have the same characteristics (number of attributes and attribute types) as your training set? 

Learner I hugo_gha
Learner I

Re: Problem with naive bayes

Thanks for the response. It does. My training set has 2 columns, one for the text I´m analyzing and one for the sentiment. The test set is a different data set that has the same columns, one for the text and one for the sentiment. The sentiments for my test set are fake, tho. What I really want to do is to predict the sentiment of my test set based on the trained model and keep the predicted labels.

 

Maven
Maven

Re: Problem with naive bayes

It is a bit difficult to figure out why you get the error, without seeing your process. Is the error already occuring during training of the model or only when you try to apply the trained model to the test data? If it is the latter, it would indicate that you are feeding your model data that is somehow different than the training data.

Learner I hugo_gha
Learner I

Re: Problem with naive bayes

Indeed, it happens in the Apply Model operator. The cross validation occurs with no issues, it seems. 

 

Here are two captures of the first few rows of the data sets. I can´t tell if there is any difference between them.

 

Thanks for your help, I appreciate it.

 

 

Highlighted
RM Staff
RM Staff

Re: Problem with naive bayes

It seams you skipped some necessary "text processing" on the input data? 

Learner I hugo_gha
Learner I

Re: Problem with naive bayes

What do you mean by necessary? I mean I do have a Process Document Operator that tokenizes, stems, eliminate stopwords, and filters hashtags for both datasets in my process.

 

Thanks for replying!!

RM Certified Expert
RM Certified Expert

Re: Problem with naive bayes

It sounds like you didn't use the trained Wordlist as a preprocessing input to your scoring set.

 

This is an example, just swap out the CV for your Naive Bayes with CV and set your Sentiment as your label

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
        <parameter key="connection" value="NewConnection"/>
        <parameter key="query" value="machinelearning"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Text|Retweet-Count"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="7.5.001" expanded="true" height="82" name="Nominal to Text" width="90" x="380" y="34"/>
      <operator activated="true" class="set_role" compatibility="7.5.001" expanded="true" height="82" name="Set Role" width="90" x="514" y="34">
        <parameter key="attribute_name" value="Retweet-Count"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="648" y="34">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="7.5.001" expanded="true" height="145" name="Validation" width="90" x="782" y="34">
        <parameter key="sampling_type" value="shuffled sampling"/>
        <process expanded="true">
          <operator activated="true" class="h2o:generalized_linear_model" compatibility="7.5.000" expanded="true" height="103" name="Generalized Linear Model" width="90" x="257" y="34">
            <list key="beta_constraints"/>
            <list key="expert_parameters"/>
          </operator>
          <connect from_port="training set" to_op="Generalized Linear Model" to_port="training set"/>
          <connect from_op="Generalized Linear Model" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
          <description align="left" color="green" colored="true" height="113" resized="true" width="284" x="195" y="156">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="7.5.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
          <description align="left" color="blue" colored="true" height="107" resized="true" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
        </process>
        <description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
      </operator>
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter (2)" width="90" x="112" y="187">
        <parameter key="connection" value="NewConnection"/>
        <parameter key="query" value="machinelearning"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="246" y="187">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Text|Retweet-Count"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="7.5.001" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="380" y="187"/>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="648" y="340">
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="179" y="34"/>
          <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="983" y="289">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Validation" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
      <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Validation" from_port="performance 1" to_port="result 1"/>
      <connect from_op="Search Twitter (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Text (2)" to_port="example set input"/>
      <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Process Documents from Data (2)" to_port="example set"/>
      <connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Learner I hugo_gha
Learner I

Re: Problem with naive bayes

Hi, Thomas,

 

Thanks for the response. I am having a hard time following the code you attached to your post. Is there a chance you can add an image or something like that. Let me see if I can follow your suggestion, you´re saying that I should have a Naive Bayes operator before the Cross Validation operator, and then doing the cross validation with my test set?

 

I appreciate the help. As you can tell I am no expert in this subject.

 

Thnks, again!

RM Certified Expert
RM Certified Expert

Re: Problem with naive bayes

Just clip and paste in my XML code into the XML view. To activiate that view just go to your Studio > View > Show Panel and select XML.

 

Click the green check mark and everything populates up.

Twitter Feed