Problem with naive bayes

hugo_ghahugo_gha Member Posts: 5 Contributor I
edited December 2018 in Help

Hi, 

 

I am trying to use a Naive Bayes model to train my sentiment analysis, but when I try to apply the model to my scoring set I get the following error:

 

"The learned model "Simple Distribution" does not support the parameter "create_view". Some models support parameters for the predictions of values. This model does not support the given parameter". 

 

I would love it if someone can help me with this. Thanks!

Answers

  • FBTFBT Member Posts: 106 Unicorn

    Does your test set have the same characteristics (number of attributes and attribute types) as your training set? 

  • hugo_ghahugo_gha Member Posts: 5 Contributor I

    Thanks for the response. It does. My training set has 2 columns, one for the text I´m analyzing and one for the sentiment. The test set is a different data set that has the same columns, one for the text and one for the sentiment. The sentiments for my test set are fake, tho. What I really want to do is to predict the sentiment of my test set based on the trained model and keep the predicted labels.

     

  • FBTFBT Member Posts: 106 Unicorn

    It is a bit difficult to figure out why you get the error, without seeing your process. Is the error already occuring during training of the model or only when you try to apply the trained model to the test data? If it is the latter, it would indicate that you are feeding your model data that is somehow different than the training data.

  • hugo_ghahugo_gha Member Posts: 5 Contributor I

    Indeed, it happens in the Apply Model operator. The cross validation occurs with no issues, it seems. 

     

    Here are two captures of the first few rows of the data sets. I can´t tell if there is any difference between them.

     

    Thanks for your help, I appreciate it.

     

     

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist

    It seams you skipped some necessary "text processing" on the input data? 

  • hugo_ghahugo_gha Member Posts: 5 Contributor I

    What do you mean by necessary? I mean I do have a Process Document Operator that tokenizes, stems, eliminate stopwords, and filters hashtags for both datasets in my process.

     

    Thanks for replying!!

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    It sounds like you didn't use the trained Wordlist as a preprocessing input to your scoring set.

     

    This is an example, just swap out the CV for your Naive Bayes with CV and set your Sentiment as your label

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="112" y="34">
    <parameter key="connection" value="NewConnection"/>
    <parameter key="query" value="machinelearning"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Text|Retweet-Count"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="7.5.001" expanded="true" height="82" name="Nominal to Text" width="90" x="380" y="34"/>
    <operator activated="true" class="set_role" compatibility="7.5.001" expanded="true" height="82" name="Set Role" width="90" x="514" y="34">
    <parameter key="attribute_name" value="Retweet-Count"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="648" y="34">
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="34"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="7.5.001" expanded="true" height="145" name="Validation" width="90" x="782" y="34">
    <parameter key="sampling_type" value="shuffled sampling"/>
    <process expanded="true">
    <operator activated="true" class="h2o:generalized_linear_model" compatibility="7.5.000" expanded="true" height="103" name="Generalized Linear Model" width="90" x="257" y="34">
    <list key="beta_constraints"/>
    <list key="expert_parameters"/>
    </operator>
    <connect from_port="training set" to_op="Generalized Linear Model" to_port="training set"/>
    <connect from_op="Generalized Linear Model" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    <description align="left" color="green" colored="true" height="113" resized="true" width="284" x="195" y="156">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance" compatibility="7.5.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <connect from_op="Performance" from_port="example set" to_port="test set results"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    <description align="left" color="blue" colored="true" height="107" resized="true" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
    </process>
    <description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
    </operator>
    <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter (2)" width="90" x="112" y="187">
    <parameter key="connection" value="NewConnection"/>
    <parameter key="query" value="machinelearning"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="246" y="187">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Text|Retweet-Count"/>
    </operator>
    <operator activated="true" class="nominal_to_text" compatibility="7.5.001" expanded="true" height="82" name="Nominal to Text (2)" width="90" x="380" y="187"/>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="648" y="340">
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="179" y="34"/>
    <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
    <connect from_op="Tokenize (2)" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="983" y="289">
    <list key="application_parameters"/>
    </operator>
    <connect from_op="Search Twitter" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_op="Validation" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
    <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
    <connect from_op="Validation" from_port="performance 1" to_port="result 1"/>
    <connect from_op="Search Twitter (2)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Text (2)" to_port="example set input"/>
    <connect from_op="Nominal to Text (2)" from_port="example set output" to_op="Process Documents from Data (2)" to_port="example set"/>
    <connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>
  • hugo_ghahugo_gha Member Posts: 5 Contributor I

    Hi, Thomas,

     

    Thanks for the response. I am having a hard time following the code you attached to your post. Is there a chance you can add an image or something like that. Let me see if I can follow your suggestion, you´re saying that I should have a Naive Bayes operator before the Cross Validation operator, and then doing the cross validation with my test set?

     

    I appreciate the help. As you can tell I am no expert in this subject.

     

    Thnks, again!

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Just clip and paste in my XML code into the XML view. To activiate that view just go to your Studio > View > Show Panel and select XML.

     

    Click the green check mark and everything populates up.

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    This article here explains in more detail how to use the XML code posted before and turn this into the graphical process you are used to: http://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/How-can-I-share-processes-without-RapidMiner-Server/ta-p/37047

     

    Hope this helps,

    Ingo

Sign In or Register to comment.