RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

Regression unable to use polynomial label(or any label)

green_duckgreen_duck Member Posts: 4 Newbie
in Help
Hello all,

new here and new to RM(which will be made obvious shortly).  So, i'm trying to do a simple regression analysis based on a attribute(label) as(-1,0,1).  I've followed the steps provided to me, but every time I input a regression operator, I get an error saying the operator cannot handle polynomial or numerical labels.  I'm stumped.  

Any help would be greatly appreciated! Thanks! <?xml version="1.0" encoding="UTF-8"?>

Best Answers

  • lionelderkrikorlionelderkrikor Posts: 1,068   Unicorn
    Solution Accepted
    @green_duck,

    In attached file, the working process.
    How said previously, you have a classification problem, thus you need a classifier model (Here I used a Naive Bayes model).
    The Linear Model you used is dedicated to regression task(s) and thus raised an error in your case.

    To go further and to find the best model for your use case, I advice you to use the Auto-Model tool : Click on Auto-Model, submit your data, choose Predict and select your label attribute (in your case "sentiment") and then follow the indications.

    Good luck ! 

    Hope this helps,

    Regards,

    Lionel


Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,068   Unicorn
    Hi @green_duck,

    If your attribute(label) has (-1,0,1) as values, it is a classification problem and not a regression problem.
    A regression problem is characterized by a continuous attribute(label).
    Can you provide your process and your data in order we can fix your error ?

    Regards,

    Lionel
  • green_duckgreen_duck Member Posts: 4 Newbie
    Hi Lionel,

    Thanks for getting back to me - I had a feeling this may have been the case as I was also attempting to use cross-validation but I couldn't get the operator to work either. I've attached the data(should've done this earlier).

    XML below:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" breakpoints="after" class="retrieve" compatibility="9.6.000" expanded="true" height="68" name="Retrieve Tweets_sequence" width="90" x="45" y="34">
            <parameter key="repository_entry" value="data/Tweets_sequence"/>
          </operator>
          <operator activated="true" class="subprocess" compatibility="9.6.000" expanded="true" height="103" name="Subprocess" width="90" x="179" y="85">
            <process expanded="true">
              <operator activated="true" class="select_attributes" compatibility="9.6.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value="sentiment"/>
                <parameter key="attributes" value="1_word|2_word|3_word|4_word|5_word|6_word|7_word|8_word|9_word|10_word|11_word|12_word|13_word|14_word|15_word|16_word|17_word|18_word|19_word|20_word|21_word|22_word|23_word|24_word|25_word|26_word|27_word|28_word|29_word|30_word"/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="true"/>
              </operator>
              <operator activated="true" class="numerical_to_polynominal" compatibility="9.6.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="313" y="34">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="sentiment"/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="numeric"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="real"/>
                <parameter key="block_type" value="value_series"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_series_end"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="9.6.000" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
                <parameter key="attribute_name" value="sentiment"/>
                <parameter key="target_role" value="label"/>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="split_data" compatibility="9.6.000" expanded="true" height="103" name="Split Data" width="90" x="849" y="34">
                <enumeration key="partitions">
                  <parameter key="ratio" value="0.8"/>
                  <parameter key="ratio" value="0.2"/>
                </enumeration>
                <parameter key="sampling_type" value="shuffled sampling"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
              </operator>
              <connect from_port="in 1" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
              <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="Split Data" to_port="example set"/>
              <connect from_op="Split Data" from_port="partition 1" to_port="out 1"/>
              <connect from_op="Split Data" from_port="partition 2" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="linear_regression" compatibility="9.6.000" expanded="true" height="103" name="Linear Regression" width="90" x="313" y="85">
            <parameter key="feature_selection" value="M5 prime"/>
            <parameter key="alpha" value="0.05"/>
            <parameter key="max_iterations" value="10"/>
            <parameter key="forward_alpha" value="0.05"/>
            <parameter key="backward_alpha" value="0.05"/>
            <parameter key="eliminate_colinear_features" value="true"/>
            <parameter key="min_tolerance" value="0.05"/>
            <parameter key="use_bias" value="true"/>
            <parameter key="ridge" value="1.0E-8"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="9.6.000" expanded="true" height="82" name="Apply Model" width="90" x="581" y="238">
            <list key="application_parameters"/>
            <parameter key="create_view" value="false"/>
          </operator>
          <operator activated="true" class="performance" compatibility="9.6.000" expanded="true" height="82" name="Performance" width="90" x="782" y="136">
            <parameter key="use_example_weights" value="true"/>
          </operator>
          <connect from_op="Retrieve Tweets_sequence" from_port="output" to_op="Subprocess" to_port="in 1"/>
          <connect from_op="Subprocess" from_port="out 2" to_op="Linear Regression" to_port="training set"/>
          <connect from_op="Linear Regression" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • green_duckgreen_duck Member Posts: 4 Newbie
    @lionelderkrikor

    Thank you so much!  This was very helpful - Just have one last question - are there any deep learning models(NNs) that you would suggest for this same dataset?
Sign In or Register to comment.