RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

How to use Polynomial Regression in rapidminer correctly

rookierookie Member Posts: 3 Newbie
edited March 27 in Help

          Hello, everyone. This is my first forum post asking questions about polynomial regression in rapidminer.

The original data is:x:4194.06 3466.45  2070.08   874.98  corresponding to   y:91540.07  109460.36  120338.64  102182.19

As shown in the first flow, the first result expression is obtained by using the polynomial regression operator.

<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">

  <context>

    <input/>

    <output/>

    <macros/>

  </context>

  <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">

    <parameter key="logverbosity" value="init"/>

    <parameter key="random_seed" value="2001"/>

    <parameter key="send_mail" value="never"/>

    <parameter key="notification_email" value=""/>

    <parameter key="process_duration_for_mail" value="30"/>

    <parameter key="encoding" value="SYSTEM"/>

    <process expanded="true">

      <operator activated="true" class="read_excel" compatibility="9.6.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="85">

        <parameter key="excel_file" value="C:\Users\1\Desktop\question data.xlsx"/>

        <parameter key="sheet_selection" value="sheet number"/>

        <parameter key="sheet_number" value="1"/>

        <parameter key="imported_cell_range" value="A1"/>

        <parameter key="encoding" value="SYSTEM"/>

        <parameter key="first_row_as_names" value="true"/>

        <list key="annotations"/>

        <parameter key="date_format" value=""/>

        <parameter key="time_zone" value="SYSTEM"/>

        <parameter key="locale" value="English (United States)"/>

        <parameter key="read_all_values_as_polynominal" value="false"/>

        <list key="data_set_meta_data_information">

          <parameter key="0" value="x.true.real.attribute"/>

          <parameter key="1" value="y.true.real.attribute"/>

        </list>

        <parameter key="read_not_matching_values_as_missings" value="false"/>

        <parameter key="datamanagement" value="double_array"/>

        <parameter key="data_management" value="auto"/>

      </operator>

      <operator activated="true" class="set_role" compatibility="9.6.000" expanded="true" height="82" name="Set Role" width="90" x="179" y="85">

        <parameter key="attribute_name" value="y"/>

        <parameter key="target_role" value="label"/>

        <list key="set_additional_roles">

          <parameter key="x" value="regular"/>

        </list>

      </operator>

      <operator activated="true" class="polynomial_regression" compatibility="9.6.000" expanded="true" height="82" name="Polynomial Regression" width="90" x="313" y="85">

        <parameter key="max_iterations" value="5000"/>

        <parameter key="replication_factor" value="2"/>

        <parameter key="max_degree" value="2"/>

        <parameter key="min_coefficient" value="-100.0"/>

        <parameter key="max_coefficient" value="100.0"/>

        <parameter key="use_local_random_seed" value="false"/>

        <parameter key="local_random_seed" value="1992"/>

      </operator>

      <connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>

      <connect from_op="Set Role" from_port="example set output" to_op="Polynomial Regression" to_port="training set"/>

      <connect from_op="Polynomial Regression" from_port="model" to_port="result 1"/>

      <portSpacing port="source_input 1" spacing="0"/>

      <portSpacing port="sink_result 1" spacing="0"/>

      <portSpacing port="sink_result 2" spacing="0"/>

    </process>

  </operator>

</process>


      The second flow, based on the original data, creates a new list of attributes as x^2=z, and uses the linear regression operator to make the second result expression.

<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000">

  <context>

    <input/>

    <output/>

    <macros/>

  </context>

  <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">

    <parameter key="logverbosity" value="init"/>

    <parameter key="random_seed" value="2001"/>

    <parameter key="send_mail" value="never"/>

    <parameter key="notification_email" value=""/>

    <parameter key="process_duration_for_mail" value="30"/>

    <parameter key="encoding" value="SYSTEM"/>

    <process expanded="true">

      <operator activated="true" class="read_excel" compatibility="9.6.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="85">

        <parameter key="excel_file" value="C:\Users\1\Desktop\question data.xlsx"/>

        <parameter key="sheet_selection" value="sheet number"/>

        <parameter key="sheet_number" value="1"/>

        <parameter key="imported_cell_range" value="A1"/>

        <parameter key="encoding" value="SYSTEM"/>

        <parameter key="first_row_as_names" value="true"/>

        <list key="annotations"/>

        <parameter key="date_format" value=""/>

        <parameter key="time_zone" value="SYSTEM"/>

        <parameter key="locale" value="English (United States)"/>

        <parameter key="read_all_values_as_polynominal" value="false"/>

        <list key="data_set_meta_data_information">

          <parameter key="0" value="x.true.real.attribute"/>

          <parameter key="1" value="y.true.real.attribute"/>

        </list>

        <parameter key="read_not_matching_values_as_missings" value="false"/>

        <parameter key="datamanagement" value="double_array"/>

        <parameter key="data_management" value="auto"/>

      </operator>

      <operator activated="true" class="generate_attributes" compatibility="9.6.000" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="85">

        <list key="function_descriptions">

          <parameter key="z" value="x*x"/>

        </list>

        <parameter key="keep_all" value="true"/>

      </operator>

      <operator activated="false" class="rename" compatibility="9.6.000" expanded="true" height="82" name="Rename" width="90" x="246" y="238">

        <parameter key="old_name" value="x"/>

        <parameter key="new_name" value="x^2"/>

        <list key="rename_additional_attributes"/>

      </operator>

      <operator activated="true" class="set_role" compatibility="9.6.000" expanded="true" height="82" name="Set Role" width="90" x="313" y="85">

        <parameter key="attribute_name" value="y"/>

        <parameter key="target_role" value="label"/>

        <list key="set_additional_roles">

          <parameter key="x" value="regular"/>

        </list>

      </operator>

      <operator activated="true" class="linear_regression" compatibility="9.6.000" expanded="true" height="103" name="Linear Regression" width="90" x="514" y="85">

        <parameter key="feature_selection" value="none"/>

        <parameter key="alpha" value="0.05"/>

        <parameter key="max_iterations" value="10"/>

        <parameter key="forward_alpha" value="0.05"/>

        <parameter key="backward_alpha" value="0.05"/>

        <parameter key="eliminate_colinear_features" value="false"/>

        <parameter key="min_tolerance" value="0.05"/>

        <parameter key="use_bias" value="true"/>

        <parameter key="ridge" value="1.0E-8"/>

      </operator>

      <operator activated="false" class="polynomial_regression" compatibility="9.6.000" expanded="true" height="82" name="Polynomial Regression" width="90" x="581" y="238">

        <parameter key="max_iterations" value="5000"/>

        <parameter key="replication_factor" value="2"/>

        <parameter key="max_degree" value="2"/>

        <parameter key="min_coefficient" value="-100.0"/>

        <parameter key="max_coefficient" value="100.0"/>

        <parameter key="use_local_random_seed" value="false"/>

        <parameter key="local_random_seed" value="1992"/>

      </operator>

      <connect from_op="Read Excel" from_port="output" to_op="Generate Attributes" to_port="example set input"/>

      <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>

      <connect from_op="Set Role" from_port="example set output" to_op="Linear Regression" to_port="training set"/>

      <connect from_op="Linear Regression" from_port="model" to_port="result 1"/>

      <portSpacing port="source_input 1" spacing="0"/>

      <portSpacing port="sink_result 1" spacing="0"/>

      <portSpacing port="sink_result 2" spacing="0"/>

    </process>

  </operator>

</process>

     I want to ask why the results of the two processes are not the same, the original data presents a quadratic nonlinear relationship, and why the quadratic expression cannot be made by polynomial regression. 

Thanks you very much!


Tagged:

Answers

  • rookierookie Member Posts: 3 Newbie
            First of all, thank you for your answer <3 . According to your description, I am as the data is too little, and not standardized, to lead to the results out? But these four samples are real data , need the four data to construct a yuan quadratic polynomial, Because nonlinear equations can be converted to linear equations , so I use z instead of x2, I have the linear regression equation. But why do with polynomial regression is not to come out, how do you explain that please?Polynomial regression is there any limit to this operator ?
  • rookierookie Member Posts: 3 Newbie
    hi @yyhuang
           Sorry in advance, I don't know how to use the function of this forum.That's why it took so long to reply
             First of all, thank you for your answer 3 . According to your description, I am as the data is too little, and not standardized, to lead to the results out? But these four samples are real data , need the four data to construct a yuan quadratic polynomial, Because nonlinear equations can be converted to linear equations , so I use z instead of x2, I have the linear regression equation. But why do with polynomial regression is not to come out, how do you explain that please?Polynomial regression is there any limit to this operator ?
Sign In or Register to comment.