RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.


"The problem of building regression model from Rapidminer"

huaiyanggongzihuaiyanggongzi Member Posts: 39 Contributor II
edited June 2019 in Help
I tried the linear regression using the following data set,
x y1 z1 label
0 85.2475654 245.1558442 99.69204152
-1 36.00008409 -50.37614679 95.61016949
-2 257.1300917 517.2790698 189
-2 194.4923912 10.50413223 593.6107784
1 602.6111798 410.6153846 345.1538462
1 36.2366869 608.7922078 1.076124567
-5 13.09949256 16.59633028 -4.389830508
-5 660.3381923 468.0886076 353.7486034
3 52.75862603 724.5955056 -20.92633223
-5 37.49788729 64.61607143 -2.71990172
The column "label" is the response variable, and other three columns are predictor variables. I built the Rapidminer workflow as
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="224" width="346">
      <operator activated="true" class="read_csv" compatibility="5.2.008" expanded="true" height="60" name="Read CSV" width="90" x="59" y="95">
        <parameter key="csv_file" value="C:\Users\Desktop\training.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="x.true.integer.attribute"/>
          <parameter key="1" value="y1.true.real.attribute"/>
          <parameter key="2" value="z1.true.real.attribute"/>
          <parameter key="3" value="label.true.real.label"/>
      <operator activated="true" class="linear_regression" compatibility="5.2.008" expanded="true" height="94" name="Linear Regression" width="90" x="246" y="75"/>
      <connect from_op="Read CSV" from_port="output" to_op="Linear Regression" to_port="training set"/>
      <connect from_op="Linear Regression" from_port="model" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
The resulting model is not correct.  On the other side, the R is able to build the linear regression model for this data set without any problem. i am not sure why Rapidminer has problem for this data set. Thanks.


  • earmijoearmijo Member Posts: 263   Unicorn
    You should get exactly the same if, in feature selection, you select "None". By default, Rapidminer implements the M5Prime Feature Selection. From what I understand this is sort of equivalent to maximizing the AIC.

Sign In or Register to comment.