split validation linear regression connection error

1630717bremmers · June 2018

So I have this project I have been working on and I splited my dataset with split validation to use linear regression on it, but to the connection etc I used the permormance regression after I used apply model, but the outcome of the proces is not quite right, I believe it's because the mod port from the apply model is not connected to the end, is there a way to fix that?

Thanks in advance.

lionelderkrikor · June 2018

Hi @1630717bremmers,

"..but the outcome of the proces is not quite right..."

Could you be more precise and explain what you mean ?

Thanks you,

Regards,

Lionel

1630717bremmers · June 2018

Hi Lionel,
The only outcome I have is the root mean squared error from the peformans operator. I'm not getting the prediction from the linear regression

lionelderkrikor · June 2018

Hi @1630717bremmers,

Many things :

1. To visualize your model connect the output port mod of Split Validation operator to the res port.

2. To calculate and display other performance metrics (not only the RMSE), check them in the parameters of Performance(Regression) operator.

3. To calculate and display the prediction there are 2 ways :

a. Use the couple Remember / Recall to recover the labelled dataset inside the Split Validation operator, like in this process (to adapt to your own data) :

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Deals" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/data/Deals"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
        <parameter key="attribute_name" value="Future Customer"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="nominal_to_numerical" compatibility="8.2.000" expanded="true" height="103" name="Nominal to Numerical" width="90" x="380" y="34">
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="coding_type" value="unique integers"/>
        <list key="comparison_groups"/>
      </operator>
      <operator activated="true" class="split_validation" compatibility="8.2.000" expanded="true" height="145" name="Validation" width="90" x="514" y="34">
        <process expanded="true">
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="112" y="34">
            <parameter key="criterion" value="least_square"/>
          </operator>
          <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_regression" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
            <parameter key="absolute_error" value="true"/>
          </operator>
          <operator activated="true" class="remember" compatibility="8.2.000" expanded="true" height="68" name="Remember" width="90" x="313" y="85">
            <parameter key="name" value="Labelled"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <connect from_op="Performance" from_port="example set" to_op="Remember" to_port="store"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
          <portSpacing port="sink_averagable 3" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="recall" compatibility="8.2.000" expanded="true" height="68" name="Recall" width="90" x="648" y="136">
        <parameter key="name" value="Labelled"/>
      </operator>
      <connect from_op="Retrieve Deals" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
      <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="model" to_port="result 3"/>
      <connect from_op="Validation" from_port="training" to_port="result 1"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
      <connect from_op="Recall" from_port="result" to_port="result 4"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

b. Use a Cross Validation operator instead the Split Validation operator and connect the test output of Cross Validation tes

to the res port.

In deed, the performance calculated in a cross validation is considered as more representative of the real performance of your model on future unseen data.

I hope it helps,

Regards,

Lionel

Telcontar120 · June 2018

You can also just use Store to save the model, and then use Apply Model with it and any other dataset (including the original development dataset) in the future to generate the set of predictions using that model.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

split validation linear regression connection error

Answers