split validation linear regression connection error

1630717bremmers1630717bremmers Member Posts: 2 Contributor I
edited March 2020 in Help

So I have this project I have been working on and I splited my dataset with split validation to use linear regression on it, but to the connection etc I used the permormance regression after I used apply model, but the outcome of the proces is not quite right, I believe it's  because the mod port from the apply model is not connected to the end, is there a way to fix that?


Thanks in advance.


  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @1630717bremmers,


    "..but the outcome of the proces is not quite right..."


    Could you be more precise and explain what you mean ?


    Thanks you,





  • 1630717bremmers1630717bremmers Member Posts: 2 Contributor I
    Hi Lionel,
    The only outcome I have is the root mean squared error from the peformans operator. I'm not getting the prediction from the linear regression
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @1630717bremmers,


    Many things : 

    1. To visualize your model connect the output port mod of Split Validation operator to the res port.

    2. To calculate and display other performance metrics (not only the RMSE), check them in the parameters of Performance(Regression) operator.

    3. To calculate and display the prediction there are 2 ways :

      a. Use the couple Remember / Recall to recover the labelled dataset inside the Split Validation operator, like in this process (to adapt to your own data) : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Deals" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Deals"/>
    <operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
    <parameter key="attribute_name" value="Future Customer"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    <operator activated="true" class="nominal_to_numerical" compatibility="8.2.000" expanded="true" height="103" name="Nominal to Numerical" width="90" x="380" y="34">
    <parameter key="include_special_attributes" value="true"/>
    <parameter key="coding_type" value="unique integers"/>
    <list key="comparison_groups"/>
    <operator activated="true" class="split_validation" compatibility="8.2.000" expanded="true" height="145" name="Validation" width="90" x="514" y="34">
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.2.000" expanded="true" height="103" name="Decision Tree" width="90" x="112" y="34">
    <parameter key="criterion" value="least_square"/>
    <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model"/>
    <portSpacing port="source_training" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
    <list key="application_parameters"/>
    <operator activated="true" class="performance_regression" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
    <parameter key="absolute_error" value="true"/>
    <operator activated="true" class="remember" compatibility="8.2.000" expanded="true" height="68" name="Remember" width="90" x="313" y="85">
    <parameter key="name" value="Labelled"/>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
    <connect from_op="Performance" from_port="example set" to_op="Remember" to_port="store"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_averagable 1" spacing="0"/>
    <portSpacing port="sink_averagable 2" spacing="0"/>
    <portSpacing port="sink_averagable 3" spacing="0"/>
    <operator activated="true" class="recall" compatibility="8.2.000" expanded="true" height="68" name="Recall" width="90" x="648" y="136">
    <parameter key="name" value="Labelled"/>
    <connect from_op="Retrieve Deals" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
    <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Validation" to_port="training"/>
    <connect from_op="Validation" from_port="model" to_port="result 3"/>
    <connect from_op="Validation" from_port="training" to_port="result 1"/>
    <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
    <connect from_op="Recall" from_port="result" to_port="result 4"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>

     b. Use a Cross Validation operator instead the Split Validation operator and connect the test output of Cross Validation tes

    to the res port.

    In deed, the performance calculated in a cross validation is considered as more representative of the real performance of your model on future unseen data.


    I hope it helps,





  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    You can also just use Store to save the model, and then use Apply Model with it and any other dataset (including the original development dataset) in the future to generate the set of predictions using that model.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.