Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Is my model good enough?

oneponep Member Posts: 20 Maven
edited November 2018 in Help

Hello.

I'm trying to make a predictive regression model, but having a hard time telling if my model is good enough?

I'm using X-validation and I've read somewhere that you can tell if it's a good fit based on the difference between the training error and the validation error? But how do I get the X-validation to tell me the training error?

 

Currently my model has a RMSE of about 1,000 and my label has a range from 0 to about 32,000. Out from this I can't really tell if it is any good? Is there another way I can measure if it's a good model?

 

Oh, and one more thing - I can manage to make my model better if I use a k-NN global anomaly score and remove some of the outliers comming from noise - but i'm afraid that I remove too much information. How can I decide how many outliers I can remove?

 

Thanks in advance!

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,531 RM Data Scientist

    Hi mathias,

    it really depends on the use case wether this is good or not. Hard to judge. But i would of course look at the testing error, not the training.

     

    What might help you to get a better feeling is to get a plot of the scored set returned by the new Cross Validation in 7.3. Just plot label against prediction(label) in a scatter plot. You can extract information like "if the truth is 5, my prediction is between 3 and 7".

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • oneponep Member Posts: 20 Maven

    Okay, I've tried to do that.

     

    I figured out if I use the k-NN Global Anomaly and filter out some outliers, I can get the RMSE lower - is that ok to do?

  • oneponep Member Posts: 20 Maven

    Here is my process if you like to look it over;

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Retrieve data-windows-avg" width="90" x="45" y="187">
    <parameter key="repository_entry" value="//iMac/data-windows-avg"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.3.000" expanded="true" height="82" name="Set Role" width="90" x="179" y="187">
    <parameter key="attribute_name" value="Shaftpower (avg)"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.3.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="187">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value="Shaftpower (avg)|1212001PIT (avg)|1212001PIT (coe)|1212001PIT (var)|1212001ROL (avg)|1212001ROL (coe)|1212001ROL (var)|1215001SI3 (avg)|1215001SI3 (coe)|1215001SI3 (var)|1223001ZT1_Angle (avg)|1223001ZT1_Angle (coe)|1223001ZT1_Angle (var)|1223001ZT2_Angle (avg)|1223001ZT2_Angle (coe)|1223001ZT2_Angle (var)|1225001PS_crosswind (avg)|1225001PS_crosswind (coe)|1225001PS_crosswind (var)|1225001PS_headwind (avg)|1225001PS_headwind (coe)|1225001PS_headwind (var)|1907001ZT_Lin (avg)|1907001ZT_Lin (coe)|1907001ZT_Lin (var)|1907002ZT_Lin (avg)|1907002ZT_Lin (coe)|1907002ZT_Lin (var)|1225001DFTM (var)|1225001DFTM (coe)|1225001DFTM (avg)|1212001ROLR (var)|1212001ROLR (coe)|1212001ROLR (avg)|1212001PITR (var)|1212001PITR (coe)|1212001PITR (avg)|1212001HEV (var)|1212001HEV (coe)|1212001HEV (avg)|0603703ZT2 (var)|0603703ZT2 (coe)|0603703ZT2 (avg)|0603703ZT1 (var)|0603703ZT1 (coe)|0603703ZT1 (avg)"/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.3.000" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="187">
    <parameter key="parameter_expression" value=""/>
    <parameter key="condition_class" value="no_missing_attributes"/>
    <parameter key="invert_filter" value="false"/>
    <list key="filters_list"/>
    <parameter key="filters_logic_and" value="true"/>
    <parameter key="filters_check_metadata" value="true"/>
    </operator>
    <operator activated="true" class="anomalydetection:k-NN Global Anomaly Score" compatibility="2.3.002" expanded="true" height="103" name="k-NN Global Anomaly Score" width="90" x="45" y="442">
    <parameter key="k" value="1"/>
    <parameter key="use k-th neighbor distance only (no average)" value="false"/>
    <parameter key="measure_types" value="MixedMeasures"/>
    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
    <parameter key="nominal_measure" value="NominalDistance"/>
    <parameter key="numerical_measure" value="EuclideanDistance"/>
    <parameter key="divergence" value="GeneralizedIDivergence"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_sigma1" value="1.0"/>
    <parameter key="kernel_sigma2" value="0.0"/>
    <parameter key="kernel_sigma3" value="2.0"/>
    <parameter key="kernel_degree" value="3.0"/>
    <parameter key="kernel_shift" value="1.0"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    <parameter key="parallelize evaluation process" value="false"/>
    <parameter key="number of threads" value="4"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.3.000" expanded="true" height="103" name="Filter Examples (2)" width="90" x="179" y="442">
    <parameter key="parameter_expression" value=""/>
    <parameter key="condition_class" value="custom_filters"/>
    <parameter key="invert_filter" value="false"/>
    <list key="filters_list">
    <parameter key="filters_entry_key" value="outlier.le.5"/>
    </list>
    <parameter key="filters_logic_and" value="true"/>
    <parameter key="filters_check_metadata" value="true"/>
    </operator>
    <operator activated="true" class="normalize" compatibility="7.3.000" expanded="true" height="103" name="Normalize" width="90" x="313" y="442">
    <parameter key="return_preprocessing_model" value="false"/>
    <parameter key="create_view" value="false"/>
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="real"/>
    <parameter key="block_type" value="value_series"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_series_end"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="method" value="range transformation"/>
    <parameter key="min" value="0.0"/>
    <parameter key="max" value="1.0"/>
    </operator>
    <operator activated="true" class="weight_by_correlation" compatibility="7.3.000" expanded="true" height="82" name="Weight by Correlation" width="90" x="447" y="442">
    <parameter key="normalize_weights" value="false"/>
    <parameter key="sort_weights" value="true"/>
    <parameter key="sort_direction" value="ascending"/>
    <parameter key="squared_correlation" value="false"/>
    </operator>
    <operator activated="true" class="scale_by_weights" compatibility="7.3.000" expanded="true" height="82" name="Scale by Weights" width="90" x="581" y="442"/>
    <operator activated="true" class="split_data" compatibility="7.3.000" expanded="true" height="103" name="Split Data" width="90" x="715" y="442">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.8"/>
    <parameter key="ratio" value="0.2"/>
    </enumeration>
    <parameter key="sampling_type" value="shuffled sampling"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="7.3.000" expanded="true" height="145" name="Cross Validation" width="90" x="782" y="187">
    <parameter key="split_on_batch_attribute" value="false"/>
    <parameter key="leave_one_out" value="false"/>
    <parameter key="number_of_folds" value="10"/>
    <parameter key="sampling_type" value="automatic"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    <parameter key="enable_parallel_execution" value="true"/>
    <process expanded="true">
    <operator activated="true" class="k_nn" compatibility="7.3.000" expanded="true" height="82" name="k-NN" width="90" x="380" y="34">
    <parameter key="k" value="1"/>
    <parameter key="weighted_vote" value="false"/>
    <parameter key="measure_types" value="MixedMeasures"/>
    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
    <parameter key="nominal_measure" value="NominalDistance"/>
    <parameter key="numerical_measure" value="EuclideanDistance"/>
    <parameter key="divergence" value="GeneralizedIDivergence"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_sigma1" value="1.0"/>
    <parameter key="kernel_sigma2" value="0.0"/>
    <parameter key="kernel_sigma3" value="2.0"/>
    <parameter key="kernel_degree" value="3.0"/>
    <parameter key="kernel_shift" value="1.0"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    </operator>
    <connect from_port="training set" to_op="k-NN" to_port="training set"/>
    <connect from_op="k-NN" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.3.000" expanded="true" height="82" name="Apply Model (3)" width="90" x="179" y="187">
    <list key="application_parameters"/>
    <parameter key="create_view" value="false"/>
    </operator>
    <operator activated="true" class="performance_regression" compatibility="7.3.000" expanded="true" height="82" name="Performance (3)" width="90" x="313" y="187">
    <parameter key="main_criterion" value="first"/>
    <parameter key="root_mean_squared_error" value="true"/>
    <parameter key="absolute_error" value="false"/>
    <parameter key="relative_error" value="false"/>
    <parameter key="relative_error_lenient" value="false"/>
    <parameter key="relative_error_strict" value="false"/>
    <parameter key="normalized_absolute_error" value="false"/>
    <parameter key="root_relative_squared_error" value="false"/>
    <parameter key="squared_error" value="false"/>
    <parameter key="correlation" value="false"/>
    <parameter key="squared_correlation" value="false"/>
    <parameter key="prediction_average" value="false"/>
    <parameter key="spearman_rho" value="false"/>
    <parameter key="kendall_tau" value="false"/>
    <parameter key="skip_undefined_labels" value="true"/>
    <parameter key="use_example_weights" value="true"/>
    </operator>
    <connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (3)" to_port="labelled data"/>
    <connect from_op="Performance (3)" from_port="performance" to_port="performance 1"/>
    <connect from_op="Performance (3)" from_port="example set" to_port="test set results"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="apply_model" compatibility="7.3.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="916" y="544">
    <list key="application_parameters"/>
    <parameter key="create_view" value="false"/>
    </operator>
    <operator activated="true" class="performance_regression" compatibility="7.3.000" expanded="true" height="82" name="Performance (2)" width="90" x="1050" y="544">
    <parameter key="main_criterion" value="root_mean_squared_error"/>
    <parameter key="root_mean_squared_error" value="true"/>
    <parameter key="absolute_error" value="true"/>
    <parameter key="relative_error" value="true"/>
    <parameter key="relative_error_lenient" value="true"/>
    <parameter key="relative_error_strict" value="true"/>
    <parameter key="normalized_absolute_error" value="true"/>
    <parameter key="root_relative_squared_error" value="true"/>
    <parameter key="squared_error" value="true"/>
    <parameter key="correlation" value="true"/>
    <parameter key="squared_correlation" value="true"/>
    <parameter key="prediction_average" value="true"/>
    <parameter key="spearman_rho" value="true"/>
    <parameter key="kendall_tau" value="true"/>
    <parameter key="skip_undefined_labels" value="true"/>
    <parameter key="use_example_weights" value="true"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.3.000" expanded="true" height="82" name="Generate Attributes" width="90" x="1184" y="595">
    <list key="function_descriptions">
    <parameter key="Diff" value="[Shaftpower (avg)]-[prediction(Shaftpower (avg))]"/>
    </list>
    <parameter key="keep_all" value="true"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.3.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="1318" y="595">
    <list key="function_descriptions">
    <parameter key="Accuracy" value="100-((abs(Diff)/[Shaftpower (avg)])*100)"/>
    </list>
    <parameter key="keep_all" value="true"/>
    </operator>
    <connect from_op="Retrieve data-windows-avg" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="k-NN Global Anomaly Score" to_port="example set"/>
    <connect from_op="k-NN Global Anomaly Score" from_port="example set" to_op="Filter Examples (2)" to_port="example set input"/>
    <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Weight by Correlation" to_port="example set"/>
    <connect from_op="Weight by Correlation" from_port="weights" to_op="Scale by Weights" to_port="weights"/>
    <connect from_op="Weight by Correlation" from_port="example set" to_op="Scale by Weights" to_port="example set"/>
    <connect from_op="Scale by Weights" from_port="example set" to_op="Split Data" to_port="example set"/>
    <connect from_op="Split Data" from_port="partition 1" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Cross Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
    <connect from_op="Cross Validation" from_port="test result set" to_port="result 3"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 4"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
    <connect from_op="Performance (2)" from_port="performance" to_port="result 1"/>
    <connect from_op="Performance (2)" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    </process>
    </operator>
    </process>
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,531 RM Data Scientist

    Mathias,

     

    i do think it makes some sense to filter out the outlier - it often makes models better. The downside of this, is that your model does not cover examples with a high outlier score. I would argue that you want to do it anyway, because you cannot find good statistical reasoning for these outliers.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • oneponep Member Posts: 20 Maven

    Hello again.

    I have tried to remove the outliers, but it turns out that it doesn't really have an effect on my RMSE.

     

    I'm having a hard time telling how I could improve my model - as of right now I get a RMSE around 850, and I would like it to be atleast half of that. Could someone tell me what i'm doing wrong?

     

    Here is my process;

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Retrieve traindata" width="90" x="45" y="187">
    <parameter key="repository_entry" value="//Local Repository/traindata"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.3.000" expanded="true" height="82" name="Set Role" width="90" x="179" y="187">
    <parameter key="attribute_name" value="Shaftpower (avg)"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.3.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="187">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value="0603703ZT1 (avg)|0603703ZT1 (coe)|0603703ZT1 (var)|0603703ZT2 (avg)|0603703ZT2 (coe)|0603703ZT2 (var)|1212001HEV (avg)|1212001HEV (coe)|1212001HEV (var)|1212001PIT (avg)|1212001PIT (coe)|1212001PIT (var)|1212001PITR (avg)|1212001PITR (coe)|1212001PITR (var)|1212001ROL (avg)|1212001ROL (coe)|1212001ROL (var)|1212001ROLR (avg)|1212001ROLR (coe)|1212001ROLR (var)|1215001SI3 (avg)|1215001SI3 (coe)|1215001SI3 (var)|1223001ZT1_Angle (avg)|1223001ZT1_Angle (coe)|1223001ZT1_Angle (var)|1223001ZT2_Angle (avg)|1223001ZT2_Angle (coe)|1223001ZT2_Angle (var)|1225001DFTM (avg)|1225001DFTM (coe)|1225001DFTM (var)|1225001PS_crosswind (avg)|1225001PS_crosswind (coe)|1225001PS_crosswind (var)|1225001PS_headwind (avg)|1225001PS_headwind (coe)|1225001PS_headwind (var)|1907001ZT_Lin (avg)|1907001ZT_Lin (coe)|1907001ZT_Lin (var)|1907002ZT_Lin (avg)|1907002ZT_Lin (coe)|1907002ZT_Lin (var)|Shaftpower (avg)|1215001SI2 (var)|1215001SI2 (coe)|1215001SI2 (avg)|1215001SI1 (var)|1215001SI1 (coe)|1215001SI1 (avg)"/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="attribute_value"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="time"/>
    <parameter key="block_type" value="attribute_block"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_matrix_row_start"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.3.000" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="187">
    <parameter key="parameter_expression" value=""/>
    <parameter key="condition_class" value="no_missing_attributes"/>
    <parameter key="invert_filter" value="false"/>
    <list key="filters_list"/>
    <parameter key="filters_logic_and" value="true"/>
    <parameter key="filters_check_metadata" value="true"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="7.3.000" expanded="true" height="103" name="Filter Examples (2)" width="90" x="581" y="187">
    <parameter key="parameter_expression" value=""/>
    <parameter key="condition_class" value="custom_filters"/>
    <parameter key="invert_filter" value="false"/>
    <list key="filters_list">
    <parameter key="filters_entry_key" value="1215001SI3 (avg).ge.0\.1"/>
    </list>
    <parameter key="filters_logic_and" value="true"/>
    <parameter key="filters_check_metadata" value="true"/>
    </operator>
    <operator activated="true" class="normalize" compatibility="7.3.000" expanded="true" height="103" name="Normalize" width="90" x="313" y="595">
    <parameter key="return_preprocessing_model" value="false"/>
    <parameter key="create_view" value="false"/>
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="real"/>
    <parameter key="block_type" value="value_series"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="value_series_end"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <parameter key="method" value="range transformation"/>
    <parameter key="min" value="0.0"/>
    <parameter key="max" value="1.0"/>
    </operator>
    <operator activated="true" class="weight_by_correlation" compatibility="7.3.000" expanded="true" height="82" name="Weight by Correlation" width="90" x="447" y="595">
    <parameter key="normalize_weights" value="false"/>
    <parameter key="sort_weights" value="true"/>
    <parameter key="sort_direction" value="ascending"/>
    <parameter key="squared_correlation" value="false"/>
    </operator>
    <operator activated="true" class="scale_by_weights" compatibility="7.3.000" expanded="true" height="82" name="Scale by Weights" width="90" x="581" y="595"/>
    <operator activated="true" class="concurrency:cross_validation" compatibility="7.3.000" expanded="true" height="145" name="Cross Validation" width="90" x="715" y="595">
    <parameter key="split_on_batch_attribute" value="false"/>
    <parameter key="leave_one_out" value="false"/>
    <parameter key="number_of_folds" value="10"/>
    <parameter key="sampling_type" value="automatic"/>
    <parameter key="use_local_random_seed" value="false"/>
    <parameter key="local_random_seed" value="1992"/>
    <parameter key="enable_parallel_execution" value="true"/>
    <process expanded="true">
    <operator activated="true" class="k_nn" compatibility="7.3.000" expanded="true" height="82" name="k-NN" width="90" x="179" y="34">
    <parameter key="k" value="2"/>
    <parameter key="weighted_vote" value="false"/>
    <parameter key="measure_types" value="NumericalMeasures"/>
    <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
    <parameter key="nominal_measure" value="NominalDistance"/>
    <parameter key="numerical_measure" value="ManhattanDistance"/>
    <parameter key="divergence" value="GeneralizedIDivergence"/>
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="1.0"/>
    <parameter key="kernel_sigma1" value="1.0"/>
    <parameter key="kernel_sigma2" value="0.0"/>
    <parameter key="kernel_sigma3" value="2.0"/>
    <parameter key="kernel_degree" value="3.0"/>
    <parameter key="kernel_shift" value="1.0"/>
    <parameter key="kernel_a" value="1.0"/>
    <parameter key="kernel_b" value="0.0"/>
    </operator>
    <connect from_port="training set" to_op="k-NN" to_port="training set"/>
    <connect from_op="k-NN" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.3.000" expanded="true" height="82" name="Apply Model (3)" width="90" x="179" y="187">
    <list key="application_parameters"/>
    <parameter key="create_view" value="false"/>
    </operator>
    <operator activated="true" class="performance_regression" compatibility="7.3.000" expanded="true" height="82" name="Performance" width="90" x="380" y="187">
    <parameter key="main_criterion" value="root_mean_squared_error"/>
    <parameter key="root_mean_squared_error" value="true"/>
    <parameter key="absolute_error" value="true"/>
    <parameter key="relative_error" value="true"/>
    <parameter key="relative_error_lenient" value="true"/>
    <parameter key="relative_error_strict" value="true"/>
    <parameter key="normalized_absolute_error" value="true"/>
    <parameter key="root_relative_squared_error" value="true"/>
    <parameter key="squared_error" value="true"/>
    <parameter key="correlation" value="true"/>
    <parameter key="squared_correlation" value="true"/>
    <parameter key="prediction_average" value="true"/>
    <parameter key="spearman_rho" value="true"/>
    <parameter key="kendall_tau" value="true"/>
    <parameter key="skip_undefined_labels" value="true"/>
    <parameter key="use_example_weights" value="true"/>
    </operator>
    <connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <connect from_op="Performance" from_port="example set" to_port="test set results"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve traindata" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Filter Examples (2)" to_port="example set input"/>
    <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Weight by Correlation" to_port="example set"/>
    <connect from_op="Weight by Correlation" from_port="weights" to_op="Scale by Weights" to_port="weights"/>
    <connect from_op="Weight by Correlation" from_port="example set" to_op="Scale by Weights" to_port="example set"/>
    <connect from_op="Scale by Weights" from_port="example set" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="test result set" to_port="result 2"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    And here is my data;

    https://www.dropbox.com/s/w9a5545nn1vs0b8/traindata.csv?dl=0

     

    I'm trying to predict the shaftpower for a ship.

     

    Thank you in advance!

Sign In or Register to comment.