Optimize Parameter (Grid) Parameter

domedome Member Posts: 12 Newbie
Hello,

I tried the Optimize Parameter (Grid) Operator in my Process, to find optimal Parameters for a decision tree. The Operator workd fine. My Problem now is, when i use the Parameters in a seperate decision Tree the accuraccy is different from the Optimize Operator.

For example:

The Optimize Parameter Operator detects that with the criterion gini_index and depth of 3 and in the cross validation with the stratified method and 5 folds the accuracy of 80%.
If I use these Parameters to the exact same Process (without Optimize Parameter) with same data and Operators, then the accuracy drops to 60%.

Is there any solution to this problem? Or is that explainable?

Thanks!

Best Answer

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @dome,

    Can you share your process and your data in order we can reproduce what you observe ?

    Regards,

    Lionel
  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Hello,

    Yes, it might be. It all depends on how does your data look like and how are you testing it. It would be nice to have a copy of your XML process so that we can take a look.

    A simple, educated guess:

    With a split validation inside optimize parameters, you can check that the best combination of parameters gives 80% by using, let's say 90% of your data and leaving 10% of your data for testing. If you connect the mod port to a store, you can get the decision tree you need, but running with the best parameters on top of the entire collection, and your results may vary. However, without knowing your data or your process it's hard to know what's happening.

    All the best,

    Rod.




  • domedome Member Posts: 12 Newbie
    edited July 2019
    Hello,

    Unfortunately i cant show you the data, but here is my process.

    The parameter i read from the results, where the log of all Iterations is listet. After that i put these Parameter into another Process with a decision tree and a cross validation.

    Thanks!


    <?xml version="1.0" encoding="UTF-8"?>

    -<process version="9.3.001">


    -<context>

    <input/>

    <output/>

    <macros/>

    </context>


    -<operator name="Process" expanded="true" compatibility="9.3.001" class="process" activated="true">

    <parameter value="init" key="logverbosity"/>

    <parameter value="2001" key="random_seed"/>

    <parameter value="never" key="send_mail"/>

    <parameter value="" key="notification_email"/>

    <parameter value="30" key="process_duration_for_mail"/>

    <parameter value="SYSTEM" key="encoding"/>


    -<process expanded="true">


    -<operator name="Nominal to Numerical" expanded="true" compatibility="9.3.001" class="nominal_to_numerical" activated="true" y="187" x="179" width="90" height="103">

    <parameter value="false" key="return_preprocessing_model"/>

    <parameter value="true" key="create_view"/>

    <parameter value="subset" key="attribute_filter_type"/>

    <parameter value="Geschlecht" key="attribute"/>

    <parameter value="Geschlecht|Kader|Phase|Sportart|Gruppe" key="attributes"/>

    <parameter value="false" key="use_except_expression"/>

    <parameter value="nominal" key="value_type"/>

    <parameter value="false" key="use_value_type_exception"/>

    <parameter value="file_path" key="except_value_type"/>

    <parameter value="single_value" key="block_type"/>

    <parameter value="false" key="use_block_type_exception"/>

    <parameter value="single_value" key="except_block_type"/>

    <parameter value="false" key="invert_selection"/>

    <parameter value="false" key="include_special_attributes"/>

    <parameter value="dummy coding" key="coding_type"/>

    <parameter value="false" key="use_comparison_groups"/>

    <list key="comparison_groups"/>

    <parameter value="all 0 and warning" key="unexpected_value_handling"/>

    <parameter value="false" key="use_underscore_in_name"/>

    </operator>


    -<operator name="Generate Attributes (2)" expanded="true" compatibility="9.3.001" class="generate_attributes" activated="true" y="187" x="313" width="90" height="82">


    -<list key="function_descriptions">

    <parameter value="if ((abs ([MmaxExt60re]-[MmaxExt60li])/ max([MmaxExt60li],[MmaxExt60re]))> 0.1, 1, 0)" key="MA60"/>

    <parameter value="if ((abs ([MmaxExt180re]-[MmaxExt180li])/ max([MmaxExt180li],[MmaxExt180re]))> 0.1, 1, 0)" key="MA180"/>

    <parameter value="if ([MA60] || [MA180], true, false)" key="MA"/>

    </list>

    <parameter value="true" key="keep_all"/>

    </operator>


    -<operator name="Select Attributes (3)" expanded="true" compatibility="9.3.001" class="select_attributes" activated="true" y="187" x="447" width="90" height="82">

    <parameter value="subset" key="attribute_filter_type"/>

    <parameter value="" key="attribute"/>

    <parameter value="" key="attributes"/>

    <parameter value="false" key="use_except_expression"/>

    <parameter value="attribute_value" key="value_type"/>

    <parameter value="false" key="use_value_type_exception"/>

    <parameter value="time" key="except_value_type"/>

    <parameter value="attribute_block" key="block_type"/>

    <parameter value="false" key="use_block_type_exception"/>

    <parameter value="value_matrix_row_start" key="except_block_type"/>

    <parameter value="false" key="invert_selection"/>

    <parameter value="false" key="include_special_attributes"/>

    <description width="126" colored="false" color="transparent" align="center"/>

    </operator>


    -<operator name="Set Role" expanded="true" compatibility="9.3.001" class="set_role" activated="true" y="187" x="581" width="90" height="82">

    <parameter value="MA" key="attribute_name"/>

    <parameter value="label" key="target_role"/>

    <list key="set_additional_roles"/>

    </operator>


    -<operator name="Optimize Parameters (Grid)" expanded="true" compatibility="9.3.001" class="concurrency:optimize_parameters_grid" activated="true" y="187" x="782" width="90" height="124">


    -<list key="parameters">

    <parameter value="gain_ratio,information_gain,gini_index,accuracy" key="Decision Tree (3).criterion"/>

    <parameter value="[-1.0;100.0;100;linear]" key="Decision Tree (3).maximal_depth"/>

    <parameter value="[2.0;10;10;linear]" key="Cross Validation (3).number_of_folds"/>

    </list>

    <parameter value="fail on error" key="error_handling"/>

    <parameter value="true" key="log_performance"/>

    <parameter value="false" key="log_all_criteria"/>

    <parameter value="false" key="synchronize"/>

    <parameter value="true" key="enable_parallel_execution"/>


    -<process expanded="true">


    -<operator name="Cross Validation (3)" expanded="true" compatibility="9.3.001" class="concurrency:cross_validation" activated="true" y="34" x="514" width="90" height="145">

    <parameter value="false" key="split_on_batch_attribute"/>

    <parameter value="false" key="leave_one_out"/>

    <parameter value="10" key="number_of_folds"/>

    <parameter value="stratified sampling" key="sampling_type"/>

    <parameter value="false" key="use_local_random_seed"/>

    <parameter value="1992" key="local_random_seed"/>

    <parameter value="true" key="enable_parallel_execution"/>


    -<process expanded="true">


    -<operator name="Decision Tree (3)" expanded="true" compatibility="9.3.001" class="concurrency:parallel_decision_tree" activated="true" y="34" x="246" width="90" height="103">

    <parameter value="accuracy" key="criterion"/>

    <parameter value="100" key="maximal_depth"/>

    <parameter value="false" key="apply_pruning"/>

    <parameter value="0.1" key="confidence"/>

    <parameter value="false" key="apply_prepruning"/>

    <parameter value="Infinity" key="minimal_gain"/>

    <parameter value="2" key="minimal_leaf_size"/>

    <parameter value="4" key="minimal_size_for_split"/>

    <parameter value="3" key="number_of_prepruning_alternatives"/>

    </operator>

    <connect to_port="training set" to_op="Decision Tree (3)" from_port="training set"/>

    <connect to_port="model" from_port="model" from_op="Decision Tree (3)"/>

    <portSpacing spacing="0" port="source_training set"/>

    <portSpacing spacing="0" port="sink_model"/>

    <portSpacing spacing="0" port="sink_through 1"/>

    </process>


    -<process expanded="true">


    -<operator name="Apply Model (3)" expanded="true" compatibility="9.3.001" class="apply_model" activated="true" y="34" x="112" width="90" height="82">

    <list key="application_parameters"/>

    <parameter value="false" key="create_view"/>

    </operator>


    -<operator name="Performance (3)" expanded="true" compatibility="9.3.001" class="performance_classification" activated="true" y="34" x="313" width="90" height="82" origin="GENERATED_TUTORIAL">

    <parameter value="first" key="main_criterion"/>

    <parameter value="true" key="accuracy"/>

    <parameter value="false" key="classification_error"/>

    <parameter value="false" key="kappa"/>

    <parameter value="false" key="weighted_mean_recall"/>

    <parameter value="false" key="weighted_mean_precision"/>

    <parameter value="false" key="spearman_rho"/>

    <parameter value="false" key="kendall_tau"/>

    <parameter value="false" key="absolute_error"/>

    <parameter value="false" key="relative_error"/>

    <parameter value="false" key="relative_error_lenient"/>

    <parameter value="false" key="relative_error_strict"/>

    <parameter value="false" key="normalized_absolute_error"/>

    <parameter value="false" key="root_mean_squared_error"/>

    <parameter value="false" key="root_relative_squared_error"/>

    <parameter value="false" key="squared_error"/>

    <parameter value="false" key="correlation"/>

    <parameter value="false" key="squared_correlation"/>

    <parameter value="false" key="cross-entropy"/>

    <parameter value="false" key="margin"/>

    <parameter value="false" key="soft_margin_loss"/>

    <parameter value="false" key="logistic_loss"/>

    <parameter value="true" key="skip_undefined_labels"/>

    <parameter value="true" key="use_example_weights"/>

    <list key="class_weights"/>

    </operator>

    <connect to_port="model" to_op="Apply Model (3)" from_port="model"/>

    <connect to_port="unlabelled data" to_op="Apply Model (3)" from_port="test set"/>

    <connect to_port="labelled data" to_op="Performance (3)" from_port="labelled data" from_op="Apply Model (3)"/>

    <connect to_port="performance 1" from_port="performance" from_op="Performance (3)"/>

    <connect to_port="test set results" from_port="example set" from_op="Performance (3)"/>

    <portSpacing spacing="0" port="source_model"/>

    <portSpacing spacing="0" port="source_test set"/>

    <portSpacing spacing="0" port="source_through 1"/>

    <portSpacing spacing="0" port="sink_test set results"/>

    <portSpacing spacing="0" port="sink_performance 1"/>

    <portSpacing spacing="0" port="sink_performance 2"/>

    </process>

    </operator>

    <connect to_port="example set" to_op="Cross Validation (3)" from_port="input 1"/>

    <connect to_port="model" from_port="model" from_op="Cross Validation (3)"/>

    <connect to_port="performance" from_port="performance 1" from_op="Cross Validation (3)"/>

    <portSpacing spacing="0" port="source_input 1"/>

    <portSpacing spacing="0" port="source_input 2"/>

    <portSpacing spacing="0" port="sink_performance"/>

    <portSpacing spacing="0" port="sink_model"/>

    <portSpacing spacing="0" port="sink_output 1"/>

    </process>

    </operator>

    <connect to_port="example set input" to_op="Generate Attributes (2)" from_port="example set output" from_op="Nominal to Numerical"/>

    <connect to_port="example set input" to_op="Select Attributes (3)" from_port="example set output" from_op="Generate Attributes (2)"/>

    <connect to_port="example set input" to_op="Set Role" from_port="example set output" from_op="Select Attributes (3)"/>

    <connect to_port="input 1" to_op="Optimize Parameters (Grid)" from_port="example set output" from_op="Set Role"/>

    <connect to_port="result 1" from_port="performance" from_op="Optimize Parameters (Grid)"/>

    <connect to_port="result 2" from_port="model" from_op="Optimize Parameters (Grid)"/>

    <connect to_port="result 3" from_port="parameter set" from_op="Optimize Parameters (Grid)"/>

    <portSpacing spacing="0" port="source_input 1"/>

    <portSpacing spacing="0" port="sink_result 1"/>

    <portSpacing spacing="0" port="sink_result 2"/>

    <portSpacing spacing="0" port="sink_result 3"/>

    <portSpacing spacing="0" port="sink_result 4"/>

    </process>

    </operator>

    </process>

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @dome

    Please post your XML code based on the followung screenshot, 


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • domedome Member Posts: 12 Newbie
    Thank you! This helps a lot!
Sign In or Register to comment.