Options

Optimizer Parameter Grid (No result from optimizer)

dasoxoridasoxori Member Posts: 9 Learner I
edited February 8 in Help
Some time ago I had used with a similar dataset as today the optimize parameter grid and the results had the effect of improving the performance of the model. The data is reported on patient records. Due to the nature of the data a lot of it is missing, but the optimizer worked. Now that I have modified the dataset I get the following error. Feeding the optimize parameter grid with one of the samples contained in the rapidminer I get results. So the problem lies in the dataset. I have included examples of these in the following screenshots. Is there any idea on how to solve this problem?





Tagged:

Answers

  • Options
    ClaudioKeckClaudioKeck Employee, Member Posts: 38 Guru
    edited February 8
    Hi, 

    would it be possible to share the XML of the process and the data please, to investigate the issue further?

    have you tried to replace the missing values? 

    Thank you in advance!
  • Options
    dasoxoridasoxori Member Posts: 9 Learner I
    For now I filtered out columns that are full to test the optimizer and the result is the same. The purpose is to do a multiparameter analysis on the different stages of the dataset.

    <?xml version="1.0" encoding="UTF-8"?><process version="10.1.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="10.1.002" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="10.1.002" expanded="true" height="68" name="Retrieve Mean" width="90" x="45" y="34">
            <parameter key="repository_entry" value="../data/stroke/Mean"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" compatibility="10.1.002" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="45" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="hospital_expire_flag"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="blending:set_role" compatibility="10.1.002" expanded="true" height="82" name="Set Role" width="90" x="179" y="187">
            <list key="set_roles">
              <parameter key="hospital_expire_flag" value="label"/>
            </list>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="10.1.002" expanded="true" height="103" name="Training" width="90" x="313" y="34">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="custom_filters"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="row_count.le.39040"/>
            </list>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <operator activated="true" class="blending:select_attributes" compatibility="10.1.002" expanded="true" height="82" name="Select Attributes" width="90" x="514" y="85">
            <parameter key="type" value="exclude attributes"/>
            <parameter key="attribute_filter_type" value="a subset"/>
            <parameter key="select_attribute" value=""/>
            <parameter key="select_subset" value="hadm_id␞subject_id␞Time_Zone"/>
            <parameter key="also_apply_to_special_attributes_(id,_label..)" value="false"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="10.1.002" expanded="true" height="103" name="Test" width="90" x="313" y="289">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="custom_filters"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="row_count.gt.39040"/>
            </list>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <operator activated="true" class="blending:select_attributes" compatibility="10.1.002" expanded="true" height="82" name="Select Attributes (2)" width="90" x="514" y="187">
            <parameter key="type" value="exclude attributes"/>
            <parameter key="attribute_filter_type" value="a subset"/>
            <parameter key="select_attribute" value=""/>
            <parameter key="select_subset" value="hadm_id␞subject_id␞Time_Zone"/>
            <parameter key="also_apply_to_special_attributes_(id,_label..)" value="false"/>
          </operator>
          <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="10.1.002" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="782" y="85">
            <list key="parameters">
              <parameter key="XGBoost.learning_rate" value="[0.001;0.15;10;linear]"/>
            </list>
            <parameter key="error_handling" value="fail on error"/>
            <parameter key="log_performance" value="true"/>
            <parameter key="log_all_criteria" value="false"/>
            <parameter key="synchronize" value="false"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="xgboost:xgboost" compatibility="0.1.003" expanded="true" height="103" name="XGBoost" width="90" x="246" y="34">
                <parameter key="booster" value="tree booster"/>
                <parameter key="rounds" value="25"/>
                <parameter key="early_stopping" value="none"/>
                <parameter key="early_stopping_rounds" value="10"/>
                <parameter key="learning_rate" value="0.01"/>
                <parameter key="min_split_loss" value="0.0"/>
                <parameter key="max_depth" value="6"/>
                <parameter key="min_child_weight" value="1.0"/>
                <parameter key="subsample" value="1.0"/>
                <parameter key="tree_method" value="auto"/>
                <parameter key="lambda" value="1.0"/>
                <parameter key="alpha" value="0.0"/>
                <parameter key="sample_type" value="uniform"/>
                <parameter key="normalize_type" value="tree"/>
                <parameter key="rate_drop" value="0.0"/>
                <parameter key="skip_drop" value="0.0"/>
                <parameter key="updater" value="shotgun"/>
                <parameter key="feature_selector" value="cyclic"/>
                <parameter key="top_k" value="0"/>
                <enumeration key="expert_parameters"/>
              </operator>
              <operator activated="true" class="apply_model" compatibility="10.1.002" expanded="true" height="82" name="Apply Model" width="90" x="514" y="442">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_classification" compatibility="10.1.002" expanded="true" height="82" name="Performance" width="90" x="648" y="34">
                <parameter key="main_criterion" value="first"/>
                <parameter key="accuracy" value="true"/>
                <parameter key="classification_error" value="true"/>
                <parameter key="kappa" value="true"/>
                <parameter key="weighted_mean_recall" value="true"/>
                <parameter key="weighted_mean_precision" value="false"/>
                <parameter key="spearman_rho" value="false"/>
                <parameter key="kendall_tau" value="false"/>
                <parameter key="absolute_error" value="false"/>
                <parameter key="relative_error" value="false"/>
                <parameter key="relative_error_lenient" value="false"/>
                <parameter key="relative_error_strict" value="false"/>
                <parameter key="normalized_absolute_error" value="false"/>
                <parameter key="root_mean_squared_error" value="false"/>
                <parameter key="root_relative_squared_error" value="false"/>
                <parameter key="squared_error" value="false"/>
                <parameter key="correlation" value="false"/>
                <parameter key="squared_correlation" value="false"/>
                <parameter key="cross-entropy" value="false"/>
                <parameter key="margin" value="false"/>
                <parameter key="soft_margin_loss" value="false"/>
                <parameter key="logistic_loss" value="false"/>
                <parameter key="skip_undefined_labels" value="true"/>
                <parameter key="use_example_weights" value="true"/>
                <list key="class_weights"/>
              </operator>
              <connect from_port="input 1" to_op="XGBoost" to_port="training set"/>
              <connect from_port="input 2" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="XGBoost" from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Apply Model" from_port="model" to_port="model"/>
              <connect from_op="Performance" from_port="performance" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="source_input 3" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Mean" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Training" to_port="example set input"/>
          <connect from_op="Training" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Training" from_port="original" to_op="Test" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
          <connect from_op="Test" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 2"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="model" to_port="result 2"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="parameter set" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>


  • Options
    ClaudioKeckClaudioKeck Employee, Member Posts: 38 Guru
    edited February 9
    Would it be possible to sharet the dataset as well?

    you could try to use on of the Operators in the "Missing" Folder like "replace Missing values"
  • Options
    dasoxoridasoxori Member Posts: 9 Learner I
    Unfortunately the dataset cannot be shared as it contains patient data. I carefully chose the screen shot from the dataset as a sample of what is included within it.

    In terms of incomplete values I don't think this is the cause, as I created a dataset with columns that were fully populated
  • Options
    ClaudioKeckClaudioKeck Employee, Member Posts: 38 Guru
    Could you also check please your .RapidMiner/rapidminer-studio.log file ?  
  • Options
    dasoxoridasoxori Member Posts: 9 Learner I
    edited February 25
    After I built the model from scratch, it worked.

Sign In or Register to comment.