The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

Effect of Normalisation in Gradient Boosting Trees

k_vishnu772k_vishnu772 Member Posts: 34 Contributor I
edited November 2018 in Help

HI All,

 

I am dealing with a small data set of 90 rows and 29 features.I tried many algorithms and stopped at gradient boosting and based on my knowledge gradient boosting does not require any sort of normalisation of numerical data.But when i applied normalise on my data the result got improved little bit and the change i noticed is in Weights coming from the Gradient boosting  algorithm,

one of the numerical features where i got zero weight with out normaisation has got highest weight with normalisation ,Could you please help me out here which one to consider.

Answers

  • Options
    rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    Hi @k_vishnu772,

     

    Are you able to post the XML process, so that we can see if there is something wrong? Thanks in advance.

     

    All the best,

     

  • Options
    k_vishnu772k_vishnu772 Member Posts: 34 Contributor I

    @rfuentealba please find the xml of my process. i am very much interested in finding out the infulential parameters for my model from weights out put of gradient boosting. When i run the process with normalisation i got " att5 " highest weight and when i ran the model without normalisation the same attribute got xero weight and "att 22" got highest weight. So i got confused here as far as i know gradient boosting can handle the data with out normalisation but here i can see some real difference. 

     

    I  ran the tukey test for outliers detection and found some outliers in "att5 " and some others also .Could you please let me know how to consider this?

     

     

     

     

     

     

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.0.000" expanded="true" height="68" name="Retrieve XML Process Data" width="90" x="45" y="136">
    <parameter key="repository_entry" value="../Data/XML Process Data"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="9.0.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="289"/>
    <operator activated="true" class="normalize" compatibility="9.0.000" expanded="true" height="103" name="Normalize" width="90" x="313" y="136"/>
    <operator activated="true" class="concurrency:cross_validation" compatibility="9.0.000" expanded="true" height="145" name="Cross Validation" width="90" x="447" y="34">
    <process expanded="true">
    <operator activated="true" class="h2o:gradient_boosted_trees" compatibility="9.0.000" expanded="true" height="103" name="Gradient Boosted Trees" width="90" x="112" y="85">
    <parameter key="number_of_trees" value="81"/>
    <parameter key="learning_rate" value="0.01"/>
    <list key="expert_parameters"/>
    </operator>
    <operator activated="true" class="remember" compatibility="9.0.000" expanded="true" height="68" name="Remember" width="90" x="246" y="187">
    <parameter key="name" value="wei"/>
    <parameter key="io_object" value="AttributeWeights"/>
    </operator>
    <connect from_port="training set" to_op="Gradient Boosted Trees" to_port="training set"/>
    <connect from_op="Gradient Boosted Trees" from_port="model" to_port="model"/>
    <connect from_op="Gradient Boosted Trees" from_port="weights" to_op="Remember" to_port="store"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="9.0.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_binominal_classification" compatibility="9.0.000" expanded="true" height="82" name="Performance (2)" width="90" x="179" y="238">
    <parameter key="classification_error" value="true"/>
    <parameter key="kappa" value="true"/>
    <parameter key="AUC (optimistic)" value="true"/>
    <parameter key="AUC" value="true"/>
    <parameter key="AUC (pessimistic)" value="true"/>
    <parameter key="precision" value="true"/>
    <parameter key="recall" value="true"/>
    <parameter key="lift" value="true"/>
    <parameter key="fallout" value="true"/>
    <parameter key="f_measure" value="true"/>
    <parameter key="false_positive" value="true"/>
    <parameter key="false_negative" value="true"/>
    <parameter key="true_positive" value="true"/>
    <parameter key="true_negative" value="true"/>
    <parameter key="sensitivity" value="true"/>
    <parameter key="specificity" value="true"/>
    <parameter key="positive_predictive_value" value="true"/>
    <parameter key="negative_predictive_value" value="true"/>
    <parameter key="psep" value="true"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
    <connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="model_simulator:model_simulator" compatibility="9.0.001" expanded="true" height="103" name="Model Simulator" width="90" x="648" y="34"/>
    <operator activated="true" class="recall" compatibility="9.0.000" expanded="true" height="68" name="Recall" width="90" x="715" y="340">
    <parameter key="name" value="wei"/>
    <parameter key="io_object" value="AttributeWeights"/>
    </operator>
    <connect from_op="Retrieve XML Process Data" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_port="result 4"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="model" to_op="Model Simulator" to_port="model"/>
    <connect from_op="Cross Validation" from_port="example set" to_op="Model Simulator" to_port="training data"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 2"/>
    <connect from_op="Model Simulator" from_port="model output" to_port="result 1"/>
    <connect from_op="Recall" from_port="result" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    </process>
    </operator>
    </process>

  • Options
    MaerkliMaerkli Member Posts: 84 Guru

    The XML works fine. Is it possible to post the data, if not confidential?

    Maerkli

Sign In or Register to comment.