Effect of Normalisation in Gradient Boosting Trees

k_vishnu772 · August 2018

HI All,

I am dealing with a small data set of 90 rows and 29 features.I tried many algorithms and stopped at gradient boosting and based on my knowledge gradient boosting does not require any sort of normalisation of numerical data.But when i applied normalise on my data the result got improved little bit and the change i noticed is in Weights coming from the Gradient boosting algorithm,

one of the numerical features where i got zero weight with out normaisation has got highest weight with normalisation ,Could you please help me out here which one to consider.

rfuentealba · August 2018

Hi @k_vishnu772,

Are you able to post the XML process, so that we can see if there is something wrong? Thanks in advance.

All the best,

k_vishnu772 · August 2018

@rfuentealba please find the xml of my process. i am very much interested in finding out the infulential parameters for my model from weights out put of gradient boosting. When i run the process with normalisation i got " att5 " highest weight and when i ran the model without normalisation the same attribute got xero weight and "att 22" got highest weight. So i got confused here as far as i know gradient boosting can handle the data with out normalisation but here i can see some real difference.

I ran the tukey test for outliers detection and found some outliers in "att5 " and some others also .Could you please let me know how to consider this?

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.0.000" expanded="true" height="68" name="Retrieve XML Process Data" width="90" x="45" y="136">
<parameter key="repository_entry" value="../Data/XML Process Data"/>
</operator>
<operator activated="true" class="multiply" compatibility="9.0.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="289"/>
<operator activated="true" class="normalize" compatibility="9.0.000" expanded="true" height="103" name="Normalize" width="90" x="313" y="136"/>
<operator activated="true" class="concurrency:cross_validation" compatibility="9.0.000" expanded="true" height="145" name="Cross Validation" width="90" x="447" y="34">
<process expanded="true">
<operator activated="true" class="h2o:gradient_boosted_trees" compatibility="9.0.000" expanded="true" height="103" name="Gradient Boosted Trees" width="90" x="112" y="85">
<parameter key="number_of_trees" value="81"/>
<parameter key="learning_rate" value="0.01"/>
<list key="expert_parameters"/>
</operator>
<operator activated="true" class="remember" compatibility="9.0.000" expanded="true" height="68" name="Remember" width="90" x="246" y="187">
<parameter key="name" value="wei"/>
<parameter key="io_object" value="AttributeWeights"/>
</operator>
<connect from_port="training set" to_op="Gradient Boosted Trees" to_port="training set"/>
<connect from_op="Gradient Boosted Trees" from_port="model" to_port="model"/>
<connect from_op="Gradient Boosted Trees" from_port="weights" to_op="Remember" to_port="store"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="9.0.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_binominal_classification" compatibility="9.0.000" expanded="true" height="82" name="Performance (2)" width="90" x="179" y="238">
<parameter key="classification_error" value="true"/>
<parameter key="kappa" value="true"/>
<parameter key="AUC (optimistic)" value="true"/>
<parameter key="AUC" value="true"/>
<parameter key="AUC (pessimistic)" value="true"/>
<parameter key="precision" value="true"/>
<parameter key="recall" value="true"/>
<parameter key="lift" value="true"/>
<parameter key="fallout" value="true"/>
<parameter key="f_measure" value="true"/>
<parameter key="false_positive" value="true"/>
<parameter key="false_negative" value="true"/>
<parameter key="true_positive" value="true"/>
<parameter key="true_negative" value="true"/>
<parameter key="sensitivity" value="true"/>
<parameter key="specificity" value="true"/>
<parameter key="positive_predictive_value" value="true"/>
<parameter key="negative_predictive_value" value="true"/>
<parameter key="psep" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="model_simulator:model_simulator" compatibility="9.0.001" expanded="true" height="103" name="Model Simulator" width="90" x="648" y="34"/>
<operator activated="true" class="recall" compatibility="9.0.000" expanded="true" height="68" name="Recall" width="90" x="715" y="340">
<parameter key="name" value="wei"/>
<parameter key="io_object" value="AttributeWeights"/>
</operator>
<connect from_op="Retrieve XML Process Data" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Normalize" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_port="result 4"/>
<connect from_op="Normalize" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
<connect from_op="Cross Validation" from_port="model" to_op="Model Simulator" to_port="model"/>
<connect from_op="Cross Validation" from_port="example set" to_op="Model Simulator" to_port="training data"/>
<connect from_op="Cross Validation" from_port="performance 1" to_port="result 2"/>
<connect from_op="Model Simulator" from_port="model output" to_port="result 1"/>
<connect from_op="Recall" from_port="result" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
</process>

Maerkli · August 2018

The XML works fine. Is it possible to post the data, if not confidential?

Maerkli

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Effect of Normalisation in Gradient Boosting Trees

Answers