Random Forest

Liverpool_RedsLiverpool_Reds Member Posts: 1 Contributor I
edited June 2019 in Help

Could anyone please explain how Rapidminer implementation of Random Forest operator handles missing values in attributes.


  • Options
    SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn



    Both in Random Forest and Decision Trees, missing values are treated like a separate data value, both for numerical and nominal attributes. You can check it out yourself in the following process:


    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.000">
    <operator activated="true" class="process" compatibility="9.0.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.0.000" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="112" y="34">
    <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
    <operator activated="true" class="declare_missing_value" compatibility="9.0.000" expanded="true" height="82" name="Declare Missing Value" width="90" x="246" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Sex"/>
    <parameter key="mode" value="nominal"/>
    <parameter key="nominal_value" value="Female"/>
    <operator activated="true" class="declare_missing_value" compatibility="9.0.000" expanded="true" height="82" name="Declare Missing Value (2)" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Age"/>
    <parameter key="mode" value="expression"/>
    <parameter key="nominal_value" value="Female"/>
    <parameter key="expression_value" value="Age&gt;40"/>
    <operator activated="true" class="concurrency:parallel_random_forest" compatibility="9.0.000" expanded="true" height="103" name="Random Forest" width="90" x="648" y="34"/>
    <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Declare Missing Value" to_port="example set input"/>
    <connect from_op="Declare Missing Value" from_port="example set output" to_op="Declare Missing Value (2)" to_port="example set input"/>
    <connect from_op="Declare Missing Value (2)" from_port="example set output" to_op="Random Forest" to_port="training set"/>
    <connect from_op="Random Forest" from_port="model" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>

    Note that for numerical attributes it results in a 3-way split.


    With Decision Tree models, inputing missing values doesn't improve the model, unless you have a very precise way to do it.




Sign In or Register to comment.