Regression with Random Forest ?

phivuphivu Member Posts: 34 Guru
edited November 2018 in Help

Hi RapidMiner,

 

I'm doing regression with 480 input features. I tried to use Deep Learning operator but the training Root Mean Square Error is still quite high. Now I'm trying to use Random Forest because of its Random Subspace approach, but found that the Random Forest operator cannot handle numerical label. How can I deal with this?

 

Thank you very much for your support.

 

Best Regards,

phivu

Tagged:

Best Answers

  • earmijoearmijo Member Posts: 270 Unicorn
    Solution Accepted

    You cannot do it in RapidMiner unless you are willing to use R Scripts. However, the latest version of RM has a new operator Gradient Boosted Trees which is competitive with Random Forest and it can handle both numerical and polynominal labels. Explore it. 

  • earmijoearmijo Member Posts: 270 Unicorn
    Solution Accepted

    Install the R Script Extension. Verify you have R installed in your computer and run the code below. I adapted the code that comes with the application to run Random Forest for a regression problem.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" breakpoints="after" class="retrieve" compatibility="7.3.001" expanded="true" height="68" name="Retrieve Polynomial" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
    <description align="center" color="blue" colored="true" width="126">Fetch example data</description>
    </operator>
    <operator activated="true" class="split_data" compatibility="7.3.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="34">
    <enumeration key="partitions">
    <parameter key="ratio" value="0.5"/>
    <parameter key="ratio" value="0.5"/>
    </enumeration>
    <description align="center" color="purple" colored="true" width="126">Split the data in a training and a test set</description>
    </operator>
    <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Learn Model" width="90" x="380" y="34">
    <parameter key="script" value="# train a random Forest on the training data and return the learned model&#10;&#10;rm_main = function(data)&#10;{&#10; library(randomForest) &#10;&#9;Model.rf &lt;- randomForest(label~., data =data,mtry=3,importance=FALSE,na.action=na.omit)&#10; &#9;return(Model.rf)&#10;}&#10;"/>
    <description align="center" color="red" colored="true" width="126">Train a RandomForest model in R and return it as an R object</description>
    </operator>
    <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="103" name="Apply R Model" width="90" x="514" y="238">
    <parameter key="script" value="## load the trained model and apply it on the test data&#10;&#10;rm_main = function(model, data)&#10;{&#10; library(randomForest)&#10; # apply the model and build a prediction&#10; result &lt;-predict(model, data)&#10;&#10; # add the prediction to the example set&#10; data$prediction &lt;- result&#10; &#10; # update the meta data&#10; metaData$data$prediction &lt;&lt;- list(type=&quot;real&quot;, role=&quot;prediction&quot;)&#10; &#10; return(data)&#10;}&#10;"/>
    <description align="center" color="red" colored="true" width="126">Apply the trained model on the test data</description>
    </operator>
    <connect from_op="Retrieve Polynomial" from_port="output" to_op="Split Data" to_port="example set"/>
    <connect from_op="Split Data" from_port="partition 1" to_op="Learn Model" to_port="input 1"/>
    <connect from_op="Split Data" from_port="partition 2" to_op="Apply R Model" to_port="input 2"/>
    <connect from_op="Learn Model" from_port="output 1" to_op="Apply R Model" to_port="input 1"/>
    <connect from_op="Apply R Model" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

Sign In or Register to comment.