RapidMiner

Regression with Random Forest ?

SOLVED
Regular Contributor

Regression with Random Forest ?

Hi RapidMiner,

 

I'm doing regression with 480 input features. I tried to use Deep Learning operator but the training Root Mean Square Error is still quite high. Now I'm trying to use Random Forest because of its Random Subspace approach, but found that the Random Forest operator cannot handle numerical label. How can I deal with this?

 

Thank you very much for your support.

 

Best Regards,

phivu

2 ACCEPTED SOLUTIONS

Accepted Solutions
Elite II
Solution
Accepted by IngoRM (RMStaff)
‎02-23-2017 08:27 AM

Re: Regression with Random Forest ?

You cannot do it in RapidMiner unless you are willing to use R Scripts. However, the latest version of RM has a new operator Gradient Boosted Trees which is competitive with Random Forest and it can handle both numerical and polynominal labels. Explore it. 

Highlighted
Elite II
Solution
Accepted by IngoRM (RMStaff)
‎02-23-2017 08:27 AM

Re: Regression with Random Forest ?

Install the R Script Extension. Verify you have R installed in your computer and run the code below. I adapted the code that comes with the application to run Random Forest for a regression problem.

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" breakpoints="after" class="retrieve" compatibility="7.3.001" expanded="true" height="68" name="Retrieve Polynomial" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
        <description align="center" color="blue" colored="true" width="126">Fetch example data</description>
      </operator>
      <operator activated="true" class="split_data" compatibility="7.3.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="34">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.5"/>
          <parameter key="ratio" value="0.5"/>
        </enumeration>
        <description align="center" color="purple" colored="true" width="126">Split the data in a training and a test set</description>
      </operator>
      <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Learn Model" width="90" x="380" y="34">
        <parameter key="script" value="# train a random Forest on the training data and return the learned model&#10;&#10;rm_main = function(data)&#10;{&#10;     library(randomForest)                    &#10;&#9;Model.rf  &lt;-   randomForest(label~.,  data =data,mtry=3,importance=FALSE,na.action=na.omit)&#10;    &#9;return(Model.rf)&#10;}&#10;"/>
        <description align="center" color="red" colored="true" width="126">Train a RandomForest model in R and return it as an R object</description>
      </operator>
      <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="103" name="Apply R Model" width="90" x="514" y="238">
        <parameter key="script" value="## load the trained model and apply it on the test data&#10;&#10;rm_main = function(model, data)&#10;{&#10;   library(randomForest)&#10;   # apply the model and build a prediction&#10;   result &lt;-predict(model, data)&#10;&#10;   # add the prediction to the example set&#10;   data$prediction &lt;- result&#10;   &#10;   # update the meta data&#10;   metaData$data$prediction &lt;&lt;- list(type=&quot;real&quot;, role=&quot;prediction&quot;)&#10;   &#10;   return(data)&#10;}&#10;"/>
        <description align="center" color="red" colored="true" width="126">Apply the trained model on the test data</description>
      </operator>
      <connect from_op="Retrieve Polynomial" from_port="output" to_op="Split Data" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 1" to_op="Learn Model" to_port="input 1"/>
      <connect from_op="Split Data" from_port="partition 2" to_op="Apply R Model" to_port="input 2"/>
      <connect from_op="Learn Model" from_port="output 1" to_op="Apply R Model" to_port="input 1"/>
      <connect from_op="Apply R Model" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
4 REPLIES
Elite II
Solution
Accepted by IngoRM (RMStaff)
‎02-23-2017 08:27 AM

Re: Regression with Random Forest ?

You cannot do it in RapidMiner unless you are willing to use R Scripts. However, the latest version of RM has a new operator Gradient Boosted Trees which is competitive with Random Forest and it can handle both numerical and polynominal labels. Explore it. 

Regular Contributor

Re: Regression with Random Forest ?

Thank you Earmijo, could you elaborate more on how to use RapidMiner with R to do regression with Random Forest?

Highlighted
Elite II
Solution
Accepted by IngoRM (RMStaff)
‎02-23-2017 08:27 AM

Re: Regression with Random Forest ?

Install the R Script Extension. Verify you have R installed in your computer and run the code below. I adapted the code that comes with the application to run Random Forest for a regression problem.

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" breakpoints="after" class="retrieve" compatibility="7.3.001" expanded="true" height="68" name="Retrieve Polynomial" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
        <description align="center" color="blue" colored="true" width="126">Fetch example data</description>
      </operator>
      <operator activated="true" class="split_data" compatibility="7.3.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="34">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.5"/>
          <parameter key="ratio" value="0.5"/>
        </enumeration>
        <description align="center" color="purple" colored="true" width="126">Split the data in a training and a test set</description>
      </operator>
      <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Learn Model" width="90" x="380" y="34">
        <parameter key="script" value="# train a random Forest on the training data and return the learned model&#10;&#10;rm_main = function(data)&#10;{&#10;     library(randomForest)                    &#10;&#9;Model.rf  &lt;-   randomForest(label~.,  data =data,mtry=3,importance=FALSE,na.action=na.omit)&#10;    &#9;return(Model.rf)&#10;}&#10;"/>
        <description align="center" color="red" colored="true" width="126">Train a RandomForest model in R and return it as an R object</description>
      </operator>
      <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="103" name="Apply R Model" width="90" x="514" y="238">
        <parameter key="script" value="## load the trained model and apply it on the test data&#10;&#10;rm_main = function(model, data)&#10;{&#10;   library(randomForest)&#10;   # apply the model and build a prediction&#10;   result &lt;-predict(model, data)&#10;&#10;   # add the prediction to the example set&#10;   data$prediction &lt;- result&#10;   &#10;   # update the meta data&#10;   metaData$data$prediction &lt;&lt;- list(type=&quot;real&quot;, role=&quot;prediction&quot;)&#10;   &#10;   return(data)&#10;}&#10;"/>
        <description align="center" color="red" colored="true" width="126">Apply the trained model on the test data</description>
      </operator>
      <connect from_op="Retrieve Polynomial" from_port="output" to_op="Split Data" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 1" to_op="Learn Model" to_port="input 1"/>
      <connect from_op="Split Data" from_port="partition 2" to_op="Apply R Model" to_port="input 2"/>
      <connect from_op="Learn Model" from_port="output 1" to_op="Apply R Model" to_port="input 1"/>
      <connect from_op="Apply R Model" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Regular Contributor

Re: Regression with Random Forest ?

That's great, thanks!