Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Optimization Grid with Random Forest - Not Working.

CarlNCarlN Member Posts: 6 Contributor II
RapidMiner Unicorns 🦄,

I trying to run a optimization grid with our my Random Forest model and I am getting an error.  It's stating that gain_ratio criterion cannot be used for numeric labels (see pictures below).  I checked all my parameters and I am not using gain_ratio in the optimization grid (see pictures below).  So, specifically how you used a optimization grid with cross validation, and random forest predicting a real number in RapidMiner? 

Can you send an basic working example of this workflow process with with good documented comments explaining each step.



Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,528 RM Data Scientist
    Hi,
    can you show us your optimization settings? Likely you use least_square there.

    Also: Be careful using Explain Pred in the X-Val. This can take enormous amount of time.

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • CarlNCarlN Member Posts: 6 Contributor II
    Please see below.  Also, I am sending the results of the optimization to a log.  Let me know what this issue is or an example workflow process of how this works in RapidMiner.




  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,528 RM Data Scientist
    Hi,
    you have a numeric label and try to vary the gain metric between [information_gain,gain_ratio,gini_index,accuracy]. This has to not work, since those are all metrics which don't work on numeric labels.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • CarlNCarlN Member Posts: 6 Contributor II
    Okay, thanks for the explanation, but the solution is not clear from your response. 

    Specifically what configuration/setup tasks are needed to make the grid optimization operator work and simply find the optimal parameters for Random Forest model?  Do you have a sample workflow of how this can work?
  • CarlNCarlN Member Posts: 6 Contributor II
    Okay, thanks for the explanation, but the solution is not clear from your response. 

    Specifically what configuration/setup tasks are needed to make the grid optimization operator work and simply find the optimal parameters for Random Forest model?  Do you have a sample workflow of how this can work?
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi!

    Just select correct and applicable settings for the optimization. Leave the criterion alone (it has to be least_square for numerical prediction) and optimize parameters like the number of trees and the maximum depth. 

    Regards,
    Balázs
  • CarlNCarlN Member Posts: 6 Contributor II
    I am using least_square in the Random Forest decision tree and it's still giving me an error (see below).  I still don't understand why it's not working.  Please educate me on the specific, step by step, how-to instructions to make this work.  Thank you much.


Sign In or Register to comment.