Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Heritage Health: problems creating a useable Random Forest model
Hello,
I am having problems developing a useable Random Forest model in the RapidMiner GUI.
The dataset is from the Heritage Healthcare contest. It has approximately 144 attributes and over 70k examples.
The datatypes are mostly numeric and binomial. The label is numeric.
I am new to RapidMiner GUI and am trying to create a simple Random Forest model.
The process is straignt-forward. It reads in a .csv files, set the roles, discretes the numeric label using 10 bins, splits the process into modeling and validation and writes out the model.
When I initially ran the process, all the trees contained one node with a range for the predicted value of negative infinity to 0.278.
When I turned off pruning and pre-pruning, the process failed with an error message of "cannot clone example set".
When I turned off pre-prunning BUT turned on prunning, the process didn't fail but didn't produce better results. When I swithed the algorithm type to gini_varinace, the model produced trees with multiple nodes.
However, when I checked the performance of the model from the validation process, the model predicts only the range negative infirnity to 0.287. The performance operatior indicates that this gives an 84% performance.
Do you know how to modify the model so that more ranges are used in the prediction?
I lowered the gain needed to create a new node to 0.05 and decreased the confidence level from 0.25 to 0.05.
Thanks!
I am having problems developing a useable Random Forest model in the RapidMiner GUI.
The dataset is from the Heritage Healthcare contest. It has approximately 144 attributes and over 70k examples.
The datatypes are mostly numeric and binomial. The label is numeric.
I am new to RapidMiner GUI and am trying to create a simple Random Forest model.
The process is straignt-forward. It reads in a .csv files, set the roles, discretes the numeric label using 10 bins, splits the process into modeling and validation and writes out the model.
When I initially ran the process, all the trees contained one node with a range for the predicted value of negative infinity to 0.278.
When I turned off pruning and pre-pruning, the process failed with an error message of "cannot clone example set".
When I turned off pre-prunning BUT turned on prunning, the process didn't fail but didn't produce better results. When I swithed the algorithm type to gini_varinace, the model produced trees with multiple nodes.
However, when I checked the performance of the model from the validation process, the model predicts only the range negative infirnity to 0.287. The performance operatior indicates that this gives an 84% performance.
Do you know how to modify the model so that more ranges are used in the prediction?
I lowered the gain needed to create a new node to 0.05 and decreased the confidence level from 0.25 to 0.05.
Thanks!
0
Answers
Just experiment with the possibilities
Best, Marius