Options

# Optimise Parameters operator

Member Posts: 18 Contributor II
Hi

I am using the Optimise Parameters operator on some Decision Tree analysis - I have a query on what parameters to select within this operator:

Inside the Optimise Parameters operator, I have an X-Validation operator and in this I have Decision Tree on the training side and an Apply Model operator & a Performance operator on the testing side.

The results from the Performance operator are something like this:

For example, the results are:
Accuracy:
True Yes                True No                  Class Precision
Predicted Yes:      100                            1                                99.01%
Predicted No:        40                            460                            92.00%
Class recall        71.43%                    99.78%

In the In the Optimise Parameters operator, I have selected the DT operators (accuracy, gini_index, gain_ration and Information_gain) as the parameters to optimise but I'm not sure if this is correct? Should I be choosing something in the Performance operator? Ideally, I would like to get a result which balances the values of 71.43% & 99.78% as much as possible in the example above.

Thanks

Neil

• Options
Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,524 RM Data Scientist
Hi,

the optimize operator is optimizing for the main criterion of your performance operator. I guess it will optimize for the accuracy. if you want to optimize for something else, you need to use another value here (or define the specific value yourself. Data to Performance is helpful).

In your case the problem might be class balance. Have you considered weights? And have you considered to change to AUC as measure?

Best,
Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany
• Options
Member Posts: 21 Contributor II
I'd be curious to see how a weighting scheme would work for balancing data. I have used sampling to balance datasets before but not weights.
• Options
Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,524 RM Data Scientist
Hi,

Use a generate Attribute operator and create a new attribute. Set the weight to 1 for the one class and to 10 for the other. Afterwards set role of this attribute to weight. Then every example of the second class counts 10x.

Cheers,
Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany