🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Optimise Parameters operator

neildugganneilduggan Member Posts: 18 Contributor I
Hi

I am using the Optimise Parameters operator on some Decision Tree analysis - I have a query on what parameters to select within this operator:

Inside the Optimise Parameters operator, I have an X-Validation operator and in this I have Decision Tree on the training side and an Apply Model operator & a Performance operator on the testing side.

The results from the Performance operator are something like this:

For example, the results are:
Accuracy:
                          True Yes                True No                  Class Precision
Predicted Yes:      100                            1                                99.01%
Predicted No:        40                            460                            92.00%
Class recall        71.43%                    99.78%                 

In the In the Optimise Parameters operator, I have selected the DT operators (accuracy, gini_index, gain_ration and Information_gain) as the parameters to optimise but I'm not sure if this is correct? Should I be choosing something in the Performance operator? Ideally, I would like to get a result which balances the values of 71.43% & 99.78% as much as possible in the example above.

Any advice appreciated

Thanks

Neil

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,083  RM Data Scientist
    Hi,

    the optimize operator is optimizing for the main criterion of your performance operator. I guess it will optimize for the accuracy. if you want to optimize for something else, you need to use another value here (or define the specific value yourself. Data to Performance is helpful).

    In your case the problem might be class balance. Have you considered weights? And have you considered to change to AUC as measure?

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • MBA_Data_MinerMBA_Data_Miner Member Posts: 21 Contributor I
    I'd be curious to see how a weighting scheme would work for balancing data. I have used sampling to balance datasets before but not weights.
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,083  RM Data Scientist
    Hi,

    Use a generate Attribute operator and create a new attribute. Set the weight to 1 for the one class and to 10 for the other. Afterwards set role of this attribute to weight. Then every example of the second class counts 10x.

    Cheers,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.