If the tree grows too big, the model is easily overfitting. The confidence level is used to determine whether or not prune the branches based on pessimistic errors.
Hi @yyhuang , it would be a lot easier for people to know the complex parameters if rapidminer company provide detailed technical manual,many users can't read java
This operator for decision tree is the same as Quinlan's C4.5 or CART depending on the criterion, e.g. using gain_ratio or Gini. Useful reference Quinlan,J.R.: C4.5: Programs for Machine Learning Morgan Kauffman, 1993
Answers
Not sure about the definition of f and q in your probability function. But you can refer to the source code of decision tree and pessimisticpruner.java scripts here
https://github.com/rapidminer/rapidminer-studio-modular/blob/master/rapidminer-studio-core/src/main/java/com/rapidminer/operator/learner/tree/DecisionTreeLearner.java
https://github.com/rapidminer/rapidminer-studio-modular/blob/master/rapidminer-studio-core/src/main/java/com/rapidminer/operator/learner/tree/PessimisticPruner.java
If the tree grows too big, the model is easily overfitting. The confidence level is used to determine whether or not prune the branches based on pessimistic errors.
HTH!
YY
Useful reference
Quinlan,J.R.: C4.5: Programs for Machine Learning Morgan Kauffman, 1993