Decision Tree Gini Index crition

nesebaz · December 2020

Hi,
I am trying to Migraine Diagnosis by using decision tree techniques. I just started using rapidminer. I have training and test dataset. They don't have any missing values. I can not understand is it true or not. Because classification does not consist of a single class in each branch. I am adding screenshots. How can increase accuracy and how can have better decision tree?
Thank you

nesebaz · December 2020

My accurancy is %55

Image: https://us.v-cdn.net/6030995/uploads/editor/8k/wt2qhxpcm5ag.png

Image: https://us.v-cdn.net/6030995/uploads/editor/ht/jzwcn5h681sy.png

Image: https://us.v-cdn.net/6030995/uploads/editor/28/9tx03x1h0gas.png

jacobcybulski · December 2020

I think you are asking the wrong question. You ask how to increase accuracy AND how to end up with a better decision tree. I'd probably ask a series of very different questions, such as the following. What do the label classes mean? How large is my training data set? Are test and training data sets consistent? What model is most suitable for the data? What data pre-processing could improve model training? How can the selected model be tuned to produce best performance? Etc.

First, I observe that your test data is not very large. Is it the same case with training data? I can see that your label has three classes, which of these classes is positive - or which of these classes are you most interested in and would like the model to predict accurately? And if so, what measure of model performance is most appropriate for your objective? (perhaps accuracy is not the best measure of the model performance). The label classes seem to be non-exclusive, is this a problem for prediction? Your label classes are unbalanced, so what can be done to balance the classes for model training, as this often improves the model performance? Your post title implies that your decision tree uses Gini Index, have you tried different node splitting criteria? Have you tried other model parameters? Have you study those parameters in a more systematic way? Have you tried other classifiers with this data? So, start answering those questions first.

Decision Tree Gini Index crition

Answers

Categories