Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Decision Tree Gini Index crition

nesebaznesebaz Member Posts: 3 Learner I
Hi,
I am trying to Migraine Diagnosis by using decision tree techniques. I just started using rapidminer. I have training and test dataset. They don't have any missing values. I can not understand is it true or not. Because classification does not consist of a single class in each branch. I am adding screenshots. How can increase accuracy and how can have better decision tree?
Thank you
Tagged:

Answers

  • nesebaznesebaz Member Posts: 3 Learner I

    My accurancy is %55





  • jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    I think you are asking the wrong question. You ask how to increase accuracy AND how to end up with a better decision tree. I'd probably ask a series of very different questions, such as the following. What do the label classes mean? How large is my training data set? Are test and training data sets consistent? What model is most suitable for the data? What data pre-processing could improve model training? How can the selected model be tuned to produce best performance? Etc. 

    First, I observe that your test data is not very large. Is it the same case with training data? I can see that your label has three classes, which of these classes is positive - or which of these classes are you most interested in and would like the model to predict accurately? And if so, what measure of model performance is most appropriate for your objective? (perhaps accuracy is not the best measure of the model performance). The label classes seem to be non-exclusive, is this a problem for prediction? Your label classes are unbalanced, so what can be done to balance the classes for model training, as this often improves the model performance? Your post title implies that your decision tree uses Gini Index, have you tried different node splitting criteria? Have you tried other model parameters? Have you study those parameters in a more systematic way? Have you tried other classifiers with this data? So, start answering those questions first.
Sign In or Register to comment.