Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
{Solved}Strange problem with Decision trees
Hi, i have a really strange problem with decision tree models on my data
my data range is: Reston (7), Zaire (7), Sudan (4), Bundibugyo (2), Cote d'Ivoire (1)
but when i run decision tree model i get strange results
for example i got a model which was correct in image but when i switched into text perspective i see this
Tree
CountofAlaGly > 5: Zaire {Reston=0, Zaire=6, Sudan=0, Bundibugyo=0, Cote d'Ivoire=0}
CountofAlaGly = 5
| CountofIleAsn > 2.500: Cote d'Ivoire {Reston=0, Zaire=0, Sudan=0, Bundibugyo=0, Cote d'Ivoire=2}
| CountofIleAsn = 2.500
| | CountofIleAsn > 1.500: Sudan {Reston=0, Zaire=0, Sudan=2, Bundibugyo=0, Cote d'Ivoire=0}
| | CountofIleAsn = 1.500
| | | CountofLeuThr > 4.500: Reston {Reston=8, Zaire=0, Sudan=0, Bundibugyo=0, Cote d'Ivoire=0}
| | | CountofLeuThr = 4.500: Bundibugyo {Reston=0, Zaire=0, Sudan=0, Bundibugyo=3, Cote d'Ivoire=0}
its so strange....as you can see model has mixed up... while i have 7 Zaire model says i just have six and while i have just one Cote d'Ivoire model is presenting two Cote d'Ivoire and so on
can some one explain what should i do?
my data range is: Reston (7), Zaire (7), Sudan (4), Bundibugyo (2), Cote d'Ivoire (1)
but when i run decision tree model i get strange results
for example i got a model which was correct in image but when i switched into text perspective i see this
Tree
CountofAlaGly > 5: Zaire {Reston=0, Zaire=6, Sudan=0, Bundibugyo=0, Cote d'Ivoire=0}
CountofAlaGly = 5
| CountofIleAsn > 2.500: Cote d'Ivoire {Reston=0, Zaire=0, Sudan=0, Bundibugyo=0, Cote d'Ivoire=2}
| CountofIleAsn = 2.500
| | CountofIleAsn > 1.500: Sudan {Reston=0, Zaire=0, Sudan=2, Bundibugyo=0, Cote d'Ivoire=0}
| | CountofIleAsn = 1.500
| | | CountofLeuThr > 4.500: Reston {Reston=8, Zaire=0, Sudan=0, Bundibugyo=0, Cote d'Ivoire=0}
| | | CountofLeuThr = 4.500: Bundibugyo {Reston=0, Zaire=0, Sudan=0, Bundibugyo=3, Cote d'Ivoire=0}
its so strange....as you can see model has mixed up... while i have 7 Zaire model says i just have six and while i have just one Cote d'Ivoire model is presenting two Cote d'Ivoire and so on
can some one explain what should i do?
0
Answers
Which tree operator are you using?
Best, Marius
classes and just had 4 of them
I cant say if this classification is right or wrong it might make sense, it be discussed but needs lab confirm which is impossible for me,
I have used a cross validation to gain average performances ..
Does it affect on my other operators such as SVM and Baysian? is this problem about my data ??
how should i solve this problem ?
Thanks alot
I suppose with "criteria" you mean attributes?
However, Decision Tree and Decision Tree (Parallel) are using the same algorithms, the parallel tree just uses several threads (and thus several cpus) to calculate the tree.
What about your 5 classes? Are they equally sized, or does one of them contain significantly less examples than the others? If yes, it may be possible that the trees just drop the class because they don't consider it worth be be considered at all.
Of course the creation of a decision tree is totally independent of an SVM or Naive Bayes - how should it affect an SVM?
So, all in all I need a bit more information about the data, and as always it would be a good idea to post your process setup - you'll find a description on how to ask good questions in the post linked in my signature.
Best, Marius
and about the size i have to tell i explained in my first post...the sizes are
Reston (7), Zaire (7), Sudan (4), Bundibugyo (2), Cote d'Ivoire (1)
here is my code but it not the whole processes... i had to delete some operators cause my code was so long that couldn't be post here
Concerning the differences in text view/graphical view, you can test which of the trees is used in the end by applying the tree to a piece of data and see according to which of the trees the examples are classified. If you can post the results of that, this would indeed help us a lot to fix the problem.
Probably the class which is not part of some trees is Cote d'Ivoire, since it makes only 5% of the data, and probably the tree creation algorithm did non consider it large enough to create a branch for it. The default Decision Tree e.g. has a lot of parameters which control the growing of the tree, maybe if you play around with them, the missing class will appear. But be careful, a bad choice of parameter settings can cause the tree to be too specialized on the training data ("overfitting") or to be too general. As always, creating good models is a process of trial and error and of optimization. Also here the Loop Parameters or Optimize Parameters operator will help you.
Hope this helps!
Happy Mining,
Marius