Classification algorithms
Hello,
So I've sort of finished my model. The last thing I now need to do is check which classification algorithm gives the best performances. I've made a selection of seven algorithms:
Naïve Bayes
SVM
kNN
Neural Network
Logistic Regression
Decision Tree
Random Forest
I've tried them all on my model, but some don't work.
So for the Logistic regression and SVM I understand that I need the operator Polynominal by Binominal Classification, this doesn't change any of my data right? It only changes to format to suit these algorithms?
When I try naïve Bayes it suddenly gives me an error that my exampleset does not match my training set, but I don't get this error when I use kNN, Decision Tree or Random Forest.
Decision Tree gives me for all my examples exactly the same confidence which is very strange
Lastly, Neural Network takes for some reason forever to load and I don't know why
Thanks for answering these questions
Prentice
So I've sort of finished my model. The last thing I now need to do is check which classification algorithm gives the best performances. I've made a selection of seven algorithms:
Naïve Bayes
SVM
kNN
Neural Network
Logistic Regression
Decision Tree
Random Forest
I've tried them all on my model, but some don't work.
So for the Logistic regression and SVM I understand that I need the operator Polynominal by Binominal Classification, this doesn't change any of my data right? It only changes to format to suit these algorithms?
When I try naïve Bayes it suddenly gives me an error that my exampleset does not match my training set, but I don't get this error when I use kNN, Decision Tree or Random Forest.
Decision Tree gives me for all my examples exactly the same confidence which is very strange
Lastly, Neural Network takes for some reason forever to load and I don't know why
Thanks for answering these questions
Prentice
0
Best Answers

[Deleted User] Posts: 0 Learner III@Prentice
How many row do you have? do you use split data? if you use it divide your data 0.7 for train and 0.3 for test.
If you see any error you can see a yellow triangle in the operator you can use that and it will help you.
mbs0 
varunm1 Moderator, Member Posts: 1,207 UnicornHello @Prentice
I see that you have a filter set to Maximum < 0.7, but naive Bayes the maximum attributes has 1 in all the examples. As the filter example is not satisfied it is not giving any attributes. I think this is the reason for the issue during the apply model operator. When I tested KNN and SVM they have examples with a maximum attribute <0.7 and are able to produce results. I tried changing this filter value to <= 1.0 and the naive Bayes gave results without any errors. So my understanding is that in naive Bayes it is not satisfying your filter value.
Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
1 
varunm1 Moderator, Member Posts: 1,207 UnicornHello @Prentice
I am not sure why the confidences are either 0 or 1. But I think this is not wrong. I tried testing on titanic data set and can see predictions with high levels of confidence. @IngoRM might suggest some thing.
Question 2: It is not an automatic functionality as the process is continuous and the operators expect appropriate inputs if not the process fails. I think if they make it automatic it will be an issue in case if we do some mistake in the process which we cannot identify without process failing. If you really want this condition Maximum <0.7 the naive Bayes and decision tree results doesn't satisfy this criterion which means you can use other algorithms or change filter. These is my understanding.
Thanks
VarunRegards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
1
Answers
Hi
For Logistic regression and SVM you can see the result of converting the data on your table.
.
For Decision Tree try to use information gain or gain ratio.
Good luck
Thanks for your reply, but the cross validation does not work for the naïve bayes, I get the same error. when I set the decision tree to information gain I get this same error, while if I set it to gain ratio I get the same values everywhere.
I have put the word list from my training data into the wordlist of my testing data so it's not that
Prentice
I see an error but it says that the input set needs to have one attribute with a label (which I have). That is also not the problem since it works fine with kNN.
do you have one data with two different names in repository? if you have delete one of them.
for the label if you use read excel I think the problem will solve because I had the same story.
I already use read excel.
I wish that I could add my process but it's way too big to put it here
I think at first import your data to the RM and at first check it maybe some mistake happen
I hope it helps
Can you export the process and attach it here (File > Export Process)?
Thank you
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
I think that I also need to explain it.
I got two things going on, the training set and the testing set which I apply to my training model.
Then when the confidence is below 0.7 for the predicted outcome of the testing set, it takes all these examples and it does another prediction, this time with a different testing set that has added information from the previous one. Then if these confidences are again under 0.7, it generates a second prediction.
The imported data is in this format:
Training data:
Text Category
My bike has a flat tire and I cannot use it. The flat tire has been repaired. Flat tire
The bike's chain is worn. The cahin is replaced and regreased Worn chain
Testing data:
Text Text additional
My bike's tire is flat My bike's tire is flat. The flat tire is repaired
I hope that this is enough information.
Thanks for sharing your process. Its looks fine but without data, I cannot get the error. Did you check these two articles below to see if any of these are suitable for your problem?
https://community.rapidminer.com/discussion/31723/textminingandthewordlist/p1
https://community.rapidminer.com/discussion/26803/solvedrapidminersentimentanalysisproblem
I see that you are sharing word list correctly. I don't think these threads are useful here.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Unfortunately the data is confidential and I'm not allowed to share it but I can try to make up some examples and hope that I can reproduce the error.
But if you don't mind, I'll do that tomorrow. It's getting late (in my timezone)
Anyway, you'll hear from me
Thanks
I've managed to come up with data and reproduced the error.
If you change the naive bayes with knn,svm or random forest it works, but the same error also happens with decision tree and logistic regression.
Aah, this explains a lot, thanks! Can't believe I couldn't figure that out myself haha. However, this brings me to two other questions.
1. Why does only Naïve Bayes and Decision Tree have a confidence of only 0 and 1 everywhere? This even applies for my real testing set which has a lot more examples. It can't be right that for everything the confidence would be 0 or 1, even if I put an example with multiple categories, it still only has 0s or 1s.
2. For cases where this does happen, what can I do to resolve this problem? Can I do something like if there are no examples for the filter don't run the next operators? I thought that this was already the case, because why would it execute if there are no examples to execute it on.
1:Aha, I just thought it was pretty remarkable, but maybe that's just how they work. I can take this information in my analysis so that's a good thing.
2: Ok, well if it's not possible then I guess that this is not a big deal to change. It's only one operator anyways. Or when I use one of these two algorithms if just deactivate those operators in order to not get the error.
To round things up: Thanks a lot for your help with my problem, I appreciate it a lot. It's these times that I appreciate this community the most, just when you're stuck on something, ask it here and you'll get a valid response in no time!
Prentice