Classification algorithms

Prentice · May 2019

Hello,

So I've sort of finished my model. The last thing I now need to do is check which classification algorithm gives the best performances. I've made a selection of seven algorithms:

Naïve Bayes
SVM
k-NN
Neural Network
Logistic Regression
Decision Tree
Random Forest

I've tried them all on my model, but some don't work.

-So for the Logistic regression and SVM I understand that I need the operator Polynominal by Binominal Classification, this doesn't change any of my data right? It only changes to format to suit these algorithms?
-When I try naïve Bayes it suddenly gives me an error that my exampleset does not match my training set, but I don't get this error when I use k-NN, Decision Tree or Random Forest.
-Decision Tree gives me for all my examples exactly the same confidence which is very strange
-Lastly, Neural Network takes for some reason forever to load and I don't know why

Thanks for answering these questions
-Prentice

[Deleted User] · May 2019

@Prentice
How many row do you have? do you use split data? if you use it divide your data 0.7 for train and 0.3 for test.
If you see any error you can see a yellow triangle in the operator you can use that and it will help you.
mbs

varunm1 · May 2019

Hello @Prentice

I see that you have a filter set to Maximum < 0.7, but naive Bayes the maximum attributes has 1 in all the examples. As the filter example is not satisfied it is not giving any attributes. I think this is the reason for the issue during the apply model operator. When I tested K-NN and SVM they have examples with a maximum attribute <0.7 and are able to produce results. I tried changing this filter value to <= 1.0 and the naive Bayes gave results without any errors. So my understanding is that in naive Bayes it is not satisfying your filter value.

Image: https://us.v-cdn.net/6030995/uploads/editor/qi/yqwkiwmvor5t.png

varunm1 · May 2019

Hello @Prentice

I am not sure why the confidences are either 0 or 1. But I think this is not wrong. I tried testing on titanic data set and can see predictions with high levels of confidence. @IngoRM might suggest some thing.

Image: https://us.v-cdn.net/6030995/uploads/editor/ox/g4e2umrho3rz.png

Question 2: It is not an automatic functionality as the process is continuous and the operators expect appropriate inputs if not the process fails. I think if they make it automatic it will be an issue in case if we do some mistake in the process which we cannot identify without process failing. If you really want this condition Maximum <0.7 the naive Bayes and decision tree results doesn't satisfy this criterion which means you can use other algorithms or change filter. These is my understanding.

Thanks
Varun

[Deleted User] · May 2019

Prentice
Hi
For Logistic regression and SVM you can see the result of converting the data on your table.
.
For Decision Tree try to use information gain or gain ratio.
Good luck

Prentice · May 2019

Hello,

Thanks for your reply, but the cross validation does not work for the naïve bayes, I get the same error. when I set the decision tree to information gain I get this same error, while if I set it to gain ratio I get the same values everywhere.
I have put the word list from my training data into the wordlist of my testing data so it's not that

-Prentice

Prentice · May 2019

I have like 170 rows of training data and 20 rows of test data. I import my training and test data separately, so I don't need split data.
I see an error but it says that the input set needs to have one attribute with a label (which I have). That is also not the problem since it works fine with k-NN.

[Deleted User] · May 2019

Do you import your data from repository or use read excel?
do you have one data with two different names in repository? if you have delete one of them.
for the label if you use read excel I think the problem will solve because I had the same story.

Prentice · May 2019

I already use read excel.

I wish that I could add my process but it's way too big to put it here

[Deleted User] · May 2019

@Prentice
I think at first import your data to the RM and at first check it maybe some mistake happen
I hope it helps

varunm1 · May 2019

Hello @Prentice

Can you export the process and attach it here (File --> Export Process)?

Thank you

Prentice · May 2019

Here it is.

I think that I also need to explain it.

I got two things going on, the training set and the testing set which I apply to my training model.

Then when the confidence is below 0.7 for the predicted outcome of the testing set, it takes all these examples and it does another prediction, this time with a different testing set that has added information from the previous one. Then if these confidences are again under 0.7, it generates a second prediction.

The imported data is in this format:

Training data:

Text    Category
My bike has a flat tire and I cannot use it. The flat tire has been repaired. Flat tire
The bike's chain is worn. The cahin is replaced and regreased Worn chain

Testing data:

Text Text additional
My bike's tire is flat    My bike's tire is flat. The flat tire is repaired

I hope that this is enough information.

varunm1 · May 2019

Hello @Prentice

Thanks for sharing your process. Its looks fine but without data, I cannot get the error. Did you check these two articles below to see if any of these are suitable for your problem?

https://community.rapidminer.com/discussion/31723/text-mining-and-the-word-list/p1
https://community.rapidminer.com/discussion/26803/solved-rapidminer-sentiment-analysis-problem

I see that you are sharing word list correctly. I don't think these threads are useful here.

Prentice · May 2019

@varunm1, yes I've seen these articles before and as you said, I've linked them.
Unfortunately the data is confidential and I'm not allowed to share it but I can try to make up some examples and hope that I can reproduce the error.

But if you don't mind, I'll do that tomorrow. It's getting late (in my timezone)
Anyway, you'll hear from me

Thanks

Prentice · May 2019

@varunm1

I've managed to come up with data and reproduced the error.
If you change the naive bayes with k-nn,svm or random forest it works, but the same error also happens with decision tree and logistic regression.

Prentice · May 2019

Hi @varunm1

Aah, this explains a lot, thanks! Can't believe I couldn't figure that out myself haha. However, this brings me to two other questions.

1. Why does only Naïve Bayes and Decision Tree have a confidence of only 0 and 1 everywhere? This even applies for my real testing set which has a lot more examples. It can't be right that for everything the confidence would be 0 or 1, even if I put an example with multiple categories, it still only has 0s or 1s.

2. For cases where this does happen, what can I do to resolve this problem? Can I do something like if there are no examples for the filter don't run the next operators? I thought that this was already the case, because why would it execute if there are no examples to execute it on.

Prentice · May 2019

Hi @varunm1

1:Aha, I just thought it was pretty remarkable, but maybe that's just how they work. I can take this information in my analysis so that's a good thing.

2: Ok, well if it's not possible then I guess that this is not a big deal to change. It's only one operator anyways. Or when I use one of these two algorithms if just deactivate those operators in order to not get the error.

To round things up: Thanks a lot for your help with my problem, I appreciate it a lot. It's these times that I appreciate this community the most, just when you're stuck on something, ask it here and you'll get a valid response in no time!

Prentice

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Classification algorithms

Best Answers

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing

Answers

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing