Classification algorithms

PrenticePrentice Member Posts: 66 Maven
Hello,

So I've sort of finished my model. The last thing I now need to do is check which classification algorithm gives the best performances. I've made a selection of seven algorithms:

Naïve Bayes
SVM
k-NN
Neural Network
Logistic Regression
Decision Tree
Random Forest

I've tried them all on my model, but some don't work.

-So for the Logistic regression and SVM I understand that I need the operator Polynominal by Binominal Classification, this doesn't change any of my data right? It only changes to format to suit these algorithms?
-When I try naïve Bayes it suddenly gives me an error that my exampleset does not match my training set, but I don't get this error when I use k-NN, Decision Tree or Random Forest.
-Decision Tree gives me for all my examples exactly the same confidence which is very strange
-Lastly, Neural Network takes for some reason forever to load and I don't know why

Thanks for answering these questions
-Prentice

Best Answers

  • [Deleted User][Deleted User] Posts: 0 Learner III
    edited May 2019 Solution Accepted
    @Prentice
    How many row do you have? do you use split data? if you use it divide your data 0.7 for train and 0.3 for test.
    If you see any error you can see a yellow triangle in the operator you can use that and it will help you.
    mbs
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited May 2019 Solution Accepted
    Hello @Prentice

    I see that you have a filter set to Maximum < 0.7, but naive Bayes the maximum attributes has 1 in all the examples. As the filter example is not satisfied it is not giving any attributes. I think this is the reason for the issue during the apply model operator. When I tested K-NN and SVM they have examples with a maximum attribute <0.7 and are able to produce results. I tried changing this filter value to <= 1.0 and the naive Bayes gave results without any errors. So my understanding is that in naive Bayes it is not satisfying your filter value.


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited May 2019 Solution Accepted
    Hello @Prentice

    I am not sure why the confidences are either 0 or 1. But I think this is not wrong. I tried testing on titanic data set and can see predictions with high levels of confidence. @IngoRM might suggest some thing.



    Question 2: It is not an automatic functionality as the process is continuous and the operators expect appropriate inputs if not the process fails. I think if they make it automatic it will be an issue in case if we do some mistake in the process which we cannot identify without process failing. If you really want this condition Maximum <0.7 the naive Bayes and decision tree results doesn't satisfy this criterion which means you can use other algorithms or change filter.  These is my understanding.

    Thanks
    Varun
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Answers

  • [Deleted User][Deleted User] Posts: 0 Learner III
    edited July 2019
    Prentice
    Hi
    For Logistic regression and SVM you can see the result of converting the data on your table.
    .
    For Decision Tree try to use information gain or gain ratio.
    Good luck :)


  • PrenticePrentice Member Posts: 66 Maven
    Hello,

    Thanks for your reply, but the cross validation does not work for the naïve bayes, I get the same error. when I set the decision tree to information gain I get this same error, while if I set it to gain ratio I get the same values everywhere.
    I have put the word list from my training data into the wordlist of my testing data so it's not that

    -Prentice
  • PrenticePrentice Member Posts: 66 Maven
    I have like 170 rows of training data and 20 rows of test data. I import my training and test data separately, so I don't need split data.
    I see an error but it says that the input set needs to have one attribute with a label (which I have). That is also not the problem since it works fine with k-NN.
  • [Deleted User][Deleted User] Posts: 0 Learner III
    Do you import your data from repository or use read excel?
    do you have one data with two different names in repository? if you have delete one of them.
    for the label if you use read excel I think the problem will solve because I had the same story.

  • PrenticePrentice Member Posts: 66 Maven

    I already use read excel. 

    I wish that I could add my process but it's way too big to put it here

  • [Deleted User][Deleted User] Posts: 0 Learner III
    @Prentice
    I think at first import your data to the RM and at first check it maybe some mistake happen
    I hope it helps
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @Prentice

    Can you export the process and attach it here (File --> Export Process)?

    Thank you
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • PrenticePrentice Member Posts: 66 Maven
    Here it is. 

    I think that I also need to explain it.

    I got two things going on, the training set and the testing set which I apply to my training model.

    Then when the confidence is below 0.7 for the predicted outcome of the testing set, it takes all these examples and it does another prediction, this time with a different testing set that has added information from the previous one. Then if these confidences are again under 0.7, it generates a second prediction. 

    The imported data is in this format:

    Training data:

    Text                                                                                                                          Category               
    My bike has a flat tire and I cannot use it. The flat tire has been repaired.                 Flat tire               
    The bike's chain is worn. The cahin is replaced and regreased                                   Worn chain        


    Testing data:

    Text                                                       Text additional
    My bike's tire is flat                                My bike's tire is flat. The flat tire is repaired


    I hope that this is enough information. 




  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited May 2019
    Hello @Prentice

    Thanks for sharing your process. Its looks fine but without data, I cannot get the error. Did you check these two articles below to see if any of these are suitable for your problem? 

    https://community.rapidminer.com/discussion/31723/text-mining-and-the-word-list/p1
    https://community.rapidminer.com/discussion/26803/solved-rapidminer-sentiment-analysis-problem

    I see that you are sharing word list correctly. I don't think these threads are useful here.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • PrenticePrentice Member Posts: 66 Maven
    @varunm1, yes I've seen these articles before and as you said, I've linked them.
    Unfortunately the data is confidential and I'm not allowed to share it but I can try to make up some examples and hope that I can reproduce the error. 

    But if you don't mind, I'll do that tomorrow. It's getting late (in my timezone) 
    Anyway, you'll hear from me

    Thanks
  • PrenticePrentice Member Posts: 66 Maven
    @varunm1

    I've managed to come up with data and reproduced the error. 
    If you change the naive bayes with k-nn,svm or random forest it works, but the same error also happens with decision tree and logistic regression.
  • PrenticePrentice Member Posts: 66 Maven
    Hi @varunm1

    Aah, this explains a lot, thanks! Can't believe I couldn't figure that out myself haha. However, this brings me to two other questions.

    1. Why does only Naïve Bayes and Decision Tree have a confidence of only 0 and 1 everywhere? This even applies for my real testing set which has a lot more examples. It can't be right that for everything the confidence would be 0 or 1, even if I put an example with multiple categories, it still only has 0s or 1s. 

    2. For cases where this does happen, what can I do to resolve this problem? Can I do something like if there are no examples for the filter don't run the next operators? I thought that this was already the case, because why would it execute if there are no examples to execute it on.

  • PrenticePrentice Member Posts: 66 Maven
    Hi @varunm1

    1:Aha, I just thought it was pretty remarkable, but maybe that's just how they work. I can take this information in my analysis so that's a good thing.

    2: Ok, well if it's not possible then I guess that this is not a big deal to change. It's only one operator anyways. Or when I use one of these two algorithms if just deactivate those operators in order to not get the error.

    To round things up: Thanks a lot for your help with my problem, I appreciate it a lot. It's these times that I appreciate this community the most, just when you're stuck on something, ask it here and you'll get a valid response in no time!


    Prentice
Sign In or Register to comment.