Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Email classification models using Naive Bayesian, SVM and Neural Networks
Hello,
I am a student at the University of Gloucestershire and have decided to extend some of the email classification work that we did earlier this year for my dissertation. Please forgive me if my question is too vague or I do not provide enough information but I have read through the manual and scoured the forums but can not find an answer.
I am trying to compare the performance of the 3 classification models (mentioned above) when tasked with classifying SPAM and non-SPAM email. I have a corpus of emails that is already categorized into SPAM and non-SPAM (the corpus is in the form of text files and is used as an example in the book "Machine Learning for Hackers [O'Reilly, 2012]")
I have managed to make a start on my models but keep running into problems. I have not accomplished a great deal, basically I have go to the stage of Processing Documents from Files, creating a Vector which removes some of the unwanted data through stemming and tokenizing, then Wordlist to Data, then Write to Excel. That is where I get a bit stuck, I'm not sure how to complete the models or even if what I have done previously is correct.
I know it's a big ask but I would really appreciate it if somebody would be kind enough to take me through creating one of the models step-by-step (I assume that once I have completed one model, the other 2 should be very similar).
Thanks for your time.
Elliot
I am a student at the University of Gloucestershire and have decided to extend some of the email classification work that we did earlier this year for my dissertation. Please forgive me if my question is too vague or I do not provide enough information but I have read through the manual and scoured the forums but can not find an answer.
I am trying to compare the performance of the 3 classification models (mentioned above) when tasked with classifying SPAM and non-SPAM email. I have a corpus of emails that is already categorized into SPAM and non-SPAM (the corpus is in the form of text files and is used as an example in the book "Machine Learning for Hackers [O'Reilly, 2012]")
I have managed to make a start on my models but keep running into problems. I have not accomplished a great deal, basically I have go to the stage of Processing Documents from Files, creating a Vector which removes some of the unwanted data through stemming and tokenizing, then Wordlist to Data, then Write to Excel. That is where I get a bit stuck, I'm not sure how to complete the models or even if what I have done previously is correct.
I know it's a big ask but I would really appreciate it if somebody would be kind enough to take me through creating one of the models step-by-step (I assume that once I have completed one model, the other 2 should be very similar).
Thanks for your time.
Elliot
Tagged:
0
Answers
did you already check out our video tutorials on our website? They explain quite well how to create and validate models in general, and there are videos specially tailored to text processing. If you combine the knowledge from both video series, you are almost there
If you have any specific problems, please let us know.
Best regards,
Marius