🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉
Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.
Looking for a data set classifiable by humans and mining - possibly email/spam
I am looking for a collection of email messages to classify as spam or regular mail. The only data set I've found is the spambase set (http://archive.ics.uci.edu/ml/datasets/Spambase). Unfortunately that does not include the actual messages, but only attributes.
Finding spam mail should be easy. My spam folder has plenty. Finding email messages which could be made open publicly is more difficult. The only collection I've found is Sarah Palin's emails (http://www.crivellawest.net/palin2011/allList.html). However, it is unfortunate that they are all addressed to the same person and are only available in pdf format anyways.
Email is just the first sort of data set I came up with. If you have ideas for other kinds of data which could be both classified by humans and data mining methods, please let me know. It would be an advantage if the data set is tried and tested.