The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Looking for a data set classifiable by humans and mining - possibly email/spam
Hi
I am looking for a collection of email messages to classify as spam or regular mail. The only data set I've found is the spambase set (http://archive.ics.uci.edu/ml/datasets/Spambase). Unfortunately that does not include the actual messages, but only attributes.
Finding spam mail should be easy. My spam folder has plenty. Finding email messages which could be made open publicly is more difficult. The only collection I've found is Sarah Palin's emails (http://www.crivellawest.net/palin2011/allList.html). However, it is unfortunate that they are all addressed to the same person and are only available in pdf format anyways.
Email is just the first sort of data set I came up with. If you have ideas for other kinds of data which could be both classified by humans and data mining methods, please let me know. It would be an advantage if the data set is tried and tested.
Best regards,
Steinar
I am looking for a collection of email messages to classify as spam or regular mail. The only data set I've found is the spambase set (http://archive.ics.uci.edu/ml/datasets/Spambase). Unfortunately that does not include the actual messages, but only attributes.
Finding spam mail should be easy. My spam folder has plenty. Finding email messages which could be made open publicly is more difficult. The only collection I've found is Sarah Palin's emails (http://www.crivellawest.net/palin2011/allList.html). However, it is unfortunate that they are all addressed to the same person and are only available in pdf format anyways.
Email is just the first sort of data set I came up with. If you have ideas for other kinds of data which could be both classified by humans and data mining methods, please let me know. It would be an advantage if the data set is tried and tested.
Best regards,
Steinar
0