The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Email classification models using Naive Bayesian, SVM and Neural Networks
Hello,
I am a student at the University of Gloucestershire and have decided to extend some of the email classification work that we did earlier this year for my dissertation. Please forgive me if my question is too vague or I do not provide enough information but I have read through the manual and scoured the forums but can not find an answer.
I am trying to compare the performance of the 3 classification models (mentioned above) when tasked with classifying SPAM and non-SPAM email. I have a corpus of emails that is already categorized into SPAM and non-SPAM (the corpus is in the form of text files and is used as an example in the book "Machine Learning for Hackers [O'Reilly, 2012]")
I have managed to make a start on my models but keep running into problems. I have not accomplished a great deal, basically I have go to the stage of Processing Documents from Files, creating a Vector which removes some of the unwanted data through stemming and tokenizing, then Wordlist to Data, then Write to Excel. That is where I get a bit stuck, I'm not sure how to complete the models or even if what I have done previously is correct.
I know it's a big ask but I would really appreciate it if somebody would be kind enough to take me through creating one of the models step-by-step (I assume that once I have completed one model, the other 2 should be very similar).
Thanks for your time.
Elliot
I am a student at the University of Gloucestershire and have decided to extend some of the email classification work that we did earlier this year for my dissertation. Please forgive me if my question is too vague or I do not provide enough information but I have read through the manual and scoured the forums but can not find an answer.
I am trying to compare the performance of the 3 classification models (mentioned above) when tasked with classifying SPAM and non-SPAM email. I have a corpus of emails that is already categorized into SPAM and non-SPAM (the corpus is in the form of text files and is used as an example in the book "Machine Learning for Hackers [O'Reilly, 2012]")
I have managed to make a start on my models but keep running into problems. I have not accomplished a great deal, basically I have go to the stage of Processing Documents from Files, creating a Vector which removes some of the unwanted data through stemming and tokenizing, then Wordlist to Data, then Write to Excel. That is where I get a bit stuck, I'm not sure how to complete the models or even if what I have done previously is correct.
I know it's a big ask but I would really appreciate it if somebody would be kind enough to take me through creating one of the models step-by-step (I assume that once I have completed one model, the other 2 should be very similar).
Thanks for your time.
Elliot
Tagged:
0
Answers
did you already check out our video tutorials on our website? They explain quite well how to create and validate models in general, and there are videos specially tailored to text processing. If you combine the knowledge from both video series, you are almost there
If you have any specific problems, please let us know.
Best regards,
Marius