Email classification models using Naive Bayesian, SVM and Neural Networks

ejr3gan · February 2013

Hello,

I am a student at the University of Gloucestershire and have decided to extend some of the email classification work that we did earlier this year for my dissertation. Please forgive me if my question is too vague or I do not provide enough information but I have read through the manual and scoured the forums but can not find an answer.

I am trying to compare the performance of the 3 classification models (mentioned above) when tasked with classifying SPAM and non-SPAM email. I have a corpus of emails that is already categorized into SPAM and non-SPAM (the corpus is in the form of text files and is used as an example in the book "Machine Learning for Hackers [O'Reilly, 2012]")

I have managed to make a start on my models but keep running into problems. I have not accomplished a great deal, basically I have go to the stage of Processing Documents from Files, creating a Vector which removes some of the unwanted data through stemming and tokenizing, then Wordlist to Data, then Write to Excel. That is where I get a bit stuck, I'm not sure how to complete the models or even if what I have done previously is correct.

I know it's a big ask but I would really appreciate it if somebody would be kind enough to take me through creating one of the models step-by-step (I assume that once I have completed one model, the other 2 should be very similar).

Thanks for your time.
Elliot

MariusHelf · February 2013

Hi Elliot,

did you already check out our video tutorials on our website? They explain quite well how to create and validate models in general, and there are videos specially tailored to text processing. If you combine the knowledge from both video series, you are almost there

If you have any specific problems, please let us know.

Best regards,
Marius

roohishahid · November 2013

have you found solution to this ??

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Email classification models using Naive Bayesian, SVM and Neural Networks

Answers