Content Based Spam Email Classification using Supervised SVM, Decision Trees and Naive Bayes

Conference: ICMLCA 2021 - 2nd International Conference on Machine Learning and Computer Application
12/17/2021 - 12/19/2021 at Shenyang, China

Proceedings: ICMLCA 2021

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Cui, Jiaqi; Li, Xiaoxi (College of Computer Science Sichuan University Chengdu, China)

Abstract:
The prevalence of spam emails has generated an urgent need for anti-spam filters. Many algorithms have been proposed to classify spam emails. In this paper, we create a supervised classification pipeline to classify emails as spam or legitimate and evaluate three machine learning algorithms (SVMs, decision trees, Naive Bayes) for spam email classification. One of the main steps in spam email classification is feature selection. We implement Term Frequency Inverse Document Frequency (TF-IDF) and choose the top-20 most frequently used words in spam and legitimate emails. We conduct experiments on SVMs, decision trees, Naive Bayes with selected features and evaluate their capability and performance in spam email detection and classification.