Content-Based Spam Classification of Academic E-mails: A Machine Learning Approach

Iddrisu W.A.; Adjei-Gyabaa S.K.; Akoto I.

Content-Based Spam Classification of Academic E-mails: A Machine Learning Approach

dc.contributor.author	Iddrisu W.A.
dc.contributor.author	Adjei-Gyabaa S.K.
dc.contributor.author	Akoto I.
dc.contributor.editor	Goar V.; Kuri M.; Kumar R.; Senjyu T.
dc.date.accessioned	2025-03-06T18:11:43Z
dc.date.accessioned	2025-03-06T18:58:58Z
dc.date.issued	2023
dc.description.abstract	In the academic environment, players have overstretched University faculty with less available time. The task of reading and deleting electronic mail (e-mail) spam tends to consume or steal the little available time they have at their disposal. Due to the spam issue, automated processes or methods for separating spam from valid emails are becoming important. Due to the unstructured nature of the material, additional features, and a vast number of documents, the process of automatically classifying spam email presents significant difficulties. Increasing usage of the e-mail spam directly affects the performance of these spam classifications with regards to the quality and speed based on the challenges stated above. Most of the recent algorithms consider only relevant features or characteristics for the classification of the e-mails as spam or legitimate. The main objective of this work was to use a machine-learning algorithm to detect and categorize e-mail messages of university faculty into spam and non-spam with the identification of the most occurring attributes that are contained in the messages. These attributes helped in the generation of a classification model based on the Random Forest algorithm. Exploratory analysis of the academic e-mails revealed that the most occurring attributes that are easily associated with spam messages included �your�, �open access� and �remove�. Five hundred decision trees were generated in the Random Forest Model, which had an excellent classification accuracy of 0.942 (94.2%). The performance of the model was compared with similar models based on the random forest algorithm. The comparative analysis revealed that classification accuracy differs depending on the type of e-mails used. � 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
dc.identifier.doi	10.1007/978-981-19-9888-1_7
dc.identifier.isbn	978-981199887-4
dc.identifier.issn	23673370
dc.identifier.uri	http://162.250.124.58:4000/handle/123456789/389
dc.language.iso	en
dc.publisher	Springer Science and Business Media Deutschland GmbH
dc.source	Lecture Notes in Networks and Systems
dc.subject	Academic e-mails
dc.subject	Classification
dc.subject	Machine learning
dc.subject	Random forest
dc.subject	Spam
dc.title	Content-Based Spam Classification of Academic E-mails: A Machine Learning Approach
dc.type	Other
oaire.citation.conferenceDate	17 December 2022 through 18 December 2022
oaire.citation.conferencePlace	Bikaner

Collections

Conference Papers

Content-Based Spam Classification of Academic E-mails: A Machine Learning Approach

Files

Collections