Content-Based Spam Classification of Academic E-mails: A Machine Learning Approach
| dc.contributor.author | Iddrisu W.A. | |
| dc.contributor.author | Adjei-Gyabaa S.K. | |
| dc.contributor.author | Akoto I. | |
| dc.contributor.editor | Goar V.; Kuri M.; Kumar R.; Senjyu T. | |
| dc.date.accessioned | 2025-03-06T18:11:43Z | |
| dc.date.accessioned | 2025-03-06T18:58:58Z | |
| dc.date.issued | 2023 | |
| dc.description.abstract | In the academic environment, players have overstretched University faculty with less available time. The task of reading and deleting electronic mail (e-mail) spam tends to consume or steal the little available time they have at their disposal. Due to the spam issue, automated processes or methods for separating spam from valid emails are becoming important. Due to the unstructured nature of the material, additional features, and a vast number of documents, the process of automatically classifying spam email presents significant difficulties. Increasing usage of the e-mail spam directly affects the performance of these spam classifications with regards to the quality and speed based on the challenges stated above. Most of the recent algorithms consider only relevant features or characteristics for the classification of the e-mails as spam or legitimate. The main objective of this work was to use a machine-learning algorithm to detect and categorize e-mail messages of university faculty into spam and non-spam with the identification of the most occurring attributes that are contained in the messages. These attributes helped in the generation of a classification model based on the Random Forest algorithm. Exploratory analysis of the academic e-mails revealed that the most occurring attributes that are easily associated with spam messages included �your�, �open access� and �remove�. Five hundred decision trees were generated in the Random Forest Model, which had an excellent classification accuracy of 0.942 (94.2%). The performance of the model was compared with similar models based on the random forest algorithm. The comparative analysis revealed that classification accuracy differs depending on the type of e-mails used. � 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. | |
| dc.identifier.doi | 10.1007/978-981-19-9888-1_7 | |
| dc.identifier.isbn | 978-981199887-4 | |
| dc.identifier.issn | 23673370 | |
| dc.identifier.uri | http://162.250.124.58:4000/handle/123456789/389 | |
| dc.language.iso | en | |
| dc.publisher | Springer Science and Business Media Deutschland GmbH | |
| dc.source | Lecture Notes in Networks and Systems | |
| dc.subject | Academic e-mails | |
| dc.subject | Classification | |
| dc.subject | Machine learning | |
| dc.subject | Random forest | |
| dc.subject | Spam | |
| dc.title | Content-Based Spam Classification of Academic E-mails: A Machine Learning Approach | |
| dc.type | Other | |
| oaire.citation.conferenceDate | 17 December 2022 through 18 December 2022 | |
| oaire.citation.conferencePlace | Bikaner |
