Received 18 August 2022, accepted 29 August 2022, date of publication 5 September 2022, date of current version 22 September 2022. Digital Object Identifier 10.1109/ACCESS.2022.3204593 Feature Selection by Multiobjective Optimization: Application to Spam Detection System by Neural Networks and Grasshopper Optimization Algorithm SANAA A. A. GHALEB 1,2,3,4, MUMTAZIMAH MOHAMAD4, WAHEED ALI H. M. GHANEM 1,2,3,5, ABDULLAH B. NASSER6, (Member, IEEE), MOHAMED GHETAS7, AKIBU MAHMOUD ABDULLAHI8, SAMI ABDULLA MOHSEN SALEH9, HUMAIRA ARSHAD10, ABIODUN ESTHER OMOLARA11, AND OLUDARE ISAAC ABIODUN11 1Faculty of Engineering, University of Aden, Aden, Yemen 2Faculty of Education Aden, University of Aden, Aden, Yemen 3Faculty of Education Saber, University of Lahej, Lahej, Yemen 4Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin, Terengganu 21300, Malaysia 5Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, Terengganu 32030, Malaysia 6School of Technology and Innovation, University of Vaasa, 65200 Vaasa, Finland 7Faculty of Computer Science, Nahda University, Beni Suef Governorate 62764, Egypt 8Faculty of Computing and Informatics, Albukhary International University, Kedah 05200, Malaysia 9School of Electrical and Electronic Engineering, Universiti Sains Malaysia, Pulau Pinang 14300, Malaysia 10Department of Computer Science, Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan 11Department of Computer Science, University of Abuja, Gwagwalada 900110, Nigeria Corresponding authors: Sanaa A. A. Ghaleb (sanaaghaleb.sg@gmail.com) and Waheed Ali H. M. Ghanem (waheed.ghanem@gmail.com) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ABSTRACT Networks are strained by spam, which also overloads email servers and blocks mailboxes with unwanted messages and files. Setting the protective level for spam filtering might become even more crucial for email users when malicious steps are taken since they must deal with an increase in the number of valid communications being marked as spam. By finding patterns in email communications, spam detection systems (SDS) have been developed to keep track of spammers and filter email activity. SDS has also enhanced the tool for detecting spam by reducing the rate of false positives and increasing the accuracy of detection. The difficulty with spam classifiers is the abundance of features. The importance of feature selection (FS) comes from its role in directing the feature selection algorithm’s search for ways to improve the SDS’s classification performance and accuracy. As ameans of enhancing the performance of the SDS, we use a wrapper technique in this study that is based on the multi-objective grasshopper optimization algorithm (MOGOA) for feature extraction and the recently revised EGOA algorithm for multilayer perceptron (MLP) training. The suggested system’s performance was verified using the SpamBase, SpamAssassin, and UK-2011 datasets. Our research showed that our novel approach outperformed a variety of established practices in the literature by as much as 97.5%, 98.3%, and 96.4% respectively. 15 16 INDEX TERMS Spam detection system (SDS), grasshopper optimization algorithm (GOA), feature selec- tion (FS), multi-objective optimization (MOO), multilayer perceptron (MLP). The associate editor coordinating the review of this manuscript and approving it for publication was Bilal Alatas . I. INTRODUCTION 17 Spam is electronic mail that is not requested yet is sent to 18 large numbers of people. Advertising is usually a known 19 type of spam and is the most common way to send it. But 20 VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 98475 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System it is incorrect to think that such a strategy is used only21 in commercial environments. The number of spam e-mails22 has grown over recent years, whereby a recipient regularly23 receives heaps of emails on a daily basis, of which 92% are24 spam [1]. Currently, the battle between spam detection tools25 and spammers is an ongoing battle as each side seeks new26 ways to neutralize the other’s presence [2].27 Ensuring the integrity and privacy of those statistics is turn-28 ing into an actual challenge. The Simple Mail Transfer Pro-29 tocol (SMTP) has the ability to transmit and receive emails30 over the internet, but it does not have security measures31 built into it. To quote the SMTP [3], the designers of these32 protocols are aware of the security challenges of SMTP. The33 SMTP protocol does not include data integrity, encryption,34 or authentication services [4].35 Spam to a private email can cause havoc throughout the36 system. Nowadays, it has created many problems in business37 life, such as occupying network bandwidth and the space in38 users’ mailboxes. Research has been conducted in this area39 to resolve this issue and spam detection systems (SDS) have40 been developed to monitor spammers and filter email activi-41 ties by identifying patterns in email messages, thus improving42 the tool to detect spam [5], [6].43 Both the knowledge filtering and the guideline filter-44 ing strategies are used to detect spam. Both have advan-45 tages and disadvantages, but neither is effective against46 all threats [7], [8]. The guideline detection method works47 well for identifying recognised communications but not48 spam [8]. In comparison, the knowledge detection strat-49 egy is effective at finding new messages, but it has a low50 detection rate and a high percentage of false positives [9].51 As such, our study introduces a new method. Most inves-52 tigations into spam detection in the literature have focused53 on the knowledge detection strategy since it seemed more54 promising.55 Recently, several methods, including machine learning,56 statistical analysis, and artificial intelligence techniques, have57 emerged in the field of knowledge detection [8], [9], [10],58 [11], [12]. Unsupervised, semi-supervised, and supervised59 machine learning techniques are the three types used, and in60 general, supervised learning performs better than the other61 techniques. Several Machine Learning Algorithms (MLA)62 can be employed for knowledge identification, including63 Naive Bayes (NB), Artificial Neural Networks (ANN),64 Support Vector Machines (SVM), and k-Nearest Neighbor65 (KNN) [10], [11], [12], [13].66 The majority of categorization information is highly67 dimensional, and for effectiveness and accuracy, natural68 dimensionality reduction is also required. As a result, the69 main disadvantage of content classification is its high dimen-70 sionality. The features with area addresses will act in con-71 junction with high dimensionality or an excessive array of72 options (a large assortment of vocabulary that consists of all73 the special terms that occur a minimum of once or over once74 within the collection of emails). Due to the performance of75 the majority of content classifiers, this drawback worsens the76 system as a whole. Additionally, it will make the systemmore 77 complex overall. 78 Dimensionality reduction is crucially required to handle 79 and combat high spatiality problems as well as mitigate their 80 effects. This work is centred on the dimensionality of unso- 81 licited mail email classifiers. 82 Thus, the feature selection mechanism may be a curse for 83 the dimensionality of the selection of appropriate features 84 and its classification. However, many features may be low- 85 ered, and the training time may increase with the elimina- 86 tion and reduction of redundant features, therby improving 87 the classification’s performance. This analysis discussed the 88 several drawbacks of the well-known methods used in earlier 89 feature selection studies. The two types of feature selection 90 algorithms are filters and wrappers. Gain ratio, information 91 gain, chi-squared, and correlation-based feature selection 92 are a few examples of statistical, information theory-based, 93 or searching methods that can be used to apply filters [13], 94 [14]. Wrappers evaluate and categorise capabilities using a 95 machine learning technique to determine the subset that, for 96 the most part, makes up the dataset. 97 They have been built entirely on the following components: 98 a learning algorithm of a set of rules thatmay be any classifier, 99 and a feature search, sequential search, genetic search, etc. 100 The wrapper technique often requires less processing than the 101 clear-out strategy, but the latter yields the best results [14]. 102 Some researchers have classified the proposals which 103 are based on artificial intelligence optimization algorithms 104 into the following categories: biology-based, social-based, 105 chemical-based, physics-based, mathematics-based, music- 106 based, sports-based, swarm-based, plant-based, light-based, 107 and water-based [15], [16], [17]. 108 Based on this categorization, our proposal is based on 109 swarms the contributions of this work are summarized as 110 follows: 111 1. The proposed MOBGOA as a wrapper-based feature 112 selection to determine features from the emails in the first 113 stage. 114 2. Adapted the EGOAMLPs for the training of supervised 115 Multi-Layer Perceptrons (MLPs) in a second stage. 116 3. The final SDS approach (MOBEGOAMLP) was tested 117 by three spam datasets (SpamBase, SpamAssassin, and 118 UK-2011 Webspam) on ten statistics. 119 Section II provides an overview of this study. Section III 120 presents related research. Section IV discusses the method- 121 ology. Section V discusses the performance evaluation. The 122 assessment of contributions is depicted in Section VI, along- 123 side results and discussions. The conclusion is in Section VII. 124 II. BACKGROUND 125 A. GRASSHOPPER OPTIMIZATION ALGORITHM (GOA) 126 The GOA was inspired by the behaviour of grasshopper 127 insects and is one of the metaheuristic algorithms that [18] 128 presented in 2017. The grasshopper swarms go through two 129 stages in their life cycle: nymphs and adults. The nymph 130 98476 VOLUME 10, 2022 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System grasshopper travels slowly over a short distance, which lets131 them take advantage of their habitat and consume all the veg-132 etation in their way. The adult grasshopper, on the other hand,133 has two primary responsibilities: locating food andmigrating.134 It has a greater region to explore because it can jump quite135 high and travel a longway to obtain food.We can infer that the136 grasshopper’s two movements, slow movement over a small137 distance and abrupt movement over a wide distance, are both138 indicative of exploitation and exploration. The grasshoppers139 prefer to move locally during the exploitation stage, whereas140 during exploration they prefer to wander over long distances141 in search of food. The accomplishment of these two tasks,142 as well as locating a food source, is a natural process for143 grasshoppers. The mathematical model presented in [19],144 which is replicated here, describes the grasshopper swarming145 behaviour as follows:146 Xi = Si + Gi + Ai (1)147 where Xi stands for the ith grasshopper’s location, Si for the148 social interaction,Gi for gravity acting on the ith grasshopper,149 and Ai for the wind advection. Eq. (1) can be expanded to150 include Si, Gi, and Ai, and then rewritten as follows:151 Xi = ∑N j=1,j6=i s (∣∣xj − xi∣∣)xj − xidij − geˆg + eˆw (2)152 where N is the number of grasshoppers and s (r) = fer/l −153 e−r is a function that simulates the effects of social inter-154 actions. geˆg where g is gravitational force and eˆg is a unit155 vector pointing toward the center of the earth, is the enlarged156 Gi component. The extended Ai component is represented as157 ueˆw, where u is a constant drift and eˆw is a unit vector heading158 toward the wind. Where dij equals ∣∣xi − xj∣∣ and denotes the159 separation between the ith and jth grasshoppers. The effects160 of wind and gravity are much smaller than the relationships161 between grasshoppers since they discover comfortable zones162 rapidly and have poor convergence, hence this mathematical163 model should be changed as follows:164 Xdi = c (∑N j=1,j6=i c ubd − lbd 2 s (∣∣∣xdj − xdi ∣∣∣ xj − xidij )) + Tˆd165 (3)166 In Eq. (3) the parameter stands in for the upper and lower167 ubd and lbd bounds in the Dth dimension, respectively, and168 Tˆd parameter denotes the best solution value in the Dth169 dimension at the time. As a result, the parameter c must be170 reduced in accordance with the quantity of iterations. The171 more iterations there are, the more exploitation is encouraged172 by this system. The calculation for the argument c, which173 shrinks the comfort zone according to iterations, is as follows:174 c = cmax−Iter cmax−cminitermax (4)175 In Eq. (4), the parameters cmin and cmax stand for the176 maximum and minimum values, respectively. Iter stands for177 the most recent iteration and itermax denotes the maximum178 number of iterations.179 FIGURE 1. Classification of moo algorithms, highlighting the methods used in this research. B. JUSTIFYING THE GOA ALGORITHM 180 The inherent benefit of GOA is that it enhances convergence 181 quality by merging single-based and population-based meth- 182 ods. The following are some additional advantages of the 183 GOA that encourage scholars to use it to address classifica- 184 tion issues [19]: 185  During their initial search, grasshoppers can make a 186 number of abrupt large step hops and can automatically 187 seek into areas where potentially superior solutions have 188 already been discovered. 189  The automatic transition from exploratory movement 190 to local focused exploitation is used to carry out this 191 search. As a result, the GOA converges quickly in the 192 initial phases of the iteration process. 193  The GOA updates the position by taking into account 194 not only the current position of the grasshopper and the 195 position of the target, but also the positions of every other 196 grasshopper. 197 Themajority ofmetaheuristic algorithms use pre-tuned 198 preset parameters. The GOA, in contrast, used param- 199 eter control, which involved varying the values of the 200 parameters (C2 andC1) throughout each cycle. This aids 201 in automatically switching the GOA from exploration 202 to exploitation when searching is the optimal course of 203 action. 204 C. MULTI-OBJECTIVE OPTIMIZATION (MOO) 205 The MOO is important as it helps make the best decision 206 possible, especially when there are trade-offs between at least 207 two different objective functions. It may involve increasing 208 or decreasing several changing objective functions [20]. The 209 equation for an n-objective minimization challenge’s equa- 210 tion is as follows: 211 Minimise : F (x) = [f 1 (x) , f 2 (x) , f 3 (x) , . . . , fn (x)] 212 (5) 213 Subject to : gi (x) ≤ 0, i 214 = 1, 2, 3, . . .m, hi (x)= 0, i = 1, 2, 3, . . . l (6) 215 The total number of objective functions that must be low- 216 ered, where x is a selection vector, is n. The model in Eq. (6) 217 transforms into a single-objective issue when n is equal to 1, 218 and the perfect solution minimises the objective. On the other 219 hand, when n > 1, fi (x) denotes the objective function, 220 whereas gi(x) and hi(x) denote the utility functions of the 221 issue being maximised or minimised. 222 VOLUME 10, 2022 98477 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System In MOO, a solution’s nature is indicated by the trade-223 off between its n different aims. The optimum solutions224 to the MOO problems are all non-dominated arrangements225 if the following criteria are satisfied: x is dominant over y.226 The Pareto set/front refers to these solutions [20]:227 ∀i: fi (x) ≤ fi (y) and ∃j:fi (x) < fi (y) (7)228 MOOs are used to gather a collection of trade-offs, draw-229 backs, or non-dominant options. The Pareto-optimal solution230 is one that does not outperform any other solution in a given231 situation.232 The Pareto front, a trade-off surface, is defined by all233 solutions [21]. Scalar methods, criterion-based methodolo-234 gies, dominance-based methodologies, and indicator-based235 approaches are the four main groups of MOOmetaheuristics.236 More information is shown in Figure 1 which presents further237 details [22]. This graphic also shows the MOO approach that238 we propose.239 1) SCALAR APPROACHES240 This group of MOO metaheuristics includes approaches that241 change a MOO problem into a single objective or a collec-242 tion of similar problems. The strategy shown in Section II is243 modified to become a scalar methodology. The methodology244 includes the accumulation strategy, weighted measurements,245 goal programming, achievement capacities, goal achieve-246 ment, and -constraint techniques. Scalarization techniques247 are used to construct Pareto ideal layouts, which is the justi-248 fication. The scalar method is an a priori technique; it calls249 for the communication of sufficient inclination data before250 the solution procedure. Commonly used examples of priori251 methods are the utility capacity strategy, goal programming,252 and lexicographic technique.253 2) AGGREGATION METHOD254 The aggregation (or weighted aggregation) approach is one of255 the most important and frequently used methods for produc-256 ing Pareto optimal solutions. In this approach, an aggregation257 function is used to connect numerous objective functions f_i258 linearly into a single objective function f, converting a MOO259 problem to a single-objective problem.260 f (x) = ∑n i=1 ωifi (x) (8)261 where the weights ωi ∈ [0. . . 1] and∑ni=1 ωi = 1. The trade-262 off in FS for SDS comprises theminimisation of classification263 error rate, the reduction of false alarms, and the quantity of264 features. FS approaches for SDSs are thus given as a three265 objective minimization problem. Several approaches are used266 to optimise the FS process. The evaluation of this particular267 technique in this research, however, was motivated by the fact268 that themulti-objective binaryGOA algorithm for FS in SDSs269 has not been studied recently [22].270 III. RELATED WORK271 There are other related publications in the literature that272 address different detection strategies. However, knowledge273 detection remains to be the most popular strategy. Of course, 274 because of its effectiveness in detecting new messages, it is 275 expensive to concentrate on knowledge detection. The hybrid 276 detection strategy has made some progress in recent years 277 [23], [24], but it is still a long way from the knowledge 278 approach and the guideline approach. 279 Additionally, other SDSs based on knowledge detection 280 have been developed [25] employing a variety of techniques 281 and classifiers. Despite the rise in usage of hybrid classi- 282 fiers and ensemble classifiers [26], single classifiers are still 283 used and can produce high-quality results. The UK-2011 284 Webspam dataset, SpamBase, and SpamAssassin are still the 285 three most commonly used datasets for SDS performance 286 evaluation in the literature. Ten measures are the evaluation 287 criteria most papers use to evaluate the performance of their 288 approaches [8]. 289 Instead of using the complete feature space, many spam 290 detection-related research uses a feature selection procedure 291 to choose the best subset of features to represent the whole 292 dataset [27]. The size of the dataset utilised and the classifica- 293 tion performance of various algorithms can both be impacted 294 by reducing the feature space [25]. This can be accomplished 295 using a variety of techniques. Even though it takes more time 296 and computer resources, the wrapper approach for feature 297 selection performs better than the alternatives [28]. Intrigu- 298 ingly, email spam detection has recently seen a large increase 299 in the adoption of natural inspired methodologies. 300 With the help of Bayesian theory as a fitness function and 301 checked with a different number of repeats, a new hybrid 302 SDS is proposed in [9] that uses a GA algorithm to select 303 features without a fixed number of feature selections. From 304 the 57 features in the SpamBase dataset, they only obtained 305 38. The examples are finally classified using a Naive Bayes 306 classifier technique on the smaller data set. However, a large 307 number of characteristics contributed to a high-dimensional 308 space. 309 Using particle swarm optimization (PSO) and correlation- 310 based feature selection (CFS), [11] proposed a novel hybrid 311 SDS. The CFS-PSO leads the method to create a logical 312 model with enhanced performance. From the 24 features in 313 the UK 2006 dataset, they only managed to extract 6 features. 314 The instances are finally classified using an MLP and NB 315 classifier technique in the smaller data set, which results in 316 classification AUCs of 16.13% and 8.23%. 317 In [27] proposed a new hybrid SDS that incorporates the 318 Water Cycle and Simulated Annealing (WCSA). The WCSA 319 is used to remove redundant and unnecessary features that 320 could obstruct performance. The instances are finally classi- 321 fied using an SVM classifier technique in the smaller data set. 322 From the 57 features in the SpamBase dataset, they extracted 323 26 features. 324 The case is finally classified using a KNN classifier, which 325 yields a classification accuracy of 94% for the smaller dataset. 326 As part of the algorithm, WOA develops solutions in their 327 search space [25] using the prey siege and encirclement 328 process, bubble invasion, and search for prey methods in an 329 98478 VOLUME 10, 2022 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System effort to improve the FS problem’s solutions. In addition,330 FPA enhances the FS problem’s solutions using two global331 and local search processes in a search space that is opposite332 from the solutions ofWOA. In actuality, they employed every333 potential answer to the FS problem from both the solution334 search space and its opposite. Experiments were run in two335 steps to assess the performance of the suggested method. Ten336 FS datasets from the UCI data repository were used for the337 tests in the first stage.338 A new hybrid SDS using the Binary Firefly Algorithm339 (BFA) was proposed in [29]. The choice of a feature is340 based on a fitness function that is reliant on the acquired341 accuracy when using a Naive Bayesian Classifier (NBC),342 and BFA explores the space of the best feature subsets. The343 FA approach has a sluggish convergence rate and requires344 expensive computing. Of the 57 features in the Spambase345 dataset, 21 features were extracted. The examples are finally346 classified using an NBC algorithm in the smaller dataset,347 which yields a classification accuracy of 95.14%.348 Using a Genetic Algorithm (GA) and Random Weight349 Network (RWN), [30] suggested a novel hybrid SDS. Using350 RWN GA determines the optimum feature subsets based on351 the accuracy it has been able to accomplish. In spite of this,352 GA uses a lot of resources. From the 57 features in the353 SpamAssassin dataset, they only isolated 25 features. The354 examples are finally classified using an RWNclassifiers algo-355 rithm in the smaller dataset, which yields a 92% classification356 accuracy.357 In [31], proposed a novel spam classification technique358 using Naive Bayes (NB) and Support Vector Machines359 (SVM). They obtained 80 of the 140 features contained360 in the Spamassassin dataset. Finally, the SVM/NB clas-361 sifiers are used in the reduced dataset to classify the362 instances and achieve a classification accuracy of 97%,363 and 98%.364 In [32] a web spam detection method by extracting novel365 feature sets from the homepage source code and choosing366 the random forest (RF) as the classifier against the UK-2011367 dataset was proposed. Finally, an RF classifier is used in368 the reduced dataset to classify the instances and achieve a369 classification accuracy of 93%.370 To get over the problem of false drift, [33] presented371 a Disposition Based Drift Detection Method (DBDDM), a372 DBDDM. In order to determine the actual drift, this study373 uses the approximation randomization test to calculate the374 frequency of successive drift and compares the frequency375 with the threshold. When Naive Bayes (NB) and the Hoeffd-376 ing tree (HT) classifier are used, it shows a maximum gain377 in accuracy of 24% and 28% and an increase of 2.50 and378 1.91 average ranks, respectively.379 A novel hybrid SDS based on PSO and Fruit Fly Optimiza-380 tion (FFO) based on PSO for Feature Selection was proposed381 as a novel hybrid SDS in [34]. An FFO is utilised to optimise382 the PSO. These methods do not, however, perform as well in383 local and international searches. From the 57 features in the384 SpamBase dataset, they only extracted 10 features. The cases385 TABLE 1. Comparisons of related work. are finally classified using an FFOPSO classifier algorithm 386 in the smaller data set. 387 In [35] proposed a new SDS that incorporates the Har- 388 ris Hawks Optimizer (HHO). The HHO is used to remove 389 redundant and unnecessary features that could obstruct per- 390 formance. The instances are finally classified using a KNN 391 classifier technique in the smaller data set. A summary of the 392 related work is given in Table 1. 393 In light of the MOBGOA algorithm, this study provided an 394 SDS model using a different metaheuristic, MOBGOA, with 395 the ultimate goal of MOO FS. The wrapper method of FS is 396 used in this strategy, and the most promising updated GOA 397 model is used to train the MLP because it is acclimated to 398 dealing with tackling the problems MLPs face. 399 IV. METHODOLOGY THE STUDY 400 The present methods have performed well in terms of 401 addressing the SD issue. The ideal device, however, has yet to 402 be developed, as it must be able to detect all messages without 403 creating a fake alert in order to provide complete protection 404 from spam. Researchers must contend with a number of 405 obstacles, such as the constant growth of hacking tools, the 406 vast array of existing and emerging data mining and machine 407 learning approaches, the high dimensionality of datasets, etc. 408 A function selection approach within the framework of 409 SD is laid out in this section. Wrappers outperform filters 410 and deliver better results, but use more processing resources 411 [36]. For that reason, we used MOBGOA as the wrapper 412 method to carry out the function selection. The most efficient 413 metaheuristic can handle this challenge since the range of 414 functions is crucial [37]. Three phases: preprocessing, feature 415 selection, and classification, are integrated into the imple- 416 mentation of the suggested SDS solution. Figure 2 presents 417 the system SDS suggested. 418 A. PREPROCESSING PHASE 419 Different types of characteristics, including symbols and 420 characters, are present in the dataset employed in the con- 421 text of spam identification [38], [39]. The normalisation 422 Eq. 9. is used to normalise these numerical values. 423 A feature extraction tool is used to transform the raw 424 email formats into numerical values, as is the case 425 VOLUME 10, 2022 98479 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System FIGURE 2. System architecture for the proposed SDS. with the Spam Assassin data set, and it may be found426 at this URL:‘‘https://github.com/7ossam81/EmailFeatures427 Extraction’’ [40]. Normalization’s primary goal is to bring the428 numerical values of various attributes into the same range.429 Before using the dataset in the training and testing phases,430 all of the dataset’s characteristics must be normalized. In the431 datasets, the feature values are meant to give regular seman-432 tics. Through the use of Eq. (9), the values are transformed433 into the range [0, 1], putting all features on the same scale.434 xnew = xcurrent − xminxmax − xmin (9)435 Each collection of features in each of the 3 datasets utilised436 in this work has a class, which is either not spam or spam437 email. As a result, each entry in the dataset falls into either the438 non-spam or spam category. Each class’s value is assigned a439 numeric value, with the non-spam email class being assigned440 the No. 0 and the spam email class being assigned the number441 1. Preprocessing the entire dataset takes time since it is large442 to load into memory. Records from the dataset are chosen at443 random as samples. Then two subsets of this random sample444 are created; the first is referred to as the training and testing445 dataset.446 B. THE FEATURE SELECTION PHASE447 1) DESIGN OF MULTI-OBJECTIVE BINARY GRASSHOPPER448 OPTIMIZATION ALGORITHM (MOBEGOA)449 The most important issue to consider when developing a450 reliable approach for spam detection systems (SDS) is to451 focus on two stages for functions are: 1) selecting important452 features and excluding unimportant features from email data;453 and 2) developing an approach with a high potential for 454 detecting spam email. The general concept used in this study 455 of the normal feature algorithm selection, which is divided 456 into five basic steps, The first step begins with initialising the 457 original feature set found in all three datasets. 458 The dimensionality of the search space (SS) frequently 459 affects the initialization method for the MOO binary GOA 460 algorithm. It is important to note that in this paradigm, fea- 461 tures are frequently defined as the total number of all possible 462 features. The first step of the protocol corresponds to the 463 initialization phase of the MOBGOA. The candidate features 464 are then discovered in the second step. It’s a method of 465 discovery that starts with the creation of a random subset of 466 features that MOBGOA has identified as potential solutions. 467 The third step is an evaluation procedure of the candidate 468 features. It is an evaluation procedure that begins with using 469 the E2GOAMLP algorithm to train multi-layered neural net- 470 works. The E2GOAMLP algorithm is better understood by 471 the interested reader compared to earlier research [41]. The 472 feature selection process is one of the most crucial processes, 473 and the feature selection algorithm is based on a wrapper 474 algorithm. This step is critical in directing the algorithm’s 475 selection of an optimal subset of attributes. 476 In the fourth step, a conditioning procedure to determine 477 the relevant subset or optimal feature subset. It is a condi- 478 tioning procedure that begins with determining whether to 479 continue or stop the search for other subsets of features by 480 testing the stop criterion. Here, the stop criterion depends on 481 either reaching the maximum No. of predefined iterations or 482 a predefined No. of selected features. 483 98480 VOLUME 10, 2022 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System FIGURE 3. MOBGOA workflow. In the fifth step, a result validation of the candidate features484 is performed. It is a discovery procedure that begins with485 validating against the 3 datasets. The findings of this phase486 will be reviewed with those of earlier phases. The MOBGOA487 algorithm is illustrated in Figure 3, which shows the essential488 processes in Figure 5, while the following subsections pro-489 vide additional detail on the method’s main components.490 2) WRAPPER FEATURE SELECTION METHOD USING491 EGOAMLP492 MOBGOA is employed as a wrapper-based feature selection493 algorithm. As a result, a wrapper classifier is required for494 the MOBGOA algorithm to evaluate the subsets. In other495 words, the Section IV presented MOBGOA algorithm is the496 multi-objective binary feature selector and EGOAMLP-based497 evaluator wrapper classifier. FIGURE 3 shows the important498 role played by the EGOAMLPs algorithm in the bottom loop499 of the workflow.500 With each new generation, the MOBGOA algorithm gen-501 erates new solutions (a new subset of features is generated).502 It is entered into the MLP that is trained by the best enhanced503 GOA (which is introduced in our previous work [41]). The504 method is employed by employing the novel feature set, and505 feedback on the method is obtained from the performance506 of the E2GOAMLP algorithm, which calculates the three507 objectives and arranges a new solution.508 3) MOBGOA PARAMETERS509 The MOBGOA algorithm utilises similar parameters to the510 first GOA model. C1, C2, and the maximum number of511 generations to hunt for solutions are the control parameters512 in GOA. In this study, the maximum number of generations513 was 1000 and the population size of NP is 50. TheMOBGOA514 algorithm is run 100 times, and the generations in each exper-515 iment are terminated upon reaching the maximum number.516 The number of characteristics to evaluate the final strategy517 determines the size of the solution space for each dataset used 518 in the study. 519 4) BINARY ENCODING 520 The representation and formatting of data is a crucial step 521 before processing data using anyML technique. In the major- 522 ity of ML classification algorithms, a good representation 523 model is crucially important. In this study, the feature-value 524 representation system was investigated. In keeping with this, 525 every instance in this framework is shown as a vector for 526 characterising the problem domain. The network traffic is set 527 aside as a dataset that is typically handled as a table, with 528 each row addressing a particular occurrence and each column 529 addressing a different network element. 530 In the MOBGOA, a solution is represented by an n-bit 531 string, where n is the total No. of features in the dataset. The 532 solution’s (xd ) value at the d th place is in the range [0, 1], 533 showing the likelihood that the d th feature will be selected. 534 Using the threshold is an additional strategy. A threshold (θ) 535 is used to determine whether or not a feature is selected. 536 If (xd > θ), the d th feature is enabled; otherwise, it is not. 537 Thus, the normal features are used to create the new sub- 538 features. MOBGOA employs the threshold strategy. A novel 539 feature in Figure 4 that can be seen as a potential solution is 540 a subset that is uniquely recognised by a binary string. 541 5) MULTI-OBJECTIVE OPTIMIZATION (MOO) 542 The concept ofMOOof theMOBGOAmodel is themain fea- 543 ture. The coordination of binary strings onmultiple objectives 544 to evaluate solutions in feature selection (FS) problems, rather 545 than visualising on one criterion as accuracy. If the needed 546 solution is a minimization problem, that is, the minimum 547 value of the fitness role, the result is best, and vice versa 548 for maximisation problems. If many goals that require a 549 corresponding fitness function are found, there is a potential 550 VOLUME 10, 2022 98481 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System FIGURE 4. Representation of a possible solution as binary string. FIGURE 5. The general feature selection process. conflict between their judgement about the quality of the551 same solution.552 It is worth mentioning that the three objectives with their553 desirable characteristics are described above:554 • FS→ to be minimum555 • ER→ to be minimum556 • FPR→ to be minimum557 The weighted aggregation objective (WAO) that MOB-558 GOA uses to determine performance subsets of feature sets559 of MLP ratings is illustrated:560 (WAO) = w1×FS + w2×ER+ w3 × FPR (10)561 FromEq. (10), wherew1 refers to the feature weight andw2562 refers to the error weight, then w3 refers to the stand for false563 positive weight. Furthermore, the weights w2 and w3 refer to564 more than w1. In addition, the number of selected features565 (FS) is no more important than the false positive rate (FPR)566 of error rate (ER). The three weights (w1, w2, and w3) values567 in the evaluated tests are as follows (0.1, 0.5, and 0.4).568 6) COMPUTATIONAL COMPLEXITY569 The number of solutions, also known as the D, and the No.570 of populations, also known as the population size, of the571 MOBGOA algorithm, are what essentially define the com-572 puting complexity of the enhanced GOA method.573 The total computing complexity, in the worst case, is O574 (DNP) ≈ O (O (calculate the GOA position of all solutions575 and evaluate its fitness)+O (sort solutions of population and576 GOA population)).577 The MOBGOA algorithm’s generative process analyses578 the time complexity of the generation as follows:579 The starting population’s creation is the primary activity in580 stage 1, and the time complexity is O (NPD).581 Stage 2 temporal complexity for decision-making based on582 stop/termination criteria is O (1).583 Stage 3 involves calculating the value of an aggregated584 objective parameter based on 3 objectives, namely the No.585 of features (NF), the error rate (ER), and the false positives586 (FP), time complexity is O (1). The time complexity in 587 Stage 4, updating the answer, is O (N). Generating continues 588 in Stage 5 and returns back to Phase 2. Consequently, the 589 MOBGOA algorithm’s time complexity is O. (NPD). 590 7) INTEGRATING MOBGOA WITH E2GOAMLP FOR SPAM 591 DETECTION 592 This design is a spam detection strategy based on 593 EGOA-trained MLP and a set of optimised features. There 594 are two primary components to this goal: Feature selection is 595 the first stage, and classification is the second stage. 596 The MOBGOA method handles the feature selection por- 597 tion, while the MLP trained with the E2GOAMLP algorithm 598 handles the classification. Figure 6 depicts how each of these 599 components fits into the overall spam detection image. 600 It is worth mentioning that E2GOAMLP is also utilised 601 as a wrapper classifier for feature selection with the MOB- 602 GOA box in the diagram. The following selection of the 603 best features utilises the MLP trained by E2GOAMLP as the 604 wrapper classifier based on the characteristics. After extract- 605 ing the features through MOBGOA, the performance of the 606 extracted features is tested, as well as the spammodel, termed 607 MOBE2GOAMLP, are both tested using this unifiedmodel in 608 the following experimental assessments. 609 C. THE CLASSIFICATION PHASE 610 The parameter initialization, data input, ANN training, and 611 EGOA module are the four basic phases of the model. The 612 EGOA system and the ANN model’s parameters are ini- 613 tialised during the phase before. The Population Size (NP) 614 parameter, which deals with the population’s total number of 615 solutions, is one of many variables in the EGOA algorithm. 616 Each answer (I = 1, 2,. . . , D) deals with a D-dimensional 617 vector, where D is the total number of elements that influence 618 a decision. 619 The best solution vectors found up to this point are organ- 620 ised into a grid called Solution Memory (SM). It is an 621 expanded NP-by-D matrix. Before starting the operation, the 622 FS size is modified. In light of the objective function f(x), 623 each solution vector is additionally coupled with a positive 624 value. The algorithm is shown in Figure 7. 625 The data entry stage is the crucial part of data input in the 626 following step. It relies on how the raw data is transformed, 627 filtered, and how the features are extracted. The split of the 628 raw data into the training and testing sets is a crucial stage. 629 The following component uses it as input information. The 630 approaching data sources should fit into the range of 0 to 631 1 before the data is fed into the ANN model. This normali- 632 sation technique is important for the training in the following 633 module. 634 The third stage is when the MLP model begins to function 635 after receiving training features for the input data measure- 636 ment from the information input components. This part is 637 designed as an MLP, or organisation, using Feed-Forward 638 Neural Networks (FFNN). The three-layered neurons that 639 make up the MLP’s design are divided into an info layer, 640 a concealed layer, and a yield layer. The MLP module 641 98482 VOLUME 10, 2022 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System FIGURE 6. Integrating mobgoa with EGOAMLP. receives the information from the information input mod-642 ule that is regarded as the designing information (designing643 dataset) for designing the MLP. It is noteworthy that the644 EGOA component receives the loads and inclinations in order645 to carry out the preparation interaction in this module.646 The EGOA module is used in the fourth stage as a stand-647 alone framework (Black Box) to create novel arrangements648 that rely on the periodic refreshing of synaptic loads and649 inclinations. The EGOA module delivers each arrangement650 as a collection of loads and predispositions into the MLP651 component during each cycle of the preparation interaction.652 In this way, each preparation dataset-dependent arrangement653 is evaluated, and then its wellness values are restored. In this654 work, the Mean Square Error (MSE) and Fitness Function655 (FF) are used to process wellbeing. By reducing the MSE656 estimation of the mistake rate, the loads and inclinations are657 acquired.658 Once the maximal number of cycles is reached, the prepa-659 ration interaction ends. The loads and predispositions knowl-660 edge base is then updated. The EGOA algorithm is linked to661 other systems for streamlining. As a result, the goal is under-662 stood as either increasing or decreasing a measure achieved663 through this FF. The goal of such a FF should be similar to664 its value in enhancing calculations. Other than that, its objec-665 tive is to reduce general error, similar to studying methods666 demonstrated by previous exams [42], [43]. Therefore, the667 FF stated before might apply any of the MLP error estimation668 equations or derive another wellness metric from the recipes.669 MSE is used in this work as the primary quality component of670 the proposed EGOA preparation calculation. The preparation671 goal is to, at its most basic, restrict the MSE to arriving at the672 highest aggregate of emphasis.673 The best classification, approximation, or prediction accu-674 racy for training and testing samples is the main goal of675 training theMLP. Figure 7 shows the forward pass calculation676 measure. The fitness function was calculated in this work677 using a methodology that has been employed in a number678 of studies [42], [43]. The output of the ith hidden node is679 determined as follows: If the number of input nodes is N , the680 number of hidden nodes isH , and the number of output nodes681 is O.682 f ( Sj ) = Sigmoid (Sj)683 = 1 /( 1+ exp ( − (∑N i=1Wij.Xi − βj ))) ,684 j = , 2, . . . ,H (11)685 Wij is the connection weight from the ith node in the input 686 layer to the jth node in the hidden layer, Xi is the ith input 687 and βj is the bias (threshold) of the jth hidden node. Where 688 Sj=∑Ni=1Wij.Xi − βj. The final output can be described as 689 follows after computing the hidden nodes’ outputs: 690 Ok = ∑N i=1Wkj.f ( Sj )− βk , k = 1, 2, . . . ,O, (12) 691 where βk is the bias (threshold) of the k th output node 692 and Wkj is the connection weight from the ith hidden 693 node to the k th output node. The following are the cal- 694 culations made to determine the learning error E (fitness 695 function). 696 Ek = ∑O i=1 ( Oki − dki )2 (13) 697 MSE = ∑q k=1 Ek q (14) 698 dki is the desired output of the i th input unit when the k th 699 training sample is used, and Oki is the actual output of the i th 700 input unit when the k th training sample is used. Where q is 701 the number of training samples, consequently, the following 702 definition applies to the fitness function of the ith training 703 sample: 704 Fitness(xi) = MSE(xi) (15) 705 V. PERFORMANCE EVALUATION 706 A. SPAM DATASETS 707 The evaluation of the proposed ANN system for the specific 708 purpose of SDS defines the usage of benchmark datasets 709 for this particular framework, unlike the datasets used for 710 classification. In this section, three datasets that can be used 711 to test SDSs are briefly explained. 712 1) SPAMBASE DATASET 713 In 1999, Hopkins provided the SpamBase dataset [44]. 714 Several writers have utilised this dataset for categoriza- 715 tion. This dataset included 4601 emails with an average of 716 57 attributes, of which 1813 (39%) were spam and 2788 717 (61%) were not. The dataset’s features are all displayed 718 in Table 2. 719 The percentage of times the special characters 720 ‘‘;’’,‘‘(‘‘,’’[‘‘, ‘‘!,’’ ‘‘$,’’ and ‘‘#’’ appear among the remaining 721 six features is unknown. The other three elements serve as 722 a visual depiction of various capitalization measures used in 723 the messages’ text. Finally, each instance’s class label can be 724 either 0 for non-spam or 1 for spam. The SpamBase dataset 725 is one of the best for learning and assessment methodologies. 726 2) SPAMASSASSIN DATASET 727 The most well-known and often used dataset for identify- 728 ing spam is the SpamAssassin dataset, which Justin Mason 729 created in 2002 [45]. Information about this dataset can 730 be found at (https://wiki.apache.org/spamassassin) contains 731 information on it. In the 6047 communications that com- 732 prised this dataset, there were 1897 unsolicited (spam) emails 733 VOLUME 10, 2022 98483 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System FIGURE 7. The egoamlp training algorithm flowchart. (31.4%), 3,900 easy ham emails, and 250 difficult but genuine734 emails that in many ways resemble spam. In Table 3, the735 following characteristics of SpamAssassin email messages736 are displayed:737 3) UK-2011 WEBSPAM DATASET738 The UK-2011 Webspam dataset consists of 3,766 Web pages739 with 11 features, 1768 of which are non-spam, and 1998 of740 which are spam (53%) of the emails, making the data unbal-741 anced and hence more difficult. All of the dataset’s features742 are listed in Table 4, and a detailed description of each feature743 can be found in [46] and [47].744 B. EVALUATION METRICS745 Utilizing the accompanying metrics for ACC, FAR, DR,746 specificity, sensitivity, F-measure, Matthews correlation747 coefficient (MCC), and G-mean (GM), the effectiveness of748 the proposed technique is evaluated. The true positives TP,749 true negatives TN, false positives FP, and false negatives750 FN cases are used to determine the FAR, DR, MCC, GM,751 and ACC.752 The confusion matrix for a two-class classification in753 Table 5 yielded these four key criteria. Some performance754 indicators are used to describe the confusion matrix in755 Table 6. The performance metrics given in Equations (16–25)756 are shown in Table 7.757 VI. EVALUATION OF MOBEGOAMLP758 The suggested MOBGOA framework is thoroughly evalu-759 ated in relation to MOBEGOAMLP, thereby confirming the760 execution of the subsequent SDS method. The three datasets761 given in Section B are used to test the approach.762 A. SPAMBASE RESULTS763 In this Scenario 1, the MOBGOA was first applied to764 the SpamBase dataset to select suitable features from the765 dimensionality of the search space using the fitness function,766 resulting in 57 to 15 features as shown in Figure 2. The clas-767 sification of the resulting features for training and the results768 TABLE 2. All features of the spambase dataset. TABLE 3. All features of the spamassassin email messages. TABLE 4. Analysis of UK-2011 WEBSPAM email messages. obtained are presented in Figure 8. Classification results are 769 displayed using the selected features extracted by MOBGOA 770 training for each training set as presented in Table 9. The 771 proposed MOBE2GOAMLP algorithm is highlighted in bold 772 text. 773 As per Figure 9, the same results are described in a con- 774 fusion matrix. Using the definitions in Section B, the exper- 775 imental results of the suggested EGOAMLPs models are 776 calculated in Table 8. The spam detection model EGOAMLP 777 is able to achieve the very best ratios across the three criteria: 778 DR records of 98.1%, ACC records of 97.5 %, and FAR 779 records of 0.033, according to the acquired results, which 780 were carried out utilising 15 features. 781 Figure 8 illustrates the convergence curve resulting from 782 sample runs of the GOAMLP, E1GOAMLP, E2GOAMLP, 783 E3GOAMLP, E4GOAMLP, E5GOAMLP, and E6GOAMLP 784 algorithms against selected results from the SpamBase 785 dataset. 786 Figure 9 shows that the MOBGOA algorithm enhanced the 787 classification accuracy by selecting a subset of 15 features. 788 All the results in the matrices match those listed in Table 8. 789 Due to the constrained space, Figure 9 presents the revised 790 98484 VOLUME 10, 2022 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System TABLE 5. The confusion matrix for classification. TABLE 6. Performance indicators used to describe the confusion matrix. TABLE 7. Mathematical formulae of performance metrics. TABLE 8. The classification results after using algorithms against selected subsets testing of the SPAMBASE. proposed model MOBE2GOAMLP with confusion matrices.791 It’s important to highlight that their selection was arbitrary.792 B. SPAMASSASSIN RESULTS793 In this scenario 2, the MOBGOA was first applied to the794 SpamAssassin dataset to select suitable features from the795 dimensionality of the search space using the fitness function,796 resulting in 140 to 48 features as shown in Figure 2. The797 classification of the resulting features for training and the798 TABLE 9. The classification results after using algorithms against selected subsets testing of the SpamAssassin. FIGURE 8. Convergence curve of proposed for training the spambase dataset. FIGURE 9. Confusion matrices for mobe2goamlp against the spambase dataset. results obtained are presented in Figure 10. Classification 799 results are displayed using the selected features extracted 800 by MOBGOA training for each training set as presented 801 in Table 9. The proposed MOBE2GOAMLP algorithm is 802 highlighted in bold text. As per Figure 11, the same results are 803 described in a confusion matrix. The spam detection model 804 EGOAMLP is able to achieve the very best ratios across the 805 three criteria: DR records of 98.3%, ACC records of 98.3%, 806 and FAR records of 0.018, according to the acquired results, 807 which were carried out utilising 48 features. 808 Figure 10 illustrates the convergence curve resulting from 809 sample runs of the GOAMLP, E1GOAMLP, E2GOAMLP, 810 E3GOAMLP, E4GOAMLP, E5GOAMLP, and E6GOAMLP 811 algorithms against selected results from the SpamAssas- 812 sin dataset. Figure 11 shows that the MOBGOA algorithm 813 enhanced the classification accuracy by selecting a subset of 814 48 features. All the results in the matrices match those listed 815 in Table 9. 816 Figure 11 presents the new proposed model 817 MOBE2GOAMLP using confusion matrices due to the space 818 constraints. It should be mentioned that their selection was 819 VOLUME 10, 2022 98485 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System FIGURE 10. Convergence curve of proposed for training the spamassassin dataset. TABLE 10. The classification results after using algorithms against selected subsets testing of the UK-2011. TABLE 11. Comparison between the EGOAMLP and MOBEGOAMLP. arbitrary and that the intention was to showcase the most820 effective trainers using SpamAssassin.821 C. UK-2011 WEBSPAM RESULTS822 In this Scenario 3, the MOBGOA was first applied to the823 UK-2011 Webspam dataset to select suitable features from824 the dimensionality of the search space using the fitness func-825 tion, resulting in 11 to 5 features as shown in Figure 2. The826 classification of the resulting features for training and the827 results obtained are presented in Figure 12. Classification828 results are displayed using the selected features extracted829 by MOBGOA training for each training set as presented in830 Table 10.831 The proposed MOBE2GOAMLP algorithm is highlighted832 in bold text. As per Figure 12, the same results are described833 in a confusion matrix. From the discussion Subsections834 A and B, it is apparent that in the SpamAssassin and835 SpamBase datasets, using MOBGOA feature selection has836 improved the overall performance of the E2GOAMLP classi-837 fier. Figure 13 shows the convergence curve resulting from838 sample runs of the GOAMLP, E1GOAMLP, E2GOAMLP,839 E3GOAMLP, E4GOAMLP, E5GOAMLP, and E6GOAMLP840 algorithms against selected results of the UK-2011Webspam841 dataset. Figure 13 illustrations that the MOBGOA algorithm842 enhanced the classification accuracy by selecting a subset of843 5 features. All the results in the matrices match those listed844 in Table 10.845 TABLE 12. Comparison of the study’s findings with previously published research. FIGURE 11. Confusion matrices for mobeg2oamlp against the spamassassin dataset. Figure 13 presents the new proposed model 846 MOBE2GOAMLP using confusion matrices due to the space 847 constraints. It should be mentioned that their selection was 848 arbitrary and that the intention was to showcase the most 849 effective trainers using the UK-2011 dataset. 850 D. THE ADVANTAGE OF THE MOBGOA 851 Table 11 compares the outcomes of analysing the resultant 852 EGOAMLP models and the last MOBEGOAMLP mod- 853 els using three datasets. The evaluation comprised a com- 854 parison between the MOBEGOAMLP models, which used 855 the chosen characteristics extracted by MOBGOA, and the 856 EGOAMLP models, which used all features. ACC, DR, and 857 FAR were used to gauge performance. The results clearly 858 reveal that the most recent MOBEGOAMLP model exhibits 859 a superior classification of ACC and DR across all data sets. 860 These results offer the first proof that the EGOAMLP system 861 is superior, with the last model showing a higher ACC and 862 DR on all data sets, including SpamBase, SpamAssassin, and 863 UK-2011Webspam. 864 98486 VOLUME 10, 2022 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System FIGURE 12. Convergence curve of proposed for training the UK-2011. FIGURE 13. Confusion matrices for mobe2goamlp against the UK-2011 WEBSPAM dataset. TABLE 13. Comparison between mobegoas and mobgoa at A = 0.05 on a two-tailed t-test. E. COMPARISON OF THE RESULTS OF THIS STUDY AND865 THE PUBLISHED WORK866 This section summarises the current state-of-the-art spam867 detection systems recorded in Table 12.868 The overall results are much more satisfactory and fare869 well in comparison with the others, including datasets. This870 approach closely follows the performance of the best per-871 forming methods in the evaluation criteria. The records have872 been efficaciously classified through the proposed version as873 compared to the ones classified through other techniques.874 F. EVALUATION USING T-TesT875 In this section, we have analyzed the statistical analysis of the876 previous results in Table 13 and conducted the statistical t-877 test (T) to estimate the practical performance of the proposed878 algorithms compared with the standard algorithm (GOA).879 The proposed models’ findings show statistically significant880 differences from those of the standard GOA method, with P881 values less than 0.05. In comparison to the standard algo-882 rithm GOA, the P values greater than 0.05 (underlined) are883 not significant. This table shows that, for all three datasets,884 the proposed models were always superior to the standard 885 algorithm (GOA). 886 VII. CONCLUSION 887 This work introduces a novel method for SDS, the 888 MOBGOA-trained EGOAMLP. It centres around the perti- 889 nence of a modern algorithm, referred to as MOBGOA, for 890 preparing EGOAMLP. The MOB-EGOAMLP trained with 891 the datasets had an accuracy of 97.5%, 98.3%, and 96.4% 892 respectively. The results of this study show the highly positive 893 impact of this approach on delivering a better SDS. Future 894 research efforts will be to develop and extend an approach 895 that can robustly be implemented in detecting other malicious 896 attacks such as phishing and botnets. 897 REFERENCES 898 [1] D. M. Ablel-Rheem, ‘‘Hybrid feature selection and ensemble learning 899 method for spam email classification,’’ Int. J. Adv. Trends Comput. Sci. 900 Eng., vol. 9, no. 1.4, pp. 217–223, Sep. 2020. 901 [2] G. Mujtaba, L. Shuib, R. G. Raj, N. Majeed, and M. A. Al-Garadi, ‘‘Email 902 classification research trends: Review and open issues,’’ IEEE Access, 903 vol. 5, pp. 9044–9064, 2017. 904 [3] A. Kumari, N. Agrawal, and U. Lilhore, ‘‘Clustering malicious spam in 905 email systems using mass mailing,’’ in Proc. 2nd Int. Conf. Inventive Syst. 906 Control (ICISC), Jan. 2018, pp. 870–875. 907 [4] S. A. A. Ghaleb, M. Mohamad, S. A. Fadzli, and W. A. H. M. Ghanem, 908 ‘‘E-mail spam classification using grasshopper optimization algorithm 909 and neural networks,’’ Comput., Mater. Continua, vol. 71, no. 3, 910 pp. 4749–4766, 2022. 911 [5] S. A. A. Ghaleb, M. Mohamad, S. A. Fadzli, and W. A. H. M. Ghanem, 912 ‘‘Spam classification based on supervised learning using grasshopper opti- 913 mization algorithm and artificial neural network,’’ Commun. Comput. Inf. 914 Sci., vol. 1347, pp. 420–434, Dec. 2021. 915 [6] M. Shuaib, S. M. Abdulhamid, O. S. Adebayo, O. Osho, I. Idris, 916 J. K. Alhassan, and N. Rana, ‘‘Whale optimization algorithm-based email 917 spam feature selection method using rotation forest algorithm for classifi- 918 cation,’’ Social Netw. Appl. Sci., vol. 1, no. 5, p. 390, May 2019. 919 [7] S. A. A. Ghaleb, M. Mohamad, S. A. Fadzli, and W. A. H. M. Ghanem, 920 ‘‘An integrated model to email spam classification using an enhanced 921 grasshopper optimization algorithm to train a multilayer perceptron neural 922 network,’’ Commun. Comput. Inf. Sci., vol. 1347, pp. 402–419, Dec. 2020. 923 [8] I. Idris, A. Selamat, N. T. Nguyen, S. Omatu, O. Krejcar, K. Kuca, and 924 M. Penhaker, ‘‘A combined negative selection algorithm-particle swarm 925 optimization for an email spam detection system,’’ Eng. Appl. Artif. Intell., 926 vol. 39, pp. 33–44, Nov. 2015. 927 [9] O. M. E. Ebadati and F. Ahmadzadeh, ‘‘Classification spam email with 928 elimination of unsuitable features with hybrid of GA-naive Bayes,’’ J. Inf. 929 Knowl. Manage., vol. 18, no. 1, Mar. 2019, Art. no. 1950008. 930 [10] A. Karim, S. Azam, B. Shanmugam, and K. Kannoorpatti, ‘‘An unsu- 931 pervised approach for content-based clustering of emails into spam and 932 ham through multiangular feature formulation,’’ IEEE Access, vol. 9, 933 pp. 135186–135209, 2021. 934 [11] A. K. Singh and S. Singh, ‘‘Detection of spam using particle swarm 935 optimisation in feature selection,’’ Pertanika J. Sci. Technol., vol. 26, no. 3, 936 pp. 1–15, 2018. 937 [12] K. Wang, W. Mao, W. Feng, and H. Wang, ‘‘Research on spam filtering 938 technology based on newmutual information feature selection algorithm,’’ 939 J. Phys., Conf., vol. 1673, no. 1, Nov. 2020, Art. no. 012028. 940 [13] R. A. Atta, ‘‘Spam classification using genetic algorithm,’’ Iraqi J. Inf. 941 Technol., vol. 9, no. 2, pp. 142–170, 2018. 942 [14] W. A. H. M. Ghanem and A. Jantan, ‘‘Novel multi-objective artificial 943 bee colony optimization for wrapper based feature selection in intruction 944 detectoin,’’ Int. J. Adv. Soft Comput. Appl., vol. 8, no. 1, pp. 70–81, 2016. 945 [15] B. Alatas and H. Bingol, ‘‘Comparative assessment of light-based intel- 946 ligent search and optimization algorithms,’’ Light Eng., vol. 28, no. 6, 947 pp. 51–59, 2020. 948 [16] H. Bingol and B. Alatas, ‘‘Chaotic league championship algorithms,’’ 949 Arabian J. Sci. Eng., vol. 41, no. 12, pp. 5123–5147, Dec. 2016. 950 VOLUME 10, 2022 98487 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System [17] H. Bingol and B. Alatas, ‘‘Chaos based optics inspired optimization951 algorithms as global solution search approach,’’ Chaos, Solitons Fractals,952 vol. 141, Dec. 2020, Art. no. 110434.953 [18] S. Saremi, S. Mirjalili, and A. Lewis, ‘‘Grasshopper optimisation algo-954 rithm: Theory and application,’’ Adv. Eng. Softw., vol. 105, pp. 30–47,955 Mar. 2017.956 [19] S. A. A. Ghaleb, M. Mohamad, S. A. Fadzli, and W. A. H. M. Ghanem,957 Integrating Mutation Operator Into Grasshopper Optimization Algorithm958 for Global Optimization, vol. 25, no. 13. Berlin, Germany: Springer, 2021.959 [20] A. Saad, S. A. Khan, and A. Mahmood, ‘‘A multi-objective evolutionary960 artificial bee colony algorithm for optimizing network topology design,’’961 Swarm Evol. Comput., vol. 38, pp. 187–201, Feb. 2018.962 [21] X. S. Yang, ‘‘Bat algorithm for multi-objective optimisation,’’ Int. J. Bio-963 Inspired Comput., vol. 3, no. 5, pp. 267–274, 2012.964 [22] E. G. Talbi, ‘‘A unified taxonomy of hybrid metaheuristics with mathemat-965 ical programming, constraint programming and machine learning,’’ Stud.966 Comput. Intell., vol. 434, pp. 3–76, Dec. 2013.967 [23] Z. Hassani, V. Hajihashemi, K. Borna, and I. S. Dehmajnoonie, ‘‘A clas-968 sification method for E-mail spam using a hybrid approach for feature969 selection optimization,’’ J. Sci., Islamic Republic Iran, vol. 31, no. 2,970 pp. 165–173, 2020.971 [24] A. Jantan, W. A. H. M. Ghanem, and S. A. A. Ghaleb, ‘‘Using modified972 bat algorithm to train neural networks for spam detection,’’ J. Theor. Appl.973 Inf. Technol., vol. 95, no. 24, pp. 6788–6799, 2017.974 [25] H.Mohmmadzadeh and F. S. Gharehchopogh, ‘‘A novel hybrid whale opti-975 mization algorithm with flower pollination algorithm for feature selection:976 Case study email spam detection,’’Comput. Intell., vol. 37, no. 1, pp. 1–28,977 2020.978 [26] J. R. Méndez, T. R. Cotos-Yañez, and D. Ruano-Ordás, ‘‘A new semantic-979 based feature selection method for spam filtering,’’ Appl. Soft Comput.,980 vol. 76, pp. 89–104, Mar. 2019.981 [27] G. Al-Rawashdeh, R. Mamat, and N. H. B. A. Rahim, ‘‘Hybrid water982 cycle optimization algorithm with simulated annealing for spam E-mail983 detection,’’ IEEE Access, vol. 7, pp. 143721–143734, 2019.984 [28] T. Gangavarapu and C. D. J. B. Chanduka, Applicability of Machine985 Learning in Spam and Phishing Email Filtering: Review and Approaches.986 Amsterdam, The Netherlands: Springer, 2020.987 [29] B. Ahmed, ‘‘Wrapper feature selection approach based on binary firefly988 algorithm for spam E-mail filtering,’’ J. Soft Comput. Data Mining, vol. 2,989 no. 1, pp. 44–52, 2020.990 [30] H. Faris, A.-Z. Ala’M, A. A. Heidari, I. Aljarah, M. Mafarja,991 M. A. Hassonah, and H. Fujita, ‘‘An intelligent system for spam detection992 and identification of the most relevant features based on evolutionary993 random weight networks,’’ Inf. Fusion, vol. 48, pp. 67–83, Aug. 2019.994 [31] H. B.Ozkan andB. Can, ‘‘Analysis of adversarial attacks against traditional995 spam filters,’’ in Proc. Int. Conf. All Aspects Cyber Secur., 2019.996 [32] J. Liu, Y. Su, S. Lv, and C. Huang, ‘‘Detecting web spam based on novel997 features from web page source code,’’ Secur. Commun. Netw., vol. 2020,998 pp. 1–14, Dec. 2020.999 [33] S. Agrahari and A. K. Singh, ‘‘Disposition-based concept drift detection1000 and adaptation in data stream,’’ Arabian J. Sci. Eng., vol. 47, no. 8,1001 pp. 10605–10621, Aug. 2022.1002 [34] F. Soleimanian and S. K. Mousavi, ‘‘A new feature selection in email1003 spam detection by particle swarm optimization and fruit fly optimization1004 algorithms,’’ J. Comput. Knowl. Eng., vol. 2, no. 2, pp. 49–62, 2019.1005 [35] A. S. Mashaleh, N. F. B. Ibrahim, M. A. Al-Betar, H. M. J. Mustafa, and1006 Q. M. Yaseen, ‘‘Detecting spam email with machine learning optimized1007 with Harris hawks optimizer (HHO) algorithm,’’ Proc. Comput. Sci.,1008 vol. 201, pp. 659–664, Sep. 2022.1009 [36] E. Alba and J. F. Chicano, ‘‘Training neural networks with GA hybrid algo-1010 rithms,’’ in Proc. Genetic Evol. Comput. Conf. Berlin, Germany: Springer,1011 2004, pp. 852–863.1012 [37] S. Kang, J. Choi, and J. Choi, ‘‘A method of securing mass storage for1013 SQL server by sharing network disks-on the Amazon EC2 windows envi-1014 ronments,’’ J. Internet Comput. Services, vol. 17, no. 2, pp. 1–9, Apr. 2016.1015 [38] A. A. Aburomman and M. B. I. Reaz, ‘‘A novel SVM-kNN-PSO ensem-1016 ble method for intrusion detection system,’’ Appl. Soft Comput., vol. 38,1017 pp. 360–372, Jan. 2016.1018 [39] N. Saidani, K. Adi, and M. S. Allili, ‘‘A semantic-based classification1019 approach for an enhanced spam detection,’’ Comput. Secur., vol. 94,1020 Jul. 2020, Art. no. 101716.1021 [40] W. Hijawi, H. Faris, J. Alqatawna, I. Aljarah, A. M. Al-Zoubi,1022 and M. Habib, ‘‘EMFET: E-mail features extraction tool,’’ 2017,1023 arXiv:1711.08521.1024 [41] S. A. A. Ghaleb, M. Mohamad, S. A. Fadzli, and W. A. H. M. Ghanem, 1025 ‘‘Training neural networks by enhance grasshopper optimization algorithm 1026 for spam detection system,’’ IEEE Access, vol. 9, pp. 116768–116813, 1027 2021. 1028 [42] W. A. H. M. Ghanem and A. Jantan, Training a Neural Network for Cyber- 1029 attack Classification Applications Using Hybridization of an Artificial 1030 Bee Colony and Monarch Butterfly Optimization, vol. 51, no. 1. Cham, 1031 Switzerland: Springer, 2020. 1032 [43] W. A. H. M. Ghanem, S. A. A. Ghaleb, A. Jantan, A. B. Nasser, 1033 S. A. M. Saleh, A. Ngah, andA. C. Alhadi, ‘‘Cyber intrusion detection sys- 1034 tem based on amultiobjective binary bat algorithm for feature selection and 1035 enhanced bat algorithm for parameter optimization in neural networks,’’ 1036 IEEE Access, vol. 10, pp. 76318–76339, 2022. 1037 [44] Hopkins. (1999). UCI Machine Learning Repository: Spambase Data 1038 Set. Accessed: Nov. 1, 2021. [Online]. Available: https://archive. 1039 ics.uci.edu/ml/datasets/spambase 1040 [45] SpamAssassin. (2005). Spamassassin Public Corpus Kaggle. 1041 Accessed: Nov. 1, 2021. [Online]. Available: https://www.kaggle.com/ 1042 beatoa/spamassassin-public-corpus 1043 [46] H. A. Wahsheh, M. N. Al-Kabi, and I. M. Alsmadi, ‘‘A link and content 1044 hybrid approach for Arabic web spam detection,’’ Int. J. Intell. Syst. Appl., 1045 vol. 5, no. 1, pp. 30–43, Dec. 2012. 1046 [47] H. A. Wahsheh, M. N. Al-Kabi, and I. M. Alsmadi, ‘‘A link and content 1047 hybrid approach for Arabic web spam detection,’’ Int. J. Intell. Syst. Appl., 1048 vol. 5, no. 1, pp. 30–43, Dec. 2012. 1049 [48] K. F. Rafat, Q. Xin, A. R. Javed, Z. Jalil, and R. Z. Ahmad, ‘‘Evad- 1050 ing obscure communication from spam emails,’’ Math. Biosciences Eng., 1051 vol. 19, no. 2, pp. 1926–1943, 2021. 1052 [49] A. Makkar and S. Goel, ‘‘Spammer classification using ensemble methods 1053 over content-based features,’’ Adv. Intell. Syst. Comput., vol. 547, pp. 1–9, 1054 Jun. 2017. 1055 SANAA A. A. GHALEB received the bache- 1056 lor’s degree from the University of Aden, Yemen, 1057 in 2011, and the master’s degree from Univer- 1058 siti Sains Malaysia, Malaysia, in 2017. She is 1059 currently pursuing the Ph.D. degree with the 1060 Faculty of Informatics and Computing, Univer- 1061 siti Sultan Zainal Abidin. Her research interests 1062 include technology-enhanced learning, instruc- 1063 tional design and technology, computer networks 1064 and information security, cybersecurity, machine 1065 learning, artificial intelligence, swarm intelligence, and metaheuristic. 1066 MUMTAZIMAH MOHAMAD was born in 1067 Terengganu, Malaysia. She received the bach- 1068 elor’s degree in information technology from 1069 Universiti Kebangsaan Malaysia, in 2000, the 1070 M.Sc. degree in computer science from Universiti 1071 Putra Malaysia, and the Ph.D. degree in computer 1072 science from Universiti Malaysia Terengganu, 1073 in 2014. She was a Junior Lecturer, in 2000. 1074 Currently, she is an Associate Professor with 1075 the Department of Computer Science, Faculty 1076 of Informatics and Computing (FIK), Universiti Sultan Zainal Abidin, 1077 Terengganu, Malaysia. She has published over 50 research articles in peer- 1078 reviewed journals, book chapters, and proceeding. She has appointed a 1079 reviewer and technical committee for many conferences and journals and 1080 worked as a researcher in several national funded Research and Develop- 1081 ment projects. Her research interests include pattern recognition, machine 1082 learning, artificial intelligence, and parallel processing. 1083 98488 VOLUME 10, 2022 S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System WAHEED ALI H. M. GHANEM received the1084 B.Sc. degree in computer sciences and engi-1085 neering from Aden University, Yemen, in 2003,1086 and the M.Sc. degree in computer science and1087 the Ph.D. degree in network and communica-1088 tion protocols from Universiti Sains Malaysia,1089 in 2013 and 2019, respectively. His research1090 interests include computer and network secu-1091 rity, cybersecurity, machine learning, artificial1092 intelligence, swarm intelligence, optimization1093 algorithm, and information technology.1094 ABDULLAH B. NASSER (Member, IEEE)1095 received the B.Sc. degree from Hodeidah Univer-1096 sity, Yemen, in 2006, the M.Sc. degree from the1097 Universiti Sains Malaysia, Malaysia, in 2014, and1098 the Ph.D. degree fromUniversiti Malaysia Pahang,1099 Malaysia, in 2018, all in computer science. He is1100 currently an Assistant Professor with the Faculty1101 of Computing, Universiti Malaysia Pahang. He has1102 authored of many scientific papers published in1103 renowned journals and conferences. His research1104 interests include software testing and soft computing, specifically, the use1105 of artificial intelligence methods (metaheuristic algorithms) for solving1106 different software engineering problems.1107 MOHAMED GHETAS received the M.Sc. and1108 Ph.D. degrees in computer science from Universiti1109 Sains Malaysia. He is a Lecturer with the Faculty1110 of Computer Science, Nahda University (USM).1111 His research interests include cloud computing,1112 fog-computing, robust optimization, evolutionary1113 algorithm, federated learning, artificial neural net-1114 works, and deep learning.1115 AKIBU MAHMOUD ABDULLAHI received the1116 B.A. degree in arabic language from Bayero1117 University Kano, Nigeria, in 2011, the B.S.1118 degree in information technology (IT) from1119 Almadinah International University, Selangor,1120 Malaysia, in 2016, the M.S. degree in instruc-1121 tional multimedia from University Sains Malaysia1122 (USM), Penang, Malaysia, in 2017, and the Ph.D.1123 degree in computer science from Taylor’s Univer-1124 sity,Malaysia, 2021. From 2016 to 2018, hewas an1125 IT Help Desk Technician at Labtech International Ltd., Malaysia. He is cur-1126 rently a Lecturer with Albukhary International University, Kedah, Malaysia.1127 His research interests include the data science, machine learning, learning1128 analytics, and big data analytics.1129 SAMI ABDULLA MOHSEN SALEH received 1130 the B.Eng. degree in computer engineering from 1131 Hodeidah University, Yemen, in 2005, and the 1132 M.Sc. degree in electronic systems design engi- 1133 neering and the Ph.D. degree in computer vision 1134 and machine learning from Universiti Sains 1135 Malaysia, in 2013 and 2022, respectively. He was 1136 a Researcher at the Intelligent Biometric Group, 1137 School of Electrical and Electronic Engineer- 1138 ing, Universiti Sains Malaysia. He is currently 1139 a Researcher with the Aerial Vehicle and Surveillance System Research 1140 Group, Aerospace Engineering School. His research interests include com- 1141 puter vision, deep learning, swarm intelligence, and soft biometrics. He has 1142 served as a Reviewer for several well-known conferences and international 1143 journals, such as Pattern Recognition Letters journal. 1144 HUMAIRA ARSHAD received the master’s degree in information tech- 1145 nology from the National University of Science and Technology (NUST), 1146 Pakistan, and the Ph.D. degree from the School of Computer Science, 1147 Universiti Sains Malaysiais. She joined at the Faculty of Computer Sci- 1148 ences & IT, in 2004. She is an Associate Professor with the Department 1149 of Computer Sciences & IT, Islamia University of Bahawalpur, Pakistan. 1150 Her research interests include digital & social media forensics, information 1151 security, online social networks, cybersecurity, intrusion detection, reverse 1152 engineering, and semantic web. 1153 ABIODUN ESTHER OMOLARA received the 1154 Ph.D. degree from the School of Computer Sci- 1155 ences, Universiti Sains Malaysia. Her research 1156 interests include computer and network secu- 1157 rity, cyber-security, cryptography, artificial intelli- 1158 gence, natural language processing, network and 1159 communication protocol, forensics, and the IoT 1160 security. 1161 OLUDARE ISAAC ABIODUN received the Ph.D. 1162 degree in nuclear and radiation physics from 1163 the Nigerian Defence Academy, Kaduna, and the 1164 Ph.D. degree in computer science from the Uni- 1165 versiti Sains Malaysia, Penang, Malaysia, with 1166 specialization in security and digital forensic. His 1167 research interests include artificial intelligence, 1168 robotics, cybersecurity, digital forensics, nuclear 1169 security, terrorism, national security, and the IoT’s 1170 security. 1171 1172 VOLUME 10, 2022 98489