Received 18 August 2022, accepted 29 August 2022, date of publication 5 September 2022, date of current version 22 September 2022.
Digital Object Identifier 10.1109/ACCESS.2022.3204593
Feature Selection by Multiobjective Optimization:
Application to Spam Detection System by Neural
Networks and Grasshopper Optimization
Algorithm
SANAA A. A. GHALEB 1,2,3,4, MUMTAZIMAH MOHAMAD4,
WAHEED ALI H. M. GHANEM 1,2,3,5, ABDULLAH B. NASSER6, (Member, IEEE), MOHAMED GHETAS7,
AKIBU MAHMOUD ABDULLAHI8, SAMI ABDULLA MOHSEN SALEH9, HUMAIRA ARSHAD10,
ABIODUN ESTHER OMOLARA11, AND OLUDARE ISAAC ABIODUN11
1Faculty of Engineering, University of Aden, Aden, Yemen
2Faculty of Education Aden, University of Aden, Aden, Yemen
3Faculty of Education Saber, University of Lahej, Lahej, Yemen
4Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin, Terengganu 21300, Malaysia
5Faculty of Ocean Engineering Technology and Informatics, Universiti Malaysia Terengganu, Terengganu 32030, Malaysia
6School of Technology and Innovation, University of Vaasa, 65200 Vaasa, Finland
7Faculty of Computer Science, Nahda University, Beni Suef Governorate 62764, Egypt
8Faculty of Computing and Informatics, Albukhary International University, Kedah 05200, Malaysia
9School of Electrical and Electronic Engineering, Universiti Sains Malaysia, Pulau Pinang 14300, Malaysia
10Department of Computer Science, Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
11Department of Computer Science, University of Abuja, Gwagwalada 900110, Nigeria
Corresponding authors: Sanaa A. A. Ghaleb (sanaaghaleb.sg@gmail.com) and Waheed Ali H. M. Ghanem (waheed.ghanem@gmail.com)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
ABSTRACT Networks are strained by spam, which also overloads email servers and blocks mailboxes
with unwanted messages and files. Setting the protective level for spam filtering might become even more
crucial for email users when malicious steps are taken since they must deal with an increase in the number of
valid communications being marked as spam. By finding patterns in email communications, spam detection
systems (SDS) have been developed to keep track of spammers and filter email activity. SDS has also
enhanced the tool for detecting spam by reducing the rate of false positives and increasing the accuracy
of detection. The difficulty with spam classifiers is the abundance of features. The importance of feature
selection (FS) comes from its role in directing the feature selection algorithm’s search for ways to improve the
SDS’s classification performance and accuracy. As ameans of enhancing the performance of the SDS, we use
a wrapper technique in this study that is based on the multi-objective grasshopper optimization algorithm
(MOGOA) for feature extraction and the recently revised EGOA algorithm for multilayer perceptron
(MLP) training. The suggested system’s performance was verified using the SpamBase, SpamAssassin,
and UK-2011 datasets. Our research showed that our novel approach outperformed a variety of established
practices in the literature by as much as 97.5%, 98.3%, and 96.4% respectively.
15
16
INDEX TERMS Spam detection system (SDS), grasshopper optimization algorithm (GOA), feature selec-
tion (FS), multi-objective optimization (MOO), multilayer perceptron (MLP).
The associate editor coordinating the review of this manuscript and
approving it for publication was Bilal Alatas .
I. INTRODUCTION 17
Spam is electronic mail that is not requested yet is sent to 18
large numbers of people. Advertising is usually a known 19
type of spam and is the most common way to send it. But 20
VOLUME 10, 2022
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 98475
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
it is incorrect to think that such a strategy is used only21
in commercial environments. The number of spam e-mails22
has grown over recent years, whereby a recipient regularly23
receives heaps of emails on a daily basis, of which 92% are24
spam [1]. Currently, the battle between spam detection tools25
and spammers is an ongoing battle as each side seeks new26
ways to neutralize the other’s presence [2].27
Ensuring the integrity and privacy of those statistics is turn-28
ing into an actual challenge. The Simple Mail Transfer Pro-29
tocol (SMTP) has the ability to transmit and receive emails30
over the internet, but it does not have security measures31
built into it. To quote the SMTP [3], the designers of these32
protocols are aware of the security challenges of SMTP. The33
SMTP protocol does not include data integrity, encryption,34
or authentication services [4].35
Spam to a private email can cause havoc throughout the36
system. Nowadays, it has created many problems in business37
life, such as occupying network bandwidth and the space in38
users’ mailboxes. Research has been conducted in this area39
to resolve this issue and spam detection systems (SDS) have40
been developed to monitor spammers and filter email activi-41
ties by identifying patterns in email messages, thus improving42
the tool to detect spam [5], [6].43
Both the knowledge filtering and the guideline filter-44
ing strategies are used to detect spam. Both have advan-45
tages and disadvantages, but neither is effective against46
all threats [7], [8]. The guideline detection method works47
well for identifying recognised communications but not48
spam [8]. In comparison, the knowledge detection strat-49
egy is effective at finding new messages, but it has a low50
detection rate and a high percentage of false positives [9].51
As such, our study introduces a new method. Most inves-52
tigations into spam detection in the literature have focused53
on the knowledge detection strategy since it seemed more54
promising.55
Recently, several methods, including machine learning,56
statistical analysis, and artificial intelligence techniques, have57
emerged in the field of knowledge detection [8], [9], [10],58
[11], [12]. Unsupervised, semi-supervised, and supervised59
machine learning techniques are the three types used, and in60
general, supervised learning performs better than the other61
techniques. Several Machine Learning Algorithms (MLA)62
can be employed for knowledge identification, including63
Naive Bayes (NB), Artificial Neural Networks (ANN),64
Support Vector Machines (SVM), and k-Nearest Neighbor65
(KNN) [10], [11], [12], [13].66
The majority of categorization information is highly67
dimensional, and for effectiveness and accuracy, natural68
dimensionality reduction is also required. As a result, the69
main disadvantage of content classification is its high dimen-70
sionality. The features with area addresses will act in con-71
junction with high dimensionality or an excessive array of72
options (a large assortment of vocabulary that consists of all73
the special terms that occur a minimum of once or over once74
within the collection of emails). Due to the performance of75
the majority of content classifiers, this drawback worsens the76
system as a whole. Additionally, it will make the systemmore 77
complex overall. 78
Dimensionality reduction is crucially required to handle 79
and combat high spatiality problems as well as mitigate their 80
effects. This work is centred on the dimensionality of unso- 81
licited mail email classifiers. 82
Thus, the feature selection mechanism may be a curse for 83
the dimensionality of the selection of appropriate features 84
and its classification. However, many features may be low- 85
ered, and the training time may increase with the elimina- 86
tion and reduction of redundant features, therby improving 87
the classification’s performance. This analysis discussed the 88
several drawbacks of the well-known methods used in earlier 89
feature selection studies. The two types of feature selection 90
algorithms are filters and wrappers. Gain ratio, information 91
gain, chi-squared, and correlation-based feature selection 92
are a few examples of statistical, information theory-based, 93
or searching methods that can be used to apply filters [13], 94
[14]. Wrappers evaluate and categorise capabilities using a 95
machine learning technique to determine the subset that, for 96
the most part, makes up the dataset. 97
They have been built entirely on the following components: 98
a learning algorithm of a set of rules thatmay be any classifier, 99
and a feature search, sequential search, genetic search, etc. 100
The wrapper technique often requires less processing than the 101
clear-out strategy, but the latter yields the best results [14]. 102
Some researchers have classified the proposals which 103
are based on artificial intelligence optimization algorithms 104
into the following categories: biology-based, social-based, 105
chemical-based, physics-based, mathematics-based, music- 106
based, sports-based, swarm-based, plant-based, light-based, 107
and water-based [15], [16], [17]. 108
Based on this categorization, our proposal is based on 109
swarms the contributions of this work are summarized as 110
follows: 111
1. The proposed MOBGOA as a wrapper-based feature 112
selection to determine features from the emails in the first 113
stage. 114
2. Adapted the EGOAMLPs for the training of supervised 115
Multi-Layer Perceptrons (MLPs) in a second stage. 116
3. The final SDS approach (MOBEGOAMLP) was tested 117
by three spam datasets (SpamBase, SpamAssassin, and 118
UK-2011 Webspam) on ten statistics. 119
Section II provides an overview of this study. Section III 120
presents related research. Section IV discusses the method- 121
ology. Section V discusses the performance evaluation. The 122
assessment of contributions is depicted in Section VI, along- 123
side results and discussions. The conclusion is in Section VII. 124
II. BACKGROUND 125
A. GRASSHOPPER OPTIMIZATION ALGORITHM (GOA) 126
The GOA was inspired by the behaviour of grasshopper 127
insects and is one of the metaheuristic algorithms that [18] 128
presented in 2017. The grasshopper swarms go through two 129
stages in their life cycle: nymphs and adults. The nymph 130
98476 VOLUME 10, 2022
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
grasshopper travels slowly over a short distance, which lets131
them take advantage of their habitat and consume all the veg-132
etation in their way. The adult grasshopper, on the other hand,133
has two primary responsibilities: locating food andmigrating.134
It has a greater region to explore because it can jump quite135
high and travel a longway to obtain food.We can infer that the136
grasshopper’s two movements, slow movement over a small137
distance and abrupt movement over a wide distance, are both138
indicative of exploitation and exploration. The grasshoppers139
prefer to move locally during the exploitation stage, whereas140
during exploration they prefer to wander over long distances141
in search of food. The accomplishment of these two tasks,142
as well as locating a food source, is a natural process for143
grasshoppers. The mathematical model presented in [19],144
which is replicated here, describes the grasshopper swarming145
behaviour as follows:146
Xi = Si + Gi + Ai (1)147
where Xi stands for the ith grasshopper’s location, Si for the148
social interaction,Gi for gravity acting on the ith grasshopper,149
and Ai for the wind advection. Eq. (1) can be expanded to150
include Si, Gi, and Ai, and then rewritten as follows:151
Xi =
∑N
j=1,j6=i s
(∣∣xj − xi∣∣)xj − xidij − geˆg + eˆw (2)152
where N is the number of grasshoppers and s (r) = fer/l −153
e−r is a function that simulates the effects of social inter-154
actions. geˆg where g is gravitational force and eˆg is a unit155
vector pointing toward the center of the earth, is the enlarged156
Gi component. The extended Ai component is represented as157
ueˆw, where u is a constant drift and eˆw is a unit vector heading158
toward the wind. Where dij equals
∣∣xi − xj∣∣ and denotes the159
separation between the ith and jth grasshoppers. The effects160
of wind and gravity are much smaller than the relationships161
between grasshoppers since they discover comfortable zones162
rapidly and have poor convergence, hence this mathematical163
model should be changed as follows:164
Xdi = c
(∑N
j=1,j6=i c
ubd − lbd
2
s
(∣∣∣xdj − xdi ∣∣∣ xj − xidij
))
+ Tˆd165
(3)166
In Eq. (3) the parameter stands in for the upper and lower167
ubd and lbd bounds in the Dth dimension, respectively, and168
Tˆd parameter denotes the best solution value in the Dth169
dimension at the time. As a result, the parameter c must be170
reduced in accordance with the quantity of iterations. The171
more iterations there are, the more exploitation is encouraged172
by this system. The calculation for the argument c, which173
shrinks the comfort zone according to iterations, is as follows:174
c = cmax−Iter cmax−cminitermax (4)175
In Eq. (4), the parameters cmin and cmax stand for the176
maximum and minimum values, respectively. Iter stands for177
the most recent iteration and itermax denotes the maximum178
number of iterations.179
FIGURE 1. Classification of moo algorithms, highlighting the methods
used in this research.
B. JUSTIFYING THE GOA ALGORITHM 180
The inherent benefit of GOA is that it enhances convergence 181
quality by merging single-based and population-based meth- 182
ods. The following are some additional advantages of the 183
GOA that encourage scholars to use it to address classifica- 184
tion issues [19]: 185
 During their initial search, grasshoppers can make a 186
number of abrupt large step hops and can automatically 187
seek into areas where potentially superior solutions have 188
already been discovered. 189
 The automatic transition from exploratory movement 190
to local focused exploitation is used to carry out this 191
search. As a result, the GOA converges quickly in the 192
initial phases of the iteration process. 193
 The GOA updates the position by taking into account 194
not only the current position of the grasshopper and the 195
position of the target, but also the positions of every other 196
grasshopper. 197
Themajority ofmetaheuristic algorithms use pre-tuned 198
preset parameters. The GOA, in contrast, used param- 199
eter control, which involved varying the values of the 200
parameters (C2 andC1) throughout each cycle. This aids 201
in automatically switching the GOA from exploration 202
to exploitation when searching is the optimal course of 203
action. 204
C. MULTI-OBJECTIVE OPTIMIZATION (MOO) 205
The MOO is important as it helps make the best decision 206
possible, especially when there are trade-offs between at least 207
two different objective functions. It may involve increasing 208
or decreasing several changing objective functions [20]. The 209
equation for an n-objective minimization challenge’s equa- 210
tion is as follows: 211
Minimise : F (x) = [f 1 (x) , f 2 (x) , f 3 (x) , . . . , fn (x)] 212
(5) 213
Subject to : gi (x) ≤ 0, i 214
= 1, 2, 3, . . .m, hi (x)= 0, i = 1, 2, 3, . . . l (6) 215
The total number of objective functions that must be low- 216
ered, where x is a selection vector, is n. The model in Eq. (6) 217
transforms into a single-objective issue when n is equal to 1, 218
and the perfect solution minimises the objective. On the other 219
hand, when n > 1, fi (x) denotes the objective function, 220
whereas gi(x) and hi(x) denote the utility functions of the 221
issue being maximised or minimised. 222
VOLUME 10, 2022 98477
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
In MOO, a solution’s nature is indicated by the trade-223
off between its n different aims. The optimum solutions224
to the MOO problems are all non-dominated arrangements225
if the following criteria are satisfied: x is dominant over y.226
The Pareto set/front refers to these solutions [20]:227
∀i: fi (x) ≤ fi (y) and ∃j:fi (x) < fi (y) (7)228
MOOs are used to gather a collection of trade-offs, draw-229
backs, or non-dominant options. The Pareto-optimal solution230
is one that does not outperform any other solution in a given231
situation.232
The Pareto front, a trade-off surface, is defined by all233
solutions [21]. Scalar methods, criterion-based methodolo-234
gies, dominance-based methodologies, and indicator-based235
approaches are the four main groups of MOOmetaheuristics.236
More information is shown in Figure 1 which presents further237
details [22]. This graphic also shows the MOO approach that238
we propose.239
1) SCALAR APPROACHES240
This group of MOO metaheuristics includes approaches that241
change a MOO problem into a single objective or a collec-242
tion of similar problems. The strategy shown in Section II is243
modified to become a scalar methodology. The methodology244
includes the accumulation strategy, weighted measurements,245
goal programming, achievement capacities, goal achieve-246
ment, and -constraint techniques. Scalarization techniques247
are used to construct Pareto ideal layouts, which is the justi-248
fication. The scalar method is an a priori technique; it calls249
for the communication of sufficient inclination data before250
the solution procedure. Commonly used examples of priori251
methods are the utility capacity strategy, goal programming,252
and lexicographic technique.253
2) AGGREGATION METHOD254
The aggregation (or weighted aggregation) approach is one of255
the most important and frequently used methods for produc-256
ing Pareto optimal solutions. In this approach, an aggregation257
function is used to connect numerous objective functions f_i258
linearly into a single objective function f, converting a MOO259
problem to a single-objective problem.260
f (x) =
∑n
i=1 ωifi (x) (8)261
where the weights ωi ∈ [0. . . 1] and∑ni=1 ωi = 1. The trade-262
off in FS for SDS comprises theminimisation of classification263
error rate, the reduction of false alarms, and the quantity of264
features. FS approaches for SDSs are thus given as a three265
objective minimization problem. Several approaches are used266
to optimise the FS process. The evaluation of this particular267
technique in this research, however, was motivated by the fact268
that themulti-objective binaryGOA algorithm for FS in SDSs269
has not been studied recently [22].270
III. RELATED WORK271
There are other related publications in the literature that272
address different detection strategies. However, knowledge273
detection remains to be the most popular strategy. Of course, 274
because of its effectiveness in detecting new messages, it is 275
expensive to concentrate on knowledge detection. The hybrid 276
detection strategy has made some progress in recent years 277
[23], [24], but it is still a long way from the knowledge 278
approach and the guideline approach. 279
Additionally, other SDSs based on knowledge detection 280
have been developed [25] employing a variety of techniques 281
and classifiers. Despite the rise in usage of hybrid classi- 282
fiers and ensemble classifiers [26], single classifiers are still 283
used and can produce high-quality results. The UK-2011 284
Webspam dataset, SpamBase, and SpamAssassin are still the 285
three most commonly used datasets for SDS performance 286
evaluation in the literature. Ten measures are the evaluation 287
criteria most papers use to evaluate the performance of their 288
approaches [8]. 289
Instead of using the complete feature space, many spam 290
detection-related research uses a feature selection procedure 291
to choose the best subset of features to represent the whole 292
dataset [27]. The size of the dataset utilised and the classifica- 293
tion performance of various algorithms can both be impacted 294
by reducing the feature space [25]. This can be accomplished 295
using a variety of techniques. Even though it takes more time 296
and computer resources, the wrapper approach for feature 297
selection performs better than the alternatives [28]. Intrigu- 298
ingly, email spam detection has recently seen a large increase 299
in the adoption of natural inspired methodologies. 300
With the help of Bayesian theory as a fitness function and 301
checked with a different number of repeats, a new hybrid 302
SDS is proposed in [9] that uses a GA algorithm to select 303
features without a fixed number of feature selections. From 304
the 57 features in the SpamBase dataset, they only obtained 305
38. The examples are finally classified using a Naive Bayes 306
classifier technique on the smaller data set. However, a large 307
number of characteristics contributed to a high-dimensional 308
space. 309
Using particle swarm optimization (PSO) and correlation- 310
based feature selection (CFS), [11] proposed a novel hybrid 311
SDS. The CFS-PSO leads the method to create a logical 312
model with enhanced performance. From the 24 features in 313
the UK 2006 dataset, they only managed to extract 6 features. 314
The instances are finally classified using an MLP and NB 315
classifier technique in the smaller data set, which results in 316
classification AUCs of 16.13% and 8.23%. 317
In [27] proposed a new hybrid SDS that incorporates the 318
Water Cycle and Simulated Annealing (WCSA). The WCSA 319
is used to remove redundant and unnecessary features that 320
could obstruct performance. The instances are finally classi- 321
fied using an SVM classifier technique in the smaller data set. 322
From the 57 features in the SpamBase dataset, they extracted 323
26 features. 324
The case is finally classified using a KNN classifier, which 325
yields a classification accuracy of 94% for the smaller dataset. 326
As part of the algorithm, WOA develops solutions in their 327
search space [25] using the prey siege and encirclement 328
process, bubble invasion, and search for prey methods in an 329
98478 VOLUME 10, 2022
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
effort to improve the FS problem’s solutions. In addition,330
FPA enhances the FS problem’s solutions using two global331
and local search processes in a search space that is opposite332
from the solutions ofWOA. In actuality, they employed every333
potential answer to the FS problem from both the solution334
search space and its opposite. Experiments were run in two335
steps to assess the performance of the suggested method. Ten336
FS datasets from the UCI data repository were used for the337
tests in the first stage.338
A new hybrid SDS using the Binary Firefly Algorithm339
(BFA) was proposed in [29]. The choice of a feature is340
based on a fitness function that is reliant on the acquired341
accuracy when using a Naive Bayesian Classifier (NBC),342
and BFA explores the space of the best feature subsets. The343
FA approach has a sluggish convergence rate and requires344
expensive computing. Of the 57 features in the Spambase345
dataset, 21 features were extracted. The examples are finally346
classified using an NBC algorithm in the smaller dataset,347
which yields a classification accuracy of 95.14%.348
Using a Genetic Algorithm (GA) and Random Weight349
Network (RWN), [30] suggested a novel hybrid SDS. Using350
RWN GA determines the optimum feature subsets based on351
the accuracy it has been able to accomplish. In spite of this,352
GA uses a lot of resources. From the 57 features in the353
SpamAssassin dataset, they only isolated 25 features. The354
examples are finally classified using an RWNclassifiers algo-355
rithm in the smaller dataset, which yields a 92% classification356
accuracy.357
In [31], proposed a novel spam classification technique358
using Naive Bayes (NB) and Support Vector Machines359
(SVM). They obtained 80 of the 140 features contained360
in the Spamassassin dataset. Finally, the SVM/NB clas-361
sifiers are used in the reduced dataset to classify the362
instances and achieve a classification accuracy of 97%,363
and 98%.364
In [32] a web spam detection method by extracting novel365
feature sets from the homepage source code and choosing366
the random forest (RF) as the classifier against the UK-2011367
dataset was proposed. Finally, an RF classifier is used in368
the reduced dataset to classify the instances and achieve a369
classification accuracy of 93%.370
To get over the problem of false drift, [33] presented371
a Disposition Based Drift Detection Method (DBDDM), a372
DBDDM. In order to determine the actual drift, this study373
uses the approximation randomization test to calculate the374
frequency of successive drift and compares the frequency375
with the threshold. When Naive Bayes (NB) and the Hoeffd-376
ing tree (HT) classifier are used, it shows a maximum gain377
in accuracy of 24% and 28% and an increase of 2.50 and378
1.91 average ranks, respectively.379
A novel hybrid SDS based on PSO and Fruit Fly Optimiza-380
tion (FFO) based on PSO for Feature Selection was proposed381
as a novel hybrid SDS in [34]. An FFO is utilised to optimise382
the PSO. These methods do not, however, perform as well in383
local and international searches. From the 57 features in the384
SpamBase dataset, they only extracted 10 features. The cases385
TABLE 1. Comparisons of related work.
are finally classified using an FFOPSO classifier algorithm 386
in the smaller data set. 387
In [35] proposed a new SDS that incorporates the Har- 388
ris Hawks Optimizer (HHO). The HHO is used to remove 389
redundant and unnecessary features that could obstruct per- 390
formance. The instances are finally classified using a KNN 391
classifier technique in the smaller data set. A summary of the 392
related work is given in Table 1. 393
In light of the MOBGOA algorithm, this study provided an 394
SDS model using a different metaheuristic, MOBGOA, with 395
the ultimate goal of MOO FS. The wrapper method of FS is 396
used in this strategy, and the most promising updated GOA 397
model is used to train the MLP because it is acclimated to 398
dealing with tackling the problems MLPs face. 399
IV. METHODOLOGY THE STUDY 400
The present methods have performed well in terms of 401
addressing the SD issue. The ideal device, however, has yet to 402
be developed, as it must be able to detect all messages without 403
creating a fake alert in order to provide complete protection 404
from spam. Researchers must contend with a number of 405
obstacles, such as the constant growth of hacking tools, the 406
vast array of existing and emerging data mining and machine 407
learning approaches, the high dimensionality of datasets, etc. 408
A function selection approach within the framework of 409
SD is laid out in this section. Wrappers outperform filters 410
and deliver better results, but use more processing resources 411
[36]. For that reason, we used MOBGOA as the wrapper 412
method to carry out the function selection. The most efficient 413
metaheuristic can handle this challenge since the range of 414
functions is crucial [37]. Three phases: preprocessing, feature 415
selection, and classification, are integrated into the imple- 416
mentation of the suggested SDS solution. Figure 2 presents 417
the system SDS suggested. 418
A. PREPROCESSING PHASE 419
Different types of characteristics, including symbols and 420
characters, are present in the dataset employed in the con- 421
text of spam identification [38], [39]. The normalisation 422
Eq. 9. is used to normalise these numerical values. 423
A feature extraction tool is used to transform the raw 424
email formats into numerical values, as is the case 425
VOLUME 10, 2022 98479
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
FIGURE 2. System architecture for the proposed SDS.
with the Spam Assassin data set, and it may be found426
at this URL:‘‘https://github.com/7ossam81/EmailFeatures427
Extraction’’ [40]. Normalization’s primary goal is to bring the428
numerical values of various attributes into the same range.429
Before using the dataset in the training and testing phases,430
all of the dataset’s characteristics must be normalized. In the431
datasets, the feature values are meant to give regular seman-432
tics. Through the use of Eq. (9), the values are transformed433
into the range [0, 1], putting all features on the same scale.434
xnew = xcurrent − xminxmax − xmin (9)435
Each collection of features in each of the 3 datasets utilised436
in this work has a class, which is either not spam or spam437
email. As a result, each entry in the dataset falls into either the438
non-spam or spam category. Each class’s value is assigned a439
numeric value, with the non-spam email class being assigned440
the No. 0 and the spam email class being assigned the number441
1. Preprocessing the entire dataset takes time since it is large442
to load into memory. Records from the dataset are chosen at443
random as samples. Then two subsets of this random sample444
are created; the first is referred to as the training and testing445
dataset.446
B. THE FEATURE SELECTION PHASE447
1) DESIGN OF MULTI-OBJECTIVE BINARY GRASSHOPPER448
OPTIMIZATION ALGORITHM (MOBEGOA)449
The most important issue to consider when developing a450
reliable approach for spam detection systems (SDS) is to451
focus on two stages for functions are: 1) selecting important452
features and excluding unimportant features from email data;453
and 2) developing an approach with a high potential for 454
detecting spam email. The general concept used in this study 455
of the normal feature algorithm selection, which is divided 456
into five basic steps, The first step begins with initialising the 457
original feature set found in all three datasets. 458
The dimensionality of the search space (SS) frequently 459
affects the initialization method for the MOO binary GOA 460
algorithm. It is important to note that in this paradigm, fea- 461
tures are frequently defined as the total number of all possible 462
features. The first step of the protocol corresponds to the 463
initialization phase of the MOBGOA. The candidate features 464
are then discovered in the second step. It’s a method of 465
discovery that starts with the creation of a random subset of 466
features that MOBGOA has identified as potential solutions. 467
The third step is an evaluation procedure of the candidate 468
features. It is an evaluation procedure that begins with using 469
the E2GOAMLP algorithm to train multi-layered neural net- 470
works. The E2GOAMLP algorithm is better understood by 471
the interested reader compared to earlier research [41]. The 472
feature selection process is one of the most crucial processes, 473
and the feature selection algorithm is based on a wrapper 474
algorithm. This step is critical in directing the algorithm’s 475
selection of an optimal subset of attributes. 476
In the fourth step, a conditioning procedure to determine 477
the relevant subset or optimal feature subset. It is a condi- 478
tioning procedure that begins with determining whether to 479
continue or stop the search for other subsets of features by 480
testing the stop criterion. Here, the stop criterion depends on 481
either reaching the maximum No. of predefined iterations or 482
a predefined No. of selected features. 483
98480 VOLUME 10, 2022
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
FIGURE 3. MOBGOA workflow.
In the fifth step, a result validation of the candidate features484
is performed. It is a discovery procedure that begins with485
validating against the 3 datasets. The findings of this phase486
will be reviewed with those of earlier phases. The MOBGOA487
algorithm is illustrated in Figure 3, which shows the essential488
processes in Figure 5, while the following subsections pro-489
vide additional detail on the method’s main components.490
2) WRAPPER FEATURE SELECTION METHOD USING491
EGOAMLP492
MOBGOA is employed as a wrapper-based feature selection493
algorithm. As a result, a wrapper classifier is required for494
the MOBGOA algorithm to evaluate the subsets. In other495
words, the Section IV presented MOBGOA algorithm is the496
multi-objective binary feature selector and EGOAMLP-based497
evaluator wrapper classifier. FIGURE 3 shows the important498
role played by the EGOAMLPs algorithm in the bottom loop499
of the workflow.500
With each new generation, the MOBGOA algorithm gen-501
erates new solutions (a new subset of features is generated).502
It is entered into the MLP that is trained by the best enhanced503
GOA (which is introduced in our previous work [41]). The504
method is employed by employing the novel feature set, and505
feedback on the method is obtained from the performance506
of the E2GOAMLP algorithm, which calculates the three507
objectives and arranges a new solution.508
3) MOBGOA PARAMETERS509
The MOBGOA algorithm utilises similar parameters to the510
first GOA model. C1, C2, and the maximum number of511
generations to hunt for solutions are the control parameters512
in GOA. In this study, the maximum number of generations513
was 1000 and the population size of NP is 50. TheMOBGOA514
algorithm is run 100 times, and the generations in each exper-515
iment are terminated upon reaching the maximum number.516
The number of characteristics to evaluate the final strategy517
determines the size of the solution space for each dataset used 518
in the study. 519
4) BINARY ENCODING 520
The representation and formatting of data is a crucial step 521
before processing data using anyML technique. In the major- 522
ity of ML classification algorithms, a good representation 523
model is crucially important. In this study, the feature-value 524
representation system was investigated. In keeping with this, 525
every instance in this framework is shown as a vector for 526
characterising the problem domain. The network traffic is set 527
aside as a dataset that is typically handled as a table, with 528
each row addressing a particular occurrence and each column 529
addressing a different network element. 530
In the MOBGOA, a solution is represented by an n-bit 531
string, where n is the total No. of features in the dataset. The 532
solution’s (xd ) value at the d th place is in the range [0, 1], 533
showing the likelihood that the d th feature will be selected. 534
Using the threshold is an additional strategy. A threshold (θ) 535
is used to determine whether or not a feature is selected. 536
If (xd > θ), the d th feature is enabled; otherwise, it is not. 537
Thus, the normal features are used to create the new sub- 538
features. MOBGOA employs the threshold strategy. A novel 539
feature in Figure 4 that can be seen as a potential solution is 540
a subset that is uniquely recognised by a binary string. 541
5) MULTI-OBJECTIVE OPTIMIZATION (MOO) 542
The concept ofMOOof theMOBGOAmodel is themain fea- 543
ture. The coordination of binary strings onmultiple objectives 544
to evaluate solutions in feature selection (FS) problems, rather 545
than visualising on one criterion as accuracy. If the needed 546
solution is a minimization problem, that is, the minimum 547
value of the fitness role, the result is best, and vice versa 548
for maximisation problems. If many goals that require a 549
corresponding fitness function are found, there is a potential 550
VOLUME 10, 2022 98481
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
FIGURE 4. Representation of a possible solution as binary string.
FIGURE 5. The general feature selection process.
conflict between their judgement about the quality of the551
same solution.552
It is worth mentioning that the three objectives with their553
desirable characteristics are described above:554
• FS→ to be minimum555
• ER→ to be minimum556
• FPR→ to be minimum557
The weighted aggregation objective (WAO) that MOB-558
GOA uses to determine performance subsets of feature sets559
of MLP ratings is illustrated:560
(WAO) = w1×FS + w2×ER+ w3 × FPR (10)561
FromEq. (10), wherew1 refers to the feature weight andw2562
refers to the error weight, then w3 refers to the stand for false563
positive weight. Furthermore, the weights w2 and w3 refer to564
more than w1. In addition, the number of selected features565
(FS) is no more important than the false positive rate (FPR)566
of error rate (ER). The three weights (w1, w2, and w3) values567
in the evaluated tests are as follows (0.1, 0.5, and 0.4).568
6) COMPUTATIONAL COMPLEXITY569
The number of solutions, also known as the D, and the No.570
of populations, also known as the population size, of the571
MOBGOA algorithm, are what essentially define the com-572
puting complexity of the enhanced GOA method.573
The total computing complexity, in the worst case, is O574
(DNP) ≈ O (O (calculate the GOA position of all solutions575
and evaluate its fitness)+O (sort solutions of population and576
GOA population)).577
The MOBGOA algorithm’s generative process analyses578
the time complexity of the generation as follows:579
The starting population’s creation is the primary activity in580
stage 1, and the time complexity is O (NPD).581
Stage 2 temporal complexity for decision-making based on582
stop/termination criteria is O (1).583
Stage 3 involves calculating the value of an aggregated584
objective parameter based on 3 objectives, namely the No.585
of features (NF), the error rate (ER), and the false positives586
(FP), time complexity is O (1). The time complexity in 587
Stage 4, updating the answer, is O (N). Generating continues 588
in Stage 5 and returns back to Phase 2. Consequently, the 589
MOBGOA algorithm’s time complexity is O. (NPD). 590
7) INTEGRATING MOBGOA WITH E2GOAMLP FOR SPAM 591
DETECTION 592
This design is a spam detection strategy based on 593
EGOA-trained MLP and a set of optimised features. There 594
are two primary components to this goal: Feature selection is 595
the first stage, and classification is the second stage. 596
The MOBGOA method handles the feature selection por- 597
tion, while the MLP trained with the E2GOAMLP algorithm 598
handles the classification. Figure 6 depicts how each of these 599
components fits into the overall spam detection image. 600
It is worth mentioning that E2GOAMLP is also utilised 601
as a wrapper classifier for feature selection with the MOB- 602
GOA box in the diagram. The following selection of the 603
best features utilises the MLP trained by E2GOAMLP as the 604
wrapper classifier based on the characteristics. After extract- 605
ing the features through MOBGOA, the performance of the 606
extracted features is tested, as well as the spammodel, termed 607
MOBE2GOAMLP, are both tested using this unifiedmodel in 608
the following experimental assessments. 609
C. THE CLASSIFICATION PHASE 610
The parameter initialization, data input, ANN training, and 611
EGOA module are the four basic phases of the model. The 612
EGOA system and the ANN model’s parameters are ini- 613
tialised during the phase before. The Population Size (NP) 614
parameter, which deals with the population’s total number of 615
solutions, is one of many variables in the EGOA algorithm. 616
Each answer (I = 1, 2,. . . , D) deals with a D-dimensional 617
vector, where D is the total number of elements that influence 618
a decision. 619
The best solution vectors found up to this point are organ- 620
ised into a grid called Solution Memory (SM). It is an 621
expanded NP-by-D matrix. Before starting the operation, the 622
FS size is modified. In light of the objective function f(x), 623
each solution vector is additionally coupled with a positive 624
value. The algorithm is shown in Figure 7. 625
The data entry stage is the crucial part of data input in the 626
following step. It relies on how the raw data is transformed, 627
filtered, and how the features are extracted. The split of the 628
raw data into the training and testing sets is a crucial stage. 629
The following component uses it as input information. The 630
approaching data sources should fit into the range of 0 to 631
1 before the data is fed into the ANN model. This normali- 632
sation technique is important for the training in the following 633
module. 634
The third stage is when the MLP model begins to function 635
after receiving training features for the input data measure- 636
ment from the information input components. This part is 637
designed as an MLP, or organisation, using Feed-Forward 638
Neural Networks (FFNN). The three-layered neurons that 639
make up the MLP’s design are divided into an info layer, 640
a concealed layer, and a yield layer. The MLP module 641
98482 VOLUME 10, 2022
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
FIGURE 6. Integrating mobgoa with EGOAMLP.
receives the information from the information input mod-642
ule that is regarded as the designing information (designing643
dataset) for designing the MLP. It is noteworthy that the644
EGOA component receives the loads and inclinations in order645
to carry out the preparation interaction in this module.646
The EGOA module is used in the fourth stage as a stand-647
alone framework (Black Box) to create novel arrangements648
that rely on the periodic refreshing of synaptic loads and649
inclinations. The EGOA module delivers each arrangement650
as a collection of loads and predispositions into the MLP651
component during each cycle of the preparation interaction.652
In this way, each preparation dataset-dependent arrangement653
is evaluated, and then its wellness values are restored. In this654
work, the Mean Square Error (MSE) and Fitness Function655
(FF) are used to process wellbeing. By reducing the MSE656
estimation of the mistake rate, the loads and inclinations are657
acquired.658
Once the maximal number of cycles is reached, the prepa-659
ration interaction ends. The loads and predispositions knowl-660
edge base is then updated. The EGOA algorithm is linked to661
other systems for streamlining. As a result, the goal is under-662
stood as either increasing or decreasing a measure achieved663
through this FF. The goal of such a FF should be similar to664
its value in enhancing calculations. Other than that, its objec-665
tive is to reduce general error, similar to studying methods666
demonstrated by previous exams [42], [43]. Therefore, the667
FF stated before might apply any of the MLP error estimation668
equations or derive another wellness metric from the recipes.669
MSE is used in this work as the primary quality component of670
the proposed EGOA preparation calculation. The preparation671
goal is to, at its most basic, restrict the MSE to arriving at the672
highest aggregate of emphasis.673
The best classification, approximation, or prediction accu-674
racy for training and testing samples is the main goal of675
training theMLP. Figure 7 shows the forward pass calculation676
measure. The fitness function was calculated in this work677
using a methodology that has been employed in a number678
of studies [42], [43]. The output of the ith hidden node is679
determined as follows: If the number of input nodes is N , the680
number of hidden nodes isH , and the number of output nodes681
is O.682
f
(
Sj
) = Sigmoid (Sj)683
= 1
/(
1+ exp
(
−
(∑N
i=1Wij.Xi − βj
)))
,684
j = , 2, . . . ,H (11)685
Wij is the connection weight from the ith node in the input 686
layer to the jth node in the hidden layer, Xi is the ith input 687
and βj is the bias (threshold) of the jth hidden node. Where 688
Sj=∑Ni=1Wij.Xi − βj. The final output can be described as 689
follows after computing the hidden nodes’ outputs: 690
Ok =
∑N
i=1Wkj.f
(
Sj
)− βk , k = 1, 2, . . . ,O, (12) 691
where βk is the bias (threshold) of the k th output node 692
and Wkj is the connection weight from the ith hidden 693
node to the k th output node. The following are the cal- 694
culations made to determine the learning error E (fitness 695
function). 696
Ek =
∑O
i=1
(
Oki − dki
)2
(13) 697
MSE =
∑q
k=1
Ek
q
(14) 698
dki is the desired output of the i
th input unit when the k th 699
training sample is used, and Oki is the actual output of the i
th
700
input unit when the k th training sample is used. Where q is 701
the number of training samples, consequently, the following 702
definition applies to the fitness function of the ith training 703
sample: 704
Fitness(xi) = MSE(xi) (15) 705
V. PERFORMANCE EVALUATION 706
A. SPAM DATASETS 707
The evaluation of the proposed ANN system for the specific 708
purpose of SDS defines the usage of benchmark datasets 709
for this particular framework, unlike the datasets used for 710
classification. In this section, three datasets that can be used 711
to test SDSs are briefly explained. 712
1) SPAMBASE DATASET 713
In 1999, Hopkins provided the SpamBase dataset [44]. 714
Several writers have utilised this dataset for categoriza- 715
tion. This dataset included 4601 emails with an average of 716
57 attributes, of which 1813 (39%) were spam and 2788 717
(61%) were not. The dataset’s features are all displayed 718
in Table 2. 719
The percentage of times the special characters 720
‘‘;’’,‘‘(‘‘,’’[‘‘, ‘‘!,’’ ‘‘$,’’ and ‘‘#’’ appear among the remaining 721
six features is unknown. The other three elements serve as 722
a visual depiction of various capitalization measures used in 723
the messages’ text. Finally, each instance’s class label can be 724
either 0 for non-spam or 1 for spam. The SpamBase dataset 725
is one of the best for learning and assessment methodologies. 726
2) SPAMASSASSIN DATASET 727
The most well-known and often used dataset for identify- 728
ing spam is the SpamAssassin dataset, which Justin Mason 729
created in 2002 [45]. Information about this dataset can 730
be found at (https://wiki.apache.org/spamassassin) contains 731
information on it. In the 6047 communications that com- 732
prised this dataset, there were 1897 unsolicited (spam) emails 733
VOLUME 10, 2022 98483
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
FIGURE 7. The egoamlp training algorithm flowchart.
(31.4%), 3,900 easy ham emails, and 250 difficult but genuine734
emails that in many ways resemble spam. In Table 3, the735
following characteristics of SpamAssassin email messages736
are displayed:737
3) UK-2011 WEBSPAM DATASET738
The UK-2011 Webspam dataset consists of 3,766 Web pages739
with 11 features, 1768 of which are non-spam, and 1998 of740
which are spam (53%) of the emails, making the data unbal-741
anced and hence more difficult. All of the dataset’s features742
are listed in Table 4, and a detailed description of each feature743
can be found in [46] and [47].744
B. EVALUATION METRICS745
Utilizing the accompanying metrics for ACC, FAR, DR,746
specificity, sensitivity, F-measure, Matthews correlation747
coefficient (MCC), and G-mean (GM), the effectiveness of748
the proposed technique is evaluated. The true positives TP,749
true negatives TN, false positives FP, and false negatives750
FN cases are used to determine the FAR, DR, MCC, GM,751
and ACC.752
The confusion matrix for a two-class classification in753
Table 5 yielded these four key criteria. Some performance754
indicators are used to describe the confusion matrix in755
Table 6. The performance metrics given in Equations (16–25)756
are shown in Table 7.757
VI. EVALUATION OF MOBEGOAMLP758
The suggested MOBGOA framework is thoroughly evalu-759
ated in relation to MOBEGOAMLP, thereby confirming the760
execution of the subsequent SDS method. The three datasets761
given in Section B are used to test the approach.762
A. SPAMBASE RESULTS763
In this Scenario 1, the MOBGOA was first applied to764
the SpamBase dataset to select suitable features from the765
dimensionality of the search space using the fitness function,766
resulting in 57 to 15 features as shown in Figure 2. The clas-767
sification of the resulting features for training and the results768
TABLE 2. All features of the spambase dataset.
TABLE 3. All features of the spamassassin email messages.
TABLE 4. Analysis of UK-2011 WEBSPAM email messages.
obtained are presented in Figure 8. Classification results are 769
displayed using the selected features extracted by MOBGOA 770
training for each training set as presented in Table 9. The 771
proposed MOBE2GOAMLP algorithm is highlighted in bold 772
text. 773
As per Figure 9, the same results are described in a con- 774
fusion matrix. Using the definitions in Section B, the exper- 775
imental results of the suggested EGOAMLPs models are 776
calculated in Table 8. The spam detection model EGOAMLP 777
is able to achieve the very best ratios across the three criteria: 778
DR records of 98.1%, ACC records of 97.5 %, and FAR 779
records of 0.033, according to the acquired results, which 780
were carried out utilising 15 features. 781
Figure 8 illustrates the convergence curve resulting from 782
sample runs of the GOAMLP, E1GOAMLP, E2GOAMLP, 783
E3GOAMLP, E4GOAMLP, E5GOAMLP, and E6GOAMLP 784
algorithms against selected results from the SpamBase 785
dataset. 786
Figure 9 shows that the MOBGOA algorithm enhanced the 787
classification accuracy by selecting a subset of 15 features. 788
All the results in the matrices match those listed in Table 8. 789
Due to the constrained space, Figure 9 presents the revised 790
98484 VOLUME 10, 2022
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
TABLE 5. The confusion matrix for classification.
TABLE 6. Performance indicators used to describe the confusion matrix.
TABLE 7. Mathematical formulae of performance metrics.
TABLE 8. The classification results after using algorithms against selected
subsets testing of the SPAMBASE.
proposed model MOBE2GOAMLP with confusion matrices.791
It’s important to highlight that their selection was arbitrary.792
B. SPAMASSASSIN RESULTS793
In this scenario 2, the MOBGOA was first applied to the794
SpamAssassin dataset to select suitable features from the795
dimensionality of the search space using the fitness function,796
resulting in 140 to 48 features as shown in Figure 2. The797
classification of the resulting features for training and the798
TABLE 9. The classification results after using algorithms against selected
subsets testing of the SpamAssassin.
FIGURE 8. Convergence curve of proposed for training the spambase
dataset.
FIGURE 9. Confusion matrices for mobe2goamlp against the spambase
dataset.
results obtained are presented in Figure 10. Classification 799
results are displayed using the selected features extracted 800
by MOBGOA training for each training set as presented 801
in Table 9. The proposed MOBE2GOAMLP algorithm is 802
highlighted in bold text. As per Figure 11, the same results are 803
described in a confusion matrix. The spam detection model 804
EGOAMLP is able to achieve the very best ratios across the 805
three criteria: DR records of 98.3%, ACC records of 98.3%, 806
and FAR records of 0.018, according to the acquired results, 807
which were carried out utilising 48 features. 808
Figure 10 illustrates the convergence curve resulting from 809
sample runs of the GOAMLP, E1GOAMLP, E2GOAMLP, 810
E3GOAMLP, E4GOAMLP, E5GOAMLP, and E6GOAMLP 811
algorithms against selected results from the SpamAssas- 812
sin dataset. Figure 11 shows that the MOBGOA algorithm 813
enhanced the classification accuracy by selecting a subset of 814
48 features. All the results in the matrices match those listed 815
in Table 9. 816
Figure 11 presents the new proposed model 817
MOBE2GOAMLP using confusion matrices due to the space 818
constraints. It should be mentioned that their selection was 819
VOLUME 10, 2022 98485
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
FIGURE 10. Convergence curve of proposed for training the spamassassin
dataset.
TABLE 10. The classification results after using algorithms against
selected subsets testing of the UK-2011.
TABLE 11. Comparison between the EGOAMLP and MOBEGOAMLP.
arbitrary and that the intention was to showcase the most820
effective trainers using SpamAssassin.821
C. UK-2011 WEBSPAM RESULTS822
In this Scenario 3, the MOBGOA was first applied to the823
UK-2011 Webspam dataset to select suitable features from824
the dimensionality of the search space using the fitness func-825
tion, resulting in 11 to 5 features as shown in Figure 2. The826
classification of the resulting features for training and the827
results obtained are presented in Figure 12. Classification828
results are displayed using the selected features extracted829
by MOBGOA training for each training set as presented in830
Table 10.831
The proposed MOBE2GOAMLP algorithm is highlighted832
in bold text. As per Figure 12, the same results are described833
in a confusion matrix. From the discussion Subsections834
A and B, it is apparent that in the SpamAssassin and835
SpamBase datasets, using MOBGOA feature selection has836
improved the overall performance of the E2GOAMLP classi-837
fier. Figure 13 shows the convergence curve resulting from838
sample runs of the GOAMLP, E1GOAMLP, E2GOAMLP,839
E3GOAMLP, E4GOAMLP, E5GOAMLP, and E6GOAMLP840
algorithms against selected results of the UK-2011Webspam841
dataset. Figure 13 illustrations that the MOBGOA algorithm842
enhanced the classification accuracy by selecting a subset of843
5 features. All the results in the matrices match those listed844
in Table 10.845
TABLE 12. Comparison of the study’s findings with previously published
research.
FIGURE 11. Confusion matrices for mobeg2oamlp against the
spamassassin dataset.
Figure 13 presents the new proposed model 846
MOBE2GOAMLP using confusion matrices due to the space 847
constraints. It should be mentioned that their selection was 848
arbitrary and that the intention was to showcase the most 849
effective trainers using the UK-2011 dataset. 850
D. THE ADVANTAGE OF THE MOBGOA 851
Table 11 compares the outcomes of analysing the resultant 852
EGOAMLP models and the last MOBEGOAMLP mod- 853
els using three datasets. The evaluation comprised a com- 854
parison between the MOBEGOAMLP models, which used 855
the chosen characteristics extracted by MOBGOA, and the 856
EGOAMLP models, which used all features. ACC, DR, and 857
FAR were used to gauge performance. The results clearly 858
reveal that the most recent MOBEGOAMLP model exhibits 859
a superior classification of ACC and DR across all data sets. 860
These results offer the first proof that the EGOAMLP system 861
is superior, with the last model showing a higher ACC and 862
DR on all data sets, including SpamBase, SpamAssassin, and 863
UK-2011Webspam. 864
98486 VOLUME 10, 2022
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
FIGURE 12. Convergence curve of proposed for training the UK-2011.
FIGURE 13. Confusion matrices for mobe2goamlp against the UK-2011
WEBSPAM dataset.
TABLE 13. Comparison between mobegoas and mobgoa at A = 0.05 on a
two-tailed t-test.
E. COMPARISON OF THE RESULTS OF THIS STUDY AND865
THE PUBLISHED WORK866
This section summarises the current state-of-the-art spam867
detection systems recorded in Table 12.868
The overall results are much more satisfactory and fare869
well in comparison with the others, including datasets. This870
approach closely follows the performance of the best per-871
forming methods in the evaluation criteria. The records have872
been efficaciously classified through the proposed version as873
compared to the ones classified through other techniques.874
F. EVALUATION USING T-TesT875
In this section, we have analyzed the statistical analysis of the876
previous results in Table 13 and conducted the statistical t-877
test (T) to estimate the practical performance of the proposed878
algorithms compared with the standard algorithm (GOA).879
The proposed models’ findings show statistically significant880
differences from those of the standard GOA method, with P881
values less than 0.05. In comparison to the standard algo-882
rithm GOA, the P values greater than 0.05 (underlined) are883
not significant. This table shows that, for all three datasets,884
the proposed models were always superior to the standard 885
algorithm (GOA). 886
VII. CONCLUSION 887
This work introduces a novel method for SDS, the 888
MOBGOA-trained EGOAMLP. It centres around the perti- 889
nence of a modern algorithm, referred to as MOBGOA, for 890
preparing EGOAMLP. The MOB-EGOAMLP trained with 891
the datasets had an accuracy of 97.5%, 98.3%, and 96.4% 892
respectively. The results of this study show the highly positive 893
impact of this approach on delivering a better SDS. Future 894
research efforts will be to develop and extend an approach 895
that can robustly be implemented in detecting other malicious 896
attacks such as phishing and botnets. 897
REFERENCES 898
[1] D. M. Ablel-Rheem, ‘‘Hybrid feature selection and ensemble learning 899
method for spam email classification,’’ Int. J. Adv. Trends Comput. Sci. 900
Eng., vol. 9, no. 1.4, pp. 217–223, Sep. 2020. 901
[2] G. Mujtaba, L. Shuib, R. G. Raj, N. Majeed, and M. A. Al-Garadi, ‘‘Email 902
classification research trends: Review and open issues,’’ IEEE Access, 903
vol. 5, pp. 9044–9064, 2017. 904
[3] A. Kumari, N. Agrawal, and U. Lilhore, ‘‘Clustering malicious spam in 905
email systems using mass mailing,’’ in Proc. 2nd Int. Conf. Inventive Syst. 906
Control (ICISC), Jan. 2018, pp. 870–875. 907
[4] S. A. A. Ghaleb, M. Mohamad, S. A. Fadzli, and W. A. H. M. Ghanem, 908
‘‘E-mail spam classification using grasshopper optimization algorithm 909
and neural networks,’’ Comput., Mater. Continua, vol. 71, no. 3, 910
pp. 4749–4766, 2022. 911
[5] S. A. A. Ghaleb, M. Mohamad, S. A. Fadzli, and W. A. H. M. Ghanem, 912
‘‘Spam classification based on supervised learning using grasshopper opti- 913
mization algorithm and artificial neural network,’’ Commun. Comput. Inf. 914
Sci., vol. 1347, pp. 420–434, Dec. 2021. 915
[6] M. Shuaib, S. M. Abdulhamid, O. S. Adebayo, O. Osho, I. Idris, 916
J. K. Alhassan, and N. Rana, ‘‘Whale optimization algorithm-based email 917
spam feature selection method using rotation forest algorithm for classifi- 918
cation,’’ Social Netw. Appl. Sci., vol. 1, no. 5, p. 390, May 2019. 919
[7] S. A. A. Ghaleb, M. Mohamad, S. A. Fadzli, and W. A. H. M. Ghanem, 920
‘‘An integrated model to email spam classification using an enhanced 921
grasshopper optimization algorithm to train a multilayer perceptron neural 922
network,’’ Commun. Comput. Inf. Sci., vol. 1347, pp. 402–419, Dec. 2020. 923
[8] I. Idris, A. Selamat, N. T. Nguyen, S. Omatu, O. Krejcar, K. Kuca, and 924
M. Penhaker, ‘‘A combined negative selection algorithm-particle swarm 925
optimization for an email spam detection system,’’ Eng. Appl. Artif. Intell., 926
vol. 39, pp. 33–44, Nov. 2015. 927
[9] O. M. E. Ebadati and F. Ahmadzadeh, ‘‘Classification spam email with 928
elimination of unsuitable features with hybrid of GA-naive Bayes,’’ J. Inf. 929
Knowl. Manage., vol. 18, no. 1, Mar. 2019, Art. no. 1950008. 930
[10] A. Karim, S. Azam, B. Shanmugam, and K. Kannoorpatti, ‘‘An unsu- 931
pervised approach for content-based clustering of emails into spam and 932
ham through multiangular feature formulation,’’ IEEE Access, vol. 9, 933
pp. 135186–135209, 2021. 934
[11] A. K. Singh and S. Singh, ‘‘Detection of spam using particle swarm 935
optimisation in feature selection,’’ Pertanika J. Sci. Technol., vol. 26, no. 3, 936
pp. 1–15, 2018. 937
[12] K. Wang, W. Mao, W. Feng, and H. Wang, ‘‘Research on spam filtering 938
technology based on newmutual information feature selection algorithm,’’ 939
J. Phys., Conf., vol. 1673, no. 1, Nov. 2020, Art. no. 012028. 940
[13] R. A. Atta, ‘‘Spam classification using genetic algorithm,’’ Iraqi J. Inf. 941
Technol., vol. 9, no. 2, pp. 142–170, 2018. 942
[14] W. A. H. M. Ghanem and A. Jantan, ‘‘Novel multi-objective artificial 943
bee colony optimization for wrapper based feature selection in intruction 944
detectoin,’’ Int. J. Adv. Soft Comput. Appl., vol. 8, no. 1, pp. 70–81, 2016. 945
[15] B. Alatas and H. Bingol, ‘‘Comparative assessment of light-based intel- 946
ligent search and optimization algorithms,’’ Light Eng., vol. 28, no. 6, 947
pp. 51–59, 2020. 948
[16] H. Bingol and B. Alatas, ‘‘Chaotic league championship algorithms,’’ 949
Arabian J. Sci. Eng., vol. 41, no. 12, pp. 5123–5147, Dec. 2016. 950
VOLUME 10, 2022 98487
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
[17] H. Bingol and B. Alatas, ‘‘Chaos based optics inspired optimization951
algorithms as global solution search approach,’’ Chaos, Solitons Fractals,952
vol. 141, Dec. 2020, Art. no. 110434.953
[18] S. Saremi, S. Mirjalili, and A. Lewis, ‘‘Grasshopper optimisation algo-954
rithm: Theory and application,’’ Adv. Eng. Softw., vol. 105, pp. 30–47,955
Mar. 2017.956
[19] S. A. A. Ghaleb, M. Mohamad, S. A. Fadzli, and W. A. H. M. Ghanem,957
Integrating Mutation Operator Into Grasshopper Optimization Algorithm958
for Global Optimization, vol. 25, no. 13. Berlin, Germany: Springer, 2021.959
[20] A. Saad, S. A. Khan, and A. Mahmood, ‘‘A multi-objective evolutionary960
artificial bee colony algorithm for optimizing network topology design,’’961
Swarm Evol. Comput., vol. 38, pp. 187–201, Feb. 2018.962
[21] X. S. Yang, ‘‘Bat algorithm for multi-objective optimisation,’’ Int. J. Bio-963
Inspired Comput., vol. 3, no. 5, pp. 267–274, 2012.964
[22] E. G. Talbi, ‘‘A unified taxonomy of hybrid metaheuristics with mathemat-965
ical programming, constraint programming and machine learning,’’ Stud.966
Comput. Intell., vol. 434, pp. 3–76, Dec. 2013.967
[23] Z. Hassani, V. Hajihashemi, K. Borna, and I. S. Dehmajnoonie, ‘‘A clas-968
sification method for E-mail spam using a hybrid approach for feature969
selection optimization,’’ J. Sci., Islamic Republic Iran, vol. 31, no. 2,970
pp. 165–173, 2020.971
[24] A. Jantan, W. A. H. M. Ghanem, and S. A. A. Ghaleb, ‘‘Using modified972
bat algorithm to train neural networks for spam detection,’’ J. Theor. Appl.973
Inf. Technol., vol. 95, no. 24, pp. 6788–6799, 2017.974
[25] H.Mohmmadzadeh and F. S. Gharehchopogh, ‘‘A novel hybrid whale opti-975
mization algorithm with flower pollination algorithm for feature selection:976
Case study email spam detection,’’Comput. Intell., vol. 37, no. 1, pp. 1–28,977
2020.978
[26] J. R. Méndez, T. R. Cotos-Yañez, and D. Ruano-Ordás, ‘‘A new semantic-979
based feature selection method for spam filtering,’’ Appl. Soft Comput.,980
vol. 76, pp. 89–104, Mar. 2019.981
[27] G. Al-Rawashdeh, R. Mamat, and N. H. B. A. Rahim, ‘‘Hybrid water982
cycle optimization algorithm with simulated annealing for spam E-mail983
detection,’’ IEEE Access, vol. 7, pp. 143721–143734, 2019.984
[28] T. Gangavarapu and C. D. J. B. Chanduka, Applicability of Machine985
Learning in Spam and Phishing Email Filtering: Review and Approaches.986
Amsterdam, The Netherlands: Springer, 2020.987
[29] B. Ahmed, ‘‘Wrapper feature selection approach based on binary firefly988
algorithm for spam E-mail filtering,’’ J. Soft Comput. Data Mining, vol. 2,989
no. 1, pp. 44–52, 2020.990
[30] H. Faris, A.-Z. Ala’M, A. A. Heidari, I. Aljarah, M. Mafarja,991
M. A. Hassonah, and H. Fujita, ‘‘An intelligent system for spam detection992
and identification of the most relevant features based on evolutionary993
random weight networks,’’ Inf. Fusion, vol. 48, pp. 67–83, Aug. 2019.994
[31] H. B.Ozkan andB. Can, ‘‘Analysis of adversarial attacks against traditional995
spam filters,’’ in Proc. Int. Conf. All Aspects Cyber Secur., 2019.996
[32] J. Liu, Y. Su, S. Lv, and C. Huang, ‘‘Detecting web spam based on novel997
features from web page source code,’’ Secur. Commun. Netw., vol. 2020,998
pp. 1–14, Dec. 2020.999
[33] S. Agrahari and A. K. Singh, ‘‘Disposition-based concept drift detection1000
and adaptation in data stream,’’ Arabian J. Sci. Eng., vol. 47, no. 8,1001
pp. 10605–10621, Aug. 2022.1002
[34] F. Soleimanian and S. K. Mousavi, ‘‘A new feature selection in email1003
spam detection by particle swarm optimization and fruit fly optimization1004
algorithms,’’ J. Comput. Knowl. Eng., vol. 2, no. 2, pp. 49–62, 2019.1005
[35] A. S. Mashaleh, N. F. B. Ibrahim, M. A. Al-Betar, H. M. J. Mustafa, and1006
Q. M. Yaseen, ‘‘Detecting spam email with machine learning optimized1007
with Harris hawks optimizer (HHO) algorithm,’’ Proc. Comput. Sci.,1008
vol. 201, pp. 659–664, Sep. 2022.1009
[36] E. Alba and J. F. Chicano, ‘‘Training neural networks with GA hybrid algo-1010
rithms,’’ in Proc. Genetic Evol. Comput. Conf. Berlin, Germany: Springer,1011
2004, pp. 852–863.1012
[37] S. Kang, J. Choi, and J. Choi, ‘‘A method of securing mass storage for1013
SQL server by sharing network disks-on the Amazon EC2 windows envi-1014
ronments,’’ J. Internet Comput. Services, vol. 17, no. 2, pp. 1–9, Apr. 2016.1015
[38] A. A. Aburomman and M. B. I. Reaz, ‘‘A novel SVM-kNN-PSO ensem-1016
ble method for intrusion detection system,’’ Appl. Soft Comput., vol. 38,1017
pp. 360–372, Jan. 2016.1018
[39] N. Saidani, K. Adi, and M. S. Allili, ‘‘A semantic-based classification1019
approach for an enhanced spam detection,’’ Comput. Secur., vol. 94,1020
Jul. 2020, Art. no. 101716.1021
[40] W. Hijawi, H. Faris, J. Alqatawna, I. Aljarah, A. M. Al-Zoubi,1022
and M. Habib, ‘‘EMFET: E-mail features extraction tool,’’ 2017,1023
arXiv:1711.08521.1024
[41] S. A. A. Ghaleb, M. Mohamad, S. A. Fadzli, and W. A. H. M. Ghanem, 1025
‘‘Training neural networks by enhance grasshopper optimization algorithm 1026
for spam detection system,’’ IEEE Access, vol. 9, pp. 116768–116813, 1027
2021. 1028
[42] W. A. H. M. Ghanem and A. Jantan, Training a Neural Network for Cyber- 1029
attack Classification Applications Using Hybridization of an Artificial 1030
Bee Colony and Monarch Butterfly Optimization, vol. 51, no. 1. Cham, 1031
Switzerland: Springer, 2020. 1032
[43] W. A. H. M. Ghanem, S. A. A. Ghaleb, A. Jantan, A. B. Nasser, 1033
S. A. M. Saleh, A. Ngah, andA. C. Alhadi, ‘‘Cyber intrusion detection sys- 1034
tem based on amultiobjective binary bat algorithm for feature selection and 1035
enhanced bat algorithm for parameter optimization in neural networks,’’ 1036
IEEE Access, vol. 10, pp. 76318–76339, 2022. 1037
[44] Hopkins. (1999). UCI Machine Learning Repository: Spambase Data 1038
Set. Accessed: Nov. 1, 2021. [Online]. Available: https://archive. 1039
ics.uci.edu/ml/datasets/spambase 1040
[45] SpamAssassin. (2005). Spamassassin Public Corpus Kaggle. 1041
Accessed: Nov. 1, 2021. [Online]. Available: https://www.kaggle.com/ 1042
beatoa/spamassassin-public-corpus 1043
[46] H. A. Wahsheh, M. N. Al-Kabi, and I. M. Alsmadi, ‘‘A link and content 1044
hybrid approach for Arabic web spam detection,’’ Int. J. Intell. Syst. Appl., 1045
vol. 5, no. 1, pp. 30–43, Dec. 2012. 1046
[47] H. A. Wahsheh, M. N. Al-Kabi, and I. M. Alsmadi, ‘‘A link and content 1047
hybrid approach for Arabic web spam detection,’’ Int. J. Intell. Syst. Appl., 1048
vol. 5, no. 1, pp. 30–43, Dec. 2012. 1049
[48] K. F. Rafat, Q. Xin, A. R. Javed, Z. Jalil, and R. Z. Ahmad, ‘‘Evad- 1050
ing obscure communication from spam emails,’’ Math. Biosciences Eng., 1051
vol. 19, no. 2, pp. 1926–1943, 2021. 1052
[49] A. Makkar and S. Goel, ‘‘Spammer classification using ensemble methods 1053
over content-based features,’’ Adv. Intell. Syst. Comput., vol. 547, pp. 1–9, 1054
Jun. 2017. 1055
SANAA A. A. GHALEB received the bache- 1056
lor’s degree from the University of Aden, Yemen, 1057
in 2011, and the master’s degree from Univer- 1058
siti Sains Malaysia, Malaysia, in 2017. She is 1059
currently pursuing the Ph.D. degree with the 1060
Faculty of Informatics and Computing, Univer- 1061
siti Sultan Zainal Abidin. Her research interests 1062
include technology-enhanced learning, instruc- 1063
tional design and technology, computer networks 1064
and information security, cybersecurity, machine 1065
learning, artificial intelligence, swarm intelligence, and metaheuristic. 1066
MUMTAZIMAH MOHAMAD was born in 1067
Terengganu, Malaysia. She received the bach- 1068
elor’s degree in information technology from 1069
Universiti Kebangsaan Malaysia, in 2000, the 1070
M.Sc. degree in computer science from Universiti 1071
Putra Malaysia, and the Ph.D. degree in computer 1072
science from Universiti Malaysia Terengganu, 1073
in 2014. She was a Junior Lecturer, in 2000. 1074
Currently, she is an Associate Professor with 1075
the Department of Computer Science, Faculty 1076
of Informatics and Computing (FIK), Universiti Sultan Zainal Abidin, 1077
Terengganu, Malaysia. She has published over 50 research articles in peer- 1078
reviewed journals, book chapters, and proceeding. She has appointed a 1079
reviewer and technical committee for many conferences and journals and 1080
worked as a researcher in several national funded Research and Develop- 1081
ment projects. Her research interests include pattern recognition, machine 1082
learning, artificial intelligence, and parallel processing. 1083
98488 VOLUME 10, 2022
S. A. A. Ghaleb et al.: Feature Selection by Multi-Objective Optimization: Application to Spam Detection System
WAHEED ALI H. M. GHANEM received the1084
B.Sc. degree in computer sciences and engi-1085
neering from Aden University, Yemen, in 2003,1086
and the M.Sc. degree in computer science and1087
the Ph.D. degree in network and communica-1088
tion protocols from Universiti Sains Malaysia,1089
in 2013 and 2019, respectively. His research1090
interests include computer and network secu-1091
rity, cybersecurity, machine learning, artificial1092
intelligence, swarm intelligence, optimization1093
algorithm, and information technology.1094
ABDULLAH B. NASSER (Member, IEEE)1095
received the B.Sc. degree from Hodeidah Univer-1096
sity, Yemen, in 2006, the M.Sc. degree from the1097
Universiti Sains Malaysia, Malaysia, in 2014, and1098
the Ph.D. degree fromUniversiti Malaysia Pahang,1099
Malaysia, in 2018, all in computer science. He is1100
currently an Assistant Professor with the Faculty1101
of Computing, Universiti Malaysia Pahang. He has1102
authored of many scientific papers published in1103
renowned journals and conferences. His research1104
interests include software testing and soft computing, specifically, the use1105
of artificial intelligence methods (metaheuristic algorithms) for solving1106
different software engineering problems.1107
MOHAMED GHETAS received the M.Sc. and1108
Ph.D. degrees in computer science from Universiti1109
Sains Malaysia. He is a Lecturer with the Faculty1110
of Computer Science, Nahda University (USM).1111
His research interests include cloud computing,1112
fog-computing, robust optimization, evolutionary1113
algorithm, federated learning, artificial neural net-1114
works, and deep learning.1115
AKIBU MAHMOUD ABDULLAHI received the1116
B.A. degree in arabic language from Bayero1117
University Kano, Nigeria, in 2011, the B.S.1118
degree in information technology (IT) from1119
Almadinah International University, Selangor,1120
Malaysia, in 2016, the M.S. degree in instruc-1121
tional multimedia from University Sains Malaysia1122
(USM), Penang, Malaysia, in 2017, and the Ph.D.1123
degree in computer science from Taylor’s Univer-1124
sity,Malaysia, 2021. From 2016 to 2018, hewas an1125
IT Help Desk Technician at Labtech International Ltd., Malaysia. He is cur-1126
rently a Lecturer with Albukhary International University, Kedah, Malaysia.1127
His research interests include the data science, machine learning, learning1128
analytics, and big data analytics.1129
SAMI ABDULLA MOHSEN SALEH received 1130
the B.Eng. degree in computer engineering from 1131
Hodeidah University, Yemen, in 2005, and the 1132
M.Sc. degree in electronic systems design engi- 1133
neering and the Ph.D. degree in computer vision 1134
and machine learning from Universiti Sains 1135
Malaysia, in 2013 and 2022, respectively. He was 1136
a Researcher at the Intelligent Biometric Group, 1137
School of Electrical and Electronic Engineer- 1138
ing, Universiti Sains Malaysia. He is currently 1139
a Researcher with the Aerial Vehicle and Surveillance System Research 1140
Group, Aerospace Engineering School. His research interests include com- 1141
puter vision, deep learning, swarm intelligence, and soft biometrics. He has 1142
served as a Reviewer for several well-known conferences and international 1143
journals, such as Pattern Recognition Letters journal. 1144
HUMAIRA ARSHAD received the master’s degree in information tech- 1145
nology from the National University of Science and Technology (NUST), 1146
Pakistan, and the Ph.D. degree from the School of Computer Science, 1147
Universiti Sains Malaysiais. She joined at the Faculty of Computer Sci- 1148
ences & IT, in 2004. She is an Associate Professor with the Department 1149
of Computer Sciences & IT, Islamia University of Bahawalpur, Pakistan. 1150
Her research interests include digital & social media forensics, information 1151
security, online social networks, cybersecurity, intrusion detection, reverse 1152
engineering, and semantic web. 1153
ABIODUN ESTHER OMOLARA received the 1154
Ph.D. degree from the School of Computer Sci- 1155
ences, Universiti Sains Malaysia. Her research 1156
interests include computer and network secu- 1157
rity, cyber-security, cryptography, artificial intelli- 1158
gence, natural language processing, network and 1159
communication protocol, forensics, and the IoT 1160
security. 1161
OLUDARE ISAAC ABIODUN received the Ph.D. 1162
degree in nuclear and radiation physics from 1163
the Nigerian Defence Academy, Kaduna, and the 1164
Ph.D. degree in computer science from the Uni- 1165
versiti Sains Malaysia, Penang, Malaysia, with 1166
specialization in security and digital forensic. His 1167
research interests include artificial intelligence, 1168
robotics, cybersecurity, digital forensics, nuclear 1169
security, terrorism, national security, and the IoT’s 1170
security. 1171
1172
VOLUME 10, 2022 98489