
Sayawu Yakubu Diaba 
On cyber security 
evaluations in 
smart grid using 
machine learning 
ACTA  WASAENSIA 528 
ISBN 78-952-395-126-6 (print)
78-952-395-127-3 (online)
ISSN 0355-2667 (Acta Wasaensia 528, print)
2323-9123 (Acta Wasaensia 528, online)
URN http://urn.f/URN:ISBN:978-952-395-127-3
Hansaprint Oy, Turenki, 2023.
II
ACADEMIC DISSERTATION 
To be presented, with the permission of the Board of the School of Technology and 
Innovations of the University of Vaasa, for public examination on the 13th of De-
cember, 2023, at noon. 
III
Article based dissertation of the School of Technology and Innovations at the Uni-
versity of Vaasa in the feld of Telecommunication Engineering.
Author Sayawu Yakubu Diaba https://orcid.org/0000-0002-7910-4026
Supervisor(s) Professor Mohammed Elmusrati
School of Technology and Innovations
University of Vaasa
Professor Miadreza Shafe-khah
School of Technology and Innovations
University of Vaasa
Custos Professor Mohammed Elmusrati
School of Technology and Innovations
University of Vaasa
Reviewers Professor Timo Ha¨ma¨la¨inen
Faculty of Information Technology
University of Jyva¨skyla
Professor Faisal A. Mohamed Elabdli
Libyan Authority for Scientifc Research
Tripoli Libya
Opponent Professor Juan Manuel Corchado
Department of Computer Science and Automatics
University of Salamanca
Salamanca Spain
IV
 Tiivistelmä 
Älyverkko pyrkii parantamaan sähköverkon luotettavuutta, turvallisuutta 
ja tehokkuutta käyttämällä digitaalista tieto- ja ohjausteknologiaa. 
Kasvava riippuvuus viestintätekniikasta altistaa kuitenkin nämä 
järjestelmät kyberhyökkäyksille, mikä aiheuttaa merkittäviä kyberuhkia 
älyverkon saatavuudelle ja toiminnallisuudelle. Kyetäksemme vähen-
tämään tällaisia uhkia, tehokkaat tunkeutumisen havaitsemisalgoritmit 
ovat ratkaisevan tärkeitä. 
Tässä yhteydessä ehdotamme hybridi syväoppimisalgoritmia, joka 
keskittyy hajautettuihin palvelunestohyökkäyksiin (DDoS) älyverkon 
viestintäinfrastruktuurissa. Ehdotettu algoritmi yhdistää konvolutio-
naalisen neuroverkon (CNN) ja portitetun toistoyksikön (GRU) algoritmit 
tarjotakseen reaaliaikaista analyysia ja tila-arviopohjaisia tekniikoita 
tehokkaalle ohjausten toteutukselle. Työssä suoritetaan simulointeja 
käyttäen Kanadan Kyberturvallisuusinstituutin tunkeutumisen havait-
semisjärjestelmän vertailutietojoukkoa. 
Tulokset osoittavat, että hybridi syväoppimisalgoritmimme suoriutuu 
paremmin kuin olemassa olevat tunkeutumisen havaitsemisalgoritmit, 
saavuttaen vaikuttavan kokonaistarkkuuden 99,7 prosenttia. Teollisuuden 
koneita valvovien ja ohjaavien valvonta- ja tiedonkeruujärjestelmien 
(SCADA) yhteydessä tietoliikenneverkkojen haavoittuvuudet voivat johtaa 
kyberhyökkäyksiin, joissa väärää tietoa tuodaan operatiiviseen verkkoon. 
Ehdotamme rajoitettuun Boltzmannin koneeseen perustuvaa ja luonnon 
inspiroimaa juurten etsinnän optimointialgoritmia kyber-hyökkäysten 
tunnistamiseen ja luokitteluun. Optimoimme dataominaisuuksia tällä 
algoritmilla ja arvioimme sen suorituskykyä perinteisiä valvotun kone-
oppimisen algoritmeja, kuten tekoälyä hyödyntävät neuroverkot, konvo-
lutionaaliset neuroverkot ja tuen vektorikoneet, vastaan. Ehdotettu 
algoritmi päihittää vertailukohteensa tarkkuudessa, toistettavuudessa ja f1 
-pisteissä. Lisäksi työssä käsitellään SCADA-järjestelmien tietoturva-
aukkoja esittelemällä geneettisesti alustetun muuntavan neuroverkon 
(GSFTNN) tunkeutumisen havaitsemisalgoritmin. Toisin kuin allekirjoi-
tuksiin perustuvat menetelmät, GS-FTNN havaitsee muutokset toiminnal-
listen mallien perusteella, jotka viittaavat tunkeutujan osallistumiseen 
verkkoliikenteessä. Ehdotettua algoritmia arvioidaan käyttäen WUSTL-
IIOT-2018 ICS SCADA -kyberturvallisuus tietojoukkoa. Työssä osoitetaan 
sen ylivoimaisuus perinteisiin algoritmeihin, kuten jäännösneuro-
verkkoihin, toistaviin neuroverkkoihin ja pitkäkestoisiin lyhytaikamuis-
teihin, verrattuna tarkkuuden ja tehokkuuden suhteen. 
Avainsanat: kyberturvallisuus, syväoppiminen, 
tunkeutumisenhavaitsemisjärjestelmät, koneoppiminen, älyverkot, valvova 
tarkastus ja datan keruu 
V
Abstract 
The smart grid aims to enhance the electric grid’s dependability, security, and ef-
fciency by deploying digital information and control technology. However, the
increasing reliance on communication technology exposes these systems to cyber-
attacks, posing signifcant cyber threats to the availability and functionality of the
smart grid. To mitigate such threats, effective intrusion detection algorithms are
crucial. In this context, we propose a hybrid deep learning algorithm that focuses
on distributed denial of service (DDoS) attacks on the communication infrastructure
of the smart grid. The proposed algorithm combines convolutional neural network
(CNN) and gated recurrent unit (GRU) algorithms to provide real-time analysis and
state estimation-based techniques for effcient control implementation. We conduct
simulations using a benchmark cyber-security dataset from the Canadian institute
of cybersecurity intrusion detection system. The results demonstrate that our hy-
brid deep learning algorithm outperforms existing intrusion detection algorithms,
achieving an impressive overall accuracy rate of 99.7 %.
In the context of supervisory control and data acquisition (SCADA) systems, which
monitor and control industrial machinery, communication network vulnerabilities
can lead to cyber-attacks introducing false data into the operational network. We
propose a restricted Boltzmann machine-based nature-inspired artifcial root forag-
ing optimization algorithm for identifying and classifying cyber-attacks to address
this issue. We optimize data features using this algorithm and evaluate its perfor-
mance against traditional supervised machine learning algorithms such as artifcial
neural networks, convolutional neural networks, and support vector machines. The
proposed algorithm outperforms its counterparts in accuracy, precision, recall, and
f1 score. Furthermore, we address the security vulnerabilities in SCADA systems
by introducing the genetically seeded fora transformer neural network (GSFTNN)
intrusion detection algorithm. Unlike signature-based methods, GSFTNN detects
changes in operational patterns indicative of intruder involvement. We evaluate
the proposed algorithm using the WUSTL IIOT 2018 ICS SCADA cyber security
dataset and demonstrate its superiority over traditional algorithms like residual neu-
ral networks, recurrent neural networks, and long short-term memory (LSTM) in
terms of accuracy and effciency.
Keywords: Cyber-security, deep learning, intrusion detection systems, machine
learning, smart grid, supervisory control and data acquisition.
ACKNOWLEDGEMENTS 
This study was carried out at the University of Vaasa’s School of Technology and
Innovations, with essential fnancial support from the Evald and Hilda Nissi Schol-
arships Foundation and the University of Vaasa.
I want to express my heartfelt gratitude to my adviser, Prof. Mohammed Elmusrati.
Your unwavering support, invaluable guidance, and faith in my capabilities have
been pivotal in my academic journey. Your insightful counsel during challenging
times in my studies and personal life has been a source of great comfort. Professor
Mohammed’s dedication to fostering my academic and personal growth has been
instrumental in my accomplishments. I genuinely appreciate your ability to inspire
and motivate me, pushing me to surpass my boundaries. I am most grateful.
I want to express my sincere gratitude to Prof. Miadreza Shafe-khah, my second
supervisor, for the invaluable insights and ideas that you have generously shared.
Your contributions have been truly instrumental, and I hold them in high regard. I
would also like to sincerely thank Prof. Tommi Lehtonen, Prof. Heidi Kuusniemi,
Prof. Emmanuel Nzibah, Prof. Andrew Adewale Alola, and Prof. Marcelo Godey
for their guidance and support. Thank you all for your signifcant roles in my jour-
ney.
I wish to thank Dr. Raine Hermans, the School of Technology and Innovations dean,
for the invaluable provision of scholarships. I am grateful to Prof. Tomi Pasanen,
who leads the School of Technology and Innovations, and the entire faculty and
staff. Thanks also go to Prof. Timo Mantere, Prof. Tero Valtainen, Prof. Mike
Mekkanen, Prof. Petri Välisuo and Janne Koljonen, whose dedicated efforts have
contributed to cultivating an exceptional research environment.
To my dissertation reviewers, Professor Timo Ha¨ma¨la¨inen and Professor Faisal A.
Mohamed Elabdli, your insights, feedback, warm comments and rigorous scrutiny
have challenged me to refne my work and delve deeper into the realm of knowl-
edge. Your collective expertise has shaped my research into something I am im-
mensely proud of. I am especially thankful to Professor Juan M. Corchado for
VII
accepting the role of opponents. Thank you for the insightful discussion.
My time at the University of Vaasa has been enriched by the camaraderie and sup-
port of my colleagues and fellow students, Mohammed Saffaf, Arshed Iqbal, Mah-
moud Elsanhoury, Abdul Hamid, Tajudeen Ola Hassan, Akpojoto Siemuri, Dalbert
Zimuzochukwu, Olaitan Fashanu, Francis Oyeyiola, Ahmed Maruf, Lukumanu Id-
drisu ; thank you for your time. To Divine Fomenya, Abdul Rauf Hussein, Zakir
Hossain, and Manmeet Singh, the cafes, the late-night discussions, the shared tri-
umphs, and the mutual encouragement have turned classmates into lifelong friends.
I am grateful for the community that we have built together. To my brother and
motivator, Rabiu Alawo, you are indeed a brother!.
An exceptional thanks go to Dr. Virpi Juppo. You helped me regain my true
identity with your kind words when external factors tried to alter it. I say metaphor-
ically; you revived the ”eagle-hood” in me. I re-began my Ph.D. journey from that
day when I left Cafe Oskar with restored confdence and determination. I am im-
mensely grateful.
I want to acknowledge the broader academic community and the University of
Vaasa for providing an environment conducive to growth, learning, and innovation.
The opportunities I have had here, the resources at my disposal, and the experiences
I have gained have been instrumental in shaping the scholar I have become. As I
stand on the cusp of a new chapter, I carry forward the title of ”Doctor” and the
lessons learned, the friendships built, and the memories cherished. The emotions
swirling within me are a testament to the signifcance of this achievement. My heart
is full of gratitude, humility, and immense accomplishment. Thank you, University
of Vaasa, for helping me realize my dreams and for allowing me to contribute to the
world of knowledge.
To my cherished friends beyond the academic realm, your companionship and the
time we’ve shared are invaluable to me. You’ve brought joy and balance into my
life, for which I am deeply grateful. A special shoot-out to my football team, Vaasan
Pallo-Veikot; Jeremias Ketonen, Pietari Ketonen, Miro Mikael Roukus, and Joel
Eino-Pekka Ma¨kela¨; your camaraderie on and off the feld has been a sanctuary for
me. Amidst the rigors of research, your support and our moments together have
VIII
kept me grounded and mentally resilient. Thank you for being an integral part of
my journey. I am also grateful to the Vaasa Islamic Society.
Finally, I am forever indebted to my wife, Amishetu Dicko, and son, Israr Yakubu.
Thank you for your understanding, and to my family, Shafe, Laila, Safa, Mo-
hammed, Rubatu, Noora, Sikira, and Zahra, I say thank you for your support. To
my mom and best friend, Mariam Hussain, your unwavering love and sacrifces
have been the bedrock of my journey. Your support, even from afar, has given me
the strength to persevere through challenges and celebrate triumphs. This accom-
plishment is as much yours as it is mine. All praise is due to Allah (AWJ).
IX
Contents
Acknowledgements VII
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . 7
2 REVIEW OF LITERATURE . . . . . . . . . . . . . . . . . . . . . 8
2.1 Overview of smart grid . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Cyber-security in smart grid . . . . . . . . . . . . 10
2.1.2 Terms of cybersecurity . . . . . . . . . . . . . . 11
2.1.3 Importance of cyber security in smart grid . . . . 14
2.1.4 Implications for cyber-security in smart grid . . . 14
2.2 Traditional cyber-security mechanisms in the smart grid . . 15
2.3 Emerging cyber-security threats and attacks in the smart grid 16
3 MACHINE LEARNING AND CYBER-SECURITY IN SMART
GRID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 Machine learning . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Machine learning algorithms for cyber-security in smart grid 28
3.2.1 Decision tree . . . . . . . . . . . . . . . . . . . . 35
3.2.1.1 Classifcation and regression trees . . . 37
3.2.1.2 Entropy . . . . . . . . . . . . . . . . . 39
3.2.1.3 Information gain . . . . . . . . . . . . . 40
3.2.1.4 Gini index . . . . . . . . . . . . . . . . 41
3.2.2 Support vector machine . . . . . . . . . . . . . . 42
3.2.3 Random forests . . . . . . . . . . . . . . . . . . 48
3.2.4 Deep learning . . . . . . . . . . . . . . . . . . . 49
3.2.5 Linear regression . . . . . . . . . . . . . . . . . 61
3.2.5.1 Logistic regression . . . . . . . . . . . 64
X
3.3 Machine learning applications in cybersecurity . . . . . . . 67
3.4 Risk analysis of machine learning applications . . . . . . . 69
3.4.1 Algorithmic risks in machine learning . . . . . . 74
3.4.2 Algorithmic risks management . . . . . . . . . . 76
4 METHODOLOGY OF MACHINE LEARNING IN CYBER SE-
CURITY OF SMART GRIDS . . . . . . . . . . . . . . . . . . . . 78
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2 Data preprocessing and feature selection . . . . . . . . . . . 78
4.2.1 Data collection . . . . . . . . . . . . . . . . . . . 78
4.2.2 Data cleaning and preprocessing . . . . . . . . . 79
4.2.3 Feature extraction . . . . . . . . . . . . . . . . . 79
4.2.4 Feature selection techniques . . . . . . . . . . . . 81
4.3 Performance evaluation of machine learning algorithms . . 82
4.3.1 Evaluation metrics . . . . . . . . . . . . . . . . . 82
4.3.2 Experimental setup . . . . . . . . . . . . . . . . 84
4.3.3 Baseline models . . . . . . . . . . . . . . . . . . 85
4.3.4 Comparison of machine learning algorithms . . . 85
4.4 Implementation of machine learning models . . . . . . . . . 85
4.4.1 Model selection . . . . . . . . . . . . . . . . . . 85
4.4.2 Model training . . . . . . . . . . . . . . . . . . . 86
4.4.3 Hyperparameter tuning . . . . . . . . . . . . . . 87
5 RESULTS AND DISCUSSION . . . . . . . . . . . . . . . . . . . 88
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2 Simulation results and discussion . . . . . . . . . . . . . . 88
5.3 Model evaluation . . . . . . . . . . . . . . . . . . . . . . . 89
5.4 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . 90
5.5 Risk analysis . . . . . . . . . . . . . . . . . . . . . . . . . 93
6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2 Future research . . . . . . . . . . . . . . . . . . . . . . . . 97
XI
List of Figures
1 Illustration of traditional power system . . . . . . . . . . . . . . . 2
2 2014 Fiscal year incidents reported by different sectors (ICS-CERT,
2015). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 The state estimation within an energy management system . . . . . 5
4 The architecture of smart grids. . . . . . . . . . . . . . . . . . . . . 8
5 Different types of cyber-attacks. . . . . . . . . . . . . . . . . . . . 23
6 Machine learning layers. . . . . . . . . . . . . . . . . . . . . . . . 26
7 A categorization of major machine learning techniques with rele-
vant examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
8 Pictorial representation of a decision tree structure. . . . . . . . . . 35
9 Support vector machine. . . . . . . . . . . . . . . . . . . . . . . . 45
10 Description of an artifcial neural network. . . . . . . . . . . . . . . 49
11 Illustration of single-layer neural network . . . . . . . . . . . . . . 50
12 Description of a ReLU activation function. . . . . . . . . . . . . . . 52
13 Description of a sigmoid activation function. . . . . . . . . . . . . . 52
14 Description of a hyperbolic tangent activation function. . . . . . . . 53
15 Convolutional neural network. . . . . . . . . . . . . . . . . . . . . 54
16 Recurrent neural network. . . . . . . . . . . . . . . . . . . . . . . 55
17 Long short-term memory. . . . . . . . . . . . . . . . . . . . . . . . 57
18 Gated recurrent unit. . . . . . . . . . . . . . . . . . . . . . . . . . 61
19 The concept of regression. . . . . . . . . . . . . . . . . . . . . . . 62
20 Data cleaning process. . . . . . . . . . . . . . . . . . . . . . . . . 79
21 The correlation matrix for the wustle-2021 dataset. . . . . . . . . . 82
22 The confusion matrix. . . . . . . . . . . . . . . . . . . . . . . . . . 83
23 Overall performance comparison of the considered algorithms. . . . 90
24 Performance of the best-performing algorithms on the WUSTL IIoT 2018
dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
XII
25 Performance of the best-performing algorithms on the WUSTL IIoT 2021
dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
XIII
List of Tables
1 The confguration of the hyperparameters. . . . . . . . . . . . . . . 89
XIV
ABBREVIATIONS 
4G Fourth Generation
5G Fifth Generation
AMI Advanced Meter Infrastructure
AI Artifcial Intelligence
AUC Area Under Curve
ANN Artifcial Neural Network
BPTT Backpropagation Through Time
BRS Bias Risk Score
CIC-IDS Canadian Institute of Cybersecurity Intrusion Detection Systems
CIS Center for Internet Security
CIR Class Imbalance Ratio
CART Classifcation and Regression Trees
CNN Convolutional Neural Networks
CRNN Convolutional Recurrent Neural Networks
CTI Cyber Threat Intelligence
DLP Data Loss Prevention
DIA Data Integrity Attacks
DNN Deep Neural Networks
DoS Denial of Service
DER Distributed Energy Resources
DDoS Distributed Denial of Service
EMS Energy Management System
FP False Positive
FN False Negative
GRU Gated Recurrent Unit
GMM Gaussian Mixture Models
GEE Generalized Estimating Equations
GPRS General Packet Radio Services
GSF Genetically Seeded Flora
GSFTNN Genetically Seeded Flora Transformer Neural Network
GPS Global Positioning System
GSM Global System for Mobile
IAM Identity and Access Management
XV
ICT Information and Communication Technologies
IDS Intrusion Detection Systems
IPS Intrusion Prevention System
IT Information Technology
IoT Internet of Things
ISO International Organization for Standardization
IaaS Infrastructure-as-a-Service
KNN K-Nearest Neighbors
LAN Local Area Networks
LSTM Long Short-Term Memory
LoRaWAN Long Range Wide Area Network
NIST National Institute of Standards and Technology
NSL-KDD Network Security Laboratory Knowledge Discovery in Databases
OaT One-at-a-Time
OvO One-vs-One
OvR One-vs-Rest
OTA Over-the-air Attacks
PCC Pearson Correlation Coeffcient
PLC Power Line Carrier
PCA Principal Component Analysis
PCS Process Control Security
PSSS Power System State Security
RNN Recurrent Neural Networks
ReLU Rectifed Linear Unit
ROC Receiver Operating Characteristic
RTU Remote Terminal Units
RES Renewable Energy Sources
RBM Restricted Boltzmann Machines
SSL Secure Socket Layer
SIEM Security Incident and Event Management
SMS Smart Meter Security
SE State Estimator
SARSA State-Action-Reward-State-Action
SGD Stochastic Gradient Descent
SVM Support Vector Machines
SCADA Supervisory Control and Data Acquisition
XVI
TLS Transport Layer Security
TN True Negative
TP True Positive
UEBA User and Entity Behavior Analytics
SGCPS Smart Grid Communication Protocol Security
WUSTL-IIoT Washington University in St. Louis- Industrial IoT
WAN Wide Area Networks
XVII
LIST OF PUBLICATIONS 
This thesis represents a comprehensive compilation of meticulously organized peer-
reviewed papers that signifcantly contribute to the advancement of knowledge in
the feld of cyber security within smart grids. It encapsulates the outcomes of ex-
tensive and rigorous research on the subject matter, effectively summarizing the
fndings. Roman numerals (I-V) citations are utilized throughout the text to refer-
ence the corresponding publications.
(I) On the performance metrics for cyber-physical attack detection in smart grid.
Diaba, S.Y., Shafe-khah, M. and Elmusrati, M. Published in Soft Computing,
26(23), pp.13109-13118., 2022. doi.org/10.1007/s00500-022-06761-1
(II) Proposed algorithm for smart grid DDoS detection based on deep learning.
Diaba, S.Y. and Elmusrati, M. Published in Neural Networks, 159, pp.175-
184. 2023. doi.org/10.1016/j.neunet.2022.12.011
(III) Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning
Algorithms. Diaba, S.Y., Shafe-Khah, M. and Elmusrati, M. Published in
IEEE Access, 11, pp.18660-18672. 2023.
doi:10.1109/ACCESS.2023.3247193
(IV) SCADA securing system using deep learning to prevent cyber infltration.
Diaba, S.Y., Anafo, T., Tetteh, L.A., Oyibo, M.A., Alola, A.A., Shafe-Khah,
M. and Elmusrati, M. Published in Neural Networks. 2023.
doi.org/10.1016/j.neunet.2023.05.047
(V) Risk Assessment of Machine Learning Algorithms on Manipulated Dataset
in Power Systems. Diaba, S.Y., Shafe-Khah, M., Mekkanen, M., Vartiainen,
T. and Elmusrati, M. Published in International Conference on Future Energy 
Solutions (FES 2023) (pp. 1-5). IEEE. doi: 10.1109/FES57669.2023.10182751
All the articles are reprinted with the permission of the copyright owners. 
XVIII
AUTHOR’S CONTRIBUTION 
Publication I: “On the performance metrics for cyber-physical attack
detection in smart grid”
The author diligently executed the experimental procedures for all algorithms con-
sidered in the study. This phase involved designing the experiments, setting up
the necessary environments, collecting and processing data, and conducting rigor-
ous analyses. As the primary investigator and researcher, the author took the lead
in writing the primary manuscript. The immediate and second supervisors played
integral roles in providing invaluable guidance and support. Their expertise and
experience were instrumental in shaping the manuscript and elevating its academic
rigor.
Publication II: “Proposed algorithm for smart grid DDoS detection based
on deep learning ”
The author took the lead in all aspects of the manuscript, from conceptualization
and investigation to validation and writing of the initial manuscript. The primary
and secondary supervisors played crucial roles, offering invaluable direction and
support, signifcantly contributing to the manuscript’s overall quality and academic
rigor.
Publication III: “Cyber Security in Power Systems Using Meta-Heuristic
and Deep Learning Algorithms”
The author conceptualized the idea and conducted the experimental analysis with
respect to all the algorithms considered. The author wrote the primary manuscript.
The immediate and second supervisors help in shaping the manuscript.
XIX
Publication IV: “SCADA securing system using deep learning to pre-
vent cyber infltration”
The author played a pivotal role by conceiving the original idea and meticulously
conducting the experimental analysis for all the algorithms under consideration. He
authored the primary manuscript. Throughout the process, the immediate and sec-
ond supervisors provided invaluable support and guidance, playing essential roles
in shaping the manuscript and elevating its academic merit.
Publication V: “Risk Assessment of Machine Learning Algorithms on
Manipulated Dataset in Power Systems ”
The author had the responsibility of conducting the experimental analysis. The
author wrote the primary draft of the manuscript. The other authors reviewed and
contributed at different stages of the manuscript preparation.
XX
1 INTRODUCTION 
The traditional energy system refers to how energy has been generated and dis-
tributed for many decades, relying primarily on fossil fuels such as coal, oil, and
natural gas (Belkebir et al., 2018). The generated power is transmitted through an
intricate grid system comprising electricity substations, transformers, and power
lines connecting electricity producers and consumers. The grids are interconnected
for reliability and effciency, generating more prominent and reliable networks that
improve energy supply coordination and planning. The electrical grid is an ex-
tensive network of high-voltage power lines distributing for thousands of kilome-
ters and low-voltage power lines with distribution transformers connecting electric
power to millions of customers (Fang et al., 2012; Merabet et al., 2017). The elec-
tric power generation system produces energy used to power transportation, heat
buildings, and supply electricity to homes, companies, and industries. ie., small
and large utilities. This system has typical characteristics of massive infrastructure,
centralized power generation, and unidirectional energy transfer from producers to
consumers driving economic development and technological advancements world-
wide.
Although it has been a reliable and effective source of energy for humanity, it has
also caused signifcant environmental damage, including air and water pollution.
One of the main challenges is its environmental effect, as it contributes to a sig-
nifcant portion of global greenhouse gas emissions, causing climate change. For
example, fossil fuels are fnite resources, and their extraction and production can
negatively affect local communities and the environment. Specifcally, they are
facing several challenges, including; the increasing demand for electricity from a
growing population. The need to integrate more renewable energy sources (RES)
into the grid. The grid’s aging infrastructure is becoming more vulnerable to out-
ages. The traditional electric network is vulnerable to disruption (Blumsack and
Fernandez, 2012) from natural disasters, vandalism, and other threats, leading to
power outages and other disruptions to daily life.
As a result of these challenges, many communities and countries are transitioning
to more sustainable RES (V. Kumar et al., 2016), such as solar, and wind power
(Ahmed et al., 2020). These energy sources are often distributed and can be gen-
erated near the point of use (Huang et al., 2012), reducing the need for large-scale
infrastructure and enabling greater energy independence and resilience.
The smart grid is an emerging technology that has the potential to revolutionize the
way electric power is produced and consumed. Smart grids for energy manage-
ment are becoming increasingly popular in addressing climate change and energy
security issues. They are intelligent systems that use information and communica-
tion technologies (ICTs) to improve energy delivery effciency, reliability, stability,
and sustainability (Yigit et al., 2014). It has become the core of the power net-
2 Acta Wasaensia 
work due to its vital benefts, such as self-healing, good power quality, a broader
electrical market, the involvement of customers, the ability to accommodate diverse
generation sources, storage options, and robustness (Espe et al., 2018). The new
generation standard of the power grid permits communication between the smart
grid domains. i.e. (generation, transmission, distribution, system operators, service
providers, customers, and the power market) (Fang et al., 2012; Metke and Ekl,
2010). This is possible with the introduction of information technology (IT) and
communication infrastructure in the power network (Dehalwar et al., 2014; Haluk
Go¨zde; M. Cengiz Taplamacioglo; Murat Ari; Hamza Shalaf, 2015).
Networking these domains makes the smart grid network very complex, leading to
severe possible security holes in the grid. These risks are linked to its automation,
communication, protection and data collection systems (Tuballa & Abundo, 2016).
Using computer software and data transmission in the electrical power grid allows
for improved management and optimization of the power system, with real-time
monitoring, control, and integration of RESs.
Figure 1: Illustration of traditional power system
Information exchange is vital in ICT systems that are used in the smart grids. This
is possible with the aid of the communication network (wireless and wireline) in-
tegrated into the power systems. An essential part of smart grids is based on dis-
tributed generation (DG) through RES. Hence, wireless communication will offer
effective, fexible, and inexpensive solutions for networking. There are several is-
sues associated with wireless networking applications in the protection of power
systems, such as latency and reliability. However, with new communication stan-
dards such as the 5G networks and beyond, these problems might be handled. The
issues of latency and reliability are not discussed in this thesis.
Nevertheless, the vital issue of cyber security will be the main research topic. Since
the wireless communication network has its cyber security vulnerabilities, intro-
ducing it into the power network would mean tackling the vulnerabilities from
3Acta Wasaensia 
the power network perspective. The smart grid cyber continuation security, in its
entirety, covers the privacy and security of the communication and automation of
the grid (Diptiben Ghelani, 2022). That facilitates the communication between the
smart grid domain and the control of circuit breakers and switches as well as data
transmission throughout the power grid.
Vulnerabilities in the power system sense mean the security holes in the network
that could be exploited for illegal or unauthorized tasks. These vulnerabilities are
mainly; network vulnerability, platform vulnerability, and management vulnerabil-
ity. Network vulnerability is linked to wireless connections and the communication
aspect of the power network. Platform vulnerabilities encapsulate the hardware, the
software, the system confguration, and the data-keeping system. Procedure and
security standards are considered under management vulnerabilities.
Figure 2: 2014 Fiscal year incidents reported by different sectors (ICS-CERT,
2015).
The energy sector could be the newest target area for cyber-attacks (ICS-CERT,
2015). Therefore, research on cyber attack detection and prevention schemes is nec-
essary. The prevention schemes must be capable of defending the smart grid from
cyber-attacks launched by terrorists, dissatisfed or dismissed employees, politi-
cians, industrial spymasters, market competitors, and natural disasters (Yang et al.,
2011). Based on the ramifcation of the smart grid, the security aspect is divided
into fve categories. Namely: Process control security (PCS), smart meter security
(SMS), power system state security (PSSS), smart grid communication protocol
4 Acta Wasaensia 
security (SGCPS), and smart grid simulation for security analysis (Diptiben Ghe-
lani, 2022). Looking at different industry sectors, PCS has been used by automated
systems to monitor and control processes using computer networks. Yet, it was in
isolation and without outside network links. Currently, PCSs are run in connection
with other networks from within and outside. This undoubtedly creates holes for
cyber attackers to penetrate.
Smart meters are digitalized type of traditional kilowatt-hour (kWh) meter, mainly
measuring power, current, and voltage (Avancini et al., 2018). They usually have a
programmable microcontroller, memory, and a communication board (e.g., a wire-
less modem). It might have an output port for remote control as well. This forms the
advanced meter infrastructure (AMI). The AMI is arguably the most implemented
element (Novosel, 2012) of the smart grid worldwide, as it is seen as the frst step
for power providers to locomote toward the smart grid. A metering device is clas-
sifed as smart when it can frequently keep records of the power and other vital
parameters and can interconnect this information to the central system for monitor-
ing and analytical purposes. These data can be used to monitor network conditions
such as distribution state estimation, advanced distribution operations, transmission
operations, and asset management (Hussain et al., 2020).
The AMI communicates to the central system through communication means such
as local area networks (LAN), wide area networks (WAN), like LoRaWAN, Blue-
tooth, Zigbee, and mobile networks (eg., GSM, 4G, 5G, etc.). It can also be commu-
nicated through a wireline network such as an Ethernet or power line carrier (PLC).
Hence, the AMI implementation establishes the common telecommunications and
IT infrastructure (Q. Zhang et al., 2010). These communication requirements in-
clude bandwidth, latency, reliability, and security. This means that there will need
to be many different protocols used in the smart grid to enable communication be-
tween components. The state estimator (SE) has been used in the power sector to
monitor, estimate, and control the power network since the evolution of the smart
grid. The SE uses the transferred data from all the remote terminal units (RTU) on
the network to estimate the true state of the network (Mingkui Wei, 2016). Thus,
estimating based on false data will result in bad results by default. Data integrity
attacks (DIA) must be defended in all senses.
In order to control and supervise the generation, transmission, and distribution
of electricity, an energy management system (EMS) is used. It is a sophisticated
software-based system that integrates real-time data and provides monitoring, con-
trol, and optimization capabilities to ensure effcient and reliable operation of power
systems (Su et al., 2017). EMS typically incorporates supervisory control and data
acquisition (SCADA) functionality as a part of its overall architecture (He and Yan,
2016). SCADA systems collect real-time data from various feld devices and sen-
sors, such as RTUs, and send that data to a central control center where operators
can monitor and control the power system. RTUs are devices deployed at various
5Acta Wasaensia 
points in the electrical grid, such as substations or distribution centers, to measure
electrical observations throughout the grid (Oyewole and Jayaweera, 2020). They
are equipped with sensors and meters to collect data on various electrical param-
eters, including voltage levels, current fows, power factors, frequency, and other
relevant data points. The RTUs gather real-time data from the feld and transmit it
to a central control center where operators can monitor and analyze the performance
of the electrical grid. They act as intermediaries between the feld devices and the
central SCADA system.
The measurements telemetered to the data acquisition module of the EMS are used
by the SE to provide an estimate of the grid’s state and to, monitor and optimize
the electric power grid’s condition and provide real-time troubleshooting when a
critical system component fails. This is illustrated in Figure 3.
Figure 3: The state estimation within an energy management system
Keeping the smart grid secure is a vital task. However, using very complex proto-
cols and adding complicated layers on the communication part to enhance security
will, on the other hand, increase the communication delays between sensors (e.g.,
AMI) and controllers as well as between the controllers and actuators (e.g., circuit
breakers). Such delay might cause colossal damage or operation instability. In some
faults, the protection system must react within 5 milliseconds or less to prevent se-
vere damage to the grid. Pre-learned machine learning is one potential technology
that can be useful to preserve security and privacy in smart grids.
The application of machine learning in cyber-security has shown promising re-
sults in other domains, such as fnance, healthcare, and transportation. Therefore,
a strong motivation exists to investigate the potential of machine learning-based
cyber-security techniques in smart grids. Machine learning can help to detect and
predict cyber-attacks in real-time, which can signifcantly enhance the security of
the smart grid. Moreover, machine learning models can learn from historical data
6 Acta Wasaensia 
and adapt to new and emerging threats, making them more effective than traditional
cyber-security mechanisms. This dissertation aims to evaluate the effectiveness of
machine learning-based cyber-security techniques in enhancing the security of the
smart grid against cyber-attacks. The research fndings provide valuable insights
into the potential benefts of using machine learning for cyber-security in the smart
grid and contribute to developing more effective and reliable cyber-security mecha-
nisms.
1.1 Problem statement 
The smart grid is an evolving power grid infrastructure that integrates advanced
communication and information technologies to improve the electricity supply chain
effciency, reliability, and security. However, incorporating these technologies in-
troduces new cyber-security challenges that can compromise the grid’s availabil-
ity, integrity, and confdentiality, potentially leading to catastrophic consequences
(Tufail et al., 2021). Traditional cyber-security approaches in the smart grid are
insuffcient to address emerging threats’ complexity and diversity. Therefore, there
is a need for advanced cyber-security mechanisms that can leverage the power of
machine learning to provide real-time attack detection, prediction, and mitigation.
1.2 Objective 
This dissertation aims to evaluate the effectiveness of machine learning-based cyber-
security algorithms in enhancing the security of the smart grid against cyber-attacks.
It also assesses the risks of trusting machine learning on the network operation and
compromising between network effciency and availability. The specifc objectives
are:
1. To review the state-of-the-art cyber-security threats and attacks in the smart
grid and the traditional cyber-security techniques used to mitigate them.
2. To identify the limitations of traditional cyber-security techniques in address-
ing the emerging threats and the potential benefts of machine learning-based
cyber-security techniques
3. To explore the relevant machine learning algorithms that can be used for real-
time threat detection, prediction, and mitigation in the smart grid.
4. To design and implement a machine learning-based cyber-security framework
for the smart grid and evaluate its effectiveness in detecting, predicting, and
mitigating cyber-attacks.
5. To compare and analyze the performance of the proposed machine learning-
based cyber-security algorithms with traditional cyber-security techniques and
identify its strengths and weaknesses.
7Acta Wasaensia 
6. To address some future work paths for research continuation in this area, such
as risk management issues in AI-based decision-making.
1.3 Outline of the thesis 
The remainder of the dissertation is organized as follows: The literature on smart
grids and cyber-security is reviewed in detail in Chapter 2. It includes a summary
of the smart grid, defnition of words related to cyber security, the value of cyber
security in the smart grid, and the consequences of cyber security failures in the
smart grid. It provides information on the traditional cyber-security measures used
by the smart grid as well as the most recent cyber-security threats and assaults. A
brief introduction to machine learning is provided in the third chapter, which dis-
cusses cyber security in smart grid. The smart grid’s cyber-security using machine
learning algorithms is given. Decision trees, classifcation and regression trees, en-
tropy, information gain, Gini index, support vector machines, random forests, deep
learning, long short-term memory, convolutional neural networks, recurrent neu-
ral network, linear regression, and logistic regression are captured. The chapter
highlights machine learning applications in cybersecurity. Risk analysis of machine
learning applications, algorithmic risks in machine learning, and algorithmic risk
management.
The fourth chapter discusses machine learning methodology for smart grid cyber
security. It includes feature extraction, feature selection, data collecting, data clean-
ing, and preprocessing procedures. It also discusses experimental setup, model
selection, model training, hyper-parameter adjustment, and performance evaluation
of machine learning algorithms. A model evaluation, sensitivity analysis, risk anal-
ysis, and simulation results and discussion are all presented in Chapter 5. In Chapter
6, we present the dissertation conclusions and outline some future research direc-
tions in this area.
8 Acta Wasaensia 
2 REVIEW OF LITERATURE 
This chapter thoroughly assesses existing studies on cyber intrusions in smart grids,
with a particular emphasis on using machine learning algorithms in smart grids.
The review aims to identify research gaps and propose innovative ways to build on
existing knowledge to fll them. The chapter will, thus, contribute to a more thor-
ough understanding of the possible benefts and limitations of utilizing machine
learning algorithms for cybersecurity in smart grids by assessing and synthesizing
the present state of knowledge. The proposed methodologies will try to fll identi-
fed research gaps and provide fresh insights to inform the development of effective
smart grid cybersecurity tactics.
2.1 Overview of smart grid 
The smart grid is a cutting-edge electrical distribution system that aims to transform
how we distribute and use power. It is a modern improvement to the existing power
grid that uses advanced technologies and communication networks to improve ef-
fciency, reliability, and sustainability (Lamba et al., 2019; Y. Zhang et al., 2018).
This smart grid overview looks into its defnition and concept and the advantages
and benefts it provides. It also investigates numerous problems and future devel-
opments in installing smart grid technologies.
Figure 4: The architecture of smart grids.
9Acta Wasaensia 
A modernized electricity network designed to deliver energy more effciently and
fexibly is shown in Figure 4. According to (Espe et al., 2018), the smart grid is a
self-healing, self-optimizing, and self-protecting network that leverages advanced
sensors, communication technologies, and control systems to enhance power deliv-
ery’s reliability, security, and sustainability. One of the critical features of the smart
grid is its ability to integrate renewable energy resources (RES) and distributed en-
ergy resources (DERs), such as solar panels, wind turbines, and energy storage sys-
tems, into the grid. This integration enables more effcient and cost-effective (Sadiq
et al., 2021) utilization of renewable energy (Steimer, 2009), reduces greenhouse
gas emissions (Mutani et al., 2019), and enhances energy independence. Another
important aspect is its capability to enable demand response and dynamic pricing,
which allow customers to manage their energy consumption and costs based on real-
time information and incentives. It supports advanced meter infrastructure (AMI),
which provides utilities and customers with granular information on energy usage,
peak demand, and outage management (Youssef et al., 2018).
Smart grid technology can potentially revolutionize the power sector by provid-
ing numerous advantages and benefts. In (Hledik, 2009), the paper outlined some
key advantages of smart grid technology, such as improved power quality, reduced
power outages, and increased energy effciency. With advanced sensors and com-
munication technologies, it can detect and respond to power outages in real-time,
minimizing the impact of outages on customers (Blumsack & Fernandez, 2012;
Fang et al., 2012). It is, therefore, a sophisticated and innovative energy system that
offers numerous benefts for customers, utilities, and society.
The implementation of smart grid technology has brought about numerous chal-
lenges that require attention for the system to function effectively. The challenges
include interoperability, cybersecurity, and data privacy (Zerbst et al., 2010). Inter-
operability challenges arise due to the integration of various communication tech-
nologies (Ma et al., 2013) that are not necessarily compatible, leading to data loss
and system failures. Cybersecurity threats are also a signifcant challenge, given
that smart grid systems are susceptible to attacks from hackers seeking to exploit
vulnerabilities in the system (Kappagantu and Daniel, 2018). Data privacy alarms
grow due to smart grid systems collecting vast amounts of data, creating questions
about how it is utilized and who has access to it.
Despite all the diffculties and challenges, adopting smart electrical networks is a
necessity imposed by reality and necessity. There are several considerable techno-
logical breakthroughs addressing these issues. For example, advances in creating
secure communication protocols and data encryption technologies have aided in ad-
dressing cybersecurity problems. Blockchain technology is being investigated as a
tool for improving data privacy and security in smart grid systems (Falahi et al.,
2022; Gao et al., 2022; Kuzlu et al., 2020). While smart grid deployment confronts
various obstacles, the technology’s future appears bright, with advances that solve
10 Acta Wasaensia 
these challenges and make the system more effcient and safer.
2.1.1 Cyber-security in smart grid 
The smart grid represents a modernized version of the traditional generation, trans-
mission, distribution, and metering infrastructures. This is achieved by upgrading
existing systems with digital technologies such as microprocessors, software, and
network communication channels. The new technologies can supplement existing
components that perform the same function as before, but now provide and commu-
nicate information to a centralized system, or offer entirely new functionalities that
enable human operators and the grid to respond intelligently to changing conditions
(Smith and Pate-Cornell, 2018).
Technology integration in modern power grids has brought about a new era of eff-
ciency and convenience in energy distribution. However, this advancement has also
brought about new risks (D. Kumar and S., 2020), particularly in cyber-attacks. As
the world increasingly depends on technology, cyber-security detection and preven-
tion have become crucial to ensuring smart grid systems’ safety and reliability (Sahu
and Davis, 2023). Cyber-security is a critical concern in smart grid technology due
to the increased use of interconnected communication systems and devices. Inte-
grating various communication networks and technologies such as sensors, moni-
toring systems, and control systems creates a complex web of communication sus-
ceptible to cyber-attacks (More et al., 2022).
Cyber-attacks on a smart grid can cause catastrophic power outages, resulting in f-
nancial losses, environmental harm, and public safety issues. Various procedures
must be adopted to ensure the smart grid’s cyber-security, including the use of
secure communication protocols, frewalls, and intrusion detection systems (IDS)
(Baig and Amoudi, 2013). Communication connections and data storage systems
must be encrypted to avoid unauthorized access and data breaches. Another critical
part of smart grid cyber-security is establishing access controls and authentication
procedures to prevent access to vital infrastructure and data networks. Strong pass-
words, multi-factor authentication, and role-based access restrictions are used to
restrict access to authorized individuals exclusively (Diamantoulakis et al., 2015).
The paper (Anwar et al.; 2014) proposed several cybersecurity measures to protect
smart grid systems. One of the measures is the implementation of access controls,
which restrict access to critical components of the smart grid system. This measure
ensures that only authorized personnel can access the system, reducing cyber-attack
risk. Another measure is the implementation of IDSs, which monitor and detects
any unauthorized access or malicious activity within the smart grid system (Eken,
2013). This measure allows for quick identifcation and response to potential cyber-
attacks. The authors suggested the implementation of encryption technology, which
protects smart grid system data from unauthorized access by encrypting it. The
Acta Wasaensia 11
measure ensures that data transmitted through the system is secure and cannot be
accessed by unauthorized parties. Implementing these vital cybersecurity measures
can protect smart grid systems from cyber-attacks.
In light of the increasing adoption of smart grid technology, cyber-security is a crit-
ical consideration in the design and implementation of smart grid technology. Ef-
fective cyber-security measures must be put in place to protect against cyber-attacks
and ensure the reliable and secure operation of the smart grid. Implementing these
measures requires collaboration between stakeholders, including utilities, govern-
ment agencies, and cybersecurity experts, to develop comprehensive and effective
cybersecurity strategies.
2.1.2 Terms of cybersecurity 
In the digital era, where technology is deeply integrated into our lives, protecting
sensitive information and digital assets has become paramount. Cybersecurity safe-
guards computer systems, networks, and data from unauthorized access (Ye Yan,
Yi Qian, Hamid Sharif, 2013), which has emerged as a crucial discipline in our in-
creasingly interconnected world. Understanding the terminologies associated with
cybersecurity is essential for comprehending the concepts, tools, and measures em-
ployed to mitigate cyber threats. Cybersecurity involves several crucial elements,
including:
Preventive measures: These measures are aimed at preventing cyber-attacks from
occurring in the frst place. It involves implementing security controls such as fre-
walls, antivirus software, strong passwords, and encryption to protect against po-
tential threats. Detective measures: These measures are focused on identifying
and detecting potential security breaches or unauthorized activities. They involve
monitoring systems and networks for suspicious activities, analyzing logs, and us-
ing IDSs and intrusion prevention systems (IPSs).
Responsive measures: If a cyber-attack occurs or a security breach is detected,
responsive measures are taken to mitigate the impact and prevent further damage.
Incident response plans, backup and recovery systems, and disaster recovery plans
play a crucial role in minimizing the consequences of an attack. Security aware-
ness and training: Educating users and employees about potential threats and best
practices is essential for maintaining a secure environment. Training programs,
workshops, and ongoing awareness campaigns help raise awareness about cyberse-
curity risks and promote responsible online behavior. Security policies and pro-
cedures: Establishing robust security policies and procedures is vital for organi-
zations to ensure consistent and effective cybersecurity practices. These policies
outline guidelines for the secure use of technology, handling of data, and response
to security incidents.
Some key terminologies associated with cybersecurity are:
12 Acta Wasaensia 
Malware: Short for malicious software, malware refers to any software designed to
cause harm, damage, or gain unauthorized access to computer systems (Gunduz and
Das, 2020). It includes viruses, worms, Trojans, ransomware, and spyware. Phish-
ing: Phishing is a technique cybercriminals use to trick individuals into providing
sensitive information, such as passwords, credit card details, or personal data, by
posing as a trustworthy entity. This is typically done through deceptive emails,
messages, or websites. Firewall: A frewall is a network security device that acts
as a barrier between a trusted internal network and an untrusted external network
(usually the Internet). It monitors and controls incoming and outgoing network
traffc based on predefned security rules to prevent unauthorized access and protect
against malicious activities.
Encryption: Encryption is the process of converting information or data into a
coded form to deter unauthorized access or interception. It ensures that even if the
data is intercepted, it remains unreadable unless decrypted with the appropriate key.
Vulnerability: A vulnerability refers to a weakness or faw in a system or software
that attackers can exploit to compromise the system’s security (Yeboah-Ofori and
Islam, 2019). Vulnerabilities can exist in operating systems, applications, networks,
or confgurations. Penetration testing: Also known as ethical hacking or white-hat
hacking, penetration testing involves simulating real-world attacks on a system or
network to identify vulnerabilities and weaknesses. It helps organizations assess
their security posture and address any vulnerabilities before malicious hackers can
exploit them.
Two-factor authentication (2FA): Two-factor authentication adds an extra layer of
security by requiring users to provide two forms of identifcation to access a system
or account. It typically involves combining a password or PIN with a second factor,
such as a unique code generated by a mobile app, a fngerprint scan, or a physical to-
ken. Intrusion Detection System (IDS) and Intrusion Prevention System (IPS):
IDS and IPSs are security mechanisms that detect and respond to potential security
breaches. An IDS monitors network traffc and systems for suspicious activities or
signs of an attack (Sun et al., 2018). At the same time, an IDS detects and takes
active measures to prevent or block such activities. Data breach: A data breach
occurs when unauthorized individuals access sensitive or confdential data. It can
result in the exposure or theft of personal information, fnancial data, trade secrets,
or other valuable information, leading to potential misuse or harm (Y. Li and Liu,
2021).
Social engineering: Social engineering is a tactic used by attackers to manipulate
individuals into revealing sensitive information or performing certain actions. It of-
ten involves psychological manipulation and deception, exploiting human trust and
vulnerabilities to gain unauthorized access to systems or data. These are some of
the many terminologies used in the feld of cybersecurity. The dynamic nature of
cyber threats means that new terms and concepts emerge as the feld evolves. Some
Acta Wasaensia 13
more are: Patch management: Patch management refers to keeping software and
systems up to date with the latest security patches and updates released by vendors.
Regularly applying patches helps address known vulnerabilities and strengthens the
overall security posture. Zero-day vulnerability: A zero-day vulnerability is a se-
curity faw or weakness in a software or system that is unknown to the vendor or
developers. Attackers exploit these vulnerabilities before a patch or fx is avail-
able, making it challenging for organizations to defend against such attacks. Data
loss prevention (DLP): Data loss prevention involves implementing measures and
technologies to prevent the unauthorized disclosure, leakage, or loss of sensitive or
confdential data. DLP solutions monitor and control data fow within the organiza-
tion and when data is shared with external entities.
Identity and access management (IAM): IAM refers to the policies, processes,
and technologies used to manage and control user identities and access to systems,
applications, and data. It ensures that only authorized individuals have appropri-
ate access privileges and reduces the risk of unauthorized access (Suicimezov and
Georgescu, 2014; Wulf et al., 2019). Cyber threat intelligence (CTI): CTI involves
gathering and analyzing information about potential and existing cyber threats to
identify patterns, tactics, techniques, and indicators of compromise. This informa-
tion helps organizations proactively detect and respond to cyber threats effectively.
Security incident and event management (SIEM): SIEM is a centralized security
solution that combines security event management and log management to provide
real-time monitoring, correlation, and analysis of security events across an organi-
zation’s network. It helps identify and respond to security incidents and provides
valuable insights for security operations.
Some more are, Bring your own device: Bring your own device refers to the policy
allowing employees to use their personal devices (such as smartphones, laptops, or
tablets) for work purposes. While it offers fexibility, it introduces security risks
as organizations must manage and secure corporate and personal data on these de-
vices. Cybersecurity frameworks: Cybersecurity frameworks provide guidelines
and best practices for organizations to establish and improve their cybersecurity
programs. Examples include the NIST cybersecurity framework, ISO 27001, and
center for internet security (CIS) controls, which provide a structured approach to
managing cybersecurity risks. Secure socket layer/transport layer security (SS-
L/TLS): SSL/TLS protocols provide encryption and secure communication over
networks, typically used for securing web traffc. They establish a secure con-
nection between a client and a server, ensuring the confdentiality and integrity of
the data transmitted (Wulf et al., 2019). Cyber insurance: Cyber insurance is
insurance coverage that helps organizations mitigate fnancial losses resulting from
cyber-attacks or data breaches. It may cover costs related to incident response, legal
liabilities, data recovery, and business interruption.
14 Acta Wasaensia 
2.1.3 Importance of cyber security in smart grid 
Cybersecurity is an increasingly important topic for any networked informatics sys-
tem, for example, connected to the internet, and smart grids are no exception. The
importance of cybersecurity in smart grids stems from the fact that they are complex
systems with many interconnected devices and systems that control the bidirectional
fow of electricity. This makes them vulnerable to cyber-attacks, which can have se-
vere consequences in terms of the stability and reliability of the grid (Hahn et al.,
2013). Smart grids are designed to be more effcient and reliable than traditional
power grids, but they also present new challenges in terms of cybersecurity. With
so many devices connected to the grid, there are more potential entry points for
attackers to exploit (Mylrea and Gourisetti, 2017). This is particularly true for the
software and communication systems that control the grid, which is often the weak-
est link in the system (Tufail et al., 2021).
To address these vulnerabilities, cybersecurity measures must be integrated into ev-
ery aspect of the smart grid, from the hardware and software to the communication
protocols and data management systems. This requires a multi-layered approach
that includes strong encryption, frewalls, IDSs, and regular software updates to
patch vulnerabilities (Gunduz and Das, 2020). It also requires training and aware-
ness programs for those who operate and maintain the grid and protocols for re-
sponding to cybersecurity incidents. The importance of cybersecurity in smart grids
cannot be overstated. A successful cyber-attack on a smart grid could lead to power
outages, fnancial losses, and even physical harm to people and property. It could
also impact critical infrastructure and national security (Kimani et al., 2019).
Therefore, it is essential that the cybersecurity of smart grids is taken seriously
and that all stakeholders work together to ensure that these systems are as secure
as possible. This includes policymakers, regulators, utilities, manufacturers, and
cybersecurity experts. By working together, they can protect the smart grid and
ensure that it continues to provide reliable and sustainable energy for years to come.
2.1.4 Implications for cyber-security in smart grid 
As digitization escalates in the electrical system, so does the number of access
points and potential attack paths multiply. Exploitation can be directed at various
smart grid components such as software, hardware, and communication networks.
A successful cyber-attack will disrupt critical infrastructure (Kimani et al., 2019)
such as hospitals, transportation networks, and fnancial institutions (Diptiben Ghe-
lani, 2022). Cyber-security ramifcations in smart grid technologies include the
danger of illegal system access, cyber espionage, data breaches, and other types
of cyber-attacks. Unauthorized system access is one of the most dangerous conse-
quences of cyber-security in smart grid technology. Smart grid technology mainly
relies on communication networks to exchange data between system components.
Acta Wasaensia 15
These communication networks are frequently vulnerable to cyber-attacks, and unau-
thorized system access can cause severe interruptions in energy distribution. Cyber-
criminals can exploit system vulnerabilities to access sensitive data, such as con-
sumer information, and manipulate the system to create outages or other damages
(Sun et al., 2018).
The use of sensors and monitoring systems to collect data on energy distribution
and consumption is tangled in smart grid technology. The data’s sensitivity and
valuableness make it a target for cybercriminals who seek to use this data to their
advantage. Cyber espionage can lead to intellectual property theft, fnancial losses,
and other damage to the energy sector. Data cracks are also a substantial insinua-
tion of cyber-security in smart grid technology. Informed decision-making in the
smart grid energy distribution heavily relies upon the data. Any breach of this data
can lead to substantial interruptions in the energy supply chain (Liu et al., 2012).
Cybercriminals can use data cracks to snip sensitive information, manipulate data,
or cause other types of damage to the smart grid system.
Though, integrating smart grid technology has brought momentous benefts to the
energy sector. The use of technology in the energy sector also boons several security
challenges that must be tackled to ensure the safe and secure operation of the power
grid. Cybersecurity is critical for the smart grid system; thus, any compromise of
the system can have severe implications for energy consumers and the energy sector
(Yan et al., 2012). To mitigate these implications, stakeholders in the energy sector
must work together to develop and implement robust cyber-security procedures that
can shield the smart grid system from cyber-attacks.
2.2 Traditional cyber-security mechanisms in the smart grid 
Traditional cyber-security procedures have been utilized to combat cyber-attacks,
but the smart grid provides a distinct set of issues that necessitate specifc atten-
tion. This section overviews traditional cyber-security methods, emphasizes special
smart grid diffculties, and addresses possible remedies and future perspectives for
smart grid security. Traditional cyber-security measures include the use of tech-
nologies, policies, and processes to prevent unauthorized access, use, disclosure,
disruption, modifcation, or destruction of computer systems.
According to (Cone et al., 2007), these methods include frewalls, IDSs, antivirus
software, access control mechanisms, and encryption technologies. Firewalls are
crucial in safeguarding private networks against unauthorized access by selectively
fltering incoming and outgoing traffc according to predetermined criteria. On the
contrary, IDSs are designed to identify and address abnormal network activities,
such as exploiting vulnerabilities, spreading malware, or pilfering sensitive data
(Workman et al., 2008). Antivirus software detects and eliminates malware from
compromised systems, including viruses, worms, and Trojans. Access control mea-
16 Acta Wasaensia 
sures like passwords, biometrics, and smart cards are implemented to restrict the
access of authorized users solely to sensitive data and applications. Encryption
technologies, such as SSL and TLS, ensure data security during transmission and
while at rest. These technologies convert data into an unintelligible format that
can only be decoded with a confdential key. Although conventional cyber-security
measures have successfully mitigated numerous cyber threats, they are not infalli-
ble and can be susceptible to sophisticated and persistent attacks that exploit design
and implementation vulnerabilities. Consequently, organizations must embrace a
comprehensive and forward-thinking approach to cyber-security that integrates tra-
ditional measures with emerging technologies and industry best practices.
Smart grid technology presents inherent challenges requiring comprehensive so-
lutions for effcient and effective system operation. As highlighted by the paper
(Moslehi and Kumar, 2010), integrating RES within the grid is a prominent concern.
The variable and unpredictable nature of renewable sources like wind and solar
power can introduce grid instability (Svendsen et al., 2017). Overcoming this hur-
dle requires implementing sophisticated control systems capable of real-time power
supply-demand balancing. Another signifcant issue revolves around the realm of
cyber-security. Given the heavy reliance of the smart grid on communication and
information technologies, any compromise in security could lead to severe disrup-
tions in the system. As stated (Moslehi and Kumar, 2010), safeguarding the smart
grid necessitates the deployment of critical security mechanisms, including encryp-
tion, authentication, and IDSs.
Moreover, handling voluminous data poses a formidable obstacle for the smart grid.
Effectively processing, storing, and analyzing the vast amount of generated data
is vital in enabling informed decision-making. The paper (Moslehi and Kumar,
2010) propose leveraging modern data analytics techniques such as machine learn-
ing and artifcial intelligence to tackle this challenge. In conclusion, traditional
cyber-security techniques used in smart grids have proven insuffcient in protecting
against new cyber threats. Cyber-security measures must grow in tandem as the
smart grid evolves and becomes more sophisticated. This necessitates transition-
ing to more advanced and adaptive security technology capable of detecting and
responding to new cyber threats in real-time. Furthermore, all players in the smart
grid ecosystem must collaborate to create a culture of cyber-security awareness and
best practices. Only by taking a thorough and coordinated approach can ensure the
safety and security of the smart grid infrastructure.
2.3 Emerging cyber-security threats and attacks in the smart 
grid 
With the advantages that technologies bring to the smart grid, there are also emerg-
ing cyber-security threats and attacks that can jeopardize the integrity and security
of the smart grid (Procopiou and Komninos, 2015). One of the most concerning
Acta Wasaensia 17
threats is the potential for hackers to gain unauthorized access to the smart grid’s
control systems. This could lead to power outages, equipment damage, or even
physical harm to individuals (Radoglou-Grammatikis and Sarigiannidis, 2019). An-
other emerging threat is the rise of ransomware attacks which can compromise the
data and systems of the smart grid. Ransomware attacks can restrict access to crit-
ical data and systems, and attackers can demand payment to restore access (Basnet
et al., 2021). Attacks on smart grid infrastructure, such as power substations, could
cause widespread power outages and disrupt essential services, such as hospitals,
emergency response units, and transportation systems. Phishing attacks are also
a signifcant concern for smart grids. Cybercriminals can use phishing attacks to
trick employees or customers accessing the smart grid’s systems into providing lo-
gin credentials or other sensitive information. Once attackers have access to these
systems, they can cause signifcant damage and may even disrupt power distribu-
tion. Integrating RES into the smart grid presents unique security challenges (Ayar
et al., 2017). Smart grids are vulnerable to new types of cyber-attacks that exploit
the vulnerabilities of renewable energy systems (Ding et al., 2022). For example,
hackers could target the system’s inverters or microcontrollers and cause them to
malfunction, leading to power outages.
The paper (Arabo, 2015) highlights some emerging cyber-security threats and at-
tacks in the smart grid, including advanced persistent threats (APTs), insider threats,
and supply chain attacks. APTs are sophisticated attacks that use multiple infection
vectors and evasion techniques to gain persistent access to the system and steal
sensitive information or cause damage. Insider threats are attacks from within the
organization, such as disgruntled employees, contractors, or partners with access
to critical systems and data (Yang et al., 2011). Supply chain attacks are attacks
that target the vulnerabilities of the third-party components and software used in
the smart grid, such as communication protocols or sensors, to gain unauthorized
access or control over the system. These emerging threats and attacks require new
and innovative cyber-security solutions that can detect, prevent, and respond to cy-
ber threats in real-time and improve the resilience and robustness of the smart grid
against cyber-attacks. One of the potential threats to the smart grid is the APT
attacks (Leszczyna, 2018b). These attacks are diffcult to detect and can remain un-
detected for an extended period, allowing the attacker to access sensitive informa-
tion and control the system. The insider threat is another signifcant cyber-security
threat to the smart grid. Insiders may abuse their access privileges to compromise
the system’s confdentiality, integrity, and availability. These threats pose a signif-
icant challenge to the smart grid’s cyber-security, necessitating the implementation
of robust security measures to protect against them (Workman et al., 2008).
According to (Leszczyna, 2018), cyber-security threats and smart grid attacks can
come from external and internal sources. External threats may include criminal
hackers, nation-states, and terrorist organizations, while internal threats may in-
clude disgruntled employees and contractors. The attacks can result in various
18 Acta Wasaensia 
consequences, including loss of data, disruption of service, and physical damage
to the infrastructure. In order to handle such threats, the energy sector must adopt
a comprehensive approach that includes risk assessment (Rohmeyer and Ben-zvi,
2015), cyber-security awareness training, and robust security measures such as fre-
walls, intrusion detection, and prevention systems. The smart grid’s widespread
adoption presents signifcant cyber-security challenges, and the energy sector must
take proactive measures to address them to ensure the reliable and secure delivery
of energy services (Leszczyna, 2018).
With the benefts of smart grids come new cyber-security threats and attacks. These
threats range from simple phishing attempts to more sophisticated attacks, such as
malware and denial of service (DOS) attacks. These threats can result in a wide
range of negative consequences, including the theft of sensitive data, disruption to
the electricity supply, and even physical damage to equipment. As smart grids be-
come more complex, it is becoming increasingly challenging to protect against these
threats. One way to address this issue is to implement a multi-layered approach
to cyber-security, including technical and non-technical measures. Technical mea-
sures include frewalls, IDSs, and encryption, while non-technical measures include
training and awareness programs for employees and customers. By taking a com-
prehensive approach to cyber-security, smart grid providers can help to mitigate the
risks associated with emerging threats and attacks (Eder-Neuhauser et al., 2017).
The paper (Kimani et al., 2019) highlight some emerging cyber-security threats and
attacks in the smart grid, including phishing attacks, ransomware attacks, and DOS
attacks. Phishing attacks involve cybercriminals sending fraudulent emails to util-
ity companies or smart grid customers to steal sensitive information. On the other
hand, ransomware attacks involve hackers encrypting utility company data, mak-
ing it inaccessible until a ransom is paid. DOS attacks involve hackers fooding the
smart grid system with traffc, causing it to crash (Ashok et al., 2017). These attacks
can have severe consequences, including power outages, fnancial losses, and even
loss of life.
Consequently, utility companies must implement robust cyber-security measures to
protect the smart grid from cyber-attacks. This can include regular cyber-security
training for employees, implementing frewalls and IDSs, and regularly testing the
system for vulnerabilities. With the continued evolution of technology, utility com-
panies need to remain vigilant and proactive in mitigating cyber-security threats
in the smart grid (Kimani et al., 2019). As technology advances and the grid be-
comes increasingly interconnected, it is crucial to stay vigilant, continually update
defenses, and adopt proactive measures to mitigate these risks.
With a more interconnected and digitally reliant future, these advancements come
a new frontier of cyber-security threats and attacks that loom ominously over the
stability and reliability of the smart grid. We delve into this landscape of potential
chaos and visualize the vivid threats that cast a shadow over the smart grid. These
Acta Wasaensia 19
are examples of emerging cyber-security threats and attacks in the smart grid.
APTs: APTs are sophisticated and long-lasting cyberattacks designed to infltrate
and compromise the smart grid infrastructure. Attackers obtain unauthorized ac-
cess, remain undetected for an extended period, and target specifc assets or systems
to disrupt operations or steal sensitive data (Gunduz and Das, 2020). Ransomware
Attacks: Ransomware has become a prevalent menace across many industries, in-
cluding the smart grid. Attackers can use ransomware to encrypt vital fles and
systems, effectively holding the grid infrastructure hostage until the ransom is paid.
These attacks can disrupt power generation, distribution, and management systems,
resulting in extensive blackouts. Attacks on the supply chain smart grid systems rely
on a complex supply chain involving multiple vendors and suppliers. By injecting
malicious code or compromising hardware components, attackers can exploit the
vulnerabilities in this chain (Yeboah-Ofori and Islam, 2019). This enables unau-
thorized access and manipulation of the smart grid infrastructure. Threats from
insiders: Insiders with privileged access to the smart grid infrastructure can pose
serious risks. Malicious employees or contractors may abuse their positions to in-
tentionally compromise vital systems, pilfer sensitive data, or disrupt operations. It
emphasizes the need for effective access controls and monitoring mechanisms.
Zero-day exploits: A zero-day exploit refers to a cyberattack that takes advantage
of a software vulnerability that is unknown to the software vendor or developer. At-
tackers exploit these vulnerabilities before the software creator becomes aware of
them, which means there is ”zero days” of advanced notice to patch or remedy the
issue. Phishing and social engineering: Cybercriminals frequently use phishing to
deceive employees or users into divulging sensitive information or granting unau-
thorized access. They may pose as vendors or colleagues to acquire trust and ma-
nipulate individuals into compromising the security of the smart grid (Tony Flick,
2010). Malware and botnets: Malware, including viruses and botnets, can in-
fect smart grid infrastructure devices. These infected devices join a larger network,
enabling remote control and coordinated attacks. Malware can disrupt operations,
compromise data, and facilitate further intrusion. DoS and distributed denial of
service (DDoS) attacks: DoS and DDoS attacks overwhelm targeted systems or
networks with a deluge of traffc, rendering them inaccessible or degrading perfor-
mance. These attacks can disrupt communication channels, compromise monitoring
systems, and cause the failure of vital smart grid infrastructure components (Yang
et al., 2011).
Alongside insider threats, there is also concern regarding insider subversion. In-
dividuals within the organization who cause damage or disruption to the smart
grid infrastructure on purpose. Whether motivated by personal beneft, ideology,
or vengeance, insider sabotage can have devastating effects, resulting in prolonged
power outages or system failures. AMI attacks: Smart grids frequently rely on
AMI, which consists of smart meters installed in residences and businesses to mon-
20 Acta Wasaensia 
itor and control energy consumption. These meters communicate with the grid and
provide optimization-relevant data. In addition, they introduce new attack vectors.
Attackers may attempt to compromise or manipulate smart meters, resulting in erro-
neous readings, fraudulent billing, or targeted assaults against specifc customers or
regions (Tony Flick, 2010). Global positioning system (GPS) spoofng and time
synchronization attacks: The exact timing and synchronization of devices within
the smart grid are essential to its correct operation. Using spoofng techniques, at-
tackers can target and manipulate GPS signals used for time synchronization. This
can result in inaccurate time references, disrupting the coordination and synchro-
nization of vital grid operations (Gunduz and Das, 2020). Such disruption in time
synchronization can lead to serious protection problems and large damage to the
power grid.
Vulnerabilities of cloud-based infrastructure: Smart grids increasingly utilize
cloud-based platforms and services for data storage, analytics, and remote manage-
ment. These cloud environments, however, introduce new vulnerabilities. Attackers
may target cloud infrastructure, leveraging misconfgurations or insuffcient access
controls to obtain unauthorized access to vital grid data or control systems. Attacks
using artifcial intelligence and machine learning: As artifcial intelligence and
machine learning technologies fnd applications in the smart grid for optimization
and automation; they also become potential attack targets. The integrity and eff-
cacy of artifcial intelligence models may be jeopardized if adversaries manipulate
or contaminate training data sets. This can result in erroneous decisions, cascad-
ing failures, and even targeted attacks against grid systems propelled by artifcial
intelligence. The capabilities of artifcial intelligence can be leveraged to orches-
trate severe security breaches against smart grids. In these instances, the conven-
tional methods employed for preventing and detecting cyber-security attacks may
fall short of providing adequate protection.
Firmware and hardware tampering: The frmware and hardware components
within the smart grid infrastructure are potential targets for tampering. Attackers
may modify or replace the frmware in devices or embedded systems, introducing
malicious code or backdoors that can compromise the entire system’s security. Sim-
ilarly, compromised or counterfeit hardware components can pose signifcant risks
to the grid’s integrity and functionality (Voas, 2016). Social engineering targeting
grid operators: Social engineering attacks specifcally target grid operators and
utility corporations. Attackers may impersonate executives, technical support per-
sonnel, or government offcials to manipulate employees into providing sensitive
information or granting unauthorized access to critical systems (Tony Flick, 2010).
Educating and training personnel to identify and mitigate social engineering risks is
crucial, as these attacks exploit human vulnerabilities.
Blockchain exploitation: Blockchain technology promises to enhance the security
and reliability of smart grids. However, attackers may exploit faws in blockchain
Acta Wasaensia 21
implementations or smart contracts to obtain unauthorized access, tamper with
transaction records, or disrupt the operation of blockchain-based systems. Integrity
and resilience must be ensured for blockchain networks facilitating smart grid op-
erations (Falahi et al., 2022; Gao et al., 2022; Kuzlu et al., 2020). Data integrity
attacks: The smart grid heavily relies on accurate and reliable data for decision-
making and control. Data integrity can be compromised by tampering with or ma-
nipulating data in transit or at rest. By compromising the integrity of grid data,
adversaries can deceive operators, disrupt system operations, or contribute to oper-
ational failures or safety hazards (Gunduz and Das, 2020). Interdependencies with
other systems: The smart grid is interdependent with a variety of other critical in-
frastructure systems, such as transportation, water supply, and telecommunications.
Attackers may exploit the interdependencies between these systems to initiate co-
ordinated attacks. Degrading telecommunications infrastructure, for instance, can
impede grid monitoring and response capabilities, amplifying the effects of a cy-
berattack on the smart grid.
While cyberattacks dominate smart grid security discussions, the physical infras-
tructure can also be a target. Attackers may attempt to physically damage or elimi-
nate essential grid components, such as substations, transmission lines, and control
centers. Such assaults can result in prolonged outages, cascading failures, and sig-
nifcant monetary and societal repercussions. Nation-state attacks: Smart grids
are vital components of the nation’s infrastructure, making them potential targets for
cyberattacks orchestrated by nation-states. Nation-state actors are capable of con-
ducting sophisticated, long-term espionage, disruption, or subversion campaigns.
These attacks may employ APTs, zero-day exploits, and complex attack vectors de-
signed to exploit specifc vulnerabilities and obtain strategic advantages.
Insider privilege abuse: In addition to insider threats, insiders with excessive priv-
ileges can abuse their access to critical systems. Privilege abuse can involve unau-
thorized confguration changes, bypassing security controls, or granting access to
unauthorized individuals or external attackers. This poses a signifcant risk to the
smart grid’s security and operational integrity. Insider-enabled physical attacks:
Insiders with knowledge of the smart grid’s physical infrastructure and security
systems can facilitate physical attacks by providing critical information to external
threat actors. This collaboration between insiders and external adversaries can lead
to targeted attacks on critical components, such as disabling physical security mea-
sures, compromising control systems, or damaging critical infrastructure.
Coordinated grid-scale attacks: Attackers may simultaneously and systematically
target multiple components or regions of the smart grid. These grid-scale attacks
can potentially overwhelm the system’s defenses, resulting in pervasive disruptions
or cascading failures. By exploiting vulnerabilities across multiple grid infrastruc-
ture layers, adversaries can maximize their impact and disrupt the grid’s overall
functionality. Wireless network exploitation: Wireless communication networks
22 Acta Wasaensia 
play a crucial role in the smart grid, enabling data transmission, device control,
and monitoring. Attackers may exploit vulnerabilities in wireless protocols or com-
promise wireless access points to gain unauthorized access to the grid’s network.
They can eavesdrop on communications (Voas, 2016), inject malicious commands,
or disrupt wireless connectivity, leading to control system failures or unauthorized
access to critical infrastructure.
Energy theft and fraud: Intelligent grids rely on precise measurement and in-
voicing systems to guarantee equitable energy distribution and billing. However,
attackers may attempt to manipulate (Voas, 2016), smart meters, energy data, or in-
voicing systems to commit energy theft or fraud. This can lead to revenue losses for
utility companies and compromise the grid’s overall reliability (Tan et al., 2017).
Third-parties integration risks: The smart grid ecosystem frequently integrates
third-party applications, services, or devices. While this integration may provide
benefts, it may also introduce security hazards. Attackers may target vulnerabil-
ities in third-party systems to obtain unauthorized access to the smart grid infras-
tructure or exploit vulnerabilities at integration points, thereby compromising the
grid’s overall security posture.
Over-the-air attacks(OTA): Smart grid components, such as smart meters or grid
sensors, often receive frmware updates or confguration changes OTA. Attackers
may attempt to intercept OTA communications or tamper with the update process,
injecting malicious frmware or commands into devices. These OTA can compro-
mise the integrity of devices, leading to unauthorized control, data manipulation,
or even physical damage to the grid infrastructure. Cryptocurrency mining mal-
ware: Cryptocurrency mining malware, also known as crypto-jacking, has become
prevalent in various industries. Attackers can compromise devices within the smart
grid infrastructure to mine cryptocurrencies, causing resource depletion, reduced
performance, and increased energy consumption. This impacts the grid’s opera-
tional effciency, can strain the power supply, and lead to fnancial losses. The
emergence of quantum computing: While still in its early stages, it poses op-
portunities and challenges for smart grid security. Quantum computers have the
potential to break encryption algorithms commonly used to secure grid communi-
cations and data. As quantum computing advances, it becomes crucial to develop
post-quantum encryption methods to ensure the long-term security of the smart grid.
Lack of standardized security practices: The smart grid ecosystem involves var-
ious stakeholders, including utility companies, manufacturers, vendors, and regula-
tory bodies. The lack of standardized security practices and protocols across these
entities can create vulnerabilities and inconsistencies in security implementations
(Vaos et al., 2018). Harmonizing security practices and establishing industry-wide
standards can help ensure a more robust and resilient smart grid infrastructure.
Social media and open-source intelligence exploitation: Attackers can use in-
Acta Wasaensia 23
formation disseminated on social media platforms and other public sources (Tony
Flick, 2010) to gather intelligence about the smart grid infrastructure. By analyzing
publicly accessible data, adversaries can identify grid vulnerabilities, weaknesses,
and potential targets, allowing them to plan and execute more effective cyberattacks.
IoT device vulnerabilities: The proliferation of IoT devices in the smart infras-
tructure introduces a new class of vulnerabilities. IoT devices may have insuffcient
security features, obsolete frmware, or default credentials, making them vulnerable
to compromise. Intruders may target these devices to obtain unauthorized access
(Vaos et al., 2018) to the grid network, launch attacks, or as entry points for further
infltration.
Figure 5: Different types of cyber-attacks.
Privacy breaches: Smart grid systems generate vast amounts of data about energy
consumption, user behavior, and grid operations. Ensuring the privacy of this sen-
sitive information is crucial. Attackers may target privacy vulnerabilities to gain
unauthorized access to personal data or compromise the confdentiality (Vaos et al.,
2018) of grid-related information. Privacy breaches can lead to public distrust, reg-
ulatory issues, and legal consequences. Artifcial intelligence-generated attacks:
Adversaries can leverage artifcial intelligence technologies to automate and en-
hance attack capabilities. They can use artifcial intelligence algorithms to generate
sophisticated phishing emails, create targeted spear-phishing campaigns, or even
develop artifcial intelligence-driven malware that can adapt and evolve to bypass
traditional security defenses. Artifcial intelligence-generated attacks pose signif-
24 Acta Wasaensia 
cant challenges in detection and mitigation.
5G network exploitation: As 5G networks are increasingly deployed, smart grids
are anticipated to exploit this technology’s benefts, such as decreased latency and
increased bandwidth. Nevertheless, the implementation of 5G networks introduces
potential security hazards. Attackers may exploit 5G infrastructure vulnerabilities,
such as signaling protocols or network slicing, to disrupt grid communications or
obtain unauthorized access to the smart grid. Implementing intricate security lay-
ers within the 5G network could potentially result in elevated latency levels. Given
that numerous control and protection applications within smart grids are excep-
tionally latency-sensitive, this consideration becomes paramount. Virtualization
and cloud-based attacks: Virtualization and cloud computing are indispensable to
modernizing the smart grid. However, these technologies introduce their security
risks. In order to obtain unauthorized access to grid systems, manipulate data, or
disrupt vital operations, attackers may exploit vulnerabilities in virtualized environ-
ments or compromise cloud-based services (Vaos et al., 2018).
Insider threats from external contractors: The smart grid ecosystem frequently
includes transient grid infrastructure access for external contractors. These con-
tractors may have access to privileged systems and data, making them potential
insider threats. Organizations must implement robust access controls (Eken, 2013),
monitoring mechanisms, and rigorous verifcation procedures to mitigate the risks
associated with external contractors. Infrastructure-as-a-service (IaaS) attacks:
Smart grid operators may host critical applications or infrastructure components
with IaaS providers. To obtain unauthorized access to the smart grid infrastructure,
attackers can target vulnerabilities in IaaS platforms or exploit misconfgurations.
IaaS environments must implement robust security measures and undergo regular
security assessments. Physical security system vulnerabilities: Physical security
systems, such as surveillance cameras, access control systems, and IDS, are crit-
ical for protecting the smart grid’s physical infrastructure. Attackers may target
vulnerabilities in these systems to bypass or disable them, facilitating unauthorized
physical access to critical grid components.
As technology evolves and cybercriminals become more sophisticated, new cyber
threats and attack vectors will emerge. To remain ahead of cyber adversaries and
ensure the secure and dependable operation of the smart grid, ongoing research,
collaboration, and the continuous improvement of security measures are essential.
The evolving threat landscape necessitates continuous monitoring, evaluation, and
enhancement of the smart grid’s security measures. Implementing a defense-in-
depth strategy consisting of technical controls, security awareness training, incident
response plans, and regulatory frameworks are crucial for mitigating the risks posed
by these emergent cyber-security threats and attacks. Emerging cyber threats and
cyberattacks highlight the need for a comprehensive, multi-layered smart grid secu-
rity strategy. To ensure the resilience and dependability of the smart grid, it is nec-
Acta Wasaensia 25
essary to secure not only the cyber-infrastructure but also physical security, training,
awareness programs, incident response capabilities, regulatory frameworks, and
collaborations between industry stakeholders, government agencies, and cyberse-
curity experts.
26 Acta Wasaensia 
3 MACHINE LEARNING AND CYBER-SECURITY IN 
SMART GRID 
3.1 Machine learning 
Machine learning is a subfeld of artifcial intelligence that focuses on developing
algorithms and models that enable computers to learn from data and make pre-
dictions or in general modeling without being explicitly programmed. It involves
studying statistical techniques, linear algebra, and computational models that auto-
matically learn patterns and relationships from data, allowing machines to improve
their performance over time (Batta, 2018). It has gained signifcant attention in
cybersecurity due to its ability to analyze large-scale data, detect anomalies, and
identify patterns that may indicate malicious activities.
Figure 6: Machine learning layers.
Acta Wasaensia 27
In machine learning, the learning process starts with training data consisting of input
variables (features or attribute) and corresponding output variables (labels or target
values). The algorithm analyses the training data to identify patterns, dependencies,
and statistical relationships. The goal is to create a model that can generalize from
the training data and make accurate predictions or decisions on unseen or future
data. There are various machine learning algorithms, such as supervised, unsuper-
vised, semi-supervised, and reinforcement learning (Teixeira et al., 2018).
• Supervised learning: This type of learning involves training a model with
labeled data, where the input features and corresponding output labels are
provided. The model learns to map the input features to the output labels,
enabling it to make predictions on new unseen data (Alkuwari et al., 2022;
Sarker, 2022a).
• Unsupervised learning: Unsupervised learning deals with unlabeled data,
where only the input features are provided. The algorithm learns to discover
hidden patterns, structures, or clusters within the data without explicit guid-
ance. It helps fnd insights and understand the data’s structure (Alkuwari et
al., 2022; Sarker, 2022a). In essence, the task involves discovering the opti-
mal multivariate probability density function characterized by parameters that
accurately capture the underlying data.
• Semi-supervised learning: Semi-supervised learning combines elements of
both supervised and unsupervised learning. It leverages a small amount of
labeled data and a more considerable amount of unlabeled data to create mod-
els that can make predictions or decisions (Alkuwari et al., 2022; Azad et al.,
2019).
• Reinforcement learning involves training an agent to interact with an envi-
ronment and learn through trial and error. The agent receives feedback in the
form of rewards or penalties based on its actions, allowing it to learn optimal
strategies or policies to maximize long-term rewards (Alkuwari et al., 2022;
Sarker, 2022a; Syrmakesis et al., 2022)
Machine learning algorithms utilize various techniques and methods such as regres-
sion, classifcation, clustering, dimensionality reduction, neural networks, decision
trees, support vector machines (SVM), and more. These algorithms can be applied
to a wide range of domains, including image recognition, speech recognition, nat-
ural language processing, recommendation systems, fraud detection, autonomous
vehicles, healthcare, fnance, and many others (Sarker et al., 2021). In summary,
machine learning enables computers to learn from data and make predictions or
choices without explicit programming. It makes use of algorithms and models that
examine patterns and relationships in the data, enabling machines to perform better
and offer insightful analyses and predictions across a range of industries.
28 Acta Wasaensia 
Figure 7: A categorization of major machine learning techniques with relevant ex-
amples.
3.2 Machine learning algorithms for cyber-security in smart grid 
Due to the smart grid system’s complexity, magnitude, and interconnectedness, it
is extremely susceptible to cyber-attacks (Dogaru & Dumitrache, 2019; Y. Li et
al., 2022); machine learning algorithms are vital tools in identifying and mitigating
such attacks. Smart grids face threats such as data breaches, unauthorized access,
malware attacks, and denial-of-service (DoS) attacks. These threats can have se-
vere consequences, including disruption of electricity supply, compromising cus-
tomer privacy, and even physical damage to the grid infrastructure. In smart grid
cyber-security, machine learning algorithms must be used to safeguard the infras-
tructure from such threats. Machine learning algorithms make real-time analysis
and detection of aberrant activity feasible, allowing for the quick implementation of
preventative actions. Cyber dangers are evolving along with technology; thus, it is
essential to create machine learning algorithms to strengthen smart grids’ security.
Some essential machine-learning techniques that can help identify and mitigate po-
tential threats are:
Intrusion detection systems (IDS): Machine learning algorithms are extensively
used in IDS to detect and prevent cyber-attacks in smart grids (Siraj, 2014). These
algorithms analyze network traffc, system logs, and other relevant data to identify
patterns and anomalies associated with malicious activities. They can learn from
Acta Wasaensia 29
historical data to identify new and emerging attack patterns, enabling the system to
detect and respond to previously unknown threats (Nguyen & Reddi, 2021; Prab-
hakar et al., 2022). These models are then deployed to continuously monitor the
smart grid infrastructure and detect any suspicious or abnormal behavior that devi-
ates from expected patterns. Examples of machine learning algorithms used in IDS
include decision trees, SVM, random forests, and deep learning models like convo-
lutional neural networks (CNN) and recurrent neural networks (RNN). It is worth
noting that machine learning algorithms used in IDS should be regularly updated
and trained with the latest data to maintain their effectiveness.
Besides, the performance of these algorithms depends on the quality and representa-
tiveness of the training data, as well as the proper selection and tuning of algorithm
parameters. So, a comprehensive and proactive approach to smart grid security
should incorporate machine learning-based IDS and other security measures such
as network segmentation, access control, encryption, and regular security audits.
Malware detection: Smart grids are vulnerable to malware attacks that can disrupt
operations and compromise the entire system’s security. Machine learning algo-
rithms can help in the early detection of malware by analyzing system behavior
and identifying malicious code or activities. These algorithms can learn from large
datasets of known malware samples and detect variations or new types of malware
(Liang et al., 2018). Techniques such as static and dynamic analysis, feature ex-
traction, and clustering algorithms are employed for effective malware detection in
smart grids.
In the context of smart grids, machine learning algorithms can be trained using large
datasets of known malware samples. These datasets may include various types of
malware, such as viruses, worms, Trojans, and ransomware, that have been previ-
ously identifed and analyzed. By learning from these datasets, the algorithms can
develop models that can effectively detect variations or new types of malware that
have not been previously encountered. Feature extraction is an important step in the
process of malware detection. Machine learning algorithms analyze system data
and extract relevant features or characteristics that can distinguish between normal
and malicious behavior. These features include network traffc patterns, system
resource usage, communication protocols, and data access patterns. By extracting
meaningful features, the algorithms can identify behavioral patterns associated with
malware activities and fag any suspicious behavior for further investigation.
Clustering algorithms play a crucial role in organizing and categorizing malware
samples. These algorithms group similar malware samples based on their charac-
teristics, enabling security analysts to understand the commonalities and variations
among different types of malware. By clustering malware samples, machine learn-
ing algorithms can effectively classify and identify new or unknown malware based
on their similarity to previously encountered samples. This enables the early de-
tection and classifcation of emerging malware threats, even if they have not been
30 Acta Wasaensia 
previously identifed.
Applying machine learning algorithms in smart grid security provides an intelli-
gent, automated approach to detecting malware. By continuously analyzing system
behavior, these algorithms can identify potential threats, alert security personnel,
and facilitate prompt response and mitigation actions. However, while machine
learning algorithms can signifcantly improve malware detection in smart grids, a
comprehensive security strategy should include other measures such as robust net-
work architecture, secure communication protocols, and regular software updates
to ensure the system’s overall resilience.
Anomaly detection is crucial for identifying unusual behavior or deviations from
normal patterns in smart grids. Machine learning algorithms can analyze vast amounts
of sensor data, network logs, and system performance metrics to establish nor-
mal behavior profles and identify deviations that may indicate potential security
breaches. The machine learning algorithms employed for anomaly detection in
smart grids often fall under the category of unsupervised learning. Unlike super-
vised learning, unsupervised learning algorithms do not require labeled data for
training. Instead, they focus on discovering patterns and structures in the data,
including deviations from normal behavior (Shampa et al., 2023). Unsupervised
learning algorithms like clustering, autoencoders, and Gaussian mixture models are
commonly used for anomaly detection in smart grids (Xiufeng Liu, 2016). By con-
tinuously monitoring and analyzing system behavior, these algorithms can detect
intrusions, unauthorized access attempts, or abnormal activities in real-time.
Autoencoders are another powerful technique for anomaly detection. Autoencoders
are highly non-linear black box like neural networks architectures that are trained to
reconstruct the input data from a compressed representation called the latent space.
During the training phase, the autoencoder learns to encode and decode normal
data accurately. When presented with anomalous data, the reconstruction error is
typically higher, indicating a deviation from the expected patterns. Autoencoders
can detect real-time anomalies by setting a threshold for the reconstruction error.
Gaussian mixture models (GMMs) are probabilistic models that assume the data is
generated from a mixture of Gaussian distributions. GMMs can capture the statisti-
cal properties of the normal data distribution and estimate the likelihood of new data
points belonging to the same distribution. Data points with low likelihood values
are considered anomalous. GMMs are particularly useful for detecting anomalies
in multi-dimensional data where complex distributions can characterize normal be-
havior (de Souza et al., 2022).
Machine learning algorithms for anomaly detection in smart grids can detect vari-
ous security breaches by continuously monitoring and analyzing system behavior.
They can identify intrusions, unauthorized access attempts, system malfunctions,
abnormal energy consumption, or other activities deviating from established nor-
Acta Wasaensia 31
mal patterns. Real-time anomaly detection enables rapid response and mitigation,
minimizing the potential impact of cyber-attacks or operational failures. Predic-
tive maintenance: Machine learning algorithms can also contribute to the security
of smart grids through predictive maintenance. These algorithms can predict po-
tential failures or vulnerabilities (Chicco et al., 2020) in the grid infrastructure by
analyzing historical sensor data, system logs, and maintenance records. Predictive
maintenance models can help identify potential security weaknesses and proactively
address them before attackers exploit them. Timely maintenance and security up-
dates can signifcantly reduce the risk of cyber-attacks and system failures.
Predictive maintenance plays a crucial role in ensuring the security and reliability
of smart grid infrastructure. By leveraging machine learning algorithms, historical
data, and advanced analytics, predictive maintenance can identify potential failures,
vulnerabilities, or deteriorating conditions in the grid components. This proactive
approach enables operators to address security weaknesses and mitigate risks before
they lead to cyber-attacks or system failures (T. Li et al., 2019). Machine learning
algorithms used in predictive maintenance analyze various data sources, including
historical sensor data, system logs, maintenance records, and external data such as
weather conditions or grid load patterns. These algorithms can identify patterns,
correlations, and anomalies in the data to develop models that predict the future
behavior of grid components. The models can provide insights into the strength
and performance of smart grid infrastructure. They can predict the likelihood and
timing of component failures, detect abnormal behaviors, and identify potential se-
curity vulnerabilities that attackers may exploit. By leveraging historical data, the
algorithms can learn the patterns associated with failures or security incidents and
use this knowledge to make accurate predictions.
Integrating security-related data into predictive maintenance models allows for iden-
tifying potential vulnerabilities in the grid infrastructure. For example, machine
learning algorithms can detect unusual access patterns, unauthorized activities, or
signs of cyber-attacks by analyzing system logs and network traffc data. The mod-
els can highlight areas where security updates or patches are needed to address
vulnerabilities and prevent potential breaches. Timely maintenance and security
updates based on the predictions of machine learning algorithms can signifcantly
reduce the risk of cyber-attacks and system failures. By proactively addressing po-
tential failures or vulnerabilities, operators can mitigate the impact on the smart
grid’s security and performance. This approach not only enhances the security of
the grid but also improves the overall effciency and reliability of the system.
Additionally, predictive maintenance can optimize maintenance schedules and re-
source allocation. By predicting when specifc components are likely to fail or
require maintenance, operators can plan maintenance activities more effectively.
This approach minimizes the downtime and disruption caused by unexpected fail-
ures, reduces maintenance costs, and maximizes the lifespan of grid assets. To
32 Acta Wasaensia 
ensure the effectiveness of predictive maintenance algorithms, continuously updat-
ing and retraining the models with new data is essential. As the smart grid evolves,
new technologies are introduced, and operational conditions change, the algorithms
must adapt to capture the changing patterns and behaviors. Integrating real-time
data streams into the predictive maintenance models can further enhance their ac-
curacy and responsiveness to evolving security threats. In conclusion, predictive
maintenance powered by machine learning algorithms offers a proactive and intel-
ligent approach to ensuring the security and reliability of smart grid infrastructure.
By analyzing historical data and identifying potential failures, vulnerabilities, and
security weaknesses, operators can take timely actions to prevent cyber-attacks, re-
duce downtime, optimize maintenance activities, and enhance the overall perfor-
mance of the smart grid.
User encryption and authorization: Ensuring the identity and authorization of users
accessing the smart grid infrastructure is vital for maintaining security. Machine
learning algorithms can be utilized for user authentication and access control by an-
alyzing user behavior patterns, biometric data, and historical usage patterns (Wang
and Lu, 2013). These algorithms can identify suspicious login attempts, detect
unauthorized access, and fag potential security threats in real time. Advanced tech-
niques like deep learning-based facial recognition and voice authentication systems
are gaining prominence for user authentication in smart grids. The facial recog-
nition systems utilize neural networks to analyze facial features and match them
against a database of known users. By capturing and analyzing facial biometric
data during the login process, deep learning algorithms can accurately verify the
identity of users, making it diffcult for unauthorized individuals to gain access.
Another advanced technique used for user authentication in smart grids is voice
authentication, which analyzes voice patterns, pitch, intonation, and other acous-
tic characteristics to verify the identity of users. By comparing the voice samples
provided during the authentication process with pre-registered voice templates, the
algorithms can determine whether or not the user is legitimate. Machine learning
algorithms can also be utilized for continuous authentication, where user behavior
is monitored throughout a session to detect suspicious activities. These algorithms
can analyze user actions, mouse movements, keystrokes, and other behavioral pat-
terns to ensure the ongoing legitimacy of user access. If the algorithms detect unex-
pected changes in behavior, such as erratic mouse movements or a sudden change
in typing patterns, they can prompt additional authentication steps or terminate the
session. It is crucial to highlight that privacy concerns must be carefully addressed
(Al Ameen et al., 2012) when implementing machine learning-based user authen-
tication systems. Proper data anonymization, encryption (Diamantoulakis et al.,
2015), and secure storage of biometric data are essential to protect user privacy.
Also, transparency and user consent should be ensured, and appropriate legal and
ethical guidelines should be followed when collecting and utilizing biometric data.
Acta Wasaensia 33
Secure communication networks: Securing communication networks within a smart
grid is essential to protect against various types of attacks and maintain data con-
fdentiality, integrity, and availability. Machine learning algorithms offer valuable
techniques for enhancing the security of communication channels by encrypting
data, authenticating communication nodes, detecting tampering attempts, and opti-
mizing secure communication protocols. Encryption is a fundamental method for
protecting data during transmission. Machine learning algorithms can assist in de-
veloping robust encryption algorithms and optimizing encryption parameters based
on various factors such as data sensitivity, network conditions, and security require-
ments. These algorithms can analyze historical data and patterns to identify po-
tential vulnerabilities in encryption schemes and contribute to the development of
stronger encryption methods. Authentication of communication nodes ensures that
only authorized entities can participate (Bhattarai et al., 2019) in the smart grid net-
work. Machine learning algorithms can play a role in authentication processes by
analyzing authentication data, user behavior, and historical usage patterns. These
algorithms can learn from data and develop models that accurately verify the iden-
tity of communication nodes, preventing unauthorized access or impersonation.
The smart grid can establish a more reliable and secure communication infrastruc-
ture by incorporating machine learning into authentication mechanisms. Analyzing
network traffc, monitoring data authentication, and identifying patterns indicative
of tampering or intrusion attempts are crucial for securing communication networks
in smart grids. Historical data can be leveraged to establish normal behavior pro-
fles, enabling the detection of deviations or anomalies in real time. Timely alerts
are raised when suspicious activities are detected, allowing for proactive detection
and mitigation of tampering attempts. This approach enhances the overall security
of the communication network.
Reinforcement learning algorithms can optimize secure communication protocols
by learning from environmental interactions and adapting to emerging threats. These
algorithms can utilize feedback from the system to improve the effciency and ef-
fectiveness of communication protocols, dynamically adjusting parameters and con-
fgurations to mitigate vulnerabilities or adapt to changing attack techniques. Re-
inforcement learning enables the development of adaptive and resilient communi-
cation protocols that can respond to new threats in real time. Machine learning
algorithms can also contribute to anomaly detection in communication networks.
By analyzing network traffc data, system logs, and performance metrics, these al-
gorithms can identify abnormal communication patterns that may indicate poten-
tial security breaches or attacks. Anomalies in network traffc, such as unexpected
spikes in data volume or unusual communication patterns, can be fagged as poten-
tial threats, prompting further investigation and response.
It is important to note that the security of communication networks requires a multi-
layered approach that combines machine learning techniques with other security
34 Acta Wasaensia 
measures such as frewalls, intrusion detection systems, and secure protocols. Ma-
chine learning algorithms complement and enhance existing security mechanisms,
providing intelligent and adaptive capabilities to protect communication channels
within the smart grid. In summary, machine learning algorithms offer valuable
contributions to securing communication networks in smart grids. They can assist
in encryption, authentication, tampering detection, optimization of communication
protocols, anomaly detection, and intrusion detection. By leveraging the power
of machine learning, smart grids can establish robust and resilient communication
infrastructures that protect against eavesdropping, tampering, and replay attacks,
ensuring the secure and reliable transmission of data within the grid.
Threat intelligence and risk assessment: Threat intelligence and risk assessment
are crucial aspects of maintaining the security and resilience of smart grid systems.
Machine learning algorithms can collect, analyze, and interpret vast amounts of
security-related data to provide valuable insights into potential risks and vulnerabil-
ities. By mining and processing data from diverse sources, these algorithms enable
security teams to prioritize threats, assess their impact, and develop effective mitiga-
tion strategies. Once the data is collected, machine learning algorithms can analyze
and process it to identify patterns, correlations, and emerging trends. These algo-
rithms can learn from historical data and discover relationships between different
threat indicators, helping to identify potential attack vectors or vulnerabilities spe-
cifc to the smart grid environment. By detecting patterns associated with known
attacks or indicators of compromise, the algorithms can provide early warnings and
insights into potential risks.
Risk assessment is a critical component of security management. Machine learning
algorithms can contribute to risk assessment by evaluating the impact and likeli-
hood of various threats (Lamba et al., 2019). By considering factors such as the
severity of vulnerabilities, the possibility of exploitation, and the potential impact
(Mohammed et al., 2023) on the smart grid infrastructure, these algorithms can help
prioritize risks and guide the allocation of resources for mitigation efforts. They can
also assist in developing risk-scoring models. These models assign scores or ratings
to different threats based on their potential impact and likelihood. By integrating
various data sources and applying advanced analytics techniques, the algorithms
can quantitatively assess risks, enabling security teams to prioritize and address the
most critical threats frst.
Furthermore, the algorithms can aid in the development of effective mitigation
strategies. These algorithms can learn from past experiences and recommend suit-
able countermeasures for specifc threats by analyzing historical data on success-
ful and unsuccessful mitigation efforts. They can identify patterns of successful
responses and provide insights into the most effective mitigation techniques for
different types of attacks. The continuous learning and adaptation capabilities of
machine learning algorithms make them well-suited for the dynamic nature of the
Acta Wasaensia 35
threat landscape. These algorithms can adapt to changing attack techniques, evolv-
ing vulnerabilities, and emerging risks. By continuously analyzing new data and
learning from ongoing incidents, the algorithms can update their models and pro-
vide timely and relevant threat intelligence to security teams.
3.2.1 Decision tree 
Decision trees are a popular and widely used machine learning algorithm that can
be applied to a variety of problem domains, including cyber-security in the smart
grid. Decision trees are supervised machine learning algorithms that learn from
labeled training data to make predictions or decisions (Charbuty & Abdulazeez,
2021). They construct a tree-like model where each internal node represents a fea-
ture or attribute, each branch represents a decision rule based on that attribute, and
each leaf/terminal node represents a class label or an outcome (Pandey & Kumar
Sharma, 2013).
Figure 8: Pictorial representation of a decision tree structure.
The construction of a decision tree involves recursively partitioning the training
data based on different features, selecting the best feature that provides the most
signifcant information gain or impurity reduction at each step. The goal is to create
partitions that are as pure as possible, meaning they contain mostly instances of a
single class. This process results in a tree structure that can be used for classifcation
or regression tasks. It works by recursively partitioning the training data based on
different features, selecting the best feature that provides the most signifcant infor-
mation gain or impurity reduction at each step. The overview of how the decision
tree algorithm works are:
36 Acta Wasaensia 
Data preparation: The decision tree algorithm begins by preparing the training data.
Each data instance contains features or attributes and a corresponding class label or
outcome. If the data contains categorical features, they may need to be encoded into
numerical values to be processed by the algorithm. Attribute Selection: The algo-
rithm selects the best attribute or feature that will serve as the root of the decision
tree. The attribute selection is based on a measure of impurity or information gain,
which evaluates how well an attribute can split the data and improve the classifca-
tion or regression performance. Common impurity measures include Gini impurity
and information gain (based on entropy).
Partitioning: Once the root attribute is selected, the data is partitioned into subsets
based on the attribute’s possible values. Each subset corresponds to a branch of
the decision tree emanating from the root node. The partitioning process divides
the data based on the attribute’s values, assigning each instance to the appropriate
branch.
Recursion: The algorithm recursively repeats attribute selection and partitioning for
each subset or branch created in the previous step. It selects the best attribute for
each subset based on the impurity or information gain criterion. This process con-
tinues until a stopping condition is met, such as reaching a predefned maximum
depth or having a minimum number of instances in a leaf node. Leaf node creation:
At each step, if a stopping condition is met, the algorithm creates a leaf node and
assigns a class label or outcome based on the majority class of the instances in that
subset. For regression tasks, the leaf node may contain a predicted numerical value
based on the average or median of the instances in that subset.
Tree pruning: After the decision tree is constructed, pruning techniques may be
applied to reduce overftting. Pruning involves removing branches or nodes that do
not contribute signifcantly to improving the overall performance on unseen data.
This helps the decision tree generalize better and avoid memorizing the training
examples. Prediction: Once the decision tree is constructed, it can be used to make
predictions on unseen data. For a given instance, the algorithm follows the decision
rules in the tree, traversing from the root to a leaf node based on the attribute values
of the instance. The predicted class label or outcome is the value associated with
the leaf node reached. Some key characteristics and advantages of decision trees
are:
1. Interpretability: Decision trees offer high interpretability, as humans can eas-
ily visualize and understand the resulting tree structure. Each decision rule
and branching point in the tree can be interpreted as a set of conditions or
criteria that lead to a particular outcome. This interpretability makes decision
trees valuable for gaining insights into the decision-making process.
2. Handling both categorical and numerical data: Decision trees can handle both
categorical and numerical features, making them versatile for various types
of data in the context of smart grids. The algorithm can handle features with
Acta Wasaensia 37
discrete categories or continuous values and automatically determines the best
splitting points during the tree construction process.
3. Feature importance and selection: Decision trees provide a measure of feature
importance by evaluating the contribution of each feature to the overall tree
structure. Features higher up in the tree and resulting in signifcant impurity
reduction or information gain are considered more important. This informa-
tion can be utilized for feature selection and identifying the most infuential
factors in cyber-security analysis.
4. Non-linear relationships: Decision trees are capable of capturing non-linear
relationships between features and class labels. By recursively partitioning
the data, decision trees can effectively model complex decision boundaries
that separate different classes.
5. Robustness to outliers and noise: Decision trees are relatively robust to out-
liers and noisy data points. The tree construction process is guided by impu-
rity reduction, making it less sensitive to individual data points that deviate
from the majority. However, preprocessing and cleaning the data is still im-
portant to minimize the impact of outliers and noise.
6. Scalability and effciency: Decision tree algorithms, such as the popular clas-
sifcation and regression trees (CART) algorithm, have effcient implementa-
tion techniques and can handle large datasets with reasonable computational
resources. The prediction time complexity of decision trees is logarithmic
concerning the number of instances in the tree, making them suitable for real-
time applications in smart grids.
While decision trees offer several advantages, they also have some limitations:
Overftting: Decision trees are prone to overftting, particularly when the tree be-
comes too complex and captures noise or specifc characteristics of the training
data. Overftting occurs when the tree memorizes the training examples instead of
generalizing well to unseen data. Techniques like pruning, maximum depth setting,
or ensemble methods like random forests can help alleviate overftting. Lack of
robustness to small changes: Decision trees are sensitive to small changes in the
training data, which can lead to different tree structures. This sensitivity can make
decision trees less stable compared to other algorithms. Ensemble methods like
random forests can address this issue by combining multiple decision trees to make
more robust predictions. Bias towards features with more levels: Decision trees
favor features with more levels or categories during tree construction. This bias can
lead to uneven splits and potentially overlook important features with fewer levels.
3.2.1.1 Classifcation and regression trees
CARTs are powerful machine learning algorithms that can be used for both clas-
sifcation and regression tasks. CART builds a binary tree structure by recursively
38 Acta Wasaensia 
partitioning the input space based on feature values, resulting in a series of decision
rules (Taghavinejad et al., 2020). These decision rules are used to classify or pre-
dict the target variable for new data instances. As a result, because it enables feature
selection, it has emerged as the perfect tool for data analysis. Regression trees are
frequently used when the goal variable has a numerical form, and the mean of the
responses in a given region will serve as the terminal node’s mean value. The mean
value will be used to predict any new or unusual data or observations. When the
targeted variable is categorical, classifcation techniques are used, and the response
mode throughout that region will represent the value found at the terminal node.
Thus, any new data or observation within that category will have its prediction
made based on the modal value. CART algorithms have several key characteristics
and benefts:
Tree structure: CART algorithms create a tree-like structure where each internal
node represents a feature test or decision rule, and each leaf node represents a class
label (in classifcation) or a predicted value (in regression). The tree structure pro-
vides an intuitive representation of the decision-making process.
Recursive partitioning: CART employs a top-down recursive partitioning approach.
It begins with the entire dataset and recursively splits it into subsets based on the
values of different features. The splitting process continues until a stopping crite-
rion is met, such as reaching a maximum tree depth, achieving a minimum number
of samples per leaf, or when further splitting does not improve the model’s perfor-
mance signifcantly.
Feature selection: CART determines the optimal feature and splitting point at each
internal node based on criteria that maximize the separation between classes or
minimize the variance in the target variable. Gini impurity and information gain
are commonly used criteria for classifcation, while mean squared error and mean
absolute error are used for regression.
Handling categorical and numeric features: CART algorithms can handle both
categorical and numeric features. The algorithm partitions the data based on equal-
ity or inequality with specifc categories for categorical features. The algorithm
selects a threshold for numeric features to split the data into two groups based on
whether the feature value is greater than or equal to the threshold.
Model interpretability: CART models are highly interpretable. The decision rules
represented by the tree structure provide clear explanations for the classifcation or
regression outcomes. It is easy to understand how each feature contributes to the
fnal prediction and trace the decision path through the tree.
Handling missing values and outliers: CART algorithms can assign them to the
most probable class or use surrogate splits. They are also robust to outliers since
the splitting criteria are based on relative measures, rather than absolute values.
Ensemble methods: CART can be combined with ensemble methods, such as ran-
dom forests or gradient boosting, to improve predictive performance. These ensem-
ble methods create multiple CART models and combine their predictions to achieve
Acta Wasaensia 39
better generalization and reduce overftting.
Zooming on to some various applications of the CART algorithms:
Classifcation: CART is widely used for classifcation tasks, such as email spam
detection, fraud detection, disease diagnosis, and sentiment analysis. It can handle
both binary and multi-class classifcation problems.
Regression: CART can also be applied to regression problems, such as predicting
house prices, demand forecasting, or energy consumption prediction. It builds a re-
gression tree to estimate the continuous target variable based on the input features.
Feature selection: CART can be used to select relevant features by measuring their
importance in the tree construction process. Features that contribute signifcantly to
the tree structure are important and can be used for subsequent analysis.
Anomaly detection: CART can be utilized for anomaly detection by constructing a
classifcation tree and identifying instances that deviate signifcantly from the nor-
mal class. In conclusion, CARTs are versatile machine learning algorithms that can
handle both classifcation and regression tasks. They provide interpretable models,
handle missing values and outliers, and can be combined with ensemble methods.
CART algorithms have a wide range of applications in various domains, making
them valuable tools in data analysis and decision-making processes.
3.2.1.2 Entropy
Entropy is a concept from information theory that measures the impurity or disorder
in a data set. In the context of machine learning, entropy is commonly used as a
measure of impurity or disorder in a dataset. In the context of decision tree algo-
rithms like CART, entropy is used as a criterion to determine the quality of splits
during the tree-building process. In classifcation tasks, entropy is used to quantify
the uncertainty or randomness associated with the distribution of class labels in a
dataset. The entropy of a node represents the impurity of that node. A node with
low entropy means the class labels are relatively pure, while a node with high en-
tropy indicates a more mixed distribution of class labels. The entropy of node N is
calculated using the following formula (Safavian and Landgrebe, 1991):
c 
E(N) = − å Pi × log2(Pi) (1)
i=1
where pi is the proportion of instances belonging to class i in node N and c denotes
the number of classes (Kurniabudi et al., 2020). The summation goes over all the
classes in the dataset. The entropy value ranges from 0 to log2(C) of the number of
classes in the dataset. A value of 0 indicates a node with pure class labels (all in-
stances belong to the same class), while a value of 1 indicates a node with an equal
distribution of class labels (maximum impurity). If you have a binary classifcation
40 Acta Wasaensia 
problem (two classes), then the maximum entropy value is 1. However, if there are
more than two classes, the maximum entropy value can be higher. In general, the
maximum entropy value for a dataset with C classes is log2(C).
In the context of CART, entropy is used to evaluate the quality of a split. When
deciding which feature to use for splitting a node, CART considers the reduction in
entropy achieved by the split. The idea is to select the feature and threshold result-
ing in the largest entropy decrease among the child nodes. The split that minimizes
entropy is considered the most informative and provides the greatest separation of
classes. The reduction in entropy, often referred to as information gain, is calcu-
lated by subtracting the weighted average of the child nodes entropies from the
parent node’s entropy. The information gain quantifes the amount of information
gained by splitting the data based on a particular feature. CART uses information
gain (or Gini impurity, another node impurity measure) as the splitting criterion to
construct decision trees. The algorithm recursively applies this criterion to select
the best splits and build a tree until a stopping criterion is met. By using entropy as
a measure of impurity and information gain as the splitting criterion, CART aims
to create decision trees that effectively separate classes and make accurate predic-
tions. The nodes with lower entropy represent more homogeneous subsets of data,
allowing for better classifcation accuracy and interpretability of the resulting tree
model.
3.2.1.3 Information gain
Information gain is a measure used in decision tree algorithms, such as CART, to
evaluate the usefulness of a feature for splitting a node and building a decision
tree. It quantifes the amount of information gained by splitting the data based on
a particular feature. The amount of information that a node provides is measured
by information gain. It evaluates how successfully a feature categorizes the class,
where the maximum information-providing node is chosen. In other words, the
information gain determines the entropy’s decrease by quantifying the amount of
information gained after the data split. The information gain is calculated by com-
paring the entropy (or Gini impurity) of the parent node with the weighted average
of the entropies (or Gini impurities) of the resulting child nodes after the split. The
idea is to select the feature and threshold that maximize the information gain, which
indicates the most informative split. The information gain can be computed using
the following formula:
Gain = Eparent − å[weighted average of child entropies] (2)
where Eparent is the entropy of the parent node or the Gini impurity of the parent
Acta Wasaensia 41
node before the split. å represents the summation of all child nodes resulting from
the split. The weighted average of child entropies or child Gini impurities is the
average entropy or Gini impurity of each child node, weighted by the proportion of
instances in that child node.
A more signifcant information gain implies a better split since it shows that the split
successfully divides the classes and lessens uncertainty or impurity in the resulting
child nodes. A feature with a larger information gain is seen as more informative
and offers more class separation. After assessing the information gain for each
feature, the CART algorithm chooses the feature with the largest information gain as
the splitting criterion for the current node. Until a stopping requirement is satisfed,
such as reaching a maximum tree depth or a minimum number of instances per leaf,
this process is repeated iteratively for each child node. By maximizing information
gain, CART aims to create decision trees that effciently partition the data based
on the most informative features, resulting in nodes with lower entropy or Gini
impurity and improved classifcation accuracy. It is important to note that while
information gain is a commonly used criterion, other measures such as gain ratio
and Gini index can also be used in decision tree algorithms to assess the quality
of splits. The choice of criterion depends on the specifc algorithm and problem at
hand.
3.2.1.4 Gini index
The decision tree methods employed in CART, notably the Gini index, measure
diversity or impurity. Based on the distribution of class labels in that node, it cal-
culates the likelihood of incorrectly categorizing a randomly selected instance from
that node. The Gini index is used to assess a node’s impurity in terms of the distri-
bution of class labels in the context of classifcation tasks. A node with a low Gini
index is considered relatively pure, meaning that most of its instances are members
of one class. On the other hand, a node with a high Gini index suggests a more er-
ratic distribution of class labels, which denotes increased impurity. The Gini index
of a node N is calculated using the following formula:
c 
Gini(N) = − å Pi × (1− Pi) (3)
i=1
where pi is the proportion of instances belonging to class i in node N. The summa-
tion is performed over all unique class labels in node N. The Gini index ranges from
0 to 1, where a value of 0 represents a node with pure class labels (all instances be-
long to the same class), and a value of 1 represents a node with an equal distribution
of class labels (maximum impurity).
42 Acta Wasaensia 
In the context of CART, the Gini index is used as a criterion to evaluate the quality
of splits during the tree-building process. When deciding which feature to use for
splitting a node, CART considers the reduction in the Gini index achieved by the
split. The idea is to select the feature and threshold that result in the largest de-
crease in the Gini index among the child nodes. The split that minimizes the Gini
index is considered the most informative and provides the greatest separation of
classes. Similar to information gain, CART aims to create decision trees that effec-
tively separate classes by selecting splits that minimize the Gini index. The nodes
with lower Gini index represent more homogeneous subsets of data, allowing for
better classifcation accuracy and interpretability of the resulting tree model. It is
important to note that information gain and the Gini index are two commonly used
criteria for evaluating splits in decision tree algorithms. Both measures similarly as-
sess the quality of splits and select informative features for building decision trees.
The choice between information gain and the Gini index depends on the specifc
algorithm and problem context.
3.2.2 Support vector machine 
SVM is a powerful and widely used machine learning algorithm for both classi-
fcation and regression tasks. It effectively solves complex problems with high-
dimensional data and non-linear decision boundaries. It maps input data into a
high-dimensional space and attempts to fnd an optimal hyperplane that separates
different classes, such as normal and attack instances. SVMs are effective in han-
dling complex and non-linear data, making them suitable for detecting sophisticated
cyber-attacks. They are based on the concept of fnding an optimal hyperplane that
maximally separates the data into different classes or predicts continuous values (A.
Halimaa, K. Sundarakantham 2019).
Considering a data set of n points as (x1;y1); :::;(xn;yn) where y1 represents the
class to which xi belongs by taking on a value of -1 or 1. Each xi is a p-dimensional
real vector. The goal is to determine the ”maximum-margin hyperplane” that di-
vides the group of points for which yi = −1 and the group of points for which yi = 1.
This is to ensure the distance between the hyperplane and the nearest xi point from
both groups is maximized. In other words, we want to fnd a hyperplane that acts
as a decision boundary between the two classes, ensuring the largest possible gap
or margin between the points of each class. This margin should be such that the
hyperplane is equidistant from the nearest points in both classes. The task at hand
involves a binary classifcation problem where the goal is to fnd an optimal hyper-
plane that achieves the best separation between the two classes while maximizing
the distance to the closest data points. Any hyperplane can be expressed as the sat-
isfed set of x as
WT × x − b = 0 (4)
Acta Wasaensia 43
where W is the not necessarily normalization vector. The decision boundaries (pos-
itive and negative) parallel to the main hyperplanes are expressed respectively as
follows:
w0+ wT × x+ve = 1 (5)
w0+ wT × x−ve = −1 (6)
subtracting equations (4) and (5) from each other yields
wT × (x+ve − x−ve) = −1 (7)
normalizing equation (6) by the length of the vector w defned ass 
m 
||w|| = å w2j (8)
j=1
Thus, arriving at
wT (x+ve − x−ve) 2
= (9)||w|| ||w|| 
The margin to get maximized, can then be deduced from the left side of the pre-
ceding equation as the distance between the positive and negative hyperplane. A
constraint must be added on either side to keep the margin at its maximum to pre-
vent the data points from falling within the margin. This can be written as
w0+ wT × xi ≥ 1; if yi = 1 (10)
w0+ wT × xi ≥ 1; if yi = −1 (11)
Equations (10) and (11) aim to distinguish between negative and positive samples
by utilizing hyperplanes. For the negative samples, the equation ensures that all of
them fall on one side of the negative hyperplane. This means that when the negative
samples are projected onto the hyperplane, they should be classifed as negative by
the model. Similarly, for the positive samples, the equation guarantees that they
should fall behind the positive hyperplane. When the positive samples are projected
44 Acta Wasaensia 
onto the positive hyperplane, the model should classify them as positive.
By formulating these conditions mathematically, the equations establish the nec-
essary criteria for separating of negative and positive samples in the feature space.
The model can then use these hyperplanes to make accurate predictions and classify
future instances as either negative or positive based on their position relative to the
hyperplanes. This can be written as
yi(w0+ wT × xi) ≥ 1; for all 1≤ i ≤ n (12)
To get the optimization problem,
minimize ||w||22w;w0
subject to yi(xi + wT x j) ≥ 1 ∀i ∈ 1; :::;n (13)
The hinge loss function is commonly used in SVM to handle cases where the data is
not linearly separable. The hinge loss function introduces a notion of a soft margin 
that allows for some misclassifcation while still aiming to maximize the margin
between the decision boundary and the data points. The hinge loss function is
defned as follows:
max(0;1− yi(w0+ wT × xi)) (14)
It measures the degree of misclassifcation or violation of the margin constraint. If
a data point is correctly classifed and falls on the correct side of the margin, the
hinge loss function will yield a value of zero, indicating that there is no penalty for
that particular sample. The optimization’s objective is to reduce.
" # 
n 
Tl ||w||2+ 1å max 
� 
0;1− yi(w0+ w × xi)) (15)n i=1
where l > 0 determines the trade-off between increasing the margin size and en-
suring that the xi falls on the correct side of the margin. This optimization problem
can be reduced to the following by dissecting the hinge loss:
Acta Wasaensia 45
n 
minimize ||w||22+C å z w;w0 i=1
subject to yi(xi + wT xi) ≥ 1 ∀i ∈ 1; :::;n (16)
We may then regulate the penalty for misclassifcation using the variable C. Large
values of C result in signifcant error penalties, whereas lesser values of C result in
less stringent penalties for misclassifcation errors.
Figure 9: Support vector machine.
Extensive overviews of SVMs are:
1. Basic concept: The fundamental idea behind SVM is to fnd a hyperplane that
best separates the data into distinct classes. In a binary classifcation scenario,
the hyperplane is a decision boundary that maximizes the margin, which is the
distance between the hyperplane and the closest data points from each class,
known as support vectors. SVM aims to fnd the hyperplane that achieves the
maximum margin while minimizing classifcation errors.
2. Linear SVM: Linear SVM deals with linearly separable data, where a straight
line or hyperplane can separate the data into different classes. The goal is to
fnd the hyperplane that maximizes the margin. This optimization problem
is formulated as a quadratic programming problem, where the objective is to
minimize the norm of the weight vector while satisfying certain constraints.
The support vectors, which lie closest to the decision boundary, are critical
for defning the hyperplane.
3. Non-linear SVM: In many real-world scenarios, data is not linearly separa-
ble, and a linear hyperplane cannot effectively classify the data. Non-linear
46 Acta Wasaensia 
SVM addresses this by transforming the original feature space into a higher-
dimensional space using the kernel trick. The kernel function computes the
inner products between pairs of transformed data points without explicitly
computing the transformation itself. This allows SVM to implicitly operate
in a higher-dimensional space, where the data may become linearly separa-
ble. Commonly used kernel functions include linear, polynomial, Gaussian
radial basis function, and sigmoid.
4. Margin and soft margin: The margin in SVM refers to the region between the
decision boundary and the support vectors. SVM aims to fnd the hyperplane
with the maximum margin, which is believed to generalize better on unseen
data. However, in practice, data may not be perfectly separable, or there
may be outliers. A soft margin approach is introduced to handle such cases,
allowing some misclassifcations and errors. The objective becomes a trade-
off between maximizing the margin and minimizing the classifcation errors
or violations of the margin.
5. C-parameter: The C-parameter in SVM controls the regularization or the bal-
ance between achieving a larger margin and allowing misclassifcations. A
smaller C-parameter results in a wider margin and tolerates more misclassif-
cations, leading to a more robust model but potentially with lower accuracy.
A larger C-parameter emphasizes classifcation accuracy and may lead to a
narrower margin, potentially resulting in overftting the training data.
6. Multi-class classifcation: SVM is naturally a binary classifer but can be
extended to handle multi-class classifcation problems. One approach is to
use the one-vs-rest (OvR) strategy, where multiple binary SVM classifers
are trained, each distinguishing one class from the rest. Another approach is
the one-vs-one (OvO) strategy, where separate binary classifers are trained
for each pair of classes. The fnal prediction is made by majority voting or by
considering pairwise classifcation results.
7. SVM regression: In addition to classifcation, SVM can be used for regression
tasks, where the goal is to predict continuous values. The objective is to fnd
a hyperplane that maximizes the margin while allowing a specifed margin of
tolerance for deviations or errors. The prediction is based on the distance of
the test instance from the hyperplane.
SVM offers several advantages, including:
• Effective in high-dimensional spaces: SVM can handle datasets with a large
number of features and is less susceptible to the ”curse of dimensionality” 
compared to other algorithms. It is able to fnd complex decision boundaries
and capture intricate relationships between features.
Acta Wasaensia 47
• Robust to outliers: SVM is less affected by outliers since it focuses on the
support vectors closest to the decision boundary. Outliers have a minimal
impact on the fnal hyperplane.
• Versatility through kernel functions: Using kernel functions allows SVM
to handle non-linearly separable data by implicitly mapping it to a higher-
dimensional space. This fexibility enables SVM to capture complex patterns
and relationships.
• Global solution: SVM aims to fnd the optimal hyperplane that maximizes
the margin. The solution is determined by the support vectors, which are a
subset of the training data. Thus, the solution is not dependent on the entire
dataset and is more likely to generalize well.
However, the SVM has some limitations, which include:
• Computational complexity: SVM can become computationally expensive, es-
pecially when dealing with large datasets or high-dimensional feature spaces.
SVM’s training time complexity is generally cubic concerning the number of
training instances. However, various optimization techniques, such as eff-
cient solvers and kernel approximations, can mitigate this issue.
• Sensitivity to parameter tuning: SVM performance heavily relies on appropri-
ate parameter selection, including the choice of the kernel function and reg-
ularization parameter C. These parameters are often selected through cross-
validation or grid search, which can be time-consuming and require expertise.
• Interpretability: The resulting model may lack interpretability, while SVM
provides accurate predictions. The decision boundary is represented by a
complex hyperplane in a transformed feature space, making it challenging to
directly interpret the relationship between the original features and the pre-
dictions.
• Memory requirements: SVM models require storing the support vectors, which
can be memory-intensive, especially when dealing with large datasets. Addi-
tionally, in the case of non-linear kernels, the kernel matrix may need to be
stored, resulting in increased memory usage.
• Regularization control: The C-parameter in SVM allows users to control the
balance between the margin size and the number of misclassifcation, pro-
viding fexibility in managing the trade-off between model complexity and
training accuracy.
In conclusion, SVMs are versatile and powerful machine learning algorithms that
effectively handle classifcation and regression tasks. They are particularly benef-
cial in scenarios with high-dimensional data, non-linear relationships, and the need
48 Acta Wasaensia 
for robustness to outliers. However, the computational complexity and parameter
tuning challenges should be considered, and interpretability may be limited in some
cases. SVM remains a valuable tool in the machine learning toolkit, offering accu-
rate predictions and the ability to capture complex patterns in the data.
3.2.3 Random forests 
Random forests are like a bustling forest of decision trees, working together to
create a robust and accurate prediction system. Imagine stepping into this vibrant
ecosystem, where each tree has its unique character and role. As you enter the
forest, you notice the diversity and randomness surrounding you. This is the key
strength of random forests. Each decision tree is constructed using a random subset
of the training data, creating an element of unpredictability (Mellor et al., 2015).
It is as if each tree has its own story, having been trained on a different data slice.
Moving deeper into the forest, you witness the individual trees at work. They in-
dependently grow and branch out, forming their own rules and decisions. These
decision trees, with their branches representing different features and attribute val-
ues, capture the essence of the data they were trained on (Kulkarni & Sinha, 2012).
It is fascinating to see how each tree learns from its unique perspective, forming
its understanding of the patterns and relationships within the data. But random
forests are not just a collection of independent trees. They are a tightly-knit com-
munity, collaborating to make accurate predictions. The magic happens when the
forest comes together, combining the wisdom of all the trees. It is like a demo-
cratic voting system, where each decision tree gets a say in the fnal prediction. In
classifcation tasks, the class receiving the most votes becomes the predicted class.
This voting mechanism ensures a robust and reliable prediction, as it considers the
collective intelligence of the entire forest (Long et al., 2019).
Random forests are incredibly resilient to outliers and noisy data points. Outliers
may infuence a single decision tree, but their impact is diluted when considering
the majority vote of multiple trees. The forest as a whole is less swayed by these
outliers, providing a more balanced and accurate prediction. One striking feature
of random forests is their ability to handle high-dimensional data (Belgiu & Dragu,
2016). It is as if these trees have developed a knack for navigating through the
complexity of the forest. They effortlessly evaluate different features and select the
most informative ones at each split. This adaptability allows them to capture intri-
cate relationships, making them highly effective in scenarios where the data exhibits
non-linear patterns.
Moreover, random forests are renowned for their versatility. They can seamlessly
transition between classifcation and regression tasks. In regression, instead of pre-
dicting classes, the forest predicts numerical values based on the collective knowl-
edge of the trees. It is like the forest whispering its combined wisdom to estimate
Acta Wasaensia 49
continuous outcomes. Each tree is crucial in this lively and dynamic forest, but the
random forests’ collective power shines through. The forest thrives on the principle
that the whole is greater than the sum of its parts. It brings together diverse perspec-
tives, harnesses the strength of randomness, and leverages the voting mechanism to
create a robust and accurate prediction system. In conclusion, random forests are
an ensemble learning technique that combines multiple decision trees to improve
accuracy and robustness in IDS. Random forests create a set of decision trees using
different training data and feature subsets. The fnal classifcation is based on the
individual decision trees’ majority vote or average prediction. Random forests ef-
fectively handle in handling high-dimensional data and can detect various types of
cyber-attacks.
3.2.4 Deep learning 
Deep learning is a subfeld of machine learning that focuses on training artif-
cial neural networks (ANN) with multiple layers, known as deep neural networks
(DNN), to learn and extract hierarchical representations from data (Taji et al., 2018).
It has revolutionized various domains, including computer vision, natural language
processing, speech recognition, and many others, by achieving state-of-the-art per-
formance on complex tasks (Jamil et al., 2021).
Figure 10: Description of an artifcial neural network.
Deep learning heavily relies on ANNs, which are inspired by the structure and func-
tioning of biological brains. ANNs consist of interconnected nodes called neurons,
organized in layers. Each neuron receives input, performs a computation, and gen-
erates an output. Neural networks can have multiple layers, including an input layer,
 50 Acta Wasaensia 
one or more hidden layers, and an output layer (Buczak & Guven, 2016; Niculescu,
2003). Deep learning extends traditional neural networks by adding more hidden
layers, resulting in DNN. These deep architectures allow more complex and abstract
representations to be learned from the data (Sarker, 2022b).
Deep learning excels at representation learning, which involves automatically learn-
ing hierarchical representations of data. In DNNs, each layer learns progressively
more abstract features by building upon the representations learned in the previous
layers. The initial layers capture low-level features, such as edges or textures, while
subsequent layers combine these features to represent higher-level concepts. With
DNNs, the learning process becomes more effcient at capturing intricate patterns
and relationships, leading to improved performance on challenging tasks. As we
have inputs fowing through weighting and processing to the output layer by layer,
we have a feedforward neural network that can be mathematically represented as
given in (Deb et al., 2018) ! 
y = f 
N 
å xiw1 (17)
i=1
This hierarchical representation learning is a crucial factor behind the success of
deep learning in capturing complex patterns. They are trained using a technique
called backpropagation. During training, the network receives input data, makes
predictions, and compares them to the true labels. The error or loss between the
predicted and true labels is then backpropagated through the network, adjusting the
weights and biases of each neuron to minimize the error. This is illustrated in Figure
9 (M. Elmusrati, 2022).
Figure 11: Illustration of single-layer neural network
   
 
Acta Wasaensia 51
Considering that x1
k = 1, the w1 is the bias (offset) of the neuron.  
k k k k k k = f x 2+ ::: + x (18)yk 1w1+ x2w NwN 
The error between the desired output and the obtained output is computed as
ek = yd;k − yt (19)
The learning principle is based on adjusting the weight to reduce the cost function.
The cost function is an average error-related norm. To reduce the average error be-
tween the desired and actual outputs while taking into account all learning samples.
This can be written as
h 2i h 2i 
J(wk)+ E = ek = E (yd;k − yt ) (20)
Substituting equation (18) into equation (20) yields (M. Elmusrati, 2022),h  N  2i k kJ(wk)+ E = E yd;k − f å xi wi (21)
i=1
This iterative optimization process, often coupled with gradient descent algorithms,
fne-tunes the network’s parameters to improve its predictive capabilities. Activa-
tion functions are essential in neural networks as they introduce non-linearity into
the computations of each neuron. Non-linear activation functions, such as the recti-
fed linear unit (ReLU), sigmoid, or hyperbolic tangent, enable the network to model
complex relationships and capture non-linear patterns in the data. The choice of ac-
tivation function infuences the network’s ability to learn and generalize from the
data.
A ReLU is a commonly used activation function in deep learning and ANNs. It
is a simple but powerful mathematical function that introduces non-linearity to the
network, enabling it to learn and model complex relationships in the data. ReLu is
mathematically written as (Javid et al., 2022; Qiumei et al., 2019)(
1 if x > 0
f (x) = (22)
0 if x < 0
where x represents a neuron’s input.
52 Acta Wasaensia 
Figure 12: Description of a ReLU activation function.
The sigmoid activation function is a well-known nonlinear function in ANN. Be-
cause of its shape, which resembles an S curve, it is also known as the logistic
function.
Figure 13: Description of a sigmoid activation function.
The sigmoid function is excellent for binary classifcation applications because it
transfers every real-valued number to a value between 0 and 1. The sigmoid func-
tion is mathematically expressed as (Javid et al., 2022; Qiumei et al., 2019):
1
f (z) (23)
(1+ e−z) 
Acta Wasaensia 53
It takes an input value and transforms it using the exponential function, subse-
quently scaling the result to a range between 0 and 1. One of the main issues is
that the sigmoid function saturates at extreme values, where the output becomes
close to 0 or 1.
This saturation can cause gradients to vanish, making it diffcult for the network
to learn and converge. Although it has some limitations, such as saturation and
vanishing gradients, it still fnds its use in specifc scenarios where the output needs
to be bounded between 0 and 1.
The mathematical expression of the hyperbolic tangent function is given by (Qiumei
et al., 2019)
ex − e−x 
f (x) (24)
ex + e−x 
In this formula, e represents Euler’s number (2.71828), and x is the input to the
function. The hyperbolic tangent function takes the difference of two exponential
terms and divides them by their sum. The hyperbolic tangent function shares some
similarities with the sigmoid function,
Figure 14: Description of a hyperbolic tangent activation function.
but it has a steeper slope around the origin, which means it can produce stronger
and more decisive activations. Like the sigmoid function, the hyperbolic tangent
function also exhibits saturation at extreme values, where the output approaches -1
or 1, leading to vanishing gradients. The advantages of using the hyperbolic tangent
function include its non-linearity and its ability to capture complex relationships in
the data. It is commonly used as an activation function in RNNs and certain feed-
forward neural networks. The zero-centered property of hyperbolic tangent can be
54 Acta Wasaensia 
benefcial in certain cases where the data distribution is symmetric around zero.
However, similar to the sigmoid function, the hyperbolic tangent function may suf-
fer from the vanishing gradient problem, especially in DNNs. When gradients be-
come extremely small, the network’s ability to learn and make meaningful updates
to its weights diminishes.
Convolutional neural networks (CNNs): CNNs are a specialized type of DNN
widely used in computer vision tasks. They leverage the concept of convolution,
where flters are applied to local regions of the input image, allowing the network
to learn spatial hierarchies of features. CNNs have proven to be highly effective in
tasks such as image classifcation, object detection, and image segmentation. CNNs
consist of convolutional layers, pooling layers, and fully connected layers.
Figure 15: Convolutional neural network.
Convolutional layers learn local patterns and features from input data by apply-
ing flters or kernels. These flters capture spatially local patterns like edges and
textures. Pooling layers reduce spatial dimensions while preserving important in-
formation. This pooling can be done in a variety of ways, such as by taking the
mean, maximum, or a learned linear combination of the neurons in the block. Al-
ways taking the maximum of the block they are pooling. They down sample feature
maps using operations like max pooling. Fully connected layers map high-level
features to output classes, enabling predictions based on learned features. CNNs
are trained using backpropagation, adjusting parameters to minimize the error be-
tween predicted and ground truth labels. They learn hierarchical representations,
with lower layers capturing low-level features and higher layers learning complex
and abstract features.
Acta Wasaensia 55
CNNs beneft from parameter sharing, where flters are applied to different image
regions, reducing parameters and improving computational effciency. Pretrained
CNN models like ResNet, and Inception, trained on large datasets like ImageNet,
demonstrate excellent transfer learning capabilities. These models can be fne-tuned
on specifc datasets with limited labeled data. CNNs have transformed computer vi-
sion because they can learn hierarchical features and achieve high accuracy. They
are integral to state-of-the-art models and crucial in advancing computer vision ap-
plications.
Figure 16: Recurrent neural network.
RNNs: RNNs are another important class of DNNs designed to process sequential
data, such as time series or natural language. RNNs have connections that allow
information to fow in loops, enabling them to capture temporal dependencies and
contextual information. RNNs are a type of ANN designed to process sequen-
tial and temporal data. Unlike feedforward neural networks, which process inputs
independently, RNNs have connections that form directed cycles, allowing them
to maintain and propagate information across different time steps (Jozwiak et al.,
2020).
The recurrent structure enables RNNs to capture dependencies and patterns in se-
quences, making them suitable for natural language processing, speech recognition,
time series analysis, and more. The fundamental building block of an RNN is the
recurrent neuron, which maintains an internal state or memory that evolves, as il-
lustrated in Figure 16. This memory allows the network to retain information about
past inputs and use it to infuence future predictions. At each time step, the recurrent
neuron combines an input with the previous state to produce an output and update
its internal memory. This process is repeated for every time step, creating a recur-
rent feedback loop. Mathematically, an RNN can be defned as
56 Acta Wasaensia 
  
ht = sh Whxt +Uhht −1+ bh (25)
where ht represents the hidden state or memory at time step t, xt is the input at time
step t, s is an activation function, W is the input weight matrix, U is the recurrent
weight matrix, and bh is the bias term. The input weight matrix W determines how
the current input xt is combined with the previously hidden state ht−1, while the
recurrent weight matrixU controls the infuence of the previous state on the current
state.
Various activation functions can be used in RNNs, such as the tanh or ReLU. These
functions introduce non-linearities to the network, enabling it to learn complex re-
lationships and capture non-linear dependencies in the data. One of the challenges
with traditional RNNs is the vanishing gradient problem, which occurs when the
gradients diminish exponentially as they propagate backward in time during the
training process. This issue limits the network’s ability to capture long-term de-
pendencies and can hinder learning. To mitigate this problem, several variants of
RNNs have been proposed, such as the long short-term memory (LSTM) (Dong et
al., 2018) and gated recurrent unit (GRU) architectures. These variants incorporate
gating mechanisms that allow the network to selectively update and forget informa-
tion, improving the model’s ability to handle long-term dependencies. LSTMs, for
example, have memory cells that can retain information over long periods. They
use gates, such as the input gate, forget gate, and output gate, to control the fow
of information into and out of the memory cells. This gating mechanism helps ad-
dress the vanishing gradient problem and allows LSTMs to capture and remember
important information across multiple time steps.
Training RNNs typically involves using techniques like backpropagation through
time (BPTT) or variants of it, which adapt the network’s parameters based on the
error between predicted and target outputs. Optimization aims to minimize the loss
function by adjusting the weights and biases using optimization algorithms such
as stochastic gradient descent (SGD) or its variants. RNNs can be used in vari-
ous applications. They can model and generate text in natural language processing,
perform sentiment analysis, and machine translation. In speech recognition, RNNs
can convert spoken language into written text. They are also effective in time se-
ries analysis, forecasting, and anomaly detection. In recent years, RNNs have been
further combined with other neural network architectures to create more powerful
models. For example, the combination of CNNs and RNNs, known as convolu-
tional recurrent neural networks (CRNNs), can process sequential data with both
local and temporal dependencies, as in image captioning tasks.
Acta Wasaensia 57
In conclusion, the LSTM is a variant of RNNs designed to address the limitations of
traditional RNNs, particularly in capturing and retaining long-term dependencies in
sequential data. LSTMs have become one of the most widely used architectures for
tasks involving sequential data, such as natural language processing, speech recog-
nition, and time series analysis. The key innovation of LSTM is the introduction
of memory cells, which allow the network to selectively retain and forget informa-
tion over extended periods. This memory mechanism enables LSTMs to effectively
capture long-range dependencies and overcome the vanishing gradient problem typ-
ically associated with traditional RNNs. At the core of an LSTM cell are three main
components: the forget gate, the input gate, and the output gate. These gates regu-
late the fow of information into and out of the memory cell, allowing the LSTM to
control the information it retains and utilizes.
Figure 17: Long short-term memory.
Forget gate: The forget gate decides what information in the memory cell should
be discarded. Similar to the input gate, it takes the current input and the previous
hidden state as input and applies a sigmoid activation function. The output of the
forget gate, ranging from 0 to 1, determines the amount of information to be dis-
carded from the memory cell. A value of 0 signifes complete forgetting, while a
value of 1 means the information is retained entirely.
  
ft = s Wf [ht−1;Xt ]+ b f (26)
Input gate: The input gate determines how much new information should be stored
in the memory cell. It takes the current input and the previous hidden state as in-
put and applies a sigmoid activation function to generate a value between 0 and 1
58 Acta Wasaensia 
(Rajput et al., 2021). This value represents the amount of information stored in the
memory cell, with 0 indicating no new information and 1 indicating full informa-
tion retention.
  
ii = s Wi[ht−1;Xt ]+ bi (27)
The new memory network is a neural network trained to generate a ”new memory
update vector” by integrating the prior hidden state with the current input data. This
vector contains information from the incoming data and considers the context pro-
vided by the preceding concealed state. The new memory update vector defnes how
much each long-term memory component (cell state) should be updated depending
on the most recent data.
  
Cˆt = tanh Wc[ht−1;Xt ]+ bc (28)
Output gate: The output gate determines how much of the memory cell’s content
should be exposed as the output of the LSTM cell. It takes the current input and the
previous hidden state as input and applies a sigmoid activation function.
  
Ot = s Wo[ht−1;Xt ]+ bo (29)
The updated cell state represents the updated long-term memory of the network.
The internal state is updated with this rule.
Ci = it ×Cˆ+ ft :Ct − 1 (30)
A tanh activation function is also applied to the current input and the previous hid-
den state. The output gate then multiplies the tanh output with the sigmoid output,
producing the fnal output of the LSTM cell.
ht = Ot × tanh(Ct ) (31)
Acta Wasaensia 59
These three gates work together to teach LSTMs to retain and use information over
long sequences selectively. The LSTM may learn which information is relevant to
retain, forget, or output by adjusting the weights and biases of these gates during
training. Deep LSTM networks can be formed by stacking LSTMs, allowing for
modeling progressively more intricate dependencies in sequential data. The output
of one LSTM layer is used as the input for the next layer, allowing the network
to record hierarchical data representations. Training LSTMs involves techniques
such as BPTT, which propagates gradients through the entire sequence, updating
the LSTM’s parameters to minimize the error between predicted and target outputs.
Optimization algorithms like SGD or its variants are commonly used for this pur-
pose. LSTM is a form of RNN designed to address the vanishing gradient problem
and capture long-term dependencies in sequential data. Memory cells and gating
mechanisms are used by LSTMs to selectively keep, forget, and output informa-
tion. They have been widely utilized in a variety of disciplines, including natural
language processing, speech recognition, and time series analysis, and have signif-
cantly increased neural networks capabilities in dealing with sequential data.
GRU is another variant of RNNs that, like LSTM, addresses the challenges of cap-
turing long-term dependencies in sequential data. GRU was introduced as a simpli-
fed alternative to LSTM, offering comparable performance and a simpler architec-
ture with fewer parameters. GRU retains the concept of memory cells and gating
mechanisms but combines the input and forget gates of LSTM into a single update
gate. This simplifcation allows for more effcient computation and more straight-
forward implementation. The main components of a GRU cell are the update gate
and the reset gate. These gates control the fow of information within the cell, en-
abling it to selectively update or reset the internal state.
• Update gate: The update gate in a GRU cell determines how much of the
past internal state should be combined with the current input. It inputs the
previous hidden state and the current input and applies a sigmoid activation
function. The output of the update gate, ranging from 0 to 1, represents the
portion of the previous hidden state to be retained. A value of 1 means fully
retaining the previous state, while a value of 0 means discarding it entirely.
• Reset gate: The reset gate regulates how much of the previous hidden state
should be forgotten or reset. It inputs the previous hidden state and the current
input and applies a sigmoid activation function. The output of the reset gate
determines the extent to which the previous hidden state should be reset. A
value of 1 signifes no resetting, while a value of 0 indicates a complete reset.
The combination of the update and reset gates allows the GRU cell to update its
internal state adaptively, selectively remembering or forgetting information based
on the current input and previous hidden state. It also introduces a new candidate
hidden state that is computed based on the reset gate and the current input. This
60 Acta Wasaensia 
candidate hidden state contains new information that can be potentially added to
the updated hidden state.
Mathematically, the computations in a GRU cell can be summarized as follows (Di-
aba et al., 2022):
 h i  
x˜ (t−1) tGU = s wU ;x + bU (32) h i  
x˜ (t−1) tGR = s wU ;x + bR (33)
Representing the update gate and the reset gate are the GU and GR respectively. The
range of the GRU gate ∈ {0;1}. Where wU stands for the weight function of the
update gate and wR stands for the weight function of the reset gate. The bias vec-
tors for the update and reset gates are denoted by bU and bR respectively. x˜ (t−1) is
the current unit input which is obtained from the previous unit output and xt is the
inputs of training data. Thus, the recurrent unit’s candidate activation function can
be written as
 h i  
x˜ (t) GR × x˜ (t−1) t = tanh wV ;x + bV (34)
where x˜ (t) is the candidate activation function,wV is the weight functions of the ac-
tivation function, xt is the inputs of training data, and bV is the bias vector. The
output of a single GRU unit is given correspondingly as
   
tx (t) = (1− GR) × x˜ (t−1) + GR ∗ x (35)
GRU networks can be stacked to form deep GRU architectures, allowing for the
modeling of complex sequential dependencies. The output of one GRU layer serves
as the input to the next layer, facilitating hierarchical representations. Training
GRU networks typically involves techniques such as BPTT, where gradients are
propagated through the entire sequence to update the network’s parameters.
Optimization algorithms like SGD or its variants, such as Adam or RMSprop, are
commonly employed in training GRU networks. GRUs have shown strong perfor-
mance in various sequential data applications, such as natural language processing,
speech recognition, and time series analysis. They offer a balance between mod-
eling capability and computational effciency, making them a popular choice when
capturing long-term dependencies is crucial.
Acta Wasaensia 61
Figure 18: Gated recurrent unit.
In summary, GRU is a variant of RNNs that addresses the challenge of capturing
long-term dependencies in sequential data. GRU simplifes the architecture by com-
bining the input and forget gates of LSTM into a single update gate and introducing
a reset gate (de Souza et al., 2022). The update gate controls how much of the
previous hidden state should be retained, while the reset gate determines the extent
to which the previous hidden state should be reset. This allows the GRU cell to
adaptively update its internal state, selectively remembering or forgetting informa-
tion based on the current input and previous hidden state. GRUs have demonstrated
strong performance in various applications, including natural language processing,
speech recognition, and time series analysis. They offer a balance between compu-
tational effciency and modeling capability, making them a popular choice for tasks
involving sequential data.
3.2.5 Linear regression 
Linear regression is a fundamental statistical technique that aims to model the re-
lationship between a dependent variable and one or more independent variables. It
belongs to the broader feld of regression analysis and is widely used in various
domains, including economics, fnance, social sciences, and machine learning. At
its core, linear regression assumes a linear relationship between the dependent and
independent variables. This implies that a straight line in a scatter plot can represent
the relationship.
The goal of linear regression is to estimate the parameters of this line to best ft
the observed data. The dependent variable, also known as the target variable or
response variable, is the variable being predicted or explained.
62 Acta Wasaensia 
Figure 19: The concept of regression.
The independent variables, also called predictors or explanatory variables, are the
variables used to predict or explain the dependent variable.
Given a dataset {yi;xi; :::;xni=1} of n statistical units, a linear regression model pre-
supposes that the relationship between the dependent variable y and the vector of
regressors x is linear. The relationship can be modeled as
y = ax + b (36)
Where x is the input, y is the output, a represents the slope coeffcient (the coeff-
cient associated with the independent variable x), and b represents the y-intercept
(the constant term). There are two primary types of linear regression: simple lin-
ear regression and multiple linear regression. Simple linear regression involves a
single independent variable, whereas multiple linear regression incorporates two or
more independent variables. The distinction lies in the number of predictors used
to explain the dependent variable. The multiple linear regression formula extends
the concept of one independent to include multiple independent variables as
y = b + a1x1+ a2x2+ ::: + anxn (37)
The linear regression model seeks to fnd the best-ft line that minimizes the dif-
Acta Wasaensia 63
ference between the predicted values and the actual values of the dependent vari-
able. This is termed an error with an error vector [e1;e2; :::en]. This difference is
often measured using a metric called the residual, which represents the vertical dis-
tance between the observed data points and the line. To estimate the parameters of
the line, the model uses a method called the least squares estimation (LSE). This
method calculates the sum of the squared residuals and adjusts the line parameters
to minimize this sum. The model aims to fnd the line that best fts the data by
minimizing the sum of squared residuals.
n n  2  2n1 1 12å å åx = yi − axi − byi − yˆ1 = (38)e1 = n n ni=1 i=1 i=1
Since the goal is to minimize the mean square error values, the Karush-Kuhn-Tucker
(KKT) conditions for the minimum point are applied as
¶x 
= 0
¶a 
and
¶x 
= 0
¶b 
Thus, solving the frst derivative yields (M. Elmusrati, 2022)
n  ¶x n n n2 å x21+ bå xi =åå xi yi − axi − b = 0⇒ a xiyi (39)= −¶a n i=1 i=1 i=1 i=1
n  ¶x 2å yi − axi − b = 0⇒ a åxi + nb = åyi (40)= −¶b n i=1
Equations (38) and Equation (39) in the matrix notation can be written as
     
ån 2 ån a ån i=1 xi i=1 xi i=1 xiyi = (41)
ån n b ån i=1 xi i=1 yi 
64 Acta Wasaensia 
   −1 2 åna åni=1 x åin =1 xi i=1 xiyi = i (42)b åin =1 xi n åin =1 yi 
It is important to note that linear regression assumes several underlying assumptions
to provide reliable results. These assumptions include linearity, independence of
errors, homoscedasticity/ homogeneity of variances (constant variance of residuals),
and normality of errors. Violations of these assumptions can affect the validity and
accuracy of the regression model.
3.2.5.1 Logistic regression
Logistic regression is a statistical model used to predict the probability of an event
occurring or not occurring in binary classifcation tasks. It is an extensively utilized
and interpretable machine learning and statistics algorithm (Phom et al., 2010). Lo-
gistic regression is used in numerous disciplines, including machine learning, med-
ical felds, and the social sciences. Logistic regression is an extension of linear
regression that uses a logistic function, also known as the sigmoid function as given
in Equation (23), to account for binary outcomes. The dependent variable or target
variable in logistic regression is binary or dichotomous, meaning it can take on one
of two values. Typically, these values are represented as 0 and 1, or occasionally as
negative and positive classes. The independent variables, also known as features or
predictors, can be categorical or continuous. The logistic regression model implies a
linear relationship between independent variables and the target variable’s log odds.
1
P(y = 1|x) (43)
(1+ e−x) 
1
P(y = 0|x) (44)
(1+ e−x) 
Where P(y = 1|x) represents the probability of the dependent variable y being one
given the predictors x and P(y = 0|x) represents the probability of the dependent
variable y being zero given the predictors x. Referring to Equation (23) and assum-
ing that z is the linear combination (b + a1x1+ a2x2+ ::: + anxn) of the predictors
and their corresponding coeffcients, substituting z into Equation (43) yields
1
P(y = 1|x) (45)
(1+ eb +a1x1+a2x2+:::+anxn ) 
Acta Wasaensia 65
The log-odds, also known as the logit function, is the natural logarithm of the prob-
ability that the objective variable is equal to 1. The logit (log odds) equals the
probability that an event will occur divided by the probability that it will not occur.
P(y = 1)
logit(P) = In = b + a1x1+ a2x2+ ::: + anxn (46)1− P(y = 1) 
exponentiating both sides result
P(y = 1) b +a1x1+a2x2+:::+anxn= e (47)
1− P(y = 1) 
The exponential function of the linear regression expression is identical to the prob-
ability that the dependent variable equals a case, given some linear combination x 
of the predictors. This shows how the logit acts as a link function between the prob-
ability and the equation for the linear regression. The logit gives an appropriate
criterion on which to execute linear regression, and the logit is easily transformed
back into the odds due to its range between zero and infnity. Thus, we write the
odds of the dependent variable matching a case as
b +a1x1+a2x2+:::+anxnodds = e (48)
For a continuous independent variable, the odds ratio can be de determined as
P(x+1) b +a1(x+1)odds(x + 1) 1−p(x+1) eOR = = (49)
odds(x) p(x) eb +a1x 
1−p(x) 
subjecting Equation (48) to the exponential rule eeh
m 
, Equation (48) can be rewritten
as
odds(x + 1) a1x1= e (50)
odds(x) 
Logistic regression offers several benefts for binary classifcation problems. It is a
straightforward and interpretable model that can shed light on the relationship be-
66 Acta Wasaensia 
tween predictors and the dependent variable. The coeffcients can be interpreted as
the quantity of change in the target variable’s log odds for a one-unit change in the
corresponding predictor, with all other predictors held constant. This interpretabil-
ity is particularly advantageous when communicating the results to non-technical
stakeholders.
In addition, logistic regression can handle both continuous and categorical predic-
tors. Typically, categorical predictors are embedded as dummy variables, with each
category represented as a binary variable. This enables the logistic regression model
to incorporate categorical data. Once the model has been trained, it can be used to
predict new data by computing the predicted probabilities using the logistic func-
tion. Using a certain threshold, probabilities can be converted into class designa-
tions. Generally, a threshold of 0.5 is employed, with predicted probabilities above
0.5 being classifed as positive and those below 0.5 as negative. However, optimiz-
ing the threshold value is important. It depends on the different risks of each class
as well as the imbalance between available training data in each class.
Nevertheless, logistic regression has limitations as well. It presupposes a linear re-
lationship between the predictors and the log odds, which may not be appropriate in
all situations. The model may not effectively represent the complex data patterns if
the relationship is nonlinear. In such instances, alternative nonlinear models, such
as decision trees or neural networks, may be more appropriate. Logistic regres-
sion assumes the independence of observations, meaning that each data point is
presumed independent of the others. It may not yield accurate results if there is a
correlation or dependence between the observations, as in time series or clustered
data. Specialized models such as generalized estimating equations (GEE) or mixed-
effects models may be preferable in such situations.
Furthermore, logistic regression assumes that the predictors have a linear relation-
ship with the log odds. If higher-order interactions or nonlinear relationships exist
between the predictors and the dependent variable, logistic regression may not ef-
fectively capture them. Techniques such as polynomial terms, interaction terms,
and basic functions can be used to introduce non-linearity into a model, but coef-
fcient interpretation becomes more complicated. Noting that logistic regression is
susceptible to overftting if the model is too complex relative to the available data is
essential. This is overftting when the model learns noise or random fuctuations in
the training data instead of the underlying patterns. Regularization techniques such
as L1 regularization (Lasso) and L2 regularization (Ridge) can be used to reduce
overftting by incorporating a penalty term into the loss function, which discour-
ages excessive coeffcient values.
In practice, logistic regression is utilized in numerous felds, such as healthcare,
fnance, marketing, engineering, and the social sciences. It can be used to predict
the likelihood of disease occurrence based on patient characteristics, determine the
Acta Wasaensia 67
probability of default in credit risk analysis, predict customer attrition in marketing,
predict the likelihood of cyber-attack, and assess the infuence of socio-economic
factors on educational outcomes.
3.3 Machine learning applications in cybersecurity 
Machine learning has become a potent tool in many felds, and cybersecurity is one
of them. It has signifcantly advanced this discipline. Traditional rule-based ap-
proaches are frequently insuffcient to identify and stop sophisticated attacks due
to the complexity and number of cyber threats that are on the rise. By enabling
automated analysis of massive volumes of data, the detection of abnormalities, and
the discovery of patterns that may suggest a hostile activity, machine learning tech-
niques have the potential to strengthen cybersecurity defenses. Several important
uses of machine learning in cybersecurity are listed below.
IDS: IDS is crucial for identifying and responding to network attacks. Traditional
rule-based IDS often struggle to keep pace with sophisticated attack techniques.
Machine learning algorithms, particularly supervised learning methods such as SVM,
random forests, and neural networks, have successfully detected various attacks.
These algorithms can learn from labeled datasets, capturing the characteristics of
both known and emerging threats. Machine learning-based IDS can identify anoma-
lies and fag potentially malicious activities by continuously analyzing network traf-
fc patterns, enabling prompt response and mitigation.
Malware detection: Malware continues to be a signifcant threat, constantly evolv-
ing to evade detection. Machine learning techniques offer effective solutions for
malware detection. Feature-based approaches extract relevant attributes from fles
or code snippets and use them as input to machine learning algorithms. These al-
gorithms can learn to distinguish between malicious and benign fles, leveraging
labeled datasets of known malware samples. Moreover, behavior-based approaches
analyze the actions of programs or processes, identifying malicious behavior pat-
terns indicative of malware. Machine learning models like hidden Markov models
(HMM) or RNNs can learn from these patterns and detect new and unknown mal-
ware variants.
Anomaly detection: Anomaly detection is crucial for identifying suspicious ac-
tivities or deviations from normal behavior. Machine learning provides effective
tools for anomaly detection in cybersecurity. Unsupervised learning algorithms,
including clustering, autoencoders, and one-class SVM, can learn the normal pat-
terns in data and fag instances that deviate signifcantly from those patterns. This
approach lets organizations detect novel attacks or unusual activities that tradi-
tional rule-based systems may miss (Ashok et al., 2017). Anomaly detection is
applied to network traffc analysis, system logs, user behavior monitoring, and
other cybersecurity-relevant data sources. Spam and phishing detection: Machine
learning plays a vital role in combating the ever-growing problem of spam emails
68 Acta Wasaensia 
and phishing attacks. Supervised learning algorithms, such as naive Bayes, deci-
sion trees, and logistic regression, are commonly used to classify emails as spam
or legitimate. These algorithms learn from labeled datasets containing examples
of spam and non-spam emails, capturing patterns and features indicative of spam.
Furthermore, natural language processing techniques are employed to analyze email
content and identify phishing attempts by detecting malicious URLs, suspicious at-
tachments, or social engineering techniques. Machine learning models can adapt
and evolve to tackle evolving spam and phishing tactics.
User and entity behavior analytics (UEBA): UEBA leverages machine learning to
detect anomalies in user activities and identify potential insider threats or compro-
mised accounts. Machine learning algorithms can establish baselines of normal
behavior for each user by monitoring user behavior patterns. Deviations from these
baselines, such as unusual access patterns, abnormal data transfers, or unauthorized
privilege escalations, can trigger alerts. UEBA systems employ various machine
learning techniques, including clustering, sequence mining, and anomaly detection,
to identify patterns indicative of malicious or suspicious activities. Threat intelli-
gence and predictive analytics: Machine learning enables organizations to leverage
vast amounts of threat intelligence data to predict and proactively defend against cy-
ber threats. Machine learning models can identify patterns and trends by analyzing
historical data, anticipating potential vulnerabilities, and predicting future attacks.
Predictive analytics can help organizations prioritize security measures, allocate re-
sources effectively, and develop proactive defense strategies.
Network traffc analysis: Machine learning algorithms are employed for network
traffc analysis, enabling organizations to detect and respond to network-based at-
tacks and anomalies. Machine learning algorithms can extract meaningful features
and patterns by analyzing network packets, fow data, or log fles. These algo-
rithms can learn to identify malicious activities, such as DoS attacks, port scanning,
or command and control communications. Machine learning-based network traffc
analysis enhances the ability to detect and mitigate threats in real time, providing a
proactive defense against network-based attacks.
Vulnerability assessment and penetration testing: Machine learning techniques can
be applied to vulnerability assessment and penetration testing processes. By analyz-
ing historical vulnerability data and exploit information, the models can prioritize
vulnerabilities based on their potential impact and likelihood of exploitation. These
models can help security teams focus their efforts on critical vulnerabilities and
perform targeted penetration testing to identify potential weaknesses in the system.
Machine learning also aids in automating vulnerability scanning and reducing false
positives, improving the effciency and effectiveness of security assessments.
Security log analysis: Machine learning algorithms excel at analyzing large vol-
umes of security log data generated by various systems, including frewalls, IDS,
Acta Wasaensia 69
and authentication logs. Organizations can identify log data patterns, correlations,
and anomalies by applying machine learning techniques, such as clustering or clas-
sifcation algorithms. This analysis helps detect security incidents, identify indi-
cators of compromise, and understand the context of security events (Radoglou-
Grammatikis & Sarigiannidis, 2019). Machine learning also enables the develop-
ment of intelligent security information and event management (SIEM) systems
that can automate log analysis and provide real-time alerts for potential security
breaches.
Fraud detection: In the fnance and e-commerce sectors, machine learning is exten-
sively used for fraud detection. The algorithms can identify fraudulent activities,
such as credit card fraud, identity theft, and account takeover, by analyzing trans-
actional data, user behavior, and historical patterns. Supervised learning models,
anomaly detection techniques, and ensemble methods are commonly employed to
develop fraud detection systems. These systems continuously learn from new data,
adapting to evolving fraud techniques and minimizing false positives.
Threat hunting and incident response: Machine learning plays a signifcant role
in threat hunting and incident response. By analyzing diverse data sources, such
as network logs, endpoint data, or threat intelligence feeds, machine learning al-
gorithms can identify potential threats and indicators of compromise. These algo-
rithms can automate the correlation and analysis of multiple data points, enabling
security teams to proactively hunt for threats and respond effectively to security in-
cidents. Machine learning-powered incident response systems can accelerate inci-
dent triage, reduce response times, and provide actionable insights for remediation.
It is important to note that while machine learning brings signifcant advancements
to cybersecurity, it is not a silver bullet solution. Domain expertise, human analy-
sis, and a layered defense approach are crucial in conjunction with machine learning
techniques to ensure comprehensive and effective cybersecurity. Moreover, the eth-
ical implications, interpretability, and potential vulnerabilities of machine learning
models in the context of adversarial attacks must be carefully considered and ad-
dressed in the cybersecurity landscape.
3.4 Risk analysis of machine learning applications 
Risk analysis of machine learning applications is crucial to ensuring the responsible
and effective deployment of machine learning models in various domains. While
machine learning brings tremendous opportunities, it also introduces potential risks
and challenges that need to be carefully evaluated and managed. The following are
some crucial considerations for machine learning application risk analysis.
Data quality and bias: One of the primary risks in machine learning is related to
the quality and bias present in the training data. Machine learning models heavily
rely on data for training, and the quality of the data used can signifcantly impact
their performance. Inaccurate, incomplete, or biased data can lead to biased predic-
70 Acta Wasaensia 
tions or erroneous outcomes. It’s important to assess the quality of the training data
to mitigate potential risks. Biased or incomplete datasets can lead to biased models,
perpetuating unfair or discriminatory outcomes. It is essential to assess the training
data’s representativeness, diversity, and accuracy to ensure fair and unbiased pre-
dictions. Rigorous data preprocessing, data cleaning, and careful feature selection
can help mitigate this risk.
Overftting and generalization: Overftting occurs when a machine learning model
performs well on the training data but fails to generalize to new, unseen data. Over-
ft models can result in poor performance and erroneous predictions in real-world
scenarios. It is critical to assess and manage the risk of overftting by utilizing ap-
propriate model validation techniques, such as cross-validation and regularization
methods. Regularization techniques like L1 or L2 regularization can help control
the complexity of the model and reduce overftting.
Security and privacy: Machine learning applications often deal with sensitive data,
including personal information, fnancial records, or proprietary business data. Se-
curity breaches or data leakage risks must be carefully evaluated and addressed.
Robust security measures, data encryption, access controls, and privacy-preserving
techniques like differential privacy should be implemented to protect sensitive data
from unauthorized access or misuse. Interpretability and explainability: Ma-
chine learning models, particularly complex ones like DNNs, can be opaque and
challenging to interpret. A lack of interpretability can hinder understanding model
decisions, making identifying potential biases, errors, or undesirable behaviors dif-
fcult. Risk analysis should consider the need for model interpretability and explore
techniques like feature importance analysis, model-agnostic methods, or rule ex-
traction approaches to enhance the transparency and explainability of the model.
Adversarial attacks: Machine learning models are susceptible to adversarial at-
tacks, where malicious actors intentionally manipulate or deceive the model’s input
to produce incorrect or malicious outputs. Adversarial attacks can have severe con-
sequences, especially in critical domains such as healthcare, fnance, or autonomous
systems. Risk analysis should encompass evaluating potential vulnerabilities, un-
derstanding attack vectors, and exploring robust defense mechanisms like adversar-
ial training, input sanitization, or anomaly detection to mitigate the risk of adver-
sarial attacks.
Ethical and legal considerations: Machine learning applications raise ethical and
legal concerns, including issues related to bias, fairness, transparency, and account-
ability. Evaluating the potential ethical implications of deploying machine learning
models and ensuring compliance with relevant regulations, such as data protec-
tion laws (e.g., GDPR), anti-discrimination laws, or industry-specifc guidelines, is
crucial. Risk analysis should include assessing the machine learning application’s
social impact, unintended consequences, and potential legal ramifcations.
Acta Wasaensia 71
System robustness and resilience: Machine learning models are typically devel-
oped and trained in controlled environments. However, real-world scenarios can
introduce uncertainties, adversarial conditions, or concept drift, challenging the per-
formance and robustness of the models. Risk analysis should consider the potential
risks associated with model failures, system resilience, and the ability to handle
novel or unforeseen situations. Techniques like ensemble models, anomaly detec-
tion, or continuous model monitoring can enhance system robustness and resilience.
Governance and human oversight: Machine learning applications should be gov-
erned by robust policies, guidelines, and human oversight. Risk analysis should
encompass evaluating the organizational structure, accountability frameworks, and
decision-making processes surrounding the use of machine learning models. Clearly
defned roles and responsibilities, human-in-the-loop approaches, and mechanisms
for monitoring and auditing the performance and outcomes of machine learning
models are essential to ensure responsible and accountable use. Scalability and
performance: As machine learning applications often deal with large datasets and
computationally intensive algorithms, scalability and performance risks should be
considered. Risk analysis should evaluate the scalability of the infrastructure and
the ability to handle increased data volumes or model complexity. Additionally, the
computational resources required for training and inference should be carefully as-
sessed to ensure effcient and cost-effective operations.
Change management and model updates: Machine learning models are not static
entities and may require updates or retraining over time. Risk analysis should con-
sider the challenges associated with change management, version control, and the
impact of model updates on existing systems and processes. Proper validation, test-
ing, and deployment strategies should be in place to manage the risks associated
with model updates and ensure a smooth transition. Vendor or third-party risks: In
cases where machine learning applications rely on external vendors or third-party
services, additional risks may arise. Assessing the reputation, reliability, and secu-
rity measures of the vendors or service providers is crucial. Contracts and agree-
ments should include provisions for data protection, intellectual property rights,
service-level agreements, and mechanisms for addressing potential breaches or dis-
putes.
Compliance and regulatory risks: Machine learning applications in regulated
industries like healthcare or fnance are subject to compliance requirements and
industry-specifc regulations. Risk analysis should consider the potential risks as-
sociated with non-compliance, such as legal penalties, reputational damage, or loss
of trust. Ensuring compliance with relevant regulations, maintaining proper docu-
mentation, and conducting regular audits are vital to mitigate compliance risks.
In summary, risk analysis of machine learning applications involves a comprehen-
72 Acta Wasaensia 
sive assessment of various factors, including data quality and bias, overftting, se-
curity and privacy, interpretability, adversarial attacks, ethical and legal considera-
tions, system robustness, governance and human oversight, scalability, change man-
agement, vendor risks, and compliance. By proactively identifying and managing
these risks, organizations can foster responsible and effective deployment of ma-
chine learning models, ensuring the desired outcomes while minimizing potential
pitfalls. Considering a simplifed model for assessing the risk of bias in a machine
learning model due to imbalanced training data.
Assumptions:
• We have a binary classifcation problem, where the machine learning model
predicts whether an applicant will be approved or rejected based on their
credit history.
• In the feld of cybersecurity; a machine learning model is utilized to predict
whether a class can be categorized as attack or bening specifc measurements.
• In medicine; considering various measurements, should the physician deter-
mine whether the patient’s symptoms indicate illness A or B?
• Based on some measurements, should the seismic system monitoring associ-
ation give an alarm or not?
• The training dataset consists of historical credit data with two classes: ap-
proved and rejected 
The mathematical model of the class imbalance ratio (CIR) can be defned as the
ratio of the number of instances in the minority class (rejected) to the number of
instances in the majority class (approved)
Instances in the minority class
CIR = (51)
Instances in the majority
To assess the risk of bias in the machine learning model due to data imbalance, we
can calculate a metric called the bias risk score (BRS). The BRS is a metric that
represents the potential bias introduced by the imbalanced data. It quantifes the
level of risk that the model’s predictions might be biased due to the class imbal-
ance. Mathematically written as
CIR
BRS = (52)
(1+CIR) 
Acta Wasaensia 73
When the CIR is close to 0, the BRS approaches 0. This means that the risk of
bias due to data imbalance is relatively low. As the CIR becomes larger, the BRS
also increases. This indicates that the potential risk of bias due to data imbalance
is higher. The formula suggests that the BRS becomes more signifcant as the class
imbalance becomes more extreme. It’s a simple mathematical transformation that
takes into account the relationship between the class imbalance ratio and the po-
tential bias risk. The BRS ranges from 0 to 1, a BRS of 0 basically suggests that
the way the classes are distributed won’t likely introduce bias in the predictions,
regardless of whether the classes are balanced or not. A BRS of 1 indicates a signif-
icant risk of bias due to data imbalance. In simpler terms, it means that the way the
classes are unevenly distributed could strongly affect the fairness of the predictions.
The higher the BRS goes toward 1, the more likely it is that the model’s predictions
might be unfairly infuenced by the imbalanced class distribution. By assessing the
calculated BRS, we can establish risk categories to evaluate the level of bias risk
more effectively:
• BRS ≤ 0:2 is categorized as low BRS, indicating that the data imbalance is
minimal, and the risk of bias is low.
• BRS ≥ 0:2≤ 0:5 means there is some degree of data imbalance, indicating a
moderate risk of bias.
• BRS ≥ 0:5 indicating that the data imbalance is signifcant, suggesting a high
risk of bias in the machine learning model.
This mathematical model quantitatively assesses the bias risk associated with im-
balanced training data. By calculating the BRS and categorizing the risk level,
informed decisions regarding data collection, preprocessing techniques, or algo-
rithmic adjustments to mitigate the potential bias and ensure fair and accurate pre-
dictions can be made. However, when the assumption becomes
• We have an N multi-class classifcation problem, where the machine learning
model is trained to predict the class or category of x input data based on M 
features. The goal is to assign each instance or sample to one and only one
class out of a set of N possible classes. The probability of a feature belonging
to a specifc class i is denoted as (M. Elmusrati, 2022)
P = (Ci|x1;x2; :::xM);∀i = 1; :::;N (53)
The probability value assigned to a feature offers valuable insights into its corre-
lation with a particular class i out of the N available classes. By analyzing and
comparing these probabilities across different classes, we can assess the relevance
and importance of the feature for each class. This analysis allows us to understand
74 Acta Wasaensia 
the degree to which the feature contributes to the classifcation decision and helps us
make informed judgments about its signifcance in relation to each class. Given the
probability that a feature belongs to class i, by applying the Bayesian theorem and
with the assumption that the feature is a discrete value, then (M. Elmusrati, 2022):
P(x1;x2; :::;xM|Ci)P(Ci)P = (Ci|x1;x2; :::xM) ;∀i = 1; :::;N (54)P(x1;x2; :::;xm) 
where P = (Ci|x1;x2; :::xM) is the posterior probability, P(x1;x2; :::;xM|Ci) is the
likelihood, P(Ci) denotes class priori probability, and represents evidence. Com-
puting Equation (54) is often diffcult due to the unknown interactions between
features. To tackle this issue, an assumption of attribute independence is crucial.
By assuming independence among the attributes, we simplify the calculations and
make the problem more tractable. This assumption allows us to treat each feature
separately and focus on their individual contributions to the classifcation process,
rather than trying to model complex interdependencies between the features. Equa-
tion (54) can thus, be manipulated as (M. Elmusrati, 2022)
P(x1;x2; :::;xM|Ci)P(Ci)P = (Ci|x1;x2; :::xM) ;∀i = 1; :::;N (55)P(x1);P(x2); :::;P(xm) 
3.4.1 Algorithmic risks in machine learning 
Algorithmic risks in machine learning refer to the potential challenges, limitations,
and undesirable outcomes that can arise from the design, implementation, and usage
of machine learning algorithms. These risks stem from various factors, including
algorithmic choices, data characteristics, model assumptions, and the interaction
between the algorithm and its environment. Here are some critical algorithmic risks
in machine learning:
• Bias and discrimination: Machine learning algorithms can exhibit bias and
discrimination if the training data refects existing group biases or if the algo-
rithm introduces bias during the learning process. Biased algorithms can lead
to unfair or discriminatory outcomes, particularly in sensitive domains such as
hiring, lending, or criminal justice. Careful attention should be given to iden-
tifying and mitigating bias, ensuring fairness and ethical decision-making.
• Overftting and underftting: Overftting occurs when a machine learning
model learns the training data too well, resulting in poor generalization to
new, unseen data. Conversely, underftting happens when the model fails to
Acta Wasaensia 75
capture the underlying patterns in the data. Both overftting and underftting
can lead to suboptimal performance and inaccurate predictions. Proper model
validation techniques, regularization methods, and dataset partitioning strate-
gies are crucial to mitigate these risks.
• Lack of interpretability: Many advanced machine learning algorithms, such
as DNNs, are inherently complex and lack interpretability. Interpreting how
and why a model arrived at a particular decision or prediction can be chal-
lenging. A lack of interpretability can hinder transparency, trust, and identify-
ing potential biases or errors. Techniques like model-agnostic interpretability
methods, rule extraction, or explainable AI approaches can help address this
risk.
• Adversarial attacks: Machine learning models can be vulnerable to adver-
sarial attacks, where malicious actors intentionally manipulate or deceive the
model’s input to produce incorrect or malicious outputs. Adversarial attacks
can have severe consequences, particularly in security-sensitive domains or
safety-critical applications. Adversarial training, robust optimization, and in-
put sanitization techniques are commonly employed to enhance model re-
silience against such attacks.
• Adversarial attacks: Machine learning models can be vulnerable to adver-
sarial attacks, where malicious actors intentionally manipulate or deceive the
model’s input to produce incorrect or malicious outputs. Adversarial attacks
can have severe consequences, particularly in security-sensitive domains or
safety-critical applications. Adversarial training, robust optimization, and in-
put sanitization techniques are commonly employed to enhance model re-
silience against such attacks.
• Model transparency and explainability: In some applications, such as health-
care or fnance, it is crucial to provide explanations or justifcations for the
decisions made by machine learning models. Lack of transparency or ex-
plainability can hinder user acceptance, regulatory compliance, and identify-
ing and rectifying potential errors or biases. Developing interpretable models
or post-hoc explanation techniques can help mitigate this risk.
• Concept drift and model degradation: Machine learning models assume that
the underlying data distribution remains stationary. However, real-world data
can exhibit concept drift, where the statistical properties of the data change
over time. Concept drift can lead to model degradation and deteriorating
performance. Continuous monitoring, model retraining, and adaptation tech-
niques are necessary to address this risk and maintain model effectiveness.
Addressing these algorithmic risks requires a combination of careful algorithm se-
lection, appropriate data preprocessing, model validation techniques, robustness
measures, and ongoing monitoring and maintenance. Understanding and manag-
ing these risks is essential to ensure responsible and effective deployment.
76 Acta Wasaensia 
3.4.2 Algorithmic risks management 
1. Risk identifcation: Identify and document the specifc algorithmic risks rel-
evant to the machine learning application. This involves understanding the
characteristics of the algorithm, potential biases, vulnerabilities to attacks,
interpretability challenges, scalability issues, and other risks associated with
the chosen algorithm.
2. Risk assessment: Quantify and evaluate the severity and likelihood of each
identifed risk. Assess the potential impact on different stakeholders, such as
end-users, decision-makers, or affected individuals. Prioritize risks based on
their signifcance, potential consequences, and the application context.
3. Risk mitigation strategies: Develop and implement risk mitigation strategies
to address the identifed algorithmic risks. These strategies may include:
• Data quality and preprocessing: Ensure data quality by performing thor-
ough data cleaning, outlier detection, and handling missing values. Ap-
ply appropriate preprocessing techniques to address data biases, imbal-
ances, or noise.
• Bias and fairness mitigation: Apply fairness-aware techniques to detect
and mitigate biases in the training data and algorithmic decisions. Ex-
plore approaches such as bias-correction methods, fairness-aware learn-
ing, or demographic parity to ensure equitable outcomes.
• Security and robustness: Implement security measures to protect against
adversarial attacks. Use techniques like adversarial training, robust op-
timization, or anomaly detection to enhance model resilience against
manipulation or exploitation.
• Explainability and interpretability: Employ techniques for model in-
terpretability, such as feature importance analysis, rule extraction, or
model-agnostic methods, to enhance transparency and explainability of
algorithmic decisions.
• Continuous monitoring and maintenance: Establish mechanisms for mon-
itoring the model’s performance, data drift, and potential risks. Regu-
larly update and retrain the model to adapt to changing conditions and
ensure continued effectiveness.
4. Validation and testing: Thoroughly validate and test the machine learning
model to assess its performance, robustness, and adherence to desired spec-
ifcations. Use appropriate validation techniques, such as cross-validation or
holdout sets, and conduct sensitivity analyses to evaluate the model’s behav-
ior under different conditions.
5. Ethical considerations: Integrate ethical considerations into the algorithmic
risk management process. Ensure compliance with applicable regulations and
Acta Wasaensia 77
guidelines related to data protection, privacy, fairness, and transparency. Es-
tablish principles for responsible AI development and use, including ongoing
monitoring and assessment of the societal impact of the algorithm.
6. Documentation and accountability: Maintain comprehensive documentation
of the algorithmic risk management process, including risk assessments, mit-
igation strategies, validation results, and ongoing monitoring activities. As-
sign clear roles and responsibilities to individuals or teams responsible for
managing algorithmic risks. Foster a culture of accountability and trans-
parency within the organization.
7. Stakeholder engagement and communication: Engage stakeholders, includ-
ing end-users, domain experts, legal and compliance teams, and ethicists, in
the algorithmic risk management process. Communicate the machine learn-
ing model’s risks, mitigation strategies, and limitations effectively to ensure
understanding, trust, and informed decision-making.
Algorithmic risk management is an ongoing process that requires continuous mon-
itoring, assessment, and adaptation as new risks emerge or the operating environ-
ment evolves. Organizations can effectively manage algorithmic risks and promote
responsible and trustworthy AI systems by integrating risk management practices
into the machine learning development lifecycle.
78 Acta Wasaensia 
4 METHODOLOGY OF MACHINE LEARNING IN CY-
BER SECURITY OF SMART GRIDS 
4.1 Introduction 
This chapter focuses on the methodology of machine learning in the cybersecurity
of smart grids. Essential points covered include data collection, data cleaning and
preprocessing, feature selection techniques, feature extraction, performance evalu-
ation of machine learning models, evaluation metrics, experimental setup, model
selection, model training, hyperparameter tuning, and model evaluation. The chap-
ter aims to provide an overview of the essential steps in applying machine learning
techniques to enhance the security of smart grids against cyber threats.
4.2 Data preprocessing and feature selection 
4.2.1 Data collection 
Comprehensive datasets from diverse sources in the feld of cybersecurity were uti-
lized for conducting experiments in smart grid infrastructure. The datasets aimed
to capture various aspects of the smart grid’s operations and included historical net-
work traffc records, sensor readings, device logs, and cybersecurity events. The
dataset comprised historical network traffc records, including information on com-
munication protocols, packet-level details, and network fow data. Furthermore,
the dataset included cybersecurity events and incidents recorded in the smart grid
environment. These events encompassed a wide range of cybersecurity threats, in-
cluding unauthorized access attempts, malware infections, network intrusions, and
data breaches (Hussain et al., 2020). Using these datasets, the study of network be-
havior, spotting anomalies, and identifying potential cyber threats in the smart grid
infrastructure can be done.
The data providers are examined to assess the sources and verify the integrity of the
acquired information to ensure data quality and reliability. This included checking
the data gathering methods, data storage protocols, and security measures in place to
secure sensitive data. Various operational conditions, such as different periods, ge-
ographical locations, and system confgurations, were considered to capture a wide
range of cybersecurity events and activities. Overall, the dataset authenticity, repre-
sentativeness, and comprehensiveness permitted their usage for conducting exper-
iments that closely simulated real-world cybersecurity scenarios in the smart grid
infrastructure. The employed datasets are Network Security Laboratory Knowl-
edge Discovery in Databases (NSL-KDD Cup 99), the cyber security dataset of the
Canadian Institute of Cybersecurity Intrusion Detection (CICIDS-2017), the power
system attack detection dataset developed by the Oak Ridge national laboratory of
Mississippi State University and the Washington University in St. Louis-Industrial
Acta Wasaensia 79
IoT-2018 and IoT 2021 (wustl-IIOT-2018, wustl-IIOT-2021 ) dataset for Industrial
Control System SCADA cybersecurity.
4.2.2 Data cleaning and preprocessing 
Before the analysis, the collected data underwent a series of preprocessing steps
(Ibrahim et al., 2020) to ensure its quality and compatibility with machine learning
algorithms. It included removing duplicate entries, handling missing values, nor-
malization, standardization, and addressing any inconsistencies or outliers in the
data.
Figure 20: Data cleaning process.
4.2.3 Feature extraction 
Feature extraction is crucial in data preprocessing, particularly in machine learning
tasks. It involves transforming the raw data into representative features that can ef-
fectively capture the relevant information for the given problem or task. The goal
is to derive a set of informative and discriminative features from the original data
(Spataru, 2013) that could help differentiate between normal and malicious activi-
ties in the smart grid. This process involves applying various techniques to extract
and represent the underlying patterns and structures in the data more meaningfully
and compactly. There are several methods commonly used for feature extraction:
1. Dimensionality reduction techniques: These techniques aim to reduce the di-
mensionality of the data by transforming it into a lower-dimensional space
80 Acta Wasaensia 
while preserving its essential characteristics. Examples include principal
component analysis (PCA) and linear discriminant analysis.
2. Statistical methods: These methods involve computing statistical measures
or descriptors from the data, such as mean, standard deviation, or histogram-
based features. These features can provide insights into the distribution and
variability of the data. They provide insights into the data distribution, cen-
tral tendency, and dispersion, enabling the models to identify anomalies and
abnormal behavior.
3. Transformations: Transforming the data using mathematical functions can
help highlight specifc patterns or relationships. For example, applying loga-
rithmic, exponential, or Fourier transformations can reveal certain properties
in the data.
In addition to these techniques, the domain-specifc methods. The smart grid do-
main possesses unique characteristics and operational considerations that require
specialized feature extraction approaches. Domain-specifc features encompassed
various aspects, including communication protocols, network topologies, operating
parameters, and device confgurations. Also, features derived from the physical pa-
rameters, such as voltage levels, current readings, and power consumption, can be
extracted from sensor data with the domain-specifc method. Feature extraction is
performed in a way that balances the need for capturing relevant information with
the goal of reducing dimensionality.
Dimensionality reduction techniques, such as restricted Boltzmann machines (RBM)
and the genetically seeded fora (GSF) feature selection algorithms, were employed
to mitigate the curse of dimensionality and remove redundant or irrelevant features.
The extracted features formed the input variables for the machine learning models,
enabling them to learn and make predictions based on the discriminative character-
istics of the data. The models’ ability to accurately classify and detect cybersecurity
threats in the smart grid infrastructure is enhanced by extracting relevant features.
The feature extraction process involved employing statistical analysis techniques,
signal processing methods, and domain-specifc knowledge to extract meaningful
and discriminative features from the raw data. The extracted features captured the
statistical properties, temporal dependencies, and domain-specifc indicators of cy-
bersecurity threats in the smart grid. These features formed the input for the ma-
chine learning models, enabling them to effectively learn and make predictions in
the context of securing the smart grid infrastructure.
From a different perspective, feature extraction can be seamlessly automated, elim-
inating the need for explicit data processing. This feat becomes achievable through
the utilization of highly intricate deep learning algorithms, replete with several hid-
den layers. In such instances, time-series data has the potential to be transmuted
Acta Wasaensia 81
into 2D images or even into video signals. This approach has garnered signifcant
acclaim for its ability to deliver heightened accuracy across a multitude of applica-
tions. Nonetheless, it’s worth noting that the effcacy of this approach is contingent
upon access to an extensive dataset, a resource that, unfortunately, is not easily
accessible.
4.2.4 Feature selection techniques 
With a potentially large number of features, feature selection techniques to identify
the most informative and discriminative features for the analysis are employed. It
involved evaluating each feature’s correlation, signifcance, and relevance to iden-
tify and retain the most informative and relevant features that contribute signif-
cantly to the predictive performance of our models. This process helps to streamline
the data, reducing dimensionality and mitigating the risk of overftting, and ensur-
ing that the features chosen capture the critical patterns and traits needed for reliable
model training and evaluation.
Pearson correlation coeffcient (PCC) analysis is carried out on the dataset to de-
termine if it includes correlated features. PCC, or Pearson’s r or simply correlation
coeffcient, is a statistical measure that quantifes the strength and direction of the
linear relationship between two variables. Mathematically written as
å(xi − x j)(yi − y j) p (56)
å (xi − x j)2(yi − y j)2
The symbol xi represents the content of the variable in the dataset, while x j refers
to that variable’s average value. Similarly, yi represents the values of the sample y j 
represents the average value of that variable. A correlation matrix is a square table
listing each variable in rows and columns.
The diagonal elements of the matrix are always 1, indicating a perfect correlation
of each variable with itself. The off-diagonal components represent the correla-
tion coeffcients between pairs of variables. Positive correlation coeffcients close
to 1 indicate that the variables tend to increase or decrease together, while nega-
tive correlation coeffcients close to -1 indicate they move in opposite directions.
Correlation matrices ranging from -1 to +1 are valuable for analyzing variable rela-
tionships, identifying data patterns, and discerning trends. Through thorough data
preprocessing and feature selection, efforts are made to optimize the quality and
representativeness of the input data, ultimately enhancing the overall performance
and interpretability of the machine learning models in the subsequent stages of the
study.
82 Acta Wasaensia 
Figure 21: The correlation matrix for the wustle-2021 dataset.
4.3 Performance evaluation of machine learning algorithms 
4.3.1 Evaluation metrics 
In order to comprehensively assess the performance of the machine learning algo-
rithms, a set of evaluation metrics are considered. The evaluation of the algorithm’s
categorization effectiveness is conducted using a confusion matrix, which serves as
a method to assess the performance of a classifer algorithm. The confusion ma-
trix is a fundamental scheme for evaluating the performance of machine learning
models.
It consists of four essential parameters: true positive (TP), true negative (TN), false
positive (FP), and false negative (FN). These parameters provide valuable insights
into the model’s predictions. TP represents the number of correct predictions of
the positive class, TN represents the number of correct predictions of the negative
class, FP indicates the number of incorrect predictions of the positive class, and FN
signifes the number of incorrect predictions of the negative class. These metrics
encompass a range of key performance indicators, including accuracy, precision,
recall, and F1 score (Tang et al., 2023).
Accuracy: This is a frequently used metric in classifcation tasks, defned as the
ratio of the number of correct predictions made by the model to the total number of
predictions (Tang et al., 2023). Mathematically, the accuracy can be written as
Acta Wasaensia 83
TP + TN 
Acuuracy = (57)
TP + TN + FP + FN 
Precision: This metric is used to measure the precision of a classifer, and it is de-
fned as the number of true positive predictions made by the model divided by the
total number of positive predictions.
TP 
Precision = (58)
TP + FP 
Figure 22: The confusion matrix.
Recall: This metric is used to measure the recall of a classifer, and it is defned as
the number of true positive predictions made by the model divided by the number
of positive cases in the dataset.
TP 
Recall = (59)
TP + FN 
F1 score: This is a metric that combines precision and recall, and it is defned as the
84 Acta Wasaensia 
harmonic mean of precision and recall.
2(Precision × recall)
F1− score = (60)
Precision + recall 
These measures were considered because they provided essential insights into the
model’s performance in several areas. They made it possible to evaluate the model’s
overall performance, ability to classify cases accurately, capture pertinent informa-
tion, and discriminatory power. With the help of a multi-dimensional evaluation
strategy like this, it is possible to accurately assess the performance of the machine
learning models and make decisions on their applicability and utility for the study.
To contextualize the presented confusion matrix within the realm of our cyber-
security concerns in the smart grid, let’s delve into a straightforward scenario.
Imagine a situation where a remote sensor sends a message signaling a critical fault
within the grid. The standard response to such a signal involves transmitting a pro-
tective message to trigger the opening of specifc circuit breakers. This action aims
to isolate the fault promptly, mitigating potential extensive and costly damages.
However, it’s imperative to recognize that the fault message from the sensor might
be genuine or fabricated due to cyber-security attacks. This intricate determination
falls under the purview of our proposed machine learning algorithm.
Consider that a positive output from the algorithm signifes the authenticity of the
message, while a negative output suggests that the message is a result of a security
breach. In this context, the evaluation pivots on comprehending the ramifcations
of both FP and FN. An FP arises when the machine learning algorithm inaccu-
rately identifes a genuine fault, leading to the unnecessary activation of actions
like switching the circuit breakers. The consequence of this scenario encompasses
unwarranted energy loss and fnancial repercussions. Moreover, potential power
outages and their associated adversities might ensue.
Conversely, an FN emerges when the algorithm erroneously classifes the message
as fake, thus forgoing the activation of necessary actions even though a legitimate
fault exists. This situation exposes the grid to substantial physical damages that
could escalate to catastrophic levels in certain instances. This prompts an explo-
ration into the critical theme of risk minimization. Risk minimization is one of the
interesting future research topics our research team in Vaasa will handle.
4.3.2 Experimental setup 
The dataset was meticulously divided into three sets to establish a robust experimen-
tal framework: training, validation, and testing. In some scenarios, the dataset is
Acta Wasaensia 85
usually divided into two subsets: training and testing datasets. Various approaches
exist for partitioning datasets based on the percentages allocated for different pur-
poses. Numerous studies have suggested confgurations such as 80% for training
and 20% for validation (Steimer, 2009), (Waqar et al., 2021), 70% for training and
30% for validation, and the emerging trend of 50% for training and 50% for vali-
dation. This partitioning yield effectively trains, fne-tunes, and assesses the perfor-
mance of the machine-learning models.
To ensure the reliability and generalizability of the results, cross-validation tech-
niques, which effectively mitigated potential biases and variance in the model per-
formance are employed. The training set was the foundation for model training
and hyperparameter tuning, optimizing the model’s learning process and confgura-
tion. The validation set played a crucial role in model selection, providing an inde-
pendent evaluation platform to identify the best-performing models based on their
performance on unseen data. By employing this meticulous experimental setup, it
became possible to minimize overftting, choose models with superior performance,
and attain a more comprehensive and trustworthy evaluation of the machine learn-
ing algorithms utilized.
4.3.3 Baseline models 
Traditional cybersecurity detection methods commonly employed in the smart grid
domain were implemented to establish a baseline for comparison. These methods
included rule-based systems, anomaly detection algorithms, and signature-based
approaches. The performance of the machine learning algorithms proposed in this
study were evaluated against these baselines algorithms.
4.3.4 Comparison of machine learning algorithms 
Experimentation was done with various machine learning algorithms, including de-
cision trees, random forests, support vector machines (SVM), neural networks, and
ensemble methods. For each algorithm, an evaluation is done with respect to their
performance using the selected evaluation metrics. Statistical tests are conducted to
identify signifcant differences between the models and determine the most effective
approach.
4.4 Implementation of machine learning models 
4.4.1 Model selection 
The machine learning models’ performance metrics and computational complexity
are considered. The objective is to identify the most promising models that bal-
ance accuracy and effciency, ensuring practical feasibility in real-world smart grid
environments. The selection criteria incorporated multiple factors. The model’s
86 Acta Wasaensia 
performance metrics, such as accuracy, precision, recall, and F1 score, are exam-
ined. Models that consistently demonstrated high performance across these metrics,
indicating their ability to detect and classify cybersecurity threats in the smart grid
effectively, were sought.
Smart grid systems often require real-time or near-real-time analysis, making eff-
ciency crucial. Therefore, the study assessed the computational requirements, in-
cluding the training and inference times, memory utilization, and the complexity
of the model architecture. Models that offered a good trade-off between perfor-
mance and computational effciency were prioritized. Consideration was also given
to the scalability and adaptability of the models. The smart grid environment is
dynamic, with new threats emerging and evolving. Models that could easily incor-
porate new data, adapt to changing conditions, and handle potential concept drift
were preferred. This ensured the models’long-term applicability and sustainability.
Furthermore, the feasibility of implementation was assessed. The selected models
needed to be implementable within the existing smart grid infrastructure, taking
into account the availability of resources, compatibility with the data collection and
processing systems, and any specifc constraints or limitations of the smart grid en-
vironment.
After careful evaluation and comparison, a set of machine learning models that ful-
flled the aforementioned selection criteria were chosen for further analysis. These
models formed the basis for the subsequent steps, including model training, hyper-
parameter tuning, and performance evaluation. A comprehensive evaluation of per-
formance metrics, computational complexity, interpretability, scalability, adaptabil-
ity, and feasibility drove the model selection process. By taking into account these
factors, the objective was to identify machine learning models that show promise in
enhancing the cybersecurity of the smart grid while also ensuring practical imple-
mentation and operational effciency.
4.4.2 Model training 
The selected machine learning models were trained using the training dataset care-
fully prepared during the data preprocessing stage. The training process involved
utilizing appropriate training algorithms and optimization techniques to optimize
the models’ parameters and improve their performance. For each model, widely-
used training algorithms were utilized, specifcally tailored to the specifc charac-
teristics of the selected machine-learning technique. The choice of the training al-
gorithm depended on the particular requirements and underlying principles of each
machine learning model. The models iteratively adjusted their parameters during
training to minimize the selected loss or error function. The optimization objec-
tive was to fnd the optimal parameter values that enable the models to accurately
classify and detect cybersecurity threats in the smart grid.
Acta Wasaensia 87
4.4.3 Hyperparameter tuning 
For the fne-tuning of the models, an extensive hyperparameter tuning process was
conducted. This involved systematically exploring different combinations of hy-
perparameters and evaluating their impact on the models’ performance. Grid and
random search techniques were utilized to fnd the optimal hyperparameter val-
ues. Grid search involves exhaustively trying all possible combinations of hyper-
parameters within predefned ranges. It systematically covers the hyperparameter
space and evaluates each combination’s performance. On the other hand, random
search randomly samples combinations from the hyperparameter space, which can
be more effcient in certain scenarios. The performance of the models was assessed
after each iteration of hyperparameter tuning. Hyperparameters are the confgu-
ration settings that control the behavior of the machine learning models, such as
learning rates, regularization parameters, kernel choices, or the number of hidden
layers. Hyperparameter tuning is essential to fne-tune the models and fnd the op-
timal combination of hyperparameter values. This process involved systematically
exploring different combinations of hyperparameters and evaluating their impact on
the models’ performance.
Cross-validation techniques were employed to assess the models’ performance across
various hyperparameter settings. Cross-validation partitions the training dataset into
multiple subsets, with one subset used as a training set while the rest are used for
testing or validation. Once the models have been trained, evaluating their perfor-
mance using validation and testing datasets is essential. The validation dataset is
used during training to assess the model’s performance on data it has not seen be-
fore. This helps monitor the model’s progress and decide when to stop training to
prevent overftting. The testing dataset, which is independent of the training and
validation data, is used as a fnal assessment of the model’s performance. It ob-
jectively evaluates how well the model generalizes to unseen data. This enabled
determining the hyperparameter settings that result in the best generalization and
estimating the models’ performance on hypothetical data. The model was trained
using the training dataset for each combination of hyperparameters, and its per-
formance was evaluated using the validation dataset. Performance metrics were
computed to assess the effectiveness of the models under different hyperparameter
settings.
88 Acta Wasaensia 
5 RESULTS AND DISCUSSION 
5.1 Introduction 
The preceding chapter presented a theoretical framework for detecting cyber-attacks
on the smart grid infrastructure. The framework established a comprehensive sys-
tem model through mathematical deductions. This section delves into the practical
aspects of identifying cyber-attacks embedded in various smart grids using machine
learning algorithms. The situations involved in this process and detailed simulations
to demonstrate the effectiveness of the algorithms are described. The analysis fnd-
ings, offering valuable insights into the performance and accuracy of the detection
system, are accompanied by discussions of the implications and interpretations de-
rived from the results.
5.2 Simulation results and discussion 
In this section, the performance of various machine learning algorithms is evaluated
through simulations. The simulations in this work were conducted using two soft-
ware platforms: MATLAB version R2021a and Jupyter Notebook. MATLAB is a
high-level programming language and interactive environment developed by Math-
Works. It provides a wide range of mathematical and scientifc functions for numer-
ical computation, data visualization, algorithm development, and application devel-
opment. MATLAB enables matrix computations, linear algebra, optimization, data
analysis, and signal processing. It offers powerful tools for visualizing data through
various types of plots and supports the development of algorithms for mathematical
models, machine learning, image processing, and more. MATLAB also allows for
the creation of standalone applications and user interfaces. Additionally, it supports
integration with other programming languages, enabling leveraging existing code
or using MATLAB functions within other environments.
On the other hand, Jupyter Notebook is another robust platform known for its
strength in handling machine learning algorithms. It is an open-source web ap-
plication that provides an interactive computing environment for creating and shar-
ing documents containing live code, equations, visualizations, and explanatory text.
With its interactive and fexible nature, Jupyter Notebook provides an ideal environ-
ment for developing and implementing machine learning models. By leveraging the
capabilities of both MATLAB and Jupyter Notebook, this study explores and com-
pares the performance of different machine learning algorithms. This combination
of software platforms ensures a comprehensive analysis and allows for a thorough
evaluation of the algorithms under consideration. The utilization of MATLAB and
Jupyter Notebook empowers this research to effectively investigate and assess the
performance of machine learning algorithms, ultimately contributing to a compre-
hensive understanding of their effectiveness in the context of this study.
Acta Wasaensia 89
5.3 Model evaluation 
After training and hyperparameter tuning, the selected models were evaluated. The
evaluation aimed to measure the models’ performance on unseen data and assess
their generalization capability. The evaluation metrics discussed earlier, such as
accuracy, precision, recall, and F1 score, were computed to gauge the models’ per-
formance (Berghout & Benbouzid, 2022). These metrics provided insights into
the models’ability to correctly classify cybersecurity threats in the smart grid and
their overall effectiveness in enhancing cybersecurity. The evaluation results were
used to compare the performance of different models, identify the best-performing
model, and provide intuitions about the models strengths and weaknesses.
In that sense is publication II, where the proposed algorithm, hybridized by the
convolutional neural network (CNN) and the gated recurrent unit (GRU) perfor-
mance, is compared to other deep learning methods, such as CNN, GRU, and long
short-term memory (LSTM) to detect cyber-attacks in smart grids. Using the hy-
perparameters depicted in Table 1 and employing the CICIDS2017 dataset for ex-
perimentation, the compared algorithms for intrusion detection are evaluated using
the standard evaluation metrics.
Number Parametric Quantity
1 Input layer 78
2 Hidden layer 55
3 Activation function ReLU
4 Iteration limit 1000
5 Cost function Cross entropy
6 Batch size 128
Table 1: The confguration of the hyperparameters.
Figure 23 illustrates the overall performance comparison of the considered algo-
rithms as presented in publication II. These fndings underscore the effectiveness
and potential of the proposed algorithm in accurately detecting and classifying the
target. The achieved accuracy of 99.7% demonstrates a signifcant improvement
over the existing techniques, highlighting the superiority of the proposed algorithm.
By achieving such high accuracy, the proposed algorithm showcases its ability
to handle the complexities of the task effectively. This result holds considerable
promise for practical applications where accurate detection is paramount.
90 Acta Wasaensia 
Figure 23: Overall performance comparison of the considered algorithms.
5.4 Sensitivity analysis 
Sensitivity analysis examines how modifcations to a model’s input parameters im-
pact the model’s output. It is a method of fguring out how sensitive a model’s output
is to changes in its inputs (Saltelli et al., 2010; Shin et al., 2013; Ballester-Ripoll
et al., 2019; Guo et al., 2011; Wagener & Pianosi, 2019). It can be used to answer
questions such as: How much will the model’s output change with a change of a
certain value of the input variable? Which input parameters have the most signif-
icant infuence on the model’s output? Given the ambiguity in the input variables,
what range of values can the model’s output assume? There are two main types of
sensitivity analysis: One-factor sensitivity analysis: This type of analysis examines
the impact of changes in a single input variable on the model’s output. Multi-factor
sensitivity analysis: This type of analysis examines the effect of changes in multiple
input variables on the model’s output.
A sensitivity analysis was conducted to assess the employed machine learning al-
gorithms’ robustness. The study evaluated the models’ performance under various
scenarios, such as introducing noise in the data, simulating adversarial attacks, or
altering the data distribution. The purpose is to gain insights into the behavior and
performance of the algorithms under different conditions, helping to:
1. Identify critical factors: By varying one input parameter at a time while keep-
ing others constant, sensitivity analysis helps identify which factors infuence
the output most. These critical factors are essential for understanding the
model’s behavior and can aid in focusing resources on the most impactful
areas.
Acta Wasaensia 91
2. Assess robustness and reliability: Sensitivity analysis is valuable in evalu-
ating how robust and reliable a model or system is when faced with uncer-
tainties or changes in input parameters. It provides an understanding of the
system’s output sensitivity to fuctuations in its inputs, offering insights into
the stability and accuracy of the model.
3. Optimize decision-making: Sensitivity analysis can guide decision-making
processes by determining the range of input values within which the model
performs satisfactorily. This helps in identifying the optimal conditions or
parameters for achieving desired outcomes.
4. Risk assessment: By evaluating the effects of different inputs on the model’s
output, sensitivity analysis aids in assessing potential risks and uncertainties
associated with the model or process. It highlights areas where uncertainties
may have the most substantial impact on outcomes, enabling risk mitigation
strategies to be developed.
5. Model validation: Sensitivity analysis is critical in model validation and ver-
ifcation. It helps researchers and analysts ensure that their models are not
overly sensitive to specifc inputs or assumptions and provide reasonable and
consistent results.
Sensitivity analysis can take various forms, depending on the system’s complexity.
Some standard techniques include:
• One-at-a-time (OAT) sensitivity analysis: This involves varying one input pa-
rameter at a time while keeping all others constant and observing the resulting
changes in the output (Federico Ferretti , Andrea Saltelli, 2016).
• Local sensitivity analysis: Local sensitivity analysis (Federico Ferretti , An-
drea Saltelli, 2016) involves making small perturbations to the input param-
eters around their current values and observing how these changes affect the
model’s output. This permits assessing the model’s sensitivity to variations in
the input parameters in the immediate vicinity of their current settings.
• Global sensitivity analysis: This method explores the entire input parameter
space to determine the collective impact of multiple inputs on the output. It
helps understand how uncertainties in different input parameters interact and
infuence the system’s behavior.
Sensitivity analysis is widely used in various felds, including engineering, fnance,
environmental sciences, and decision analysis, where understanding the effects of
uncertain factors is crucial for making informed decisions and optimizing perfor-
mance. This was considered in publication V, where the WUSTL IIoT 2021 dataset
and the WUSTL IIoT 2018 dataset used for analysis were manipulated to refect
this notion. The performance of some leading algorithms was evaluated, and the
results are shown in Figures 24 and Figure 25, respectively.
92 Acta Wasaensia 
Figure 24: Performance of the best-performing algorithms on the
WUSTL IIoT 2018 dataset.
Figure 25: Performance of the best-performing algorithms on the
WUSTL IIoT 2021 dataset.
The accuracy of the fne tree, linear discriminant, and linear SVM algorithms slightly
declined when trained on manipulated data, as shown in Figure 24. The fne KNN
algorithm showed little degradation, while the Gaussian Naive Bayes algorithm ex-
perienced a signifcant decline in accuracy. Surprisingly, the booted tree algorithm’s
accuracy increased from 96.1% with un-manipulated data to 98% with manipulated
Acta Wasaensia 93
data. Figure 24 shows the best-performing algorithms’ performance analysis. When
tested on manipulated data, the coarse tree, linear discriminant, quadratic SVM, and
bagged tree classifers experienced a moderate decrease in their performance. On
the other hand, the fne KNN classifer showed a signifcant drop in accuracy, while
the Kernel Naive Bayes classifer performed poorly on both un-manipulated and
manipulated data. The results of these experiments suggest that the accuracy of ma-
chine learning algorithms can be affected by the quality of the training data.
In this case, the manipulated data caused the accuracy of some algorithms to de-
cline, while the accuracy of other algorithms increased. This suggests that the im-
pact of data manipulation on machine learning algorithms can vary depending on
the specifc algorithm and the type of manipulation performed. The experimental
outcomes shed light on the critical infuence of training data quality on the accuracy
of machine learning algorithms. It becomes evident that the performance of these
algorithms can be signifcantly affected by the nature of the data they are trained on.
Interestingly, the effect of data manipulation was not uniform across all algorithms;
instead, it led to varying outcomes. Specifcally, the experiment revealed that some
algorithms experienced a decline in accuracy when confronted with manipulated
data. On the other hand, some algorithms improved accuracy under the same condi-
tions. These fndings emphasize the intricate connection between data manipulation
and algorithm behavior. It demonstrates that the impact of data manipulation on al-
gorithms is complex and multifaceted. Different algorithms respond differently to
data alterations, leading to varied outcomes in terms of their performance. This em-
phasizes the need to understand how each algorithm behaves under different data
conditions thoroughly.
5.5 Risk analysis 
In the context of applying machine learning algorithms to classifcation problems,
risk analysis is the process of identifying, assessing, and managing potential risks
associated with the model’s predictions and performance. Risk analysis plays a
crucial role in applying machine learning algorithms as it aims to identify and un-
derstand the machine learning algorithm’s uncertainties, limitations, and possible
drawbacks, mainly when deployed in real-world applications. When utilizing ma-
chine learning algorithms for classifcation tasks, some common risk factors to con-
sider include:
• Accuracy and performance risks: Assessing the accuracy and performance of
the classifcation model is crucial. Understanding its strengths and limitations
in different scenarios is essential to avoid incorrect predictions and costly
errors.
• False positives and false negatives: In binary classifcation problems, false
positives and false negatives can have varying consequences depending on
94 Acta Wasaensia 
the application. Risk analysis helps understand the implications of these mis-
classifcations and how they might impact decision-making.
• Data quality and bias: Machine learning models are highly dependent on the
quality and representativeness of the training data. If that data is biased or
contains inaccuracies, the algorithm may reproduce those biases or inaccu-
racies in its predictions. It is essential to thoroughly analyze the dataset to
identify any existing biases or inaccuracies and address them appropriately
before applying the algorithm. Risk analysis involves investigating potential
biases in the data and understanding how they might lead to biased or unfair
predictions.
• Robustness to adversarial attacks: The model’s vulnerability to adversarial
attacks should be evaluated for specifc applications, especially in security-
sensitive areas. Adversarial attacks intentionally perturb the input data to
cause misclassifcations, which can be a signifcant risk in critical applica-
tions.
• Generalization and overftting: The concept of generalization and overftting
is crucial in assessing the model’s ability to perform well on new, unseen data.
Generalization refers to how effectively the model can extend its learned pat-
terns to new instances, ensuring its robustness and reliability. On the other
hand, overftting occurs when the model becomes excessively tailored to the
training data, losing its ability to generalize accurately to new data. Conse-
quently, an overftted model may yield poor performance and unreliable pre-
dictions when applied in real-world situations. Thus, it is essential to strike a
balance during model training to avoid overftting and achieve better general-
ization on unseen data.
• Interpretability and explainability: Understanding how the machine learning
model arrives at its predictions for some applications is crucial. Risk analysis
may involve assessing the interpretability and explainability of the model to
ensure it can be trusted and understood by stakeholders. Some machine learn-
ing algorithms, such as deep neural networks, are known for their black-box
nature, meaning it can be challenging to understand why they make specifc
predictions or decisions. This lack of interpretability can be problematic, es-
pecially in sensitive domains like smart grids, healthcare, or fnance.
• Model drift: The data distribution in real-world applications may change over
time, leading to model drift. Risk analysis involves monitoring the model’s
performance over time and taking measures to mitigate the effects of drift.
• Legal and ethical considerations: Deploying machine learning models for
classifcation tasks may raise legal and ethical issues, particularly when deal-
ing with sensitive data or making decisions with signifcant consequences.
Acta Wasaensia 95
Risk analysis includes identifying potential legal or ethical implications and
ensuring compliance with relevant regulations.
Conducting risk analysis involves performing sensitivity analyses, employing per-
formance metrics that consider misclassifcation consequences, evaluating model
robustness under various conditions, and conducting comprehensive testing in di-
verse real-world scenarios. By conducting a thorough risk analysis, developers and
stakeholders can make informed decisions about deploying machine learning mod-
els, manage potential risks, and enhance the trustworthiness and reliability of the
system in real-world applications.
96 Acta Wasaensia 
6 CONCLUSION 
6.1 Conclusion 
This doctoral dissertation thoroughly examines machine learning techniques to im-
prove cybersecurity in the smart grid. The smart grid holds enormous promise for
effective and sustainable energy management due to its integration of digital tech-
nology and communication networks. However, this digital transition also brings
cybersecurity issues, peaking the creation of solid safeguards for sensitive data and
this critical infrastructure. To address these challenges, we delved into the realm of
machine learning and its role in bolstering smart grid cybersecurity. Various ma-
chine learning algorithms were explored, evaluating their suitability for cybersecu-
rity applications in the smart grid context. This understanding of machine learning’s
potential and limitations positioned the dissertation to create innovative solutions.
Building upon this foundation, comprehensive methodologies were proposed for
applying machine learning to the cybersecurity of the smart grid.
Specifcally:
a) We proposed a machine learning algorithm for an intrusion detection sys-
tem (IDS) in the supervisory control and data acquisition (SCADA) system
applied in the smart grid. We found that the proposed algorithm achieves
a detection accuracy much better than existing traditional IDSs for SCADA
systems. This is shown in paper I.
b) We propose a hybrid deep learning algorithm that focuses on distributed de-
nial of service (DDoS) attacks on the communication infrastructure of the
smart grid. The convolutional neural network (CNN) and the gated recurrent
unit (GRU) algorithms hybridize the proposed algorithm. The proposed algo-
rithm outperforms the counter-algorithms in terms of overall accuracy. This
is shown in paper II.
c) We evaluated the performance of traditional supervised machine learning al-
gorithms like artifcial neural networks (ANN), CNN, and support vector
machines (SVM) against a proposed Restricted Boltzmann Machine-based
nature-inspired artifcial root foraging optimization algorithm. The proposed
algorithm yielded better results in comparison. This is shown in paper III.
d) A genetically seeded fora transformer neural network (GSFTNN) algorithm
stands in stark contrast to the signature-based method employed by tradi-
tional IDSs was proposed. The proposed algorithm outperforms traditional
algorithms such as residual neural networks (ResNet), recurrent neural net-
works (RNN), and long short-term memory (LSTM) in terms of accuracy
and effciency. This is shown in paper IV.
e) We evaluated the performance of some machine learning algorithms by manu-
ally introducing adversarial attacks on the dataset. We determined that train-
Acta Wasaensia 97
ing machine learning algorithms with ill-nature data could affect the algo-
rithm’s performance. This is shown in paper V.
Drawing from our research endeavors, it becomes evident that AI and machine
learning possess the capacity to elevate the realm of cybersecurity tools to unprece-
dented heights. Our exploration has encompassed a selection of algorithms, yet
the landscape abounds with many untapped options with the potential for superior
performance. Take, for instance, reinforcement learning - an emerging contender.
However, the successful integration of reinforcement learning mandates deploy-
ment within a dynamic simulation environment, one replete with authentic data that
mirrors reality. This setting facilitates the crucial self-learning process through iter-
ative massive trial and error, guided by the feedback of rewards and penalties.
6.2 Future research 
These future research directions build upon the signifcant fndings and contribu-
tions of the Ph.D. dissertation, offering exciting opportunities to advance the felds
of machine learning and cybersecurity for the smart grid. By addressing these chal-
lenges and exploring novel approaches, researchers can contribute to creating more
resilient, secure, and effcient smart grid systems that can better withstand the ever-
evolving landscape of cyber threats.
a) Adversarial robustness and resilience: Adversarial attacks pose signifcant
threats to machine learning models in the smart grid. Future research should
explore methods to improve the robustness and resilience of machine learn-
ing algorithms against various adversarial attacks, including those specifcally
tailored to target smart grid applications. This research could involve develop-
ing novel defense mechanisms, data augmentation techniques, and advanced
training methodologies to mitigate the impact of adversarial manipulations.
b) Real-time cybersecurity monitoring and response: The smart grid operates
in real-time, making it essential to have cybersecurity solutions that can re-
spond quickly to emerging threats. Future research can focus on develop-
ing real-time cybersecurity monitoring and response systems that leverage
machine learning for rapid threat detection and adaptive countermeasures.
These systems should be able to detect anomalies, identify cyberattacks, and
autonomously implement defensive actions to protect the smart grid infras-
tructure.
c) Blockchain for smart grid security: Blockchain technology offers the poten-
tial to enhance the security and integrity of smart grid data and communi-
cation. Future research could investigate how machine learning algorithms
can be combined with blockchain to create secure and tamper-resistant smart
grid applications. This includes exploring the use of blockchain for secure
data storage, provenance tracking, and decentralized cybersecurity decision-
making.
98 Acta Wasaensia 
d) Multi-modal data fusion for enhanced security: Combining data from various
sources, such as smart meters, IoT devices, and weather sensors, can provide
a more comprehensive understanding of the smart grid’s security posture. Fu-
ture research could investigate multi-modal data fusion techniques using ma-
chine learning to improve situational awareness and early threat detection.
e) Cyber-physical security integration: The smart grid is a cyber-physical sys-
tem, and future research could explore integrating machine learning tech-
niques with physical security measures. This could involve using sensor data,
video analytics, and machine learning algorithms to detect physical intrusions
or tampering attempts that might compromise the smart grid’s security.
f) Human-centric cybersecurity: In the smart grid context, human operators and
users play a crucial role in ensuring cybersecurity. Future research can fo-
cus on incorporating human-centric approaches into machine learning-based
cybersecurity systems. This involves considering the cognitive aspects of hu-
man decision-making, usability, and user-centric design principles to create
cybersecurity solutions that are intuitive and easy for human operators to in-
teract with effectively.
g) Hardware-supported cybersecurity: Research can explore the integration of
machine learning algorithms with hardware-level security features to enhance
the resilience of smart grid systems. Hardware-based security mechanisms,
such as hardware-enforced isolation and trusted execution environments, can
complement machine learning algorithms to provide additional protection
against attacks.
Acta Wasaensia 99
REFERENCES 
Anwar, A., & Mahmood, A. N. (2014). Cyber security of smart grid infrastructure.
arXiv preprint arXiv:1401.3936.
Ahmed, S. D., Al-Ismail, F. S. M., Shafullah, M., Al-Sulaiman, F. A., & El-Amin,
I. M. (2020). Grid Integration Challenges of Wind Energy: A Review. IEEE Ac-
cess, 8(type 1), 10857-10878.https://doi.org/10.1109/ACCESS.2020.2964896
Al Ameen, M., Liu, J., & Kwak, K. (2012). Security and privacy issues in wireless
sensor networks for healthcare applications. Journal of Medical Systems, 36(1),
93-101. https://doi.org/10.1007/s10916-010-9449-4.
Alkuwari, A. N., Al-Kuwari, S., & Qaraqe, M. (2022). Anomaly Detection in
Smart Grids: A Survey from Cybersecurity Perspective*. 3rd International Con-
ference on Smart Grid and Renewable Energy, SGRE 2022 - Proceedings, 1-7.
https://doi.org/10.1109/SGRE53517.2022.9774221
Anish Halimaa, K. S. (2019). Machine Learning Based Intrusion Detection System.
Arabo, A. (2015). Cyber Security Challenges within the Connected Home Ecosys-
tem Futures. Procedia Computer Science, 61(0), 227-232.
https: doi.org/10.1016/j.procs.2015.09.201
Ashok, A., Govindarasu, M., & Wang, J. (2017). Cyber-Physical Attack-Resilient
Wide-Area Monitoring, Protection, and Control for the PowerGrid. 1-17.
Avancini, D. B., Martins, S. G. B., Rabelo, R. A. L., Solic, P., & Rodrigues, J. J.
P. C. (2018). A Flexible IoT Energy Monitoring Solution. 2018 3rd International
Conference on Smart and Sustainable Technologies, SpliTech 2018, 1-6.
Ayar, M., Obuz, S., Trevizan, R. D., Bretas, A. S., & Latchman, H. A. (2017).
A Distributed Control Approach for Enhancing Smart Grid Transient Stability and
Resilience. IEEE Transactions on Smart Grid, 8(6), 3035-3044.
https://doi.org/10.1109/TSG.2017.2714982
Azad, S., Sabrina, F., & Wasimi, S. (2019). Transformation of smart grid using
machine learning. 2019 29th Australasian Universities Power Engineering Confer-
ence,
AUPEC 2019. https://doi.org/10.1109/AUPEC48547.2019.211809
Baig, Z. A., & Amoudi, A. R. (2013). An analysis of smart grid attacks and coun-
100 Acta Wasaensia 
termeasures. Journal of Communications, 8(8), 473-479.
https://doi.org/10.12720/jcm.8.8.473-479
Basnet, M., Poudyal, S., Ali, M. H., & Dasgupta, D. (2021). Ransomware detection
using deep learning in the SCADA system of electric vehicle charging station. 2021
IEEE PES Innovative Smart Grid Technologies Conference - Latin America, ISGT
Latin America 2021, 1-5.
https://doi.org/10.1109/ISGTLatinAmerica52371.2021.9543031
Batta, M. (2018). Machine Learning Algorithms - A Review. International Journal
of Science and Research (IJSR), 18(8), 381-386. https://doi.org/10.21275/ART202
03995
Belgiu, M., & Dragu, L. (2016). Random forest in remote sensing: A review of
applications and future directions. ISPRS Journal of Photogrammetry and Remote
Sensing, 114, 24-31.
https://doi.org/10.1016/j.isprsjprs.2016.01.011
Belkebir, N., Maarouf, M., Khallaayoun, A., & Lghoul, R. (2018). The Future De-
velopment of Smart Grid The case of Morocco. c, 1-9.
Berghout, T., & Benbouzid, M. (2022). EL-NAHL: Exploring labels autoencod-
ing in augmented hidden layers of feedforward neural networks for cybersecurity
in smart grids. Reliability Engineering and System Safety, 226(January), 108680.
https://doi.org/10.1016/j.ress.2022.108680
Bhattarai, B. P., Paudyal, S., Luo, Y., Mohanpurkar, M., Cheung, K., Tonkoski, R.,
Hovsapian, R., Myers, K. S., Zhang, R., Zhao, P., Manic, M., Zhang, S., & Zhang,
X. (2019). Big data analytics in smart grids: State-of-theart, challenges, opportuni-
ties, and future directions. IET Smart Grid, 2(2), 141-154. https://doi.org/10.1049
/iet-stg.2018.0261
Blumsack, S., & Fernandez, A. (2012). Ready or not, here comes the smart grid!
Energy, 37(1), 61-68. https://doi.org/10.1016/j.energy.2011.07.054
Buczak, A. L., & Guven, E. (2016). A Survey of Data Mining and Machine Learn-
ing Methods for Cyber Security Intrusion Detection. IEEE Communications Sur-
veys and Tutorials, 18(2), 1153-1176. https://doi.org/10.1109/COMST.2015.
2494502
Butun, I., Sari, A., & Osterberg, P. (2019). Security Implications of Fog Computing
on the Internet of Things. 2019 IEEE International Conference on Consumer Elec-
tronics, ICCE 2019, 20201010, 1?6. https://doi.org/10.1109/ICCE.2019.8661909
Acta Wasaensia 101
Charbuty, B., & Abdulazeez, A. (2021). Classifcation Based on Decision Tree Al-
gorithm for Machine Learning. Journal of Applied Science and Technology Trends,
2(01), 20-28. https://doi.org/10.38094/jastt20165
Chicco, G., Riaz, S., Mazza, A., & Mancarella, P. (2020). Flexibility from Dis-
tributed Multienergy Systems. Proceedings of the IEEE, 108(9), 1496-1517.
https://doi.org/10.1109/JPROC.2020.2986378
Cone, B. D., Irvine, C. E., Thompson, M. F., & Nguyen, T. D. (2007). A video
game for cyber security training and awareness. Computers and Security, 26(1),
63-72. https://doi.org/10.1016/j.cose.2006.10.005
de Souza, C. A., Westphall, C. B., Machado, R. B., Loff, L., Westphall, C. M., &
Geronimo, G. A. (2022). Intrusion detection and prevention in fog based IoT envi-
ronments: A systematic literature review. Computer Networks, 214(July), 109154.
https://doi.org/10.1016/j.comnet.2022.109154
Deb, S., Vatwani, T., Chattopadhyay, A., Basu, A., & Fong, X. (2018). Domain
Wall Motion-Based Dual-Threshold Activation Unit for Low-Power Classifcation
of Non-Linearly Separable Functions. IEEE Transactions on Biomedical Circuits
and Systems, 12(6), 1410-1421.
https://doi.org/10.1109/TBCAS.2018.2867038
Dehalwar, V., Kalam, A., & Zayegh, A. (2014). Infrastructure for real-time com-
munication in smart grid. 2014 Saudi Arabia Smart Grid Conference, SASG 2014,
2-5. https://doi.org/10.1109/SASG.2014.7274281
Diaba, S. Y., Shafe-khah, M., & Elmusrati, M. (2022). On the performance met-
rics for cyber-physical attack detection in smart grid. Soft Computing, 26(23),
13109?13118. https://doi.org/10.1007/s00500-022-06761-1
Diamantoulakis, P. D., Kapinas, V. M., & Karagiannidis, G. K. (2015). Big Data
Analytics for Dynamic Energy Management in Smart Grids. Big Data Research,
2(3), 94-101.
https://doi.org/10.1016/j.bdr.2015.03.003
Ding, J., Qammar, A., Zhang, Z., Karim, A., & Ning, H. (2022). Cyber Threats to
Smart Grids: Review, Taxonomy, Potential Solutions, and Future Directions.
Energies, 15(18), 1-37. https://doi.org/10.3390/en15186799
Diptiben Ghelani, (2022). Cyber Security in Smart Grids, Threats, and Possible
Solutions. American Journal of Applied Scientifc Research, Vol. x(x.2022).
Dogaru, D. I., & Dumitrache, I. (2019). Cyber security of smart grids in the context
102 Acta Wasaensia 
of big data and machine learning. Proceedings - 2019 22nd International Confer-
ence on Control Systems and Computer Science, CSCS 2019, 61-67.
https://doi.org/10.1109/CSCS.2019.00018
Dong, H., Supratak, A., Pan, W., Wu, C., Matthews, P. M., & Guo, Y. (2018).
Mixed Neural Network Approach for Temporal Sleep Stage Classifcation. IEEE
Transactions on Neural Systems and Rehabilitation Engineering, 26(2), 324-333.
https://doi.org/10.1109/TNSRE.2017.2733220
Eder-Neuhauser, P., Zseby, T., Fabini, J., & Vormayr, G. (2017). Cyber attack
models for smart grid environments. Sustainable Energy, Grids and Networks, 12,
10-29.
https://doi.org/10.1016/j.segan.2017.08.002
Eken, H. (2013). Security Threats and Solutions in Cloud Computing. World
Congress on Internet Security (WorldCIS-2013).
Espe, E., Potdar, V., & Chang, E. (2018). Prosumer communities and relation-
ships in smart grids: A literature review, evolution and future directions. Energies,
11(10). https://doi.org/10.3390/en11102528
Falahi, M., Vasilateanu, A., Goga, N., Suciu, G., Sachian, M. A., Florescu, R.,
Ali, H. A., & Qian, Y. (2022). An Innovative Blockchain System for Smart Grids.
2022 IEEE International Conference on Blockchain, Smart Healthcare and Emerg-
ing Technologies, SmartBlock4Health 2022.
https://doi.org/10.1109/SmartBlock4Health56071.2022.10034523
Fang, X., Misra, S., Xue, G., & Yang, D. (2012). Smart grid - The new and im-
proved power grid: A survey. IEEE Communications Surveys and Tutorials, 14(4),
944-980.
https://doi.org/10.1109/SURV.2011.101911.00087
Federico Ferretti , Andrea Saltelli, S. T. (2016). Trends in sensitivity analysis prac-
tice in the last decade.Science of The Total Environment, 568, 666-670.
https://doi.org/10.1016/j.scitotenv.2016.02.133
Gao, L., Pi, Z., Wang, J., & Sun, J. (2022). Smart Grid Data Traceability System
based on Blockchain Technologies. Proceedings - 2022 8th Annual International
Conference on Network and Information Systems for Computers, ICNISC 2022,
721-726. https://doi.org/10.1109/ICNISC57059.2022.00145
Gunduz, M. Z., & Das, R. (2020). Cyber-security on smart grid: Threats and poten-
tial solutions. Computer Networks, 169, 107094. https://doi.org/10.1016/j.comnet.
2019.107094
Acta Wasaensia 103
Hahn, A., Ashok, A., Sridhar, S., & Govindarasu, M. (2013). Cyber-physical secu-
rity testbeds: Architecture, application, and evaluation for smart grid. IEEE Trans-
actions on Smart Grid, 4(2), 847-855. https://doi.org/10.1109/TSG.2012.2226919
Haluk Gozde; M. Cengiz Taplamacioglu; Murat Ari; Hamza Shalaf. (2015). 4G/LTE
technology for smart grid communication infrastructure. 2015 3rd International Is-
tanbul Smart Grid Congress and Fair (ICSG).
He, H., & Yan, J. (2016). Cyber-physical attacks and defences in the smart grid: a
survey. IET Cyber-Physical Systems: Theory & Applications, 1(1), 13-27.
https://doi.org/10.1049/iet-cps.2016.0019
Hledik, R. (2009). How Green Is the Smart Grid? Electricity Journal, 22(3), 29-41.
https://doi.org/10.1016/j.tej.2009.03.001
Huang, B. B., Xie, G. H., Kong, W. Z., & Li, Q. H. (2012). Study on smart
grid and key technology system to promote the development of distributed gen-
eration. 2012 IEEE Innovative Smart Grid Technologies - Asia, ISGT Asia 2012,
1-4. https://doi.org/10.1109/ISGT-Asia.2012.6303265
Hussain, H. M., Narayanan, A., Nardelli, P. H. J., & Yang, Y. (2020). What is en-
ergy internet? concepts, technologies, and future directions. IEEE Access, 8(iv),
183127-183145.
https://doi.org/10.1109/ACCESS.2020.3029251
Ibrahim, M. S., Dong, W., & Yang, Q. (2020). Machine learning driven smart
electric power systems: Current trends and new perspectives. Applied Energy,
272(June), 115237. https://doi.org/10.1016/j.apenergy.2020.115237
ICS-CERT. (2015). ICS-CERT Monitor September 2014 - Feburary 2015. ICS-
CERT Monitor, February, 1?15.
Jamil, F., Iqbal, N., Imran, Ahmad, S., & Kim, D. (2021). Peer-to-Peer Energy
Trading Mechanism Based on Blockchain and Machine Learning for Sustainable
Electrical Power Supply in Smart Grid. IEEE Access, 9, 39193-39217.
https://doi.org/10.1109/ACCESS.2021.3060457
Javid, I., Ghazali, R., Syed, I., Husaini, N. A., & Zulqarnain, M. (2022). Devel-
oping Novel T-Swish Activation Function in Deep Learning. 2022 International
Conference on IT and Industrial Technologies, ICIT 2022, 1-7.
https://doi.org/10.1109/ICIT56493.2022.9989151
Jozwiak, D., Pillai, J. R., Ponnaganti, P., Bak-Jensen, B., Jantzen, J., Wu, X., Dai,
104 Acta Wasaensia 
H., Zhang, N., Kong, W., O’Malley, M. J., Anwar, M. B., Heinen, S., Kober, T., Mc-
Calley, J., McPherson, M., Muratori, M., Orths, A., Ruth, M., Schmidt, T. J., Ma-
niatakos, M. (2020). Electric Power Grid Resilience to Cyber Adversaries: State of
the Art. IEEE Access, 8(5), 4929-4934. https://doi.org/10.1109/TII.2021.3112095
Kappagantu, R., & Daniel, S. A. (2018). Challenges and issues of smart grid imple-
mentation: A case of Indian scenario. Journal of Electrical Systems and Information
Technology, 5(3), 453-467.
https://doi.org/10.1016/j.jesit.2018.01.002
Kimani, K., Oduol, V., & Langat, K. (2019). Cyber security challenges for IoT-
based smart grid networks. International Journal of Critical Infrastructure Protec-
tion, 25, 36-49.
https://doi.org/10.1016/j.ijcip.2019.01.001
Kulkarni, V. Y., & Sinha, P. K. (2012). Pruning of random forest classifers: A sur-
vey and future directions. Proceedings-2012 International Conference on Data Sci-
ence and Engineering, ICDSE 2012, 64-68. https://doi.org/10.1109/ICDSE.2012.
6282329
Kumar, D., & S., D. S. (2020). Enhancing Security Mechanisms for Healthcare
Informatics Using Ubiquitous Cloud. Journal of Ubiquitous Computing and Com-
munication Technologies, 2(1), 19-28.
https://doi.org/10.36548/jucct.2020.1.003
Kumar, V., Pandey, A. S., & Sinha, S. K. (2016). Grid integration and power qual-
ity issues of wind and solar energy system: A review. International Conference on
Emerging Trends in Electrical, Electronics and Sustainable Energy Systems, ICE-
TEESES 2016, 2011, 71-80.
https://doi.org/10.1109/ICETEESES.2016.7581355
Kurniabudi, Stiawan, D., Darmawijoyo, Bin Idris, M. Y. Bin, Bamhdi, A. M., &
Budiarto, R. (2020). CICIDS-2017 Dataset Feature Analysis with Information Gain
for Anomaly Detection. IEEE Access, 8, 132911-132921. https://doi.org/10.1109/
ACCESS.2020.3009843
Kuzlu, M., Sarp, S., Pipattanasomporn, M., & Cali, U. (2020). Realizing the po-
tential of blockchain technology in smart grid applications. 2020 IEEE Power and
Energy Society Innovative Smart Grid Technologies Conference, ISGT 2020, 1-5.
https://doi.org/10.1109/ISGT45199.2020.9087677
Lamba, V., Simkova, N., & Rossi, B. (2019). Recommendations for smart grid se-
curity risk management. Cyber-Physical Systems, 5(2), 92-118.
https://doi.org/10.1080/23335777.2019.1600035
Acta Wasaensia 105
Leszczyna, R. (2018a). Cybersecurity and privacy in standards for smart grids - A
comprehensive survey. Computer Standards and Interfaces, 56(September 2017),
62-73. https://doi.org/10.1016/j.csi.2017.09.005
Leszczyna, R. (2018b). Standards on cyber security assessment of smart grid. In-
ternational Journal of Critical Infrastructure Protection, 22, 70-89.
https://doi.org/10.1016/j.ijcip.2018.05.006
Li, T., Zhang, W., Chen, N., Qian, M., & Xu, Y. (2019). Blockchain Technology
Based Decentralized Energy Trading for Multiple-Microgrid Systems. 2019 3rd
IEEE Conference on Energy Internet and Energy System Integration: Ubiquitous
Energy Network Connecting
Everything, EI2 2019, 631-636. https://doi.org/10.1109/EI247390.2019.9061928
Li, Y., & Liu, Q. (2021). A comprehensive review study of cyber-attacks and cyber
security; Emerging trends and recent developments. Energy Reports, 7, 8176-8186.
https://doi.org/10.1016/j.egyr.2021.08.126
Li, Y., Wei, X., Li, Y., Dong, Z., Shahidehpour, M. (2022). Detection of False Data
Injection Attacks in Smart Grid: A Secure Federated Deep Learning Approach.
IEEE Transactions on Smart Grid, 13(6), 4862-4872.
https://doi.org/10.1109/TSG.2022.3204796
Xiao, L., Wan, X., Lu, X., Zhang, Y., & Wu, D. (2018). IoT security techniques
based on machine learning: How do IoT devices use AI to enhance security?. IEEE
Signal Processing Magazine, 35(5), 41-49.
Liu, J., Xiao, Y., Li, S., Liang, W., & Chen, C. L. P. (2012). Cyber security and
privacy issues in smart grids. IEEE Communications Surveys and Tutorials, 14(4),
981-997.
https://doi.org/10.1109/SURV.2011.122111.00145
Long Cheng, Xuewu Chen, Jonas De Vos, Xinjun Lai, F. W. (2019). Applying a
random forest method approach to model travel mode choice behavior. Travel Be-
haviour and Society, 14, 1-10.
https://doi.org/https://doi.org/10.1016/j.tbs.2018.09.002
Mellor, A., Boukir, S., Haywood, A., & Jones, S. (2015). Exploring issues of train-
ing data imbalance and mislabelling on random forest performance for large area
land cover classifcation using the ensemble margin. ISPRS Journal of Photogram-
metry and Remote Sensing,105, 155-168.
https://doi.org/10.1016/j.isprsjprs.2015.03.014
106 Acta Wasaensia 
Merabet, A., Ahmed, K. T., Ibrahim, H., Beguenane, R., & Ghias, A. M. Y. M.
(2017). Laboratory Scale Microgrid Based Wind-PV-Battery. IEEE Transactions
on Sustainable Energy, 8(1), 145-154.
Metke, A. R., & Ekl, R. L. (2010). Security technology for smart grid networks.
IEEE Transactions on Smart Grid, 1(1), 99-107. https://doi.org/10.1109/TSG.2010.
2046347
Mingkui Wei, W. W. (2016). Data-centric threats and their impacts to real-time
communications in smart grid. Computer Networks, 104, 174-188.
https://doi.org/10.1016/j.comnet.2016.05.003
M. Elmusrati. (2022). Lecture Notes on Machine Learning Algorithms”, ICAT3120
Machine Learning Course. School of Technology and Innovations, University of
Vaasa.
Hasan, M. K., Habib, A. A., Shukur, Z., Ibrahim, F., Islam, S., & Razzaque, M. A.
(2023). Review on cyber-physical and cyber-security system in smart grid: Stan-
dards, protocols, constraints, and recommendations. Journal of Network and Com-
puter Applications, 209, 103540.
More, S., Hajari, S., Majeed, M.A., Singh, N.K., Mahajan, V. (2022). Cyber Secu-
rity for Smart Grid: Vulnerabilities, Attacks, and Solution. Sustainable Technology
and Advanced Computing in Electrical Engineering. Lecture Notes in Electrical
Engineering, 939. https://doi.org/https://doi.org/10.1007/978-981-19-4364-5-60
Moslehi, K., & Kumar, R. (2010). A reliability perspective of the smart grid. IEEE
Transactions on Smart Grid, 1(1), 57-64. https://doi.org/10.1109/TSG.2010.2046346
Mutani, G., Todeschi, V., Tartaglia, A., & Nuvoli, G. (2019). Energy commu-
nities in piedmont region (IT). the case study in pinerolo territory. INTELEC,
International Telecommunications Energy Conference (Proceedings), 2018-Octob.
https://doi.org/10.1109/INTLEC.2018.8612427
Mylrea, M., & Gourisetti, S. N. G. (2017). Blockchain for smart grid resilience:
Exchanging distributed energy at speed, scale and security. Proceedings - 2017 Re-
silience Week, RWS 2017, 18-23. https://doi.org/10.1109/RWEEK.2017.8088642
Nguyen, T. T., & Reddi, V. J. (2021). Deep Reinforcement Learning for Cyber
Security. IEEE Transactions on Neural Networks and Learning Systems, 1-17.
https://doi.org/10.1109/TNNLS.2021.3121870
Niculescu, S. P. (2003). Artifcial neural networks and genetic algorithms in QSAR.
Journal of Molecular Structure: THEOCHEM, 622(1-2), 71-83.
https://doi.org/10.1016/S0166-1280(02)00619-X
Acta Wasaensia 107
Novosel, D. (2012). Experiences with deployment of smart grid projects. 2012
IEEE PES Innovative Smart Grid Technologies, ISGT 2012.
https://doi.org/10.1109/ISGT.2012.6175600
Oyewole, P. A., & Jayaweera, D. (2020). Power System Security with Cyber-
Physical Power System Operation. IEEE Access, 8, 179970-179982.
https://doi.org/10.1109/ACCESS.2020.3028222
Pandey, M., & Kumar Sharma, V. (2013). A Decision Tree Algorithm Pertaining
to the Student Performance Analysis and Prediction. International Journal of Com-
puter Applications, 61(13), 1-5. https://doi.org/10.5120/9985-4822
Phom, H. S., Kuntze, N., Rudolph, C., Cupelli, M., Liu, J., & Monti, A. (2010). A
user-centric privacy manager for future energy systems. 2010 International Confer-
ence on Power System Technology: Technological Innovations Making Power Grid
Smarter, POWERCON2010, 1-7.
https://doi.org/10.1109/POWERCON.2010.5666447
Prabhakar, P., Arora, S., Khosla, A., Beniwal, R. K., Arthur, M. N., Arias-Gonza´les,
J. L., & Areche, F. O. (2022). Cyber Security of Smart Metering Infrastructure Us-
ing Median Absolute Deviation Methodology. Security and Communication Net-
works, 2022. https://doi.org/10.1155/2022/6200121
Procopiou, A., & Komninos, N. (2015). Current and future threats framework in
smart grid domain. 2015 IEEE International Conference on Cyber Technology
in Automation, Control and Intelligent Systems, IEEE-CYBER 2015, 1852-1857.
https://doi.org/10.1109/CYBER.2015.7288228
Qiumei, Z., Dan, T., & Fenghua, W. (2019). Improved Convolutional Neural Net-
work Based on Fast Exponentially Linear Unit Activation Function.IEEE Access,
7, 151359-151367.
https://doi.org/10.1109/ACCESS.2019.2948112
Radoglou-Grammatikis, P. I., & Sarigiannidis, P. G. (2019). Securing the Smart
Grid: A Comprehensive Compilation of Intrusion Detection and Prevention Sys-
tems. IEEE Access, 7, 46595-46620. https://doi.org/10.1109/ACCESS.2019.2909807
Rajput, G., Raut, G., Chandra, M., & Vishvakarma, S. K. (2021). VLSI imple-
mentation of transcendental function hyperbolic tangent for deep neural network
accelerators. Microprocessors and Microsystems, 84(April 2020), 104270.
https://doi.org/10.1016/j.micpro.2021.104270
Rohmeyer, P., & Ben-zvi, T. (2015). Managing Cloud Computing Risks in Finan-
108 Acta Wasaensia 
cial Services Institutions. 2015 Portland International Conference on Management
of Engineering and Technology (PICMET),
519-526. https://doi.org/10.1109/PICMET.2015.7273004
Sadiq, M., Ali, S. W., Terriche, Y., Mutarraf, M. U., Hassan, M. A., Hamid, K.,
Ali, Z., Sze, J. Y., Su, C. L., & Guerrero, J. M. (2021). Future Greener Seaports: A
Review of New Infrastructure, Challenges, and Energy Effciency Measures. IEEE
Access, 9, 75568-75587.
https://doi.org/10.1109/ACCESS.2021.3081430
Safavian, S. R., & Landgrebe, D. (1991). A Survey of Decision Tree Classifer
Methodology. IEEE Transactions on Systems, Man and Cybernetics, 21(3), 660-
674. https://doi.org/10.1109/21.97458
Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., & Tarantola, S.
(2010). Variance based sensitivity analysis of model output. Design and estimator
for the total sensitivity index. Computer Physics Communications, 181(2), 259-
270. https://doi.org/10.1016/j.cpc.2009.09.018
Sarker, I. H. (2022a). AI-Based Modeling: Techniques, Applications and Research
Issues Towards Automation, Intelligent and Smart Systems. SN Computer Science,
3(2), 1-20. https://doi.org/10.1007/s42979-022-01043-x
Sarker, I. H. (2022b). Machine Learning for Intelligent Data Analysis and Au-
tomation in Cybersecurity: Current and Future Prospects. Annals of Data Science.
https://doi.org/10.1007/s40745-022-00444-2
Sarker, I. H., Furhad, M. H., & Nowrozy, R. (2021). AI-Driven Cybersecurity: An
Overview, Security Intelligence Modeling and Research Directions. SN Computer
Science, 2(3), 1-18.
https://doi.org/10.1007/s42979-021-00557-0
Shampa Banik, Sohag Kumar Saha, Trapa Banik, S. M. M. H. (2023). Anomaly
Detection Techniques in Smart Grid Systems: A Review. 2023 IEEE World AI IoT
Congress (AIIoT).
Shin, M. J., Guillaume, J. H. A., Croke, B. F. W., & Jakeman, A. J. (2013). Ad-
dressing ten questions about conceptual rainfall-runoff models with global sensitiv-
ity analyses in R. Journal of Hydrology, 503, 135-152.
https://doi.org/10.1016/j.jhydrol.2013.08.047
Siraj, V. F. and A. (2014). Applications of machine learning in cyber security. 27th
International Conference on Computer Applications.
Acta Wasaensia 109
Smith, M. D., & Pate-Cornell, M. E. (2018). Cyber risk analysis for a smart
grid: How smart is smart enough? A multiarmed bandit approach to cyber secu-
rity investment. IEEE Transactions on Engineering Management, 65(3), 434-447.
https://doi.org/10.1109/TEM.2018.2798408
Spataru, C. (2013). The future whole energy system stability, reliability and secu-
rity: WITH or WITHOUT fear of blackouts? IET Seminar Digest, 2013(15377).
https://doi.org/10.1049/ic.2013.0155
Steimer, P. K. (2009). Power electronics, a key technology for future more elec-
trical energy systems. 2009 IEEE Energy Conversion Congress and Exposition,
ECCE 2009, 1161-1165.
https://doi.org/10.1109/ECCE.2009.5316175
Suicimezov, N., & Georgescu, M. R. (2014). Emerging Markets Queries in Finance
and Business IT Governance in Cloud. Procedia Economics and Finance, 15(14),
830-835. https://doi.org/10.1016/s2212-5671(14)00531-0
Sun, C. C., Hahn, A., & Liu, C. C. (2018). Cyber security of a power grid: State-of-
the-art. International Journal of Electrical Power and Energy Systems, 99(Novem-
ber 2017), 45-56.
https://doi.org/10.1016/j.ijepes.2017.12.020
Svendsen, H. G., Shetaya, A. A., & Loudiyi, K. (2017). Integration of renewable
energy and the beneft of storage from a grid and market perspective - Results from
Morocco and Egypt case studies. Proceedings of 2016 International Renewable and
Sustainable Energy Conference, IRSEC 2016, 1164-1168.
https://doi.org/10.1109/IRSEC.2016.7984007
Syrmakesis, A. D., Alcaraz, C., & Hatziargyriou, N. D. (2022). Classifying re-
silience approaches for protecting smart grids against cyber threats. International
Journal of Information Security, 21(5), 1189-1210. https://doi.org/10.1007/s10207-
022-00594-7
Taghavinejad, S. M., Taghavinejad, M., Shahmiri, L., Zavvar, M., & Zavvar, M.
H. (2020). Intrusion Detection in IoT-Based Smart Grid Using Hybrid Decision
Tree. 2020 6th International Conference on Web Research, ICWR 2020, 152-156.
https://doi.org/10.1109/ICWR49608.2020.9122320
Taji, B., Chan, A. D. C., & Shirmohammadi, S. (2018). False Alarm Reduction in
Atrial Fibrillation Detection Using Deep Belief Networks. IEEE Transactions on
Instrumentation and Measurement, 67(5), 1124-1131.
https://doi.org/10.1109/TIM.2017.2769198
110 Acta Wasaensia 
Tan, S., De, D., Song, W. Z., Yang, J., & Das, S. K. (2017). Survey of Security Ad-
vances in Smart Grid: A Data Driven Approach. IEEE Communications Surveys
and Tutorials, 19(1), 397?422.
Tang, D., Fang, Y. P., & Zio, E. (2023). Vulnerability analysis of demand-response
with renewable energy integration in smart grids to cyber attacks and online detec-
tion methods. Reliability Engineering and System Safety, 235(January), 109212.
https://doi.org/10.1016/j.ress.2023.109212
Teixeira, M. A., Salman, T., Zolanvari, M., Jain, R., Meskin, N., & Samaka, M.
(2018). SCADA system testbed for cybersecurity research using machine learning
approach. Future Internet, 10(8). https://doi.org/10.3390/f10080076
Tony Flick, J. M. (2010). Securing the Smart Grid Next Generation Power Grid
Security. Elsevier.
Tuballa, M. L., & Abundo, M. L. (2016). A review of the development of Smart
Grid technologies. Renewable and Sustainable Energy Reviews, 59, 710-725.
https://doi.org/10.1016/j.rser.2016.01.011
Tufail, S., Parvez, I., Batool, S., & Sarwat, A. (2021). A survey on cybersecu-
rity challenges, detection, and mitigation techniques for the smart grid. Energies,
14(18), 1-22. https://doi.org/10.3390/en14185894
Vaos, J., Kuhn, R., Laplante, P., & Applebaum, S. (2018). Internet of Things (IoT)
Trust Concerns Draft NISTIR 822. Draft NISTIR 8222, 50-50.
https://csrc.nist.gov/publications/detail/white-paper/2018/10/17/iot-trust-concerns/draft
Voas, J. (2016). Networks of ”Things.” NIST Special Publication, 800(183), 800-
183.
Wang, W., & Lu, Z. (2013). Cyber security in the Smart Grid: Survey and chal-
lenges. Computer Networks, 57(5), 1344-1371. https://doi.org/10.1016/j.comnet.
2012.12.017
Waqar, A., Hu, J., Awais, M., Xia, S., & Ai, X. (2021). Distributed Operation of
Multi-Microgrids under Censored Communication. 2021 4th International Confer-
ence on Energy, Electrical and Power Engineering, CEEPE 2021, 800-806.
https://doi.org/10.1109/CEEPE51765.2021.9475633
Workman, M., Bommer, W. H., & Straub, D. (2008). Security lapses and the omis-
sion of information security measures: A threat control model and empirical test.
Computers in Human Behavior, 24(6), 2799-2816. https://doi.org/10.1016/j.chb.
2008.04.005
Acta Wasaensia 111
Wulf, F., Strahringer, S., & Westner, M. (2019). Information security risks, benefts,
and mitigation measures in cloud sourcing. Proceedings - 21st IEEE Conference on
Business Informatics, CBI 2019, 1, 258-267. https://doi.org/10.1109/CBI.2019.
00036
Liu, X., & Nielsen, P. S. (2016). Regression-based online anomaly detection for
smart grid data. arXiv preprint arXiv:1606.05781.
Yan, Y., Qian, Y., Sharif, H., & Tipper, D. (2012). A survey on cyber security for
smart grid communications.IEEE Communications Surveys and Tutorials, 14(4),
998-1010.
https://doi.org/10.1109/SURV.2012.010912.00035
Yang, Y., Littler, T., Sezer, S., McLaughlin, K., & Wang, H. F. (2011). Impact of
cyber-security issues on Smart Grid. IEEE PES Innovative Smart Grid Technolo-
gies Conference Europe, 1-7.
https://doi.org/10.1109/ISGTEurope.2011.6162722
Ye Yan, Yi Qian, Hamid Sharif, and D. T. (2013). A Survey on Smart Grid Commu-
nication Infrastructures: Motivations, Requirements and Challenges. IEEE COM-
MUNICATIONS SURVEYS & TUTORIALS, 15(1). https://doi.org/10.1109/SURV.
2012.021312.00034
Yeboah-Ofori, A., & Islam, S. (2019). Cyber security threat modeling for supply
chain organizational environments. Future Internet, 11(3). https://doi.org/10.3390/
f11030063
Yigit, M., Gungor, V. C., & Baktir, S. (2014). Cloud Computing for Smart Grid
applications. Computer Networks, 70, 312-329. https://doi.org/10.1016/j.comnet.
2014.06.007
Youssef, T. A., Hariri, M. El, Elsayed, A. T., & Mohammed, O. A. (2018). A
DDS-Based Energy Management Framework for Small Microgrid Operation and
Control. IEEE Transactions on Industrial Informatics, 14(3), 958-968.
https://doi.org/10.1109/TII.2017.2756619
Z Su, L Xu, S Xin, W Li, Z Shi, Q. G. (2017). A future outlook for cyber-physical
power system. 2017 IEEE Conference on Energy Internet and Energy System.
Zhang, Q., Sun, Y., & Cui, Z. (2010). Application and analysis of ZigBee technol-
ogy for Smart Grid. Proceedings of ICCIA 2010 - 2010 International Conference on
Computer and Information Application, 171-174. https://doi.org/10.1109/ICCIA.
2010.6141563
112 Acta Wasaensia 
Zhang, Y., Huang, T., & Bompard, E. F. (2018). Big data analytics in smart grids: a
review. Energy Informatics, 1(1), 1-24. https://doi.org/10.1186/s42162-018-0007-5
Acta Wasaensia 113
 
 
    
Publication I 
114 Acta Wasaensia 
FOCUS
On the performance metrics for cyber-physical attack detection
in smart grid
Sayawu Yakubu Diaba1 • Miadreza Shafie-khah2 • Mohammed Elmusrati1
Accepted: 3 January 2022 / Published online: 21 January 2022
 The Author(s) 2022
Abstract
Supervisory Control and Data Acquisition (SCADA) systems play an important role in Smart Grid. Though the rapid
evolution provides numerous advantages it is one of the most desired targets for malicious attackers. So far security
measures deployed for SCADA systems detect cyber-attacks, however, the performance metrics are not up to the mark. In
this paper, we have deployed an intrusion detection system to detect cyber-physical attacks in the SCADA system
concatenating the Convolutional Neural Network and Gated Recurrent Unit as a collective approach. Extensive experi-
ments are conducted using a benchmark dataset to validate the performance of the proposed intrusion detection model in a
smart metering environment. Parameters such as accuracy, precision, and false-positive rate are compared with existing
deep learning models. The proposed concatenated approach attains 98.84% detection accuracy which is much better than
existing techniques.
Keywords Supervisory control and data acquisition (SCADA) systems  Intrusion detection system (IDS) 
Industrial control system (ICS)  Cyber-physical security  Smart grid  Convolutional neural network (CNN) 
Gated recurrent unit (GRU)
1 Introduction
Most of the Intrusion Detection Systems (IDS) used in
Supervisory Control and Data Acquisition (SCADA) in
power distribution networks are currently concentrated on
the cyber sector by ignoring the process states in the
physical field (Rakas et al. 2020). Attacks on protocol
traffic are being detected, but attacks on processes like
Replay and Man-in-the-Middle (MITM) attacks are
complex to detect. The performance criteria, risk man-
agement requirements, and coordination requirements vary
between Information Technology System (IT System) and
Industrial Control System (ICS) networks. In an IT system,
a long delay could be appropriate, but coming to ICS,
reaction time is crucial (Ghosh and Sampalli 2019). In IT
systems, data security and integrity are most significant,
whereas fault tolerance is less significant, while in ICS,
human safety is most crucial, followed by process security,
and fault tolerance is necessary (Paridari et al. 2018). Many
specific communication protocols without ID certification,
encryption, and timestamps were utilized in ICS, whereas
standard communication protocols are utilized in IT sys-
tems. Zero-day, Denial of Service (DoS), Replay and
MITM attacks to ICS will trigger the above-mentioned
delay, fault, and information leakage caused by protocols
(Hu et al. 2019).
Cyber-physical systems are widely used to integrate the
physical process and computations so that the system can
be controlled effectively. The performance of the system
relies on proper control of its elements, like sensors and
actuators. Efficient and secure communication between the
system element is most important as it directly affects the
Communicated by Joy Iong-Zong Chen.
& Sayawu Yakubu Diaba
saywu.diaba@student.uwasa.fi
Miadreza Shafie-khah
mshafiek@uwasa.fi
Mohammed Elmusrati
moel@uwasa.fi
1 School of Technology and Innovation, Department of
TelecommunicationsEngineering, University of Vaasa,
Vaasa, Finland
2 School of Technology and Innovations, Department of
Electrical Engineering, University of Vaasa, Vaasa, Finland
123
Soft Computing (2022) 26:13109–13118
https://doi.org/10.1007/s00500-022-06761-1(0123456789().,-volV)(0123456789().,- volV)
Acta Wasaensia 115
system performance. Malfunctioning the device charac-
teristics might lead to a serious issue in industrial control
systems. The system elements face serious security threats
which affect the sensing and data actuation. Attacks on IT
system networks cause congestion or data leakage, but
attacks on ICS networks may result in both data leakage as
well as harm to physical infrastructure. As a result, for the
SCADA systems, which are commonly utilized in the
power distribution networks to ensure the security of the
controlled processes, cyber-security is considered a sig-
nificant part of SCADA (Zhang et al. July 2019). The
protection of communication protocols, asset management,
physical infrastructures, and controlled processes will
come under the security of the SCADA system that is the
most important element of the smart grid, and these cannot
be handled the same as IT system contemporaries. Some of
the key components are supporting software such as
Human Machine Interface (HMI), Distributed Control
Systems (DCS), Programmable Logical Controllers (PLC),
Remote Terminal Units (RTU), network equipment, ser-
vers, and computers (Co´mbita et al. 2019, Pang et al. 2020,
Sun et al. 2020, Elnour et al. 2020). Hence, it is essential to
protect the system against attacks and secure
communication.
Intrusion detection systems are used to detect the
security threats and attacks in a system where the systems
can able to detect but not able to prevent the attacks.
However, by training the detection systems properly the
attacks can be detected efficiently without any manual
intervention which may reduce the huge loss compared to
the loss acquired in a system without an intrusion detection
system. These systems will work as a second-line defense
in any architecture and plays a vital role in cyber-physical
systems to detect different types of attacks. Intrusion
detection systems classify normal and abnormal behavior
which helps the system to detect unknown attacks. This
essential feature is adopted for cyber-physical systems. A
wide range of devices and dynamic computing resources,
different software, and operating systems are generally
included in cyber-physical systems. Detecting intrusion in
such systems using machine learning algorithms-based
models is crucial due to the heterogeneous deployment
nature. Obtaining labels for attacks can be very time-con-
suming, challenging, and sometimes even impossible.
Therefore, unsupervised learning techniques, capable of
detecting cyber-attacks without a need for labels, are
deemed best for this task (Keshk et al. 2021, Gumaei et al.
2020). However, the most existing unsupervised techniques
are not able to deal with the nonlinearity and inherent
correlations of multivariate time series, which represent a
considerable amount of real-world data, including data
streams generated by sensors in CPSs (Hu et al. 2019;
Rodofile et al. 2019; Homay et al. 2020). Therefore, a new
unsupervised technique independent from any prior
knowledge of cyberattacks is needed to detect intrusions in
CPSs.
The major contributions of this paper are summarized as
follows.
(1) CNN and GRU are combined to obtain an intrusion
detection system for detecting attacks in smart grid
metering infrastructure.
(2) An intense experimental analysis is presented using
benchmark datasets to obtain improved accuracy and
detection rate performance for the proposed model.
The rest of the paper is arranged as follows: a brief
literature analysis is presented in Sect. 2; the proposed
intrusion detection model is presented in Sect. 3. Experi-
mental results and observations are discussed in Sect. 4 and
finally, the conclusion is presented in Sect. 5.
2 Related works
Recent research works in industrial systems and their
evolutions are discussed in this section. Intrusion detection
is the major objective and the research directed toward
analyzing the features of existing intrusion detection sys-
tems. The authors of Khan et al. (2019) have introduced a
new method called anomaly detection for ICS. This method
utilized a hybrid approach by taking the benefits of the
reliable and predictable nature related to communication
patterns, which perform in-ground devices in ICS. Initially,
few preprocessing approaches were implemented for scal-
ing and standardizing the data. To enhance the perfor-
mance of anomaly detection, dimensionality reduction
algorithms were used. Later, the nearest-neighbor rule
algorithm was utilized for balancing the dataset. A signa-
ture database was created by noting the system in a time
using a bloom filter. Subsequently, a hybrid approach was
created for anomaly detection by combining the instance-
based learner and package contents-level detection. Here,
the developed model has attained the best results when
compared to other state-of-the-art models.
The authors of Qian et al. (2020) have suggested a
method in a physical way as well as a cyber-way for attack
detection. In order to detect malicious behaviors for
physical component prevention, process states validation
was utilized and being damaged by Zero-day, MITM, and
Replay attacks. For branching shaped data sets classifica-
tion, a nonparallel hyperplane-based fuzzy classifier was
developed that was quite complex, complex to classify
using two parallel hyperplanes of the Support Vector
Machine (SVM) to detect DoS and other cyber-attacks. To
test the developed model and validation part, Modbus/
Transmission Control Protocol (TCP) traffic data and
13110 S. Y. Diaba et al.
123
116 Acta Wasaensia 
simulation process states were used. Thus, it has been
proved that the suggested approach was superior to other
approaches.
In (Sheng et al. 2021), a cyber-physical technique in the
SCADA system for intrusion detection has analyzed the
risk levels faced in industries. This was utilized to char-
acterize the structure of the network and SCADA system’s
process by the extraction and correlation of communication
patterns and the ICS device condition. If any violation
occurs, then this was considered as an abnormal behavior
that was caused due to network attacks. A risk estimation
approach was suggested to measure the damage degree of
the attack on the infrastructure by associating network
intrusions with the state of the SCADA system, providing
network teams with more knowledge regarding network
attacks. Furthermore, the proposed approach outperformed
existing approaches in identifying and evaluating numer-
ous cyber-attacks against the SCADA system.
Privacy-Preserving Anomaly Detection framework
named PPAD-CPS is reported in the research work of
Keshk et al. (Keshk et al. 2021). The aim is to secure
confidential data and discover malicious attacks in power
systems and their network traffic. This framework included
two modules namely data preprocessing and anomaly
detection modules. To filter and transform the real data into
a new kind of data, a data preprocessing module was rec-
ommended that has attained privacy preservation target. By
using Kalman Filter (KF) and Gaussian Mixture Model
(GMM), an anomaly detection model was employed for
analyzing the posterior probabilities of anomalous and
legitimate events. Two public datasets such as UNSW-
NB15 and Power System were used for analyzing the
proposed framework. This analysis has been proved that
the developed PPAD-CPS outperformed the existing
methodologies.
In (Kalech 2019) cyber-attack detection models based
on temporal pattern recognition (TPR), which searched for
anomalies in the data sent by the SCADA elements in the
network and found anomalies that were occurred when
legal commands like incorrect and unauthorized time
intervals were misused. Artificial Neural Networks (ANN)
and Hidden Markov Model (HMM) were suggested for
evaluating the performance. The evaluation was done on
both simulated and real SCADA data using five various
feature extraction approaches. The outcomes have shown
that TPR models were performed well in detecting cyber-
attacks.
Gumaei et al. (Gumaei et al. 2020) have proffered a new
security control method for cyber-attack detection in smart
grid, which merged feature detection and reduction models
for decreasing the features count and attained a better
detection rate. For eliminating irrelevant features, the
Correlation-based Feature Selection (CFS) technique was
utilized that enhanced the detection rate. With the help of
optimal features that were selected, cyber and normal
attack events were classified using the Instance-Based
Learning (IBL) algorithm. By using public datasets of
SCADA power system, the experiments were performed
relied on tenfold cross-validation approach. This has been
revealed that the suggested model consisted of huge
detection rate.
Rodofile et al. (Rodofile et al. 2019) have presented a
cyber-attack structure, which detects attacks in SCADA
systems. The developed model recognized ‘‘traditional IT-
based attacks, protocol specific attacks, configuration-
based attacks and control process attacks’’ for describing
the practical attacks. The recognition of attacks in the
whole system has an advantage of allowing us to protect
over them with more effectiveness and awareness. A case
study was presented by illustrating the sequence of attacks
on Distributed Network Protocol 3 (DNP3), which facili-
tated to affirm the reliability of the developed model.
Reference (Homay et al. 2020) has implemented a
robust security control solution as a logic level security on
DCS and SCADA systems. In order to establish trust
among DCS device components, the developed model
ensured message integrity, but this was not considered as
the protection layer on industrial automation systems.
Malicious attacks like Stuxnet were avoided by the
developed solution called low-level security process. From
the analysis, the following points are observed as research
gaps. The security of SCADA systems has been disrupted
using earlier IC system attacks. The significance of
defending and securing ICS networks has increased attacks
on critical infrastructures. The features and challenges of
various methodologies of detecting and blocking cyber-
security attack on SCADA systems that has existed earlier
are given subsequently. HML-IDS can detect anomalies
very fast and it can detect unseen attacks also it can deal
with data samples that seem to be hybrid which is complex.
But it has some demerits like enhancement of detection
rate (Feng et al. 2017).
NHFC is capable of modeling a given problem into any
degree of accuracy and has high detection accuracy in
detecting zero-day, Replay, and MITM attacks. It needs to
improve in detecting the attacks for securing the manu-
facturing process (Wang et al. August 2020). Cyber-
physical model is used to detect the network intrusions. It
has high accuracy. Though, it is not considered as the
secure appliance, because of the lack of multi-factor
authentication models. PPAD-CPS has huge privacy levels.
It attains the best accuracy, detection rate, and processing
times. It needs to perform a principal and independent
component analysis for transforming the high-dimensional
space into low-dimensional space in order to enhance the
performance. TPR can detect cyber-attacks including
On the performance metrics for cyber-physical attack… 13111
123
Acta Wasaensia 117
legitimate functions. It has high detection accuracy. It
needs to reduce the count of false alarms by considering
PLC identity (Dhaya 2021; Jacob and Ebby Darney 2021;
Haoxiang and Smys 2021; Smys et al. 2021).
CFS-KNN resorts to various correlation measures for
removing the irrelevant features and retaining those using
predictive power. It is robust to exploit inter-feature rela-
tionships. It is sensitive to noise and has an over-fitting
problem, which leads to reduce the system performance.
DNP3 consists of efficient Internet Provider Security (IPS)
and IDS technologies. It can combine four categories for
describing the practical attacks. However, troubleshooting
the system is quite complex because of distribution over
many servers.
To overcome these limitations concatenated deep
learning approach is presented in this paper. Deep learning
approach has the ability to process high-dimensional data
and produce better results than machine learning approa-
ches. (Fig. 1)
3 Proposed work
The proposed concatenated deep learning model is pre-
sented in this section to detect intrusions in Cyber-Physical
Systems. The simple environment considered for this paper
is depicted in Fig. 2. The framework consists of two enti-
ties such as industrial agents and cloud servers. The
industrial agents are the industrial Cyber-Physical System
owners and they oversee the local intrusion detection
model. Collecting industrial cyber-physical system data
and updating the essential parameters for intrusion detec-
tion are some of the regular activities of industrial agents.
Industrial agents interact with cloud servers to update the
data. Whereas cloud servers take the responsibility of
building a comprehensive intrusion detection system using
the model parameters obtained from the locally trained
model in the industrial agents. The final intrusion detection
model could be obtained through multiple interactions
between each agent and cloud server.
The threat model we have considered for analysis
includes four different threats such as reconnaissance
attacks, command injection attacks, response injection
attacks, DoS attacks. Cyber threats targeting industrial
cyber-physical security are considered for the proposed
concatenated deep learning-based intrusion detection sys-
tem. The reconnaissance attacks are performed to collect
industrial cyber-physical security system valuable infor-
mation. Network architecture details, device features, net-
work protocols are the major aim of the intruder to perform
reconnaissance attacks. To mislead or deviate the industrial
physical security system behavior, command injection
attacks are performed. In this attack, the intruder injects
some false information to control a system or provides
wrong configuration commands to collapse the system
behavior. Unauthorized access, invalid communications,
wrong set points are the outcomes of a command injection
attack.
To monitor and observe the remote process state in the
industrial cyber-physical security system, response injec-
tion attacks are performed. These attacks interfere with the
system process and provide false responses to the service
queries which affect the system state. DoS attacks are quite
common and familiar in the network, in the case of
industrial cyber-physical security systems the attacker
flooded the targets with redundant requests to deplete the
server resources in the industrial cyber-physical security
system. Due to these boundless requests, the system pre-
vents legitimate requests which affect the system services.
Figure 3 depicts the proposed concatenated deep learning
Cloud server
Industrial Agent 1
Industrial Agent 2
Industrial Agent n
Intruder
Fig. 1 Proposed system model
13112 S. Y. Diaba et al.
123
118 Acta Wasaensia 
model for intrusion detection in the industrial cyber-phys-
ical security system.
The proposed intrusion detection model includes a CNN
model and a GRU unit module. The outputs of both
modules are combined and processed by a fully connected
layer followed by the SoftMax layer. The building block of
the convolutional network model includes four convolution
blocks. Each block includes a convolutional layer, a max-
pooling layer. Batch normalization is performed between
the convolution layer and max-pooling layer. The GRU is
comprised of three GRU layers. The results of the CNN
and GRU model are concatenated and then processed using
two fully connected layers. In order to prevent over-fitting,
a dropout layer is included after the fully connected layer.
Finally, the SoftMax layer is used to map the output of
fully connected to probability distribution and predicts the
attack types.
3.1 System model
Consider a feature vector v as input for the proposed model
which is a one-dimensional vector function representing
the numerical features of industrial data. The input is
processed by CNN and GRU model. GRU is a modified
LSTM model which includes a gated recurrent neural
network. LSTM consists of three gates such as forget gate,
input gate, and output gate whereas GRU comprises of two
gates such as update gate and reset gate. Due to this it
requires less parameter for training which provides quick
convergence compared to LSTM. Owing to this reason
instead of LSTM, GRU is adopted in the proposed design.
The long dependency features are captured by the GRU
module and learn essential information from the historical
data using memory cell. The reset gate is used to forget or
remove unnecessary information. Generally, the input for
GRU module will be time sequence data and the given
input feature vector is a multivariate time series data with a
single time step. Therefore, prior to GRU module, a
dimension shuffling process is performed that transposes
the dimensions of the feature vector v. The dimension
shuffling is given as
~v ¼ shuf ðvÞ ð1Þ
The GRU module process the dimensional shuffled data
~v and produces output. The output of first layer is given as
input to the second layer and it repeats. The output of GRU
module is obtained using two activation functions such as
Fig. 2 Proposed concatenated deep learning model
multivariate time series data 
with single time step
Univariate time series data 
with multiple time steps
Train GRU module using 
Time sequences
Train CNN module using 
Time sequences
Concatenate CNN and GRU 
outputs  
Find the output elements 
using fully connected layer 
Classify the results using 
SoftMax layer 
Detect the threats types
Halt
Start
Input Feature vector
Fig. 3 Process flow of proposed Intrusion detection model
On the performance metrics for cyber-physical attack… 13113
123
Acta Wasaensia 119
tanh and r. The essential formulations observed for the
GRU module is summarized as follows
CU ¼ r xU ~v t1ð Þ; x tð Þ
h i
þ bU
 
ð2Þ
CR ¼ r xR ~v t1ð Þ; x tð Þ
h i
þ bR
 
ð3Þ
where CU and CR represents the update gate and reset gate
respectively. The range of CU 2 0; 1f g and the range of
CR 2 1; 1f g. xU and xR are the weight functions of
update and reset gate. bU and bR represents the bias vectors
for update and reset gate correspondingly. The candidate
activation function for the recurrent unit is given as
~v tð Þ ¼ tanh xv CR  ~v t1ð Þ; x tð Þ
h i
þ bv
 
ð4Þ
where xv are the weight functions of activation function, bv
is the bias vector and x tð Þ is the inputs of training data. The
output of a single GRU unit is given as
v tð Þ ¼ 1 CUð Þ  ~v t1ð Þ
 
þ CU  ~v tð Þ
 
ð5Þ
where ~v t1ð Þ is the current unit input which is obtained from
the previous unit output. The final output of GRU module
is given as I. The same input used for GRU model is
parallelly provided to CNN module. The input feature
vector v is considered as a univariate time series data with
multiple time steps. Since CNN is suitable to process high-
dimensional data it does not require any dimensional
shuffling module as like GRU. One-dimensional layer is
used in the proposed model and the convolution operation
is represented as
h1 ¼ convblock1ðvÞ ð6Þ
h2 ¼ convblock2ðh1Þ ð7Þ
h3 ¼ convblock3ðh2Þ ð8Þ
h4 ¼ convblock4ðh3Þ ð9Þ
where h1, h2, h3 and h4 are the hidden vectors. After each
convolution layer, a batch normaliztion and pooling layer
is included. The normalization layer normalizes the fea-
tures which is obtained after the convolution proces and the
pooling layer is used to reduce the dimensionality of the
data. There are two types of pooling functions such as max
pooling an average pooling. In this research work, average
pooling is used in the pooling layer. The reduced output
from the pooling layer is provided as the input to the next
convolution block. The average pooling layer is math-
matically expressed as
xi ¼ avgP xi1  ð10Þ
where xi is the ouput of pooling layers, and xi1 is the
previous values obtained from the convolution layer. i
represents the number of pooling layers. The final output of
convolution layer is given to flatten layer that converts the
data into one-dimensional vector and it is given as
J ¼ flattenðh4Þ ð11Þ
The output I from the GRU module and J from the CNN
module are concatenated which is described as
ct ¼ concateðI; JÞ. Followed by two fully connected layers
are used in the proposed design. To prevent data over-
fitting a dropout function is used. Then the dropout layer,
the final SoftMax layer provides the classification results
which provide the attack types. The SoftMax function is
described as
by ¼ softmaxðuÞ ð12Þ
where u is the output of dropout layer. In order to evaluate
the loss function for the proposed model cross-entropy
function is used and it is given as
l ¼  1
b
Xn
i¼1
yilogy
0
i ð13Þ
where the batch size is given as b, n denotes the training
sample size, the predicted value is represented as y
0
i athe nd
actual value is represented as yi. The process flow of the
proposed approach is depicted in Fig. 3.
Figure 3 depicts the process flow of the proposed
intrusion detection model. Initially, the process starts with
the selection of input feature vectors. The multivariate time
series data with a single time step is used to train the GRU
model and univariate series data with multiple time steps is
used to train the CNN model. The output features of each
model are concatenated, and the elements are provided as
input to the fully connected layer. The final classified
results are obtained from the SoftMax layer and they pro-
vide the details of threats and their types.
4 Results and discussion
The proposed intrusion detection model is being experi-
mented on a smart metering environment and the model is
comprised of three network configurations, such as Home
Area Network (HAN), Wide Area Network (WAN), and
Neighborhood Area Network (NAN). The household
applications, concentrators, smart electricity meters, data
processing centers, and nodes are included in the smart
infrastructure. Wireless communication or wired commu-
nication is used for communicating the elements. The
communication is bidirectional and mainly internal com-
munications are performed through HAN, which is vul-
nerable to attacks. A DoS attack and a probing attack are
the major attacks on HAN. The NAN is used for short-
13114 S. Y. Diaba et al.
123
120 Acta Wasaensia 
distance communication, and it is vulnerable to the user to
Root (U2R) attack. WAN is used for long-distance com-
munication and it is vulnerable to Remote to Local (R2L)
attacks.
To evaluate the performance of the proposed intrusion
detection system with a standard benchmark dataset, the
NSL-KDD dataset is used. The data include 125,973
attacks and normal data which are provided as input to the
proposed intrusion detection. Deep learning models are not
able to process character-based features, so to simplify and
process the input data, preprocessing steps such as nor-
malization and feature screening are performed before the
input is fed into the deep learning model. The attack data
present in the dataset is categorized into four types, such as
DoS, Probe attack, U2R attack, and R2L attack. The sub-
classes of attacks are 39 and they cover the attack types of
smart metering infrastructure. The experimentation process
trains and tests the data in the ratio of 80:20 according to
the fivefold cross-validation method. The dataset distribu-
tion for the NSL-KDD dataset is listed in Table 1.
In the data preprocessing, the characteristic features are
converted into numerical values, specifically as Eigenvec-
tors, using a one-hot encoding process. For this process,
flag features, services, and protocol types are considered in
the dataset. The protocol type considers attributes such as
user datagram protocol (UDP), TCP, and Internet control
message protocol (ICMP). The numerical data and one-hot
encoding represent the feature vectors in 1 9 3 dimension
as like (0,0,0), (0,1,1), etc. The service feature has 70
attributes and flag features include 11 attributes and these
attributes. The features after preprocessing are mapped into
numerical features and combined with existing numerical
features in the dataset. Also, the labels in the dataset are
numerically processed such that, normal behavior is rep-
resented as ‘0’, DoS is represented as ‘1’, Probe label is
represented as ‘2’, R2L as ‘3’, and U2R as ‘4’. In order to
reduce the feature differences, the dataset is normalized
and uniformly mapped. The interval range of uniform
mapping is [0, 1].
The proposed intrusion detection model is implemented
in Spyder3.0 (Python3.6) operating on Windows 10 OS
installed on an i5 Intel processor at 4.20 GHz and 8 GB of
memory. The learning rate was set at 0.006 and the number
of epochs was 100. The hyperparameter details used in the
proposed work are listed in Table 2.
The proposed model performance is further evaluated
based on the convergence ability and classification per-
formance in terms of accuracy, detection ratio, and false-
positive rate. The training loss and epoch for the proposed
concatenated model are depicted in Fig. 4. It is observed
from the results. The training loss is gradually decreased
and stable after the eighth epoch. This indicates the
selection of hyperparameters is reasonable and validates
the convergence ability of the proposed model. The con-
fusion matrix for the proposed approach is depicted in
Table 3.
The performance of the proposed model in terms of
precision, detection rate, f1-score, and false-positive rate is
depicted for the four types of attacks in Fig. 5. Based on
the values obtained from the confusion matrix, the
parameters are evaluated.
From Fig. 5, it can be observed that the detection abil-
ities for DoS and Probe are above 95%, whereas the
detection abilities for U2R and R2L are below 90% and
50% due to the limited number of training data. The per-
formance of the proposed model is validated using fivefold
cross-validation and the confusion matrix obtained after
fivefold validation is depicted in Table 4.
The performance of the proposed model in terms of
precision, detection rate, f1-score, and false-positive rate is
depicted for the four types of attacks in Fig. 6. Based on
the values obtained from the confusion matrix given in
Table 4, the parameters are evaluated. It is observed from
the results that the detection rate of the proposed model is
maximum for DoS and Probe attacks and it has been
reduced for R2L and U2R attacks. The reduced perfor-
mance is due to the minimum number of samples in the
Table 1 Dataset Distribution of NSL-KDD
Type Total Training set Test set
Normal 67,343 53,874 13,469
Dos 45,927 36,742 9185
Probing 11,656 9325 2331
R2L 995 796 199
U2R 52 42 10
Total 1,25,973 100,778 25,195
Table 2 Hyperparameter settings
S. no Parameter Filter/Neurons
1 Number of filters in CNN 8
Number of Neurons in CNN and ReLu 16
2 Number of Hidden nodes 60
3 Activation function ReLU
4 Dense layer 256
5 Cost function Cross entropy
6 Batch size 128
7 Epoch 100
On the performance metrics for cyber-physical attack… 13115
123
Acta Wasaensia 121
R2L and U2R attacks whereas for DoS and Probe attacks
the number of samples is sufficient to obtain the desired
training accuracy which improves the test accuracy.
The performance of the proposed model is compared
with existing deep learning techniques like Convolutional
Neural Network (CNN), Gated Recurrent Unit (GRU), and
Long short-term memory (LSTM) based intrusion detec-
tion models. The performance of non-concatenated models
is included in the experimental analysis. The results of
CNN and GRU are considered as the results of non-con-
catenated systems as the results are obtained separately by
applying the models without any feature fusion. Results
clearly depict that the non-concatenated CNN and GRU
model exhibits less performance than the proposed model
for all the parameters.
The parameters like precision and detection rate (Recall)
are considered for analysis, and it is depicted in Figs. 7 and
8. It is observed from the results the performance of the
proposed model is better compared to other models. The
maximum precision and detection rate attained by the
proposed model indicates that all the normal and abnormal
activities in the network are detected effectively. The
average precision attained by the proposed model consid-
ering all the normal and attack categories is 93.6% whereas
LSTM attains 89.1% and GRU attains 85.8% and 84%
attained by the CNN-based detection model.
Based on the precision and detection rate values
obtained in the previous analysis, the performance is
measured in terms of F1-score and depicted in Fig. 9. The
maximum F1-score attained by the proposed model indi-
cates the maximum detection performance compared to
existing deep learning methods. The overall accuracy based
on the above parameters is calculated for the proposed
model and existing models and depicted in Fig. 10. It is
observed that the maximum accuracy is attained by the
proposed model. 98.84% is the acquired detection accuracy
of the proposed model whereas the existing techniques like
LSTM, GRU, and CNN attain accuracy of 94.11%,
96.65%, and 97.07%, respectively. Due to efficient feature
selection and concatenated process, the proposed model
exhibits maximum accuracy compared to other models.
From the results, it can be observed that the proposed
model efficiently detects attacks on the network and pro-
vides better detection rate and accuracy. The results were
Fig. 4 Training loss vs Epochs
Table 3 Confusion matrix
Predicted class
T
ru
e 
C
la
ss
Normal 13358 1 2 1 1 99.78%
Dos 2 9204 0 0 0 99.94%
Probe 5 0 2380 0 0 99.52%
R2L 11 0 0 200 0 90.11%
U2R 2 0 0 0 4 50%
99.70% 99.73% 98.84% 99.01% 66.78%
Normal DoS Probe R2L U2R
Fig. 5 Performance evaluation of proposed model
Table 4 Confusion matrix
Predicted class
T
ru
e 
C
la
ss
Normal 26814 8 9 7 2 99.84%
Dos 0 18404 0 0 0 99.98%
Probe 15 0 4534 0 0 99.32%
R2L 44 0 0 310 0 78.54%
U2R 5 0 0 0 12 49%
99.54% 99.89% 98.94% 95.84% 66.74%
Normal DoS Probe R2L U2R
13116 S. Y. Diaba et al.
123
122 Acta Wasaensia 
obtained for the standard dataset and the same performance
can be expected in real-time smart grid data. The compu-
tational complexity of the proposed model is slightly above
the mark than the existing techniques due to the initial
parameters for two models used in the setup. However, for
an industrial system intrusion detection model the proposed
model is sufficient to detect the attacks efficiently. There
may be a slight change in the detection performance due to
environmental and system changes, which is the minor
limitation of this paper.
5 Conclusion
This paper presents an efficient intrusion detection system
for cyber-physical attack detection in the smart grid
metering infrastructure. Cyber physical attacks on the
SCADA systems are considered for the smart grid metering
infrastructure and various types of attacks are identified.
The attack types are related to standard benchmark dataset
types and evaluation is performed to avoid real-time
computational complexities. The NSL-KDD dataset is used
for experimentation and the performance is evaluated in
terms of accuracy, detection rate, precision, and false
positive rate. Existing methods such as CNN, GRU, LSTM
are compared with the proposed model and the results
clearly demonstrate that the performance of the proposed
model is superior to others. Further, this paper can be
improved by focusing on the other parameters related to the
grid environment.
Acknowledgements Sayawu Yakubu Diaba would like to thank the
Evald and Hilda Nissi Foundation for awarding me scholarship.
Fig. 6 Performance evaluation of proposed model
Fig. 7 Precision analysis
Fig. 8 Detection rate analysis
Fig. 9 F1-Score analysis
Fig. 10 Comparative analysis of Accuracy
On the performance metrics for cyber-physical attack… 13117
123
Acta Wasaensia 123
Funding Open Access funding provided by University of Vaasa
(UVA). No funding was received to assist with the preparation of this
manuscript.
Declarations
Conflict of interests The authors declare that they have no conflict of
interest to declare.
Ethical approval This article does not contain any studies with human
participants or animals performed by any of the authors.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate
if changes were made. The images or other third party material in this
article are included in the article’s Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not
included in the article’s Creative Commons licence and your intended
use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit http://creativecommons.
org/licenses/by/4.0/.
References
Co´mbita LF, Ca´rdenas A´A, Quijano N (2019) Mitigating sensor
attacks against industrial control systems. IEEE Access
7:92444–92455
Dhaya R (2021) Light weight CNN based robust image watermarking
scheme for security. J Inf Technol Digital World 3(2):118–132
Elnour M, Meskin N, Khan K, Jain R (2020) A Dual-isolation-forests-
based attack detection framework for industrial control systems.
IEEE Access 8:36639–36651
Feng C, Li T, Chana D (2017) Multi-level Anomaly Detection in
Industrial Control Systems via Package Signatures and LSTM
Networks. In: 47th Annual IEEE/IFIP International Conference
on Dependable Systems and Networks (DSN), pp 261–272
Ghosh S, Sampalli S (2019) A survey of security in SCADA
networks: current issues and future challenges. IEEE Access
7:135812–135831
Gumaei A, Hassan MM, Huda S, Hassan MdR, Camacho D, Ser JD,
Fortino G (2020) A robust cyberattack detection approach using
optimal features of SCADA power systems in smart grids. Appl
Soft Comput 96:106658
Haoxiang W, Smys S (2021) A survey on digital fraud risk control
management by automatic case management system. J Electr
Eng Autom 3(1):1–14
Homay A, Chrysoulas C, El Boudani B, Sousa MD, Wollschlaeger M
(2020) A security and authentication layer for SCADA/DCS
applications. Microprocess Microsyst 6:103479
Hu Y, Sun Y, Wang Y, Wang Z (2019) An enhanced multi-stage
semantic attack against industrial control systems. IEEE Access
7:156871–156882
Jacob IJ, EbbyDarney P (2021) Design of deep learning algorithm for
IoT application by image based recognition. J ISMAC
3(3):276–290
Kalech M (2019) Cyber-attack detection in SCADA systems using
temporal pattern recognition techniques. Comput Secur
84:225–238
Keshk M, Sitnikova E, Moustafa N, Hu J, Khalil I (2021) An
integrated framework for privacy-preserving based anomaly
detection for cyber-physical systems. IEEE Trans Sustain
Comput 6(1):66–79
Khan IA, Pi D, Khan ZU, Hussain Y, Nawaz A (2019) HML-IDS: a
hybrid-multilevel anomaly prediction approach for intrusion
detection in SCADA systems. IEEE Access 7:89507–89521
Pang Y, Xia H, Grimble MJ (2020) Resilient nonlinear control for
attacked cyber-physical systems. IEEE Trans Syst, Man, Cybern:
Syst 50(6):2129–2138
Paridari K, O’Mahony N, El-Din Mady A, Chabukswar R, Boubekeur
M, Sandberg H (2018) A framework for attack-resilient indus-
trial control systems: attack detection and controller reconfigu-
ration. Proceedings of the IEEE 106(1):113–128
Qian J, Du X, Chen B, Qu B, Zeng K, Liu J (2020) Cyber-physical
integrated intrusion detection scheme in SCADA system of
process manufacturing industry. IEEE Access 8:147471–147481
Rakas SVB, Stojanovic´ MD, Markovic´-Petrovic´ JD (2020) A Review
of research work on network-based SCADA intrusion detection
systems. IEEE Access 8:93083–93108
Rodofile NR, Radke K, Foo E (2019) Extending the cyber-attack
landscape for SCADA-based critical infrastructure. Int J Crit
Infrastruct Prot 25:14–35
Sheng C, Yao Y, Fu Q, Yang W (2021) A cyber-physical model for
SCADA system and its intrusion detection. Computer Netw
185:107677
Smys S, Vijesh Joe C (2021) Metric routing protocol for detecting
untrustworthy nodes for packet transmission. J Inf Technol
3(2):67–76
Sun Q, Zhang K, Shi Y (2020) Resilient model predictive control of
cyber-physical systems under DoS attacks. IEEE Trans Industr
Inf 16(7):4920–4927
Wang C, Wang B, Liu H, Qu H (2020) Anomaly detection for
industrial control system based on autoencoder neural network.
Wireless Commun Mobile Comput 2020:3
Zhang F, Kodituwakku HADE, Hines JW, Coble J (2019) Multilayer
data-driven cyber-attack detection system for industrial control
systems based on network, system, and process data. IEEE Trans
Industr Inf 15(7):4362–4369
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
13118 S. Y. Diaba et al.
123
124 Acta Wasaensia 
 
 
    
Publication II 
Acta Wasaensia 125
Neural Networks 159 (2023) 175184
Contents lists available at ScienceDirect
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
Proposed algorithm for smart grid DDoS detection based on deep
learning
Sayawu Yakubu Diaba ∗, Mohammed Elmusrati
Department of Telecommunication Engineering, School of Technology and Innovations, University of Vaasa, Vaasa, Finland
a r t i c l e i n f o
Article history:
Received 16 August 2022
Received in revised form 27 October 2022
Accepted 14 December 2022
Available online 21 December 2022
Keywords:
State estimation
Smart grid
Distributed denial of service
Intrusion detection
Gated recurrent unit
Convolutional neural network
a b s t r a c t
The Smart Grid's objective is to increase the electric grid's dependability, security, and efficiency
through extensive digital information and control technology deployment. As a result, it is necessary
to apply real-time analysis and state estimation-based techniques to ensure efficient controls are
implemented correctly. These systems are vulnerable to cyber-attacks, posing significant risks to
the Smart Grid's overall availability due to their reliance on communication technology. Therefore,
effective intrusion detection algorithms are required to mitigate such attacks. In dealing with these
uncertainties, we propose a hybrid deep learning algorithm that focuses on Distributed Denial of
Service attacks on the communication infrastructure of the Smart Grid. The proposed algorithm is
hybridized by the Convolutional Neural Network and the Gated Recurrent Unit algorithms. Simulations
are done using a benchmark cyber security dataset of the Canadian Institute of Cybersecurity Intrusion
Detection System. According to the simulation results, the proposed algorithm outperforms the current
intrusion detection algorithms, with an overall accuracy rate of 99.7%.
' 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
1. Introduction
The modernized grid enables a two-way flow of electricity and
information while providing efficient, dependable, computerized,
and decentralized energy distribution. The Supervisory Control
and Data Acquisition (SCADA) Master Terminal Unit (MTU) and
the Intelligent Electronic Devices (IED) on the electric network es-
tablish communication. The Remote Terminal Units (RTUs), Pha-
sor Measurement Unit (PMU), Micro Phasor Measurement Unit
(mPMU), and Programmable Logic Controls (PLC) mounted at
various locations on the electric network provide telemetry data
to the SCADA's server (Oyewole & Jayaweera, 2020). Electric
utilities all around the world use various SCADA protocols to
communicate between IEDs on the network and control center
applications using different SCADA protocols, such as Interna-
tional Electrotechnical Commission (IEC) 61850, Modbus, and Dis-
tributed Network Protocol 3 (DNP3) (Mohan, Ravikumar, & Govin-
darasu, 2020). With these SCADA protocols, parameters are mea-
sured, processes are monitored, and operations are controlled us-
ing measurement and control systems (Yohanandhan, Elavarasan,
Manoharan, & Mihet-Popa, 2020), which are frequently utilized in
operational technology (OT) such as Smart Grid.
The SCADA system in the context of the electric network is
a crucial infrastructure made up of computer-based networked
∗ Corresponding author.
E-mail address: sdiaba@uwasa.fi (S.Y. Diaba).
systems that exchange important data across networks. Such
systems are vulnerable to intrusion attacks owing to the ex-
tensive use of information technology (Liu, Li, Shuai, & Wen,
2017). Therefore, one crucial task is to evaluate the system se-
curity by considering the probable attack that could be launched
by network intruders from the communication network lateral.
Knowing the system security valuation would help maintain the
modern electric infrastructure's security and operational stabil-
ity (Fu et al., 2019).
Intrusion detection is an approach to identifying attacks before
or after gaining access to a secure network. Incorporating this
approach into the gateway is the quickest way to integrate it
with an IEC61850-based network. Even though attack detection
and self-healing are not specified in IEC 61850, a specific tech-
nique like Intrusion Detection System (IDS) may be employed
within the grid to support IEC 61850's security (Elgargouri, Vir-
rankoski, & Elmusrati, 2015). As machine-to-machine (m2 m),
and human-machine-interface (HMI) connectivity increases, the
potential hostile threats in the electric infrastructure become
prevalent. The IDS is essential for monitoring Smart Grid security
and situational awareness (Hu, Yan, & Liu, 2020; Ullah & Mah-
moud, 2017). Likewise, the transmission of data via the radio
medium which represents the fundamental pillar by which all
devices in the Smart Grid network communicate has become
prone to cyber-attack. Due to the interconnectivity (Chen, Zhang,
Liu, & Tang, 2018) of the various technologies (Attia, Sedjelmaci,
Senouci, & Aglzim, 2015) which was not historically known in the
https://doi.org/10.1016/j.neunet.2022.12.011
0893-6080/' 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
126 Acta Wasaensia 
S.Y. Diaba and M. Elmusrati Neural Networks 159 (2023) 175184
Fig. 1. Depicts a cyberattack on the smart grid.
electric networks. This makes the system vulnerable to intrusion
attacks (Mahmud, Vallakati, Mukherjee, Ranganathan, & Nejad-
pak, 2015), which can result in significant financial losses (Gao,
Li, Jiang, Li, & Quan, 2020; Jiang, Xu, Zhang, Hong, & Cai, 2020)
but, more crucially, put public safety at risk. The risk is increased
when new connections are added to such critical infrastructures.
Therefore, a high-priority area of study in the realm of cyber
security is intrusion detection in the SCADA network of a Smart
Grid (Hosseinzadehtaher, Khan, Shadm, & Abu-Rub, 2020; Xu,
2020) (see Fig. 1).
On the other hand, distributed generation (DG) has been the
means to shift toward renewable energy sources (RES). Estab-
lishing DG at various points of an existing network affects the
primary contour of the electric network. This causes alterations
in voltage and current at different nodal points and also increases
the points of entry into the electric network (de Figueiredo,
Ferst, & Denardin, 2019). The total Smart Grid's communication
technologies and supporting infrastructure are directly impacted
by the scale of the electrical network (Talha & Ray, 2016).
Looking at the shortcomings of the current Smart Grids com-
munication mechanisms has inspired several researchers to ex-
plore cyber risks to Smart Grids. We propose an algorithm for
detecting Distributed Denial of Service (DDoS) in Smart Grid in
response to the aforementioned facts. The DDoS includes bom-
barding a target with a large volume of data and internet traffic,
typically with the aid of a network of compromised machines. The
following summarizes this paper:
• To identify DDoS attacks we propose an algorithm hy-
bridized by a Convolutional Neural Network (CNN) and a
Gated Recurrent Unit (GRU) for DDoS attacks in the cyber
physical system of the Smart Grid.
• Utilizing benchmark datasets from the Canadian Institute of
Cybersecurity Intrusion Detection System (CICIDS2017), in-
depth simulation studies are presented. Comparative analy-
ses are drawn and the proposed algorithm performed better
in comparison to other state-of-the-art algorithms with a
99.7% accuracy and 99.9% detection rate.
The remainder of the paper is organized as follows. A review of
the literature is in Section 2. Presenting the proposed hybrid algo-
rithm is given in Section 3. The proposed algorithm's performance
is compared to current algorithms in simulations described in
Section 4. Finally, concluding observations are made in Section 5.
2. Related studies
Communication networks' reliability, confidentiality, and in-
tegrity are just a few of the difficulties involved in protect-
ing sensitive infrastructure, such as Smart Grid. To protect this
crucial infrastructure, the Smart Grid requires a security strat-
egy. It is necessary to meet requirements for data authentica-
tion, confidentiality and integrity assurance, and other security-
related issues (Subasi et al., 2018). Owing to the above-stated
reasons, researchers have evaluated intrusion detection in the
cyberphysical of the Smart Grid from different perspectives. For
example, Li et al. suggested various monitoring measures to track
suspicious branch flow changes and abnormal load deviations.
Two-stage approaches are suggested to identify false data in-
jection (FDI) attacks. The article introduces the FDI cyber-attack
to investigate the impact of FDI attacks on system reliability (Li
& Hedman, 2020). The alert system with the developed unique
metrics serves as the foundation for the suggested FDI detection
approach.
A customized firewall model SCADAWall was proposed to
address the limitations of the traditional firewall system in pro-
tecting the SCADA networks (Li, Guo, Zhou, Zhou, & Wong, 2019).
The traditional SCADA systems were working in the principle of
deep packet inspection that was designed to inspect the pay-
load contents in the communication. A proprietary industrial
protocols extension algorithm and an out-of-sequence detection
algorithm were added to the SCADAWall to improve its abil-
ity to identify abnormal changes in industrial operations. The
experimental analysis indicates that the SCADAWall framework
is effective in the detection process by maintaining the latency
parameters of the SCADA system (Li et al., 2019). A testbed model
was developed for SCADA systems (Almgren, 2018) to confirm
the effectiveness of the suggested algorithms in a real-time sce-
nario. The virtual model is equipped with an energy management
model monitored by a SCADA system. The testbed was created
to give various real-world scenarios like attack generation and
defense algorithms. An anomaly-based method was created to
detect malicious packet movement in the SCADA network (Singh,
Ebrahem, & Govindarasu, 2018). The experimental work indi-
cates a better latency and detection rate. The rule-based intru-
sion detection system presented in Yang et al. (2013) employs a
deep packet inspection technique and was designed specifically
for SCADA systems. It also contains signature-based and model-
based techniques. The suggested signature-based rules are capa-
ble of correctly identifying several known suspicious or malicious
assaults.
An algorithm was made to address the SCADA system's Dy-
namic Link Library (DLL) injection attack (Lee & Hong, 2020).
The model utilizes the Windows Application Programming In-
terface (API) function that verifies the changes in the DLL load
and enables the diversion algorithm when an attack is detected.
A security layer was structured between the physical and link
layer of the SCADA system to overcome the issues observed from
the existing firewall and authentication mechanisms (Cherifi &
176
Acta Wasaensia 127
S.Y. Diaba and M. Elmusrati Neural Networks 159 (2023) 175184
Hamami, 2018). An IEC 60870−5−101 communication protocol
was employed in the work and that is un-routable by the intru-
sion algorithms. The simulation implementation of the security
layer protection in an electrical substation testbed indicates a
satisfactory performance over the previous models.
An analysis was performed to identify the effectiveness of
artificial intelligence (AI)-based techniques in detecting Denial
of Service (DoS) attacks in SCADA systems (Aldossary, Ali, &
Alasaadi, 2021). The experimental result indicates that a model
developed as Bidirectional Long Short-Term Memory (Bi-LSTM)
was capable of detecting intrusions against the other methods. To
identify intrusion detection and DoS in the smart meter, a cyber
physical monitoring system was proposed (Sun, Guan, Liu, & Liu,
2013). The idea is predicated on the informational fusion of online
occurrences and objective data. The test shows that by linking the
cyber and physical signals, the model successfully detects threats.
A temporal pattern recognition technique was proposed to ob-
serve the cyber-attack intrusions in the SCADA systems (Kalech,
2019). The technique was also designed to monitor the abnormal
changes in the operation of the connected system. This was
achieved by implementing the model with a hidden Markov
model and the artificial neural network (ANN) algorithm. The ef-
fectiveness of the proposed model was verified with simulations
and real-time scenarios with five different feature extraction
strategies and the approach that was implemented with the time
feature extraction model was found satisfactory.
A C4.5 decision tree algorithm was proposed to give a secu-
rity model over the SCADA system implemented in gas and oil
plants. The performance analysis of the proposed model explores
a betterment in handling large-scale distributed attacks in the
SCADA setup (Yang, Liu, & Zhang, 2019). A SCADA network attack
detection technique was developed with a random forest algo-
rithm and its attainments were compared over the support vector
machine (SVM). It indicates a 96.47% of f1 score on detecting the
DoS attacks (Lopez Perez, Adamsky, Soua, & Engel, 2018). The per-
formances of the decision tree and K-nearest neighbor algorithms
(KNN) were analyzed on cyber security identification. The exper-
imental work was performed with three different cybersecurity
datasets. The work findings found satisfactory results with a fine
tree and weighted KNN (Ahakonye, Nwakanma, Lee, & Kim, 2021).
A DDoS attack detection approach on the SCADA system was
performed with J48, Naïve Bayes, and random forest algorithms.
The experimental work utilizes the KDDCUP99 dataset for the
analysis and was found satisfied with the accuracy rate of 99.99%
in the random forest algorithm (Alhaidari & AL-Dahasi, 2019).
For Software Defined Networking (SDN), the authors of
Fouladi, Ermi³, and Anarim (2022) provided a DDoS attack de-
tection and countermeasure technique based on discrete wavelet
transform and auto-encoder neural network. In the suggested
method, wavelet transform was used to extract statistical features
that are then processed by an auto-encoder neural network to
identify samples of DDoS attacks. In order to effectively resist
DDoS attacks, a novel feature selection-whale optimization al-
gorithm deep neural network approach is presented in Agarwal,
Khari, and Singh (2021). The usual data are homomorphically
encrypted and safely stored in the cloud to increase the security
of the proposed paradigm. A 95.35% accuracy in detecting DDoS
attacks was shown by simulation results. A swarm intelligence
technique was developed to identify the optimum features for
making a good accuracy rate in the intrusion detection system
process. An Aquila optimizer model was also employed in the
work after the feature selection process for assigning desirable
weights to the extracted features. The work offered a reasonable
result when implemented with a CNN classifier with a parti-
cle swarm optimization model (Fatani, Dahou, Al-qaness, Lu, &
Abd Elaziz, 2022).
Concerning internet-based computer network attacks, a neu-
ral network-based intrusion detection method is presented in
Shum and Malki (2008). IDS were developed to foresee and
stop potential attacks. To find and forecast anomalous system
behavior, neural networks were used. The study specifically used
feedforward neural networks with the back-propagation training
algorithm. The experimental outcomes utilizing real data demon-
strated positive outcomes for neural-network-based IDS. In Peng,
Kong, Peng, Li, and Wang (2019), a deep learning-based technique
for network intrusion detection is presented. In the model, net-
work monitoring data features are extracted using deep neural
networks, and intrusion types are classified at the top-level using
back propagation neural networks. The KDDCUP99 dataset from
the Massachusetts Institute of Technology's Lincoln Laboratory
was used to validate the approach. The findings indicate that
the proposed method meaningfully outperforms the accuracy of
conventional machine learning. In Hai-He (2018) the authors
proposed an IDS based on the improved neural network where
feature extracting was carried out using the adaptive weighted
control method. The model showed higher accuracy using a back
propagation neural network for classification and detection. How-
ever, the back propagation neural network algorithm proposed in
Jaiganesh, Sumathi, and Mangayarkarasi (2013) with the primary
duty of detecting threats to the resources demonstrated a poor
attack detection rate.
To categorize network threats, the study in Lin, Lin, Wang,
Wu, and Tsai (2018) concentrated on network intrusion detec-
tion utilizing CNNs based on LeNet-5. The experiment's find-
ings indicate that with samples larger than 10,000, intrusion
detection prediction accuracy increases and gains overall accu-
racy of 97.53%. The authors of Khan, Zhang, Alazab, and Ku-
mar (2019) offer a network intrusion detection approach using
CNN. The approach is intended to efficiently categorize intrusion
data by automatically extracting useful features from intrusion
samples. An automated vision-based android malware detection
algorithm was proposed with a fine-tuned CNN algorithm. The
byte codes extracted from the various malware devices are col-
lected in the work for training the classifiers. The experimental
work attains an accuracy of 99.4% and 98.05% on both balance
and imbalanced datasets (Almomani, Alkhayer, & El-Shafai, 2022).
In the blockchain-based energy network, Ferrag and Maglaras
(2019) presented a learning-based method to identify network
threats and fraudulent transactions. The suggested system gen-
erates blocks using short signatures and hash functions to thwart
Smart Grid attacks.
Peng (2020) propose a hybrid CNN-based intrusion detection
approach. The hybrid deep learning network structure extracts
and encapsulates the features of unfamiliar malicious behavior
as well as more complex structure aspects of the full network
traffic matrix, in contrast to the typical machine learning ap-
proach. In the network traffic matrix, a CNN first extracts the
correlation between several features. Then, by using a Recurrent
Neural Network (RNN) to fully mine the temporal and spatial
features of the entire network traffic matrix, the accuracy of the
intrusion detection model is boosted. Al-Emadi, Al-Mohannadi,
and Al-Senaid (2020) developed an intelligent detection system
that can recognize various network intrusions using deep learning
approaches, specifically CNN and RNN. The authors compared the
results of the offered solution and evaluated the performance of
the proposed solution using several evaluation matrices to select
the best model for the network IDS. Koutsandria et al. suggested a
hybrid control paradigm that constantly tracks and examines the
network traffic that is transferred inside the physical system. It
detects communication patterns that diverge from expectations
or physical constraints that can put the system in a dangerous
mode of operation. The simulations show that, by utilizing data
177
128 Acta Wasaensia 
S.Y. Diaba and M. Elmusrati Neural Networks 159 (2023) 175184
on the physical component of the power system, the paradigm is
capable of identifying a wide variety of attack scenarios intended
to compromise the physical process (Koutsandria et al., 2014).
In Vijayanand, Devaraj, and Kannapiran (2019) a unique attack
detection system that uses deep learning algorithms to detect
attacks by carefully examining smart meter communications is
presented. To detect cyber-attacks accurately, the attack detec-
tion system uses several multi-layer deep algorithms that are
set up in a hierarchical order. In Farrukh, Ahmad, Khan, and
Elavarasan (2021) the authors proposed a two-layer hierarchical
machine learning model with 95.44% accuracy in detecting cyber-
attacks. Using the model's first layer, the two modes of operation,
normal state and cyberattack are identified. The authors of Zhao,
Chen, and Luo (2011) suggested a methodology incorporating
real-time neural network training and expert system detection to
improve detection accuracy. The model employs neural networks
to detect and converts pattern recognition into numerical calcula-
tion to speed up the detection rate. The state is divided into many
categories of cyberattacks using the second layer.
In our humble opinion, as so many articles have consisted
of IDS in power systems annals with little reference to the hy-
bridization milieu, a revisit of that background could yield a
novelty. This paper seeks to present one.
3. System model
Fig. 2 shows the proposed hybrid deep learning algorithm.
In our earlier study (Diaba, Shafie-khah, & Elmusrati, 2022), this
algorithm was tested using the Network Security Laboratory-
Knowledge Discovery and Data Mining (NSL-KDD99) dataset, and
the results were compared with CNN, GRU, and LSTM algorithms.
The algorithm performed better in terms of accuracy, detection
rate, precision, and force positive rate (FPR). However, Elmrabit,
Zhou, Li, and Zhou (2020) argued that the NSL-KDD99 dataset
had expired. Since the network traffic in that dataset was es-
tablished in 1998, the authors claimed that it is impossible for
it to accurately reflect the most recent network topologies and
attack dynamics. We, therefore, seek to apply the CICIDS2017
cyber security dataset to the algorithm because of the presence
of a large variety of up-to-date attack scenarios in the dataset,
which satisfy real-world requirements.
The proposed IDS integrates a CNN model and a GRU model.
It is believed that CNN is effective at capturing position-invariant
characteristics, thus the choice. The GRU module collects the
long-dependence features and uses memory cells to extract key
information from the previous data. The reset gate is employed
to erase or eliminate pointless data. These influenced the decision
to use the GRU model (Aldossary et al., 2021). Three GRU blocks
and four CNN blocks are mounted in the algorithmic architecture
to deepen the network (Huang, Li, Deng, Yu, & Ma, 2022). The
purpose of the convolution layer is to produce a feature map by
separating features from the input data. To capture the feature
mapping, the input data are multiplied by the convolutional
kernel in the convolutional network, which is then activated by
a nonlinear function. The convolution kernel randomly initializes
weights and biases (Liang, Ye, Zhou, & Yang, 2021). After each
CNN layer, a normalization layer and a max-pooling layer are
added. The procedure of obtaining the maximum or average value
for all features within the immediate area is referred to as a
``pooling operation''.
The concatenation layer, where the GRUs output and the CNN
outputs are combined, receives the flattened final output of the
CNN layers. Two completely connected layers are connected after
the concatenation layer. A dropout layer is used after the last fully
connected layer to prevent overfitting. The SoftMax layer con-
nects to the classification layer to map the output to a probability
distribution, which allows the classification layer to predict the
types of labels.
3.1. Deep neural network structure
Artificial neural networks were inspired by research on bio-
logical neural network processing, a type of computer structure.
An artificial neural network is a self-motivated system made up
of highly connected, parallel nonlinear processing components,
units, or nodes that exhibit extremely high levels of computation
efficiency. It can alternatively be viewed as versatile mathemat-
ical structures that can recognize intricate nonlinear correla-
tions between input and output datasets (Suppitaksakul & Saelee,
2009). A typical neural network comprises numerous small, in-
terconnected processes called neurons, each generating a string
of activations with real values. Environmental sensors activate
input neurons, and weighted connections from previously active
neurons excite more neurons (Komyakov, Erbes, & Ivanchenko,
2015; Liang et al., 2021; Schmidhuber, 2015) (see Fig. 3).
3.2. System description
The mathematical formulation of the proposed algorithm con-
siders a features vector  , given as
 = [1; 2; : : : ; n]T (1)
as inputs to the proposed model. The GRU's first layer processes
the data and generates the outputs. The first layer's outputs are
fed into the second layer. Again, the outputs of the second layer
are inputted into the third layer. The final outputs of the GRU
model are achieved by using an activation function. We apply
the most used activation functions, the sigmoid, and the tanh,
respectively, given as in Ismail et al. (2022) and Valdes, Macwan,
and Backes (2016).
s .x/ = 1
1+ e1−x (2)
tanh(x) = 2
1+ e−2x − 1 (3)
Mathematically, the GRU gate ∈{0,1} and thus, the model can be
written mathematically as
&0 = 
(
!0
[
˜ (t−1); x.t/
]+ b0) (4)
&1 = 
(
!1
[
˜ (t−1); x.t/
]+ b1) (5)
where, &0, &1 represent the update gate and the reset gate, re-
spectively. The !0 and !1 are weights functions representing the
update and reset gates in the order given. Correspondingly, the
b0 and b1 represents the bias vectors for reset and update gates.
Where ˜ .t−1/ is the input of the current layer and the output of
the prior layer. The recurrent unit's candidate activation function
is written as
˜ .t/ = tanh (!0 [&0 × ˜ (t−1); x.t/]+ b0) (6)
where, ˜ .t/ represents the candidate activation function, !0 is the
activation functions weight, the bias vector is b0 and x.t/ is the
inputs of the training data. One GRU unit's output is provided as
 .t/ = (.1− 1/× ˜ .t−1/)+ (1 ×  .t/) (7)
where,  .t/ is the output of a single GRU unit. The proposed
algorithm uses a single-dimensional layer with the convolution
operation represented as
h1 = convblock1 ./ (8)
h2 = convblock2 .h1/ (9)
h3 = convblock3 .h2/ (10)
178
Acta Wasaensia 129
S.Y. Diaba and M. Elmusrati Neural Networks 159 (2023) 175184
Fig. 2. Process flow of proposed intrusion-detection system model.
Fig. 3. Illustrations of a feed-forward multilayer perceptron.
h4 = convblock4 .h3/ (11)
The hidden vectors are h1, h2, h3, and h4 respectively. A normal-
ization layer is fixed next to the convolutional layer to speed up
training. By using the pooling layer, the features map is down-
sampled by summarizing the presence of features in patches of
the feature map, hence reducing the dimension of the features.
The main pooling techniques are average and max pooling. The
average pooling determines the average value of the patches of
the features map. The average pooling at the pooling layer is given
as
	 n = Pavg
(
 n−1
)
(12)
where  n represents the pooling layers, and output and  n−1
represent previously acquired values from the convolution layer.
The pooling layers are denoted by n and the flattening layer is
mounted to convert the data into a one-dimensional vector.
L = flatten .h4/ (13)
ct = concate .K ; L/ (14)
Table 1
Dataset considered for the simulation.
Type Total Training set Test set Label
BENIGN 67,343 53874 13469 0
DDoS 45,927 36742 9185 1
The outputs from the GRU's K , and the outputs from the CNN's
L, are concatenated as written in Eq. (14). The normalized expo-
nential function (SoftMax) yˆ :Rct → {0; 1} is written when ct is
greater than 1 as
yˆ(z)i = e
zi∑ct
n=1 ezn
(15)
For i = 1, 2,. . . , ct and z = (z1, z2,. . . ,zc t) ∈ Rct . Where z is
the input vector taken from the ct . The loss function for the
proposed model assessment is the cross-entropy function (Graves
& Schmidhuber, 2005), which is given as
Ep (l) = −1b
n∑
i=1
yi log2 y′i (16)
b is for the batch size given, whilst n represents the training
sample size, the actual value is represented by yi and it is y′i for
the predicted value.
3.3. Description of dataset
The simulation evaluation phase of our proposed model is
carried out using the CICIDS-2017 (Radoglou-Grammatikis & Sa-
rigiannidis, 2018) dataset, specifically, the Friday WorkingHours
Afternoon DDos dataset (Sharafaldin, Habibi, & Ghorbani, 2018)
which is publicly accessible and utilized by related studies in the
cyber security community. The benign and most recent common
attacks such as DDoS are included in the CICIDS-2017 dataset,
which closely reflects data from the actual world. Additionally, it
contains the outcomes of the CICFlowMeter network traffic anal-
ysis with flows categorized according to the source, timestamp,
destination IP addresses, destination ports, protocols, and attacks.
The features present there in the dataset are shown in Table 1.
Cleaning up the data and replacing not a number (NaN) and
infinite fields with the column's mean value are the first steps in
the preprocessing stage. The features are converted to numerical
features and integrated with already-existing numerical features
179
130 Acta Wasaensia 
S.Y. Diaba and M. Elmusrati Neural Networks 159 (2023) 175184
Fig. 4. The correlation heatmap for the employed dataset.
in the dataset. Additionally, the labels in the dataset are numer-
ically processed so that the two labels in the dataset, benign is
represented by 0, and DDoS is represented by 1. The dataset is
equally mapped and normalized in order to lessen the feature
discrepancies. The uniform mapping interval range is [0, 1]. Since
there are no irrelevant characteristics in the dataset and the
dataset contains correlated features as shown in the correlation
matrix in Fig. 4, feature selection was not used in the study.
Therefore, the model's decision-making was influenced by all of
the available features.
The list in Table 1 is the results after the normalization had
been performed on the data set and therefore all the character
features had been converted to their numerical values. Then, the
data set is split into a training set and a testing set in a 70:30
ratio. The training is done using 70 percent of the data, and the
validation and testing are done using the remaining 30 percent
of the data.
The four fundamental characters that make up the confusion
matrix are utilized to specify the classifier's measurement pa-
rameters. They are as follows: True Positive (TP) describes an
algorithm's accurate prediction that is accurate. Also, the True
Negative (TN) designates a truly negative prediction made by
the algorithm that is negative. False Positive (FP) describes sit-
uations where the algorithm predicted a positive class but the
actual class is negative. False Negative (FN) is a label that was
predicted by the algorithm to be negative but is actually posi-
tive. An algorithm's performance measurements are its accuracy,
precision, recall, and f1-score. These scenarios are mathemat-
ically represented as in Albulayhi and Sheldon (2021), Khoei,
Aissou, Hu, and Kaabouch (2021), Peng et al. (2019), Radoglou-
Grammatikis and Sarigiannidis (2018), Sharafaldin et al. (2018)
and Siniosoglou, Radoglou-Grammatikis, Efstathopoulos, Fouliras,
and Sarigiannidis (2021) and written in subsequent equations as
Accuracy = TP + TN
TP + TN + FP + FN (17)
Precision = TP
TP + FP (18)
Recall = TP
TP + FN (19)
F1score = 2(precision× recall)
(precision× recall) (20)
4. Results and analysis
The simulation results of our proposed algorithm are all con-
tained in this section. Figures representing each outcome are
presented step-by-step along with explanations of the findings.
We give a succinct explanation of our proposed algorithm's per-
formance and comparisons to that of some of its main contestants
such as CNN, GRU, and LSTM. The heatmap depicts the correla-
tion matrix between the target variable and the input features,
including the destination port, flow bytes, forward header length,
subflow forward packet, active mean, minimum packet length,
packet length mean, packet length variance, average packet size,
active max, ideal mean, ideal max, etc.
180
Acta Wasaensia 131
S.Y. Diaba and M. Elmusrati Neural Networks 159 (2023) 175184
Fig. 5. Convergence ability of the considered algorithms.
Fig. 6. The confusion matrices..
Table 2
The configuration of the hyperparameters.
Number Parametric Quantity
1. Input layer 78
2. Hidden layer 55
3. Activation function ReLU
4. Iteration limit 1000
5. Cost function Cross entropy
6. Batch size 128
A heat map is a graphic representation of a two-dimensional
tabular representation of multivariate data that is set up as a
matrix. The heat map shows the relationships between several
numerical variables, which can be used to identify patterns and
anomalies. It helps to find characteristics that are best for devel-
oping machine learning models and transforms the correlation
matrix into a color designation. It generates color coding from
the correlation matrix and the correlation matrix shows the rela-
tionships between the variables on a scale from a perfect positive
correlation to a perfect negative correlation with the perfect
positive correlation showing the association between the vari-
ables. Each cell represents a square region of space in a certain
measuring distance, and the colors signify the intensity of the
investigated event that occurred on each mapping cell. A heat
map provides a visual representation of data and facilitates the
understanding of large data sets. A range of values is represented
by various colors in a two-dimensional tabular depiction of the
data.
Further simulations are run with the hyperparameters settings
in Table 2. The proposed algorithm's convergence ability outper-
forms that of the other comparative algorithms. The algorithm's
best validation performance is achieved at 0.018817 at epoch
321. The GRU is the next best-performing algorithm, with the
best validation performance at 0.024437 on the 158th epoch. The
CNN algorithm also outperformed the LSTM algorithm, achiev-
ing the best validation performance of 0.029039 at the 174th
epoch, while the LSTM achieved its best validation performance
of 0.03596 at an epoch of 58 (see Fig. 5).
The confusion matrices of the performance of the algorithms
are depicted in Fig. 6a, b, c, and d. A confusion matrix is used to
evaluate the algorithms based on parameters such as accuracy,
precision, recall, and the false positive rate (Aldossary et al.,
2021).
The error histograms are depicted in Fig. 7 to determine the
error between the predicted and target values. Bins are the ver-
tical bars seen on the graph. The total error range is divided into
20 smaller bins on the x-axis. The Y-axis represents the number
of samples from the input dataset that fall into a given bin. On
the plot, the midpoint bin corresponds to an error of 0.01599, the
height of the bin for the training dataset is below 2 × 104 and the
height of the bin for validation is between a little below 2 × 104
and halfway above 2 × 104. The test dataset is halfway between
2:5×104 and 2 × 105.
In terms of overall accuracy, precision, recall, and f1-score,
Fig. 8 shows how well the proposed algorithm performed against
the other algorithms Simulation results show that the proposed
algorithm achieves accuracy, precision, recall, and f1-score, of
99.7%, 98.1%, 99.9%, and 98.9%, respectively. The GRU achieves an
accuracy of 98.6%, precision of 99.5%, recall of 97.4%, and an f1-
score of 98.5. The accuracy of the CNN is 98.5%, the precision is
99.8, the recall is 97.3% and the f1-score is 98.5%. The LSTM ob-
tains 98.5% accuracy, 99.9% precision, 97% recall, and an f1-score
of 98% FPR. The proposed model outperformed the comparative
algorithms in all categories except the recall category. This is
a result of the algorithm's high value of the false positive (FP).
Since the FP is a denominative factor in determining the recall,
its higher value caused the recall of the proposed algorithm to
drop (see Table 3).
5. Conclusion
Finding vulnerabilities in SCADA networks used by Smart
Grids is a top research objective in the field of cyber secu-
rity. However, it is very challenging to choose an efficient deep
181
132 Acta Wasaensia 
S.Y. Diaba and M. Elmusrati Neural Networks 159 (2023) 175184
Table 3
Comparison of algorithms.
The proposed algorithm is compared to the existing algorithms altogether
Algorithms Detection rate % Precision % F1-score Accuracy % Data Year Reference
ANN 96.18 96.9 96.94 Simulated data 2018 Subasi et al. (2018)
SVM 97.25 97.8 97.8 Simulated data 2018 Subasi et al. (2018)
K-NN 98.05 98.4 98.44 Simulated data 2018 Subasi et al. (2018)
Randon forest 98.67 0.98 98.94 Simulated data 2018 Subasi et al. (2018)
Feed-forward neural
network
90.13 88 87.4 88.2 Power system attack 2021 Aldossary et al. (2021)
Hybrid Deep belief
network GRU
93.5 93.57 93.68 94.14 Power system attack 2021 Aldossary et al. (2021)
Recommended
Bi-LSTMIDS
99.89 95.89 95.94 95.93 Power system attack 2021 Aldossary et al. (2021)
Random forest 99.9 KDDCup'99 2019 Alhaidari and AL-Dahasi (2019)
Naïve Bayes 97.74 KDDCup'99 2019 Alhaidari and AL-Dahasi (2019)
Proposed scheme 100 99.9 99.9 99.9 MAWI and world cup
traffic dataset
2022 Fouladi et al. (2022)
Random forest 94 94 CICDDoS 2019 2021 Khoei et al. (2021)
Naïve Bayes 87 77.1 CICDDoS 2019 2021 Khoei et al. (2021)
KNN 94.4 94.6 CICDDoS 2019 2021 Khoei et al. (2021)
Stacking 96 97.3 CICDDoS 2019 2021 Khoei et al. (2021)
Logistic regression 72.2 72.2 90.7 Distribution substation
operational dataset
2021 Siniosoglou et al. (2021)
Decision tree 99.1 99.1 97.7 Distribution substation
operational dataset
2021 Siniosoglou et al. (2021)
Multi-layer perceptron 73.3 73.3 91.1 Distribution substation
operational dataset
2021 Siniosoglou et al. (2021)
Proposed algorithm 99.9 98.1 98.9 99.7 CICIDSS2017 2022
Fig. 7. Error histograms.
Fig. 8. Overall performance comparison of the considered algorithms.
learning-based intrusion detection algorithm. As a result, we
proposed an algorithm for intrusion detection in Smart Grid, by
hybridizing CNN and GRU algorithms. In evaluating the efficacy
of our proposed algorithm, the accuracy, precision, recall, and f1-
score, are evaluated to strengthen the SCADA system's security
framework and make it more resistant to DDoS attacks. Using the
CICIDSS2017 dataset, we carried out a thorough systematic sim-
ulation using MATLAB 2021a. We used the supervised machine
learning approach after normalizing the data. Results demon-
strate that the proposed algorithm can classify cyberattacks with
a 99.7% accuracy and a detection rate of 99.9%, outperforming the
accuracy and the detection rate of the comparative existing intru-
sion detection techniques. In general, the proposed algorithm can
improve network intrusion detection performance.
Declaration of competing interest
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared
to influence the work reported in this paper.
Data availability
Data will be made available on request.
References
Agarwal, A., Khari, M., & Singh, R. (2021). Detection of DDOS attack using deep
learning model in cloud storage application. Wireless Personal Communication,
http://dx.doi.org/10.1007/s11277-021-08271-z.
182
Acta Wasaensia 133
S.Y. Diaba and M. Elmusrati Neural Networks 159 (2023) 175184
Ahakonye, L. A. C., Nwakanma, C. I., Lee, J. M., & Kim, D. S. (2021). Efficient
classification of enciphered SCADA network traffic in smart factory using
decision tree algorithm. IEEE Access, 9, 154892154901. http://dx.doi.org/10.
1109/ACCESS.2021.3127560.
Al-Emadi, S., Al-Mohannadi, A., & Al-Senaid, F. (2020). Using deep learning
techniques for network intrusion detection. In 2020 IEEE international con-
ference on informatics, IoT, and enabling technologies (ICIoT) (pp. 171176).
http://dx.doi.org/10.1109/ICIoT48696.2020.9089524.
Albulayhi, K., & Sheldon, F. T. (2021). An adaptive deep-ensemble anomaly-based
intrusion detection system for the internet of things. http://dx.doi.org/10.
1109/AIIoT52608.2021.9454168, 0187-0196.
Aldossary, L. A., Ali, M., & Alasaadi, A. (2021). Securing SCADA systems against
cyber-attacks using artificial intelligence. In 2021 international conference on
innovation and intelligence for informatics, computing, and technologies (3ICT)
(pp. 739745). http://dx.doi.org/10.1109/3ICT53449.2021.9581394.
Alhaidari, F. A., & AL-Dahasi, E. M. (2019). New approach to determine ddos
attack patterns on SCADA system using machine learning. In 2019 inter-
national conference on computer and information sciences (pp. 16). http:
//dx.doi.org/10.1109/ICCISci.2019.8716432.
Almgren, M. (2018). Building a national testbed for research and training on
SCADA security (short paper). In 13th international conference, CRITIS 2018,
Kaunas, Lithuania. Springer.
Almomani, I., Alkhayer, A., & El-Shafai, W. (2022). An automated vision-based
deep learning model for efficient detection of android malware attacks. IEEE
Access, 10, 27002720. http://dx.doi.org/10.1109/ACCESS.2022.3140341.
Attia, M., Sedjelmaci, H., Senouci, S. M., & Aglzim, E.-H. (2015). A new
intrusion detection approach against lethal attacks in the smart grid: tem-
poral and spatial based detections. In 2015 global information infrastructure
and networking symposium (pp. 13). http://dx.doi.org/10.1109/GIIS.2015.
7347186.
Chen, X., Zhang, L., Liu, Y., & Tang, C. (2018). Ensemble learning methods
for power system cyber-attack detection. In 2018 IEEE 3rd international
conference on cloud computing and big data analysis (pp. 613616). http:
//dx.doi.org/10.1109/ICCCBDA.2018.8386588.
Cherifi, T., & Hamami, L. (2018). A practical implementation of unconditional
security for the IEC 60780 − 5 − 101 SCADA protocol. International Journal
of Critical Infrastructure Protection, 20, 6884.
de Figueiredo, H. F. M., Ferst, M. K., & Denardin, G. W. (2019). An overview
about detection of cyber-attacks on power SCADA systems. In 2019 IEEE
15th Brazilian power electronics conference and 5th IEEE southern power elec-
tronics conference (COBEP/SPEC) (pp. 16). http://dx.doi.org/10.1109/COBEP/
SPEC44138.2019.9065353.
Diaba, S. Y., Shafie-khah, M., & Elmusrati, M. (2022). On the performance
metrics for cyberphysical attack detection in smart grid. Soft Computing,
http://dx.doi.org/10.1007/s00500-022-06761-1.
Elgargouri, A., Virrankoski, R., & Elmusrati, M. (2015). IEC 61850 based smart
grid security. In 2015 IEEE international conference on industrial technology
(pp. 24612465). http://dx.doi.org/10.1109/ICIT.2015.7125460.
Elmrabit, N., Zhou, F., Li, F., & Zhou, H. (2020). Evaluation of machine learning
algorithms for anomaly detection. In 2020 international conference on cyber
security and protection of digital services (cyber security) (pp. 18). http:
//dx.doi.org/10.1109/CyberSecurity49315.2020.9138871.
Farrukh, Y. A., Ahmad, Z., Khan, I., & Elavarasan, R. M. (2021). A sequential
supervised machine learning approach for cyber attack detection in a smart
grid system. In 2021 north American power symposium (pp. 16). http://dx.
doi.org/10.1109/NAPS52732.2021.9654767.
Fatani, A., Dahou, A., Al-qaness, M. A. A., Lu, S., & Abd Elaziz, M. (2022).
Advanced feature extraction and selection approach using deep learning
and aquila optimizer for IoT intrusion detection system. Sensors, 22, 140.
http://dx.doi.org/10.3390/s22010140.
Ferrag, M. A., & Maglaras, L. (2019). DeepCoin: A novel deep learning
and blockchain-based energy exchange framework for smart grids. IEEE
Transactions on Engineering Management, 67(4), 12851297.
Fouladi, R. F., Ermi³, O., & Anarim, E. (2022). A ddos attack detection and
countermeasure scheme based on DWT and auto-encoder neural network
for SDN. Computer Networks, 214, Article 109140.
Fu, R., Huang, X., Xue, Y., Wu, Y., Tang, Y., & Yue, D. (2019). Security assessment
for cyber physical distribution power system under intrusion attacks. IEEE
Access, 7, 7561575628. http://dx.doi.org/10.1109/ACCESS.2018.2855752.
Gao, J., Li, J., Jiang, H., Li, Y., & Quan, H. (2020). A new detection approach against
attack/intrusion in measurement and control system with fins protocol. In
2020 Chinese automation congress (pp. 36913696). http://dx.doi.org/10.1109/
CAC51589.2020.9327136.
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with
bidirectional LSTM and other neural network architectures. Neural Networks,
18, 56.
Hai-He, T. (2018). Intrusion detection method based on improved neural net-
work. In 2018 international conference on smart grid and electrical automation
(pp. 151154). http://dx.doi.org/10.1109/ICSGEA.2018.00045.
Hosseinzadehtaher, M., Khan, A., Shadm, M. B., & Abu-Rub, H. (2020). Anomaly
detection in distribution power system based on a condition monitoring vec-
tor and ultra- short demand forecasting. In 2020 IEEE CyberPELS (CyberPELS)
(pp. 16). http://dx.doi.org/10.1109/CyberPELS49534.2020.9311534.
Hu, C., Yan, J., & Liu, X. (2020). Adaptive feature boosting of multi-sourced deep
autoencoders for smart grid intrusion detection. In 2020 IEEE power & energy
society general meeting (pp. 15). http://dx.doi.org/10.1109/PESGM41954.
2020.9281934.
Huang, K., Li, S., Deng, W., Yu, Z., & Ma, L. (2022). Structure inference of
networked system with the synergy of deep residual network and fully
connected layer network. Neural Networks, 145.
Ismail, et al. (2022). A machine learning-based classification and prediction
technique for DDoS attacks. IEEE Access, 10, 2144321454. http://dx.doi.org/
10.1109/ACCESS.2022.3152577.
Jaiganesh, V., Sumathi, P., & Mangayarkarasi, S. (2013). An analysis of intrusion
detection system using back propagation neural network. In 2013 interna-
tional conference on information communication and embedded systems (pp.
232236). http://dx.doi.org/10.1109/ICICES.2013.6508202.
Jiang, Y., Xu, A., Zhang, Y., Hong, C., & Cai, X. (2020). Anticipate fault sets
generation methods for cyber physical power system considering cyber-
attacks. In 2020 12th IEEE PES Asia-Pacific power and energy engineering
conference (pp. 15). http://dx.doi.org/10.1109/APPEEC48164.2020.9220404.
Kalech, M. (2019). Cyber-attack detection in SCADA systems using temporal
pattern recognition techniques. Computers & Security, 84, 225238.
Khan, R. U., Zhang, X., Alazab, M., & Kumar, R. (2019). An improved convolu-
tional neural network model for intrusion detection in networks. In 2019
cybersecurity and cyberforensics conference (pp. 7477). http://dx.doi.org/10.
1109/CCC.2019.000-6.
Khoei, T. T., Aissou, G., Hu, W. C., & Kaabouch, N. (2021). Ensemble learning
methods for anomaly intrusion detection system in smart grid. In 2021 IEEE
international conference on electro information technology (pp. 129135). IEEE.
Komyakov, A. A., Erbes, V. V., & Ivanchenko, V. I. (2015). Application of artificial
neural networks for electric load forecasting on railway transport. In 2015
IEEE 15th international conference on environment and electrical engineering
(pp. 4346). http://dx.doi.org/10.1109/EEEIC.2015.7165296.
Koutsandria, G., Muthukumar, V., Parvania, M., Peisert, S., McParl, C., &
Scaglione, A. (2014). A hybrid network IDS for protective digital relays in the
power transmission grid. In 2014 IEEE international conference on smart grid
communications (SmartGridComm) (pp. 908913). http://dx.doi.org/10.1109/
SmartGridComm.2014.7007764.
Lee, J. M., & Hong, S. (2020). Keeping host sanity for security of the SCADA
systems. IEEE Access, 8, 6295462968. http://dx.doi.org/10.1109/ACCESS.2020.
2983179.
Li, D., Guo, H., Zhou, J., Zhou, L., & Wong, J. W. (2019). SCADAWall: A CPI-enabled
firewall model for SCADA security. Computers & Security, [ISSN: 0167-4048]
80, 134154.
Li, X., & Hedman, K. W. (2020). Enhancing power system cyber-security with
systematic two-stage detection strategy. IEEE Transactions on Power Systems,
35(2), 15491561. http://dx.doi.org/10.1109/TPWRS.2019.2942333.
Liang, H., Ye, C., Zhou, Y., & Yang, H. (2021). Anomaly detection based on
edge computing framework for AMI. In 2021 IEEE international conference
on electrical engineering and mechatronics technology (pp. 385390). http:
//dx.doi.org/10.1109/ICEEMT52412.2021.9601888.
Lin, W. H., Lin, H. C., Wang, P., Wu, B. H., & Tsai, J. Y. (2018). Using convolutional
neural networks to network intrusion detection for cyber threats. In 2018
IEEE international conference on applied system invention (pp. 11071110).
http://dx.doi.org/10.1109/ICASI.2018.8394474.
Liu, X., Li, Z., Shuai, Z., & Wen, Y. (2017). Cyber attacks against the economic
operation of power systems: A fast solution. IEEE Transactions on Smart Grid,
8(2), 10231025. http://dx.doi.org/10.1109/TSG.2016.2623983.
Lopez Perez, R., Adamsky, F., Soua, R., & Engel, T. (2018). Machine learning
for reliable network attack detection in SCADA systems. In 2018 17th
IEEE international conference on trust, security and privacy in computing and
communications/ 12th IEEE international conference on big data science and
engineering (TrustCom/BigDataSE) (pp. 633638). http://dx.doi.org/10.1109/
TrustCom/BigDataSE.2018.00094.A.
Mahmud, R., Vallakati, R., Mukherjee, A., Ranganathan, P., & Nejadpak, A. (2015).
A survey on smart grid metering infrastructures: Threats and solutions.
In 2015 IEEE international conference on electro/information technology (pp.
386391). http://dx.doi.org/10.1109/EIT.2015.7293374.
Mohan, S. N., Ravikumar, G., & Govindarasu, M. (2020). Distributed intrusion
detection system using semantic-based rules for SCADA in smart grid. In
2020 IEEE/PES transmission and distribution conference and exposition (T & D)
(pp. 15). http://dx.doi.org/10.1109/TD39804.2020.9299960.
Oyewole, P. A., & Jayaweera, D. (2020). Power system security with cyber-
physical power system operation. IEEE Access, 8, 179970179982. http://dx.
doi.org/10.1109/ACCESS.2020.3028222.
Peng, Y. (2020). Application of convolutional neural network in intrusion de-
tection. In 2020 international conference on advance in ambient computing
and intelligence (pp. 169172). http://dx.doi.org/10.1109/ICAACI50733.2020.
00043.
183
134 Acta Wasaensia 
S.Y. Diaba and M. Elmusrati Neural Networks 159 (2023) 175184
Peng, W., Kong, X., Peng, G., Li, X., & Wang, Z. (2019). Network intrusion
detection based on deep learning. In 2019 international conference on com-
munications, information system and computer engineering (pp. 431435).
http://dx.doi.org/10.1109/CISCE.2019.00102.
Radoglou-Grammatikis, P. I., & Sarigiannidis, P. G. (2018). An anomaly-based
intrusion detection system for the smart grid based on CART decision tree.
In 2018 global information infrastructure and networking symposium (pp. 15).
http://dx.doi.org/10.1109/GIIS.2018.8635743.
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural
Networks, 61.
Sharafaldin, I., Habibi, A. L., & Ghorbani, A. A. (2018). Toward generating a new
intrusion detection dataset and intrusion traffic characterization. In ICISSP.
Shum, J., & Malki, H. A. (2008). Network intrusion detection system using neural
networks. In 2008 fourth international conference on natural computation (pp.
242246). http://dx.doi.org/10.1109/ICNC.2008.900.
Singh, V. K., Ebrahem, H., & Govindarasu, M. (2018). Security evaluation of two
intrusion detection systems in smart grid SCADA environment. In 2018 north
American power symposium (pp. 16). http://dx.doi.org/10.1109/NAPS.2018.
8600548.
Siniosoglou, I., Radoglou-Grammatikis, P., Efstathopoulos, G., Fouliras, P., &
Sarigiannidis, P. (2021). A unified deep learning anomaly detection and
classification approach for smart grid environments. IEEE Transactions on
Network and Service Management, 18(2), 11371151. http://dx.doi.org/10.
1109/TNSM.2021.3078381.
Subasi, A., et al. (2018). Intrusion detection in smart grid using data mining
techniques. In 2018 21st Saudi computer society national computer conference
(pp. 16). http://dx.doi.org/10.1109/NCG.2018.8593124.
Sun, Y., Guan, X., Liu, T., & Liu, Y. (2013). A cyberphysical monitoring system
for attack detection in smart grid. In 2013 IEEE conference on computer
communications workshops (INFOCOM WKSHPS) (pp. 3334). http://dx.doi.org/
10.1109/INFCOMW.2013.6970712.
Suppitaksakul, C., & Saelee, V. (2009). Application of artificial neural networks
for electrical losses estimation in three-phase transformer. In 2009 6th
international conference on electrical engineering/electronics, computer, telecom-
munications and information technology (pp. 248251). http://dx.doi.org/10.
1109/ECTICON.2009.5137002.
Talha, B., & Ray, A. (2016). A framework for MAC layer wireless intrusion detec-
tion & response for smart grid applications. In 2016 IEEE 14th international
conference on industrial informatics (pp. 598605). http://dx.doi.org/10.1109/
INDIN.2016.7819232.
Ullah, I., & Mahmoud, Q. H. (2017). An intrusion detection framework for the
smart grid. In 2017 IEEE 30th Canadian conference on electrical and computer
engineering (pp. 15). http://dx.doi.org/10.1109/CCECE.2017.7946654.
Valdes, A., Macwan, R., & Backes, M. (2016). Anomaly detection in electrical
substation circuits via unsupervised machine learning. In 2016 IEEE 17th
international conference on information reuse and integration (pp. 500505).
http://dx.doi.org/10.1109/IRI.2016.74.
Vijayanand, R., Devaraj, D., & Kannapiran, B. (2019). A novel deep learning based
intrusion detection system for smart meter communication network. In 2019
IEEE international conference on intelligent techniques in control, optimization
and signal processing (pp. 13). http://dx.doi.org/10.1109/INCOS45849.2019.
8951344.
Xu, Y. (2020). A review of cyber security risks of power systems: from static to
dynamic false data attacks. Protection and Control of Modern Power Systems,
5, 19. http://dx.doi.org/10.1186/s41601-020-00164-w.
Yang, L., Liu, J., & Zhang, Y. (2019). An intelligent security defensive model
of SCADA based on multi-agent in oil and gas fields. International Journal
of Pattern Recognition and Artificial Intelligence, 34, http://dx.doi.org/10.1142/
S021800142059003X.
Yang, Y., McLaughlin, K., Littler, T., Sezer, S., Pranggono, B., & Wang, H. F.
(2013). Intrusion detection system for IEC 60870 − 5 − 104 based SCADA
networks. In 2013 IEEE power & energy society general meeting (pp. 15).
http://dx.doi.org/10.1109/PESMG.2013.6672100.
Yohanandhan, R. V., Elavarasan, R. M., Manoharan, P., & Mihet-Popa, L. (2020).
Cyber-physical power system (CPPS): A review on modeling, simulation, and
analysis with cyber security applications. IEEE Access, 8, 151019151064.
http://dx.doi.org/10.1109/ACCESS.2020.3016826.
Zhao, J., Chen, M., & Luo, Q. (2011). Research of intrusion detection system
based on neural networks. In 2011 IEEE 3rd international conference on
communication software and networks (pp. 174178). http://dx.doi.org/10.
1109/ICCSN.2011.6013688.
184
Acta Wasaensia 135
 
 
    
Publication III 
136 Acta Wasaensia 
Received 22 January 2023, accepted 14 February 2023, date of publication 22 February 2023, date of current version 27 February 2023.
Digital Object Identifier 10.1109/ACCESS.2023.3247193
Cyber Security in Power Systems Using
Meta-Heuristic and Deep Learning Algorithms
SAYAWU YAKUBU DIABA , (Graduate Student Member, IEEE),
MIADREZA SHAFIE-KHAH, (Senior Member, IEEE),
AND MOHAMMED ELMUSRATI , (Senior Member, IEEE)
School of Technology and Innovations, University of Vaasa, 65200 Vaasa, Finland
Corresponding author: Sayawu Yakubu Diaba (sdiaba@uwasa.)
ABSTRACT Supervisory Control and Data Acquisition system linked to Intelligent Electronic Devices over
a communication network keeps an eye on smart grids' performance and safety. The lack of algorithms
protecting the power system communication protocols makes them vulnerable to cyberattacks, which can
result in a hacker introducing false data into the operational network. This can result in delayed attack
detection, which might harm the infrastructure, cause nancial loss, or even result in fatalities. Similarly,
attackers may be able to feed the system with fake information to hoax the operator and the algorithm into
making bad decisions at crucial moments. This paper attempts to identify and classify such cyber-attacks by
using numerous deep learning algorithms and optimizing the data features with a metaheuristic algorithm.
We proposed a Restricted Boltzmann Machine-based nature-inspired articial root foraging optimization
algorithm. Using a publicly available dataset produced in Mississippi State University's Oak Ridge National
Laboratory, simulations are run on the Jupiter Notebook. Traditional supervised machine learning algorithms
like Articial Neural Networks, Convolutional Neural Networks, and Support VectorMachines aremeasured
with the proposed algorithm to demonstrate the effectiveness of the algorithms. Simulations show that the
proposed algorithm produced superior results, with an accuracy of 97.8% for binary classication, 95.6%
for three-class classication, and 94.3% for multi-class classication. Thereby outperforming its counterpart
algorithms in terms of accuracy, precision, recall, and f1 score.
INDEX TERMS Articial neural network, articial root foraging, cyber security, deep learning, machine
learning, metaheuristic algorithm, restricted Boltzmann machines, supervisory control and data acquisition,
smart grid.
I. INTRODUCTION
The extraordinarily intricate architectural design of the elec-
trical power systems must be handled cautiously and with
the best control strategy feasible to ensure both the protec-
tion of human life and the system's safety [1]. The sys-
tem becomes more complex as the control process must
run more quickly [2]. Automated devices are introduced to
modern power systems to make operating them easier. The
number of pieces of protective equipment that are part of
the system is directly impacted by operational demand and
The associate editor coordinating the review of this manuscript and
approving it for publication was Christos Anagnostopoulos .
consumer count [3]. Recent years have seen the development
of automated systems for connected power module protec-
tion, automation, and control [4]. Protective device perfor-
mances have somewhat improved as a result of developments
in algorithms and power systems architecture [5], [6].
However, the likelihood of security problems increases
as the number of connections to the power system modules
intensies. Hence the quality of control is expected to be in
the higher range for modern power systems. The contempo-
rary power systems are implemented with various Interna-
tional Electrotechnical Commission (IEC) standards [7], [8]
and are generally operated with six signicant components,
as depicted in Figure 1. Generators, transformers, and safety
18660
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 11, 2023
Acta Wasaensia 137
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
FIGURE 1. Component of the power system.
equipment are all part of the power system's electrical com-
ponents. These primary hardware ranges and ratings change
depending on the loads connected to the network. The pro-
tection mechanisms built into the electrical system also differ
depending on the linked equipment's location and nature [9].
The control components include the synchronization model
and operational modules for transmitting the required signal
to the digital modules used for the operation. The power sys-
tem's information and communication devices, which trans-
mit control signals between linked systems and components
across wired or wireless networks, are represented by dig-
ital modules [10]. The convergence network regulates the
power ow in the connected system by analyzing the load
requirement and the power system state. The importance of
the convergence networks increases when the power system
is linked to Distributed Energy Resources (DERs) [11], [12].
The regulatory components ensure that the integration of
power is constantly smooth and efcient.
In order to solve the problems with conventional digital
components, which were designed to have certain limitations,
smart grid power systems were developed. This is achieved
by integrating distributed intelligence algorithms into the
system. The distributed intelligence algorithms swiftly and
efciently support making decisions on the present digital
components [13].
Smart grids, however, have more security concerns due to
the distributed location of the control units. The architecture
of the smart grid power systems includes the following four
layers [14]. Physical Layer: It is identical to the layer found
in every fundamental power system, which consists of a
generation station, transmission lines, and a distribution unit.
Communication Layer: The layer between the user and the
service provider; this layer offers a network that allows for the
discovery of the status of the power system's operation. Sys-
tem Integration Layer: This layer includes the computing and
security infrastructure. It controls the data analytics process
so that the control units can make several decisions. This is
FIGURE 2. The architecture of a smart grid system.
realized by importing a powerful algorithmicmodel. Software
Layer: It enables the service provider to access the power
consumption details from the user side. This layer provides
information about the user and their nature to the system
integration layer for future predictions.
Based on their general characteristics, the four kinds of
cyber security problems for smart grid systems may be clas-
sied. They are issues with, connectivity, trust, privacy, and
software vulnerability [15], [16].
Connectivity:Compared to other physical systems, the sys-
tems that make up the smart grid are more widely distributed.
As a result, the smart grid power system's communica-
tion protocol necessitates constant operation and higher data
transmission rates. The system transfers the data regularly;
it poses numerous security concerns for the models. Trust:
The smart grid systems are open to everyone. Some key
equipment, specically the Automated Meter Infrastructure
(AMI) is situated in the user area. As a result, there is a greater
chance that the system may be interfered with, and this risk
is directly correlated with the user's level of trust, given that
operational costs and other factors are involved. Privacy: The
smart meters connected to the system contain the user's basic
information, which is the most targeted device for intruders.
Software Vulnerabilities: The smart grid systems are mostly
monitored with Supervisory Control and Data Acquisition
(SCADA) computer software. The SCADA system's mod-
ernization, standardization of communication protocols, and
increasing interconnectivity have all contributed to a sharp
rise in cyberattacks on the system over time, rendering it
vulnerable to assault from anywhere around the globe [16].
Hence it is a must to protect the smart grids' SCADA systems
from malicious cyber-attacks and malware disruption.
The aforementioned issues prompted the following goals
for this study, which are as follows:
1) To employ a nature-inspired articial root foraging
optimization algorithm with a Restricted Boltzmann
VOLUME 11, 2023 18661
138 Acta Wasaensia 
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
Machine (RBM), to provide an enhanced algorithm that
reliably detects and classies attack intrusions in the
smart grids' SCADA systems.
2) Enhanced adaptability: Nature-inspired optimization
algorithms are designed to be highly adaptable, and can
be modied or ne-tuned to meet the specic needs of
a particular application. By combining an RBM with a
nature-inspired algorithm, we seek to create a system
that is highly adaptable and able to learn and adapt to
new threats as they arise.
3) Increased efciency: Nature-inspired optimization
algorithms are typically more efcient than traditional
optimization methods, as they are able to explore a
more extensive search space more quickly. By combin-
ing an RBM with a nature-inspired algorithm, we seek
to propose an algorithm that is able to analyze large
amounts of data more quickly and efciently, allowing
for faster and more effective threat detection.
4) Reduced reliance on labelled data: RBMs are capa-
ble of performing unsupervised learning, which means
they can learn from data that is not labelled or cate-
gorized. By combining an RBM with a nature-inspired
algorithm, it is possible to create a system that can learn
from a larger and more diverse dataset, which may be
particularly useful in cases where labelled data is scarce
or difcult to obtain.
5) To demonstrate the performance of the proposed algo-
rithm's efciency to other existing algorithms in terms
of accuracy, precision, recall, and f1 score.
Section I captures the introduction of the paper. Section II
contains background information on related studies and the-
oretical frameworks from the literature. The proposed algo-
rithm is covered in Section III, while the simulation results
are described in Section IV. Section V serves as the paper's
conclusion.
II. RELATED STUDIES
The smart grid protection strategy uses local measures or
external devices to build a smart grid protection system that
is both effective and efcient. However, one of the key issues
is the ability to connect physical and digital components to
suit the conguration of the system. Measurement of data
source authentication system was developed to analyze the
data ow of a power system by extracting the features through
an ensemble empirical mode decomposition model with the
Fast Fourier Transform (FFT) technique. The experiment
was conducted with a back-propagation neural network for
data classication. An accuracy of 80.9% is achieved, and
comparatively, it is better than the traditional long short-term
memory (LSTM) model's accuracy of 77.8% [17]. To train
the neural network algorithms, a sizable dataset is required.
The performance of a neural network algorithm's prediction
process is inuenced by the amount of training data present
in the network. The authors of [18] generated a power sys-
tem dataset based on IEC 61850 Generic Object-Oriented
Substation Event (GOOSE) communication for developing a
reliable cybersecurity system.
The components of the power system are divided into
numerous categories to monitor the load demand in different
areas. Due to environmental conditions, the associated eld
will see variations in demand in particular. The system is
more vulnerable to cyber threats since the scattered devices
are connected through different channels [19]. The testbed-
based power system quality analysis is one of the familiar
methods widely used for observing the response of the power
system in different scenarios. The test bed generates different
kinds of cyber security issues to analyze and formulate a
defending algorithm. An OMNeT++-based simulation tech-
nique was structured [20] to analyze the nature of cyberat-
tacks in a bidirectional communication network. The model
was integrated with Power Systems Computer Aided Design
(PSCAD) for the power simulation.
The physical power systems are open to dynamic data
injection attacks. An example is the ease with which the
energy consumption values on smart meters could be altered.
So, an interval state estimation method was developed to
analyze the possible variations in the readings with respect to
time. A kernel quantile regression is also incorporated in the
work to estimate the uncertainties in renewable and electric
load forecasting applications [21]. The cyberattack on the
Internet of Things (IoT)-based smart grids may affect the
costly and important systems that are connected to the power
system. The hospital equipment and electric train are some of
the costlier andmost needed systems that always depend upon
the quality of the power supply. Therefore, a blockchain-
based technique was equipped with Hilbert-Huang transform
to estimate power quality through the data collected from
voltage and current sensors. The experimental work founds
satised with the performance of the proposed model on false
data injection attacks [22].
The false data injection process can also be observed
by estimating the phasor measurements of the connected
loads. A two-layer defense system was developed [23] to
observe the change in the values of the power system.
The defense resources are optimized in the work with a
zero-sum static game algorithm. It is demonstrated that the
proposed two-layer model is useful for examining false
data injection attacks. Providing cybersecurity to DER, such
as photovoltaic systems (PV), is one of the challenging
tasks in power systems. To accomplish this, the connected
syste's active and reactive power is analyzed along with its
permitted voltage level for transmission. The syste's net-
work topology is used to observe the power changes on
each terminal. The change in the difference in various esti-
mations makes the work to predict the attack output on
its class [24]. A decision-making algorithm was outlined
to estimate the cyberattacks in multi-microgrid systems.
A fuzzy static Bayesian game model was utilized in the work
for predicting the optimal security strategy, and a hybrid
approach based on a fuzzy algorithm was used to reach a
consensus [25].
18662 VOLUME 11, 2023
Acta Wasaensia 139
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
A cybersecurity risk management systemwas developed to
predict attacks in cyber-physical systems. The work analyzes
the criticality of the assets in cyberattacks and their effect
on the output of the system. The attack scenario, control,
and threats are considered in the work for estimations [26].
A stochastic coupling strategy was designed to estimate
the cascading process in cyber-physical systems. This has
been performed by keeping two asymmetric subnetworks for
increasing the accuracy of random and frequent cyberattacks.
The experimental projection indicates a reduced estimation
time for frequent attacks over the randommodels [27]. A deep
reinforcement learning technique was structured to provide
cybersecurity protection on distributed power systems. The
performance of the system was experimented on the IEEE
13-bus model and the simulation results are not found satis-
factory under the greedy attack conditions [28].
When responding to hostile attacks on industrial control
systems, machine learning techniques are particularly accus-
tomed. The results of an experiment using the random forest
and J48 algorithms to identify intrusions in control systems
were found to be good in forecasting cyber-attack behav-
iors [29]. A dimensionality reduction and statistical hypothe-
sis techniques were merged to ensure cybersecurity on smart
grids. A concept drift methodologywas utilized in the work to
observe the differences between the physical grid change and
data manipulation. Experimental work was performed in the
work with and without concept drift and found satisfactory
with the concept drift technique [30]. A physics-informed
spline learning technique was developed to detect anomalies
in power electronic circuits. The experiment was found satis-
factory even when trained with minimal data [31].
The review of the literature looks at the various strategies
developed to address security issues in power systems. The
majority of the systems, however, were created to recognize
the introduction of false data into power systems. This was
accomplished by analyzing the system's typical behavior to
anticipate the system's abnormal response when ctitious
data was injected. Because their analysis is feature-based,
deep learning and machine learning algorithms are quite
good at making these kinds of predictions. In the part that
follows, a feature optimization technique based on a meta-
heuristics algorithm is used to assess the effectiveness of deep
learning-based algorithms to observe security vulnerabilities
in SCADA systems for smart grids.
III. METHODOLOGY
The overall articial root foraging, RMB architecture, and
our variation are all introduced in this section.
A. OVERVIEW OF THE SYSTEM
The proposed model utilizes a nature-inspired articial root
foraging method for optimizing the information collected
through the power systems sensor and data transmitters.
Voltage and power sensors are used to detect the anomaly
of the power system; the abnormality of the power system
is observed and forwarded to the base station through an
FIGURE 3. The workflow of the proposed model.
IoT network. The receiving station tabulates the collected
information and projects the outcome as a database. The
dataset creation process makes the base station verify all the
collected information and separates the readings that came
up with errors and missing information. The dataset creation
process can be limited with respect to time as it may provide
the amount of data to be stored in the database. Figure 3
represents the workow of the proposed model.
B. PREPROCESSING
Preprocessing is the fundamental technique for organizing
the data gathered from the remote terminal unit (RTU) and
other Intelligent Electronic Devices (IED) modules. In this
step, the unstructured and unformatted data are organized to
make the information reliable before it is used in the training
process.
In general, the data can be segregated into two categories:
numerical data and categorical data. The binary format is used
for categorical data, and whole numbers or fractions are used
for numerical data. Information about the power system is
gathered in numerical form for the proposed task.
The quality of the feature extraction mostly depends on
the caliber of the data used in the operation. Therefore, the
paper makes use of the data translation process, data cleaning
process, and data quality assessment process. As previously
shown, the data quality assessment sends the available data
to the data cleaning process while moving the missing data to
the trash. The data cleaning procedure enables the removal of
duplicate data and requires the manual insertion of data when
it is discovered to be abnormal or missing.
VOLUME 11, 2023 18663
140 Acta Wasaensia 
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
C. ARTIFICIAL ROOT FORAGING OPTIMIZATION
1) CLASSICAL PLANT ROOT GROWTH MODEL
The biological root growth optimization algorithm served as
the basis for designing the articial root foraging optimiza-
tion algorithm. A biological plant's primary root advances
toward the ground, while its lateral roots spread outward like
a branch from the main root. Similarly, the lateral roots are
also permitted to develop numerous lateral roots in diverse
directions. While the primary roots are not permitted to do
so, the lateral roots are permitted to form in all directions
with varying degrees of movement. Hence, the articial root
foraging algorithm is also constructed using the conventional
optimization model that is used to predict the growth of
plant roots. Root growth is thought to be hindered by the
nature of the soil, and the main root movement and lateral
root movement are thought to be the best solutions. The
change in direction and length adjustments are regarded as the
ne-tuning parameters for the problems [32]. The following
factors are considered for ideal plant growth, and the same
has been followed in the articial model.
Factor 1: The spatial structure of the roots is heavily inu-
enced by the auxin concentration in the plants. It allows the
root to be automatically structured by observing the problem.
Factor 2:A single root apex advances in the same direction
and can generate children's root apices.
Factor 3: Auxin availability causes the root system to
develop a variety of lateral roots and branches.
Factor 4:Hydrotropism allows the tip of the main root and
lateral roots to move in their respective directions along the
trajectory.
2) AUXIN REGULATION
The auxin concentration is the primary parameter for devel-
oping a new branch count and movement operations [33].
Therefore, the nutrition availability of the soil is formulated
as follows.
fx = tnessx − flowfhigh − flow (1)
Mathematically, the auxin concentration is written as
Ax = fxs∑
y=1
fx
(2)
where the function value is nessx , fx is the normalization
value of the root tness, fhigh and flow represent the current
root population count and s is the population size.
3) STRATEGY ON MAIN ROOT GROWTH
The growing probability of the main root is free from the
probability of branch and re-growing factor. The movement
of the main root depends upon the best individual operation
formulated from its current position [34]. It is mathematically
represented as
I tx = I t−1x + l:":
(
Ilbest − I t−1x
)
(3)
here, I tx implies a new location, I
t−1
x represents the location
of root x. Learning inertia takes l, " is the uniform random
coefcient between 0 and 1 and Ilbest stands for the best
individual from the present location.
4) BRANCHING OPERATOR
The branching operator develops a new individual based
on the root apex estimations. It is predicted by estimating
the available auxin concentration over the threshold value
included in the branch [35]. The number of individuals gen-
erated from the branch is calculated as{
branch individuals wx if Ax > threshold value
stop branching otherwise
(4)
Therefore, the numbers of newly generated apices are esti-
mated from the following equatio
Wx = ":Ax(Bmax − Bmin)+ Bmin (5)
" is the uniform random coefcient between 0 and 1, Ax
is the auxin concentration level at the root. Bmax and Bmin
represent the branching count. The location for developing a
new branch root is predicted from the primary root through
Gaussian distribution N (I tx ; 
2). The standard deviation is
written as
 =
(
xmax − x
xmax
)2
× (ini − n)+ n (6)
where xmax is the maximum iteration, i is the current iteration
index, ini is the initial standard deviation, and n is the nal
standard deviation.
5) LATERAL OR BRANCH ROOT GROWTH
The lateral roots are allowed to conduct a random search on
every feeding state [36], [37]. The length and growing degree
of the lateral roots are changed between each other, and that
can be mathematically projected as
I tx = I t−1x + " .lmaxDi ∗ / (7)
 = i√
Ti × i
(8)
where lmax stands for the maximum length of the lateral root,
Di is the dimension growth direction of the lateral root i,
and  stands for the growth angle formulated with a random
vector i.
6) DEAD ROOT GROWTH SHRINKABLE
The growing process might not be supported by the roots if
they were unable to absorb nutrients. The auxin distribution
evaluates the likelihood that the lateral roots will grow and,
if they do not, they are removed from the main root.
D. RESTRICTED BOLTZMANN MACHINES
The RBM technique was primarily created for regression,
feature learning, and dimensionality reduction applications.
It is a subset of the family of energy-based models, where
18664 VOLUME 11, 2023
Acta Wasaensia 141
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
FIGURE 4. The architecture of the RBM.
each conguration of the relevant variables corresponds to a
training-relevant nite scalar energy value. The RBM algo-
rithms are typically shallow and only use two levels of net-
work connection [38]. As a result of their simplicity, RBMs
are widely used in a variety of applications. The primary
layer of the RBM is represented as the visible layer, and the
second layer is mentioned as the hidden layer. The number
of neural nodes included in the layer varies with respect to
the count of inputs made to the approach and the intercon-
nection between the nodes makes a neurological connection
like a human brain. The RBM connections are very special,
and there the intra-connections are restricted. The node ana-
lyzes the input received by the, computes it, and decides
whether to permit it or not for neighbor node connection [39].
The bipartite interactional graph of the RBM is depicted
in Figure 4.
The feature that the visible layer node collects is denoted
by the letter  and it is passed to the hidden layer by multi-
plying the weighted value w and adding the bias b [40]. The
following expression can be used to describe the outcome
of this operation as an activation function of the supplied
input.
f (( × w)+ b) = a (9)
where f represents the activation function,  is the input, and
w stands for the weights. The bias is represented by b and a
is for the activation function.
The hidden layer activations are considered as input in the
reconstruction step, where the input is given to the hidden
layer. Same as the input path, the reconstruction model also
operates the input with the same multiplication factor. Hence
the output gives a value to the original input. Figures 5 and 6
indicate the input path of an RBM and the reconstruction
model of the RBM, respectively.
Generally, the values of the weights included are assumed
randomly, and presumably, there will always be a huge devia-
tion between the input and output of the RBM. So, theweights
FIGURE 5. The input path of an RBM.
FIGURE 6. The reconstruction of RBM.
are modied continuously to reduce the error observations in
estimating the reconstruction r value. The nodes are designed
to take the low-level feature present in all the attributes avail-
able in the dataset. This paper considers that the RBM has a
total of n visible neurons as v = v1; v2; : : : ; vn and total hid-
den neurons m has hidden neurons as h= h1; h2; : : : ; hm. The
model uses binary values since the study examines the binary
problem (natural or attack) of the existence of anomalies. the
random variable takes the values (v; h) ∈ : {0; 1}m+n. Thus,
the probability distribution according to [41] can be written
as
P(v; h) = 1
z
e−E(v;h) (10)
Z is the partition function. An energy function E(v,h) of the
model can be dened as [42] and [43]
E(v; h)=−
n∑
=1
m∑
=1
whmvn−
n∑
=1
gnvn−
m∑
=1
qmhm (11)
VOLUME 11, 2023 18665
142 Acta Wasaensia 
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
TABLE 1. Description of the employed dataset.
Equation (11) can be re-written as
E(v;h) = gT v− qTh− vTWh (12)
where the considered features for the training process of 
is  ∈ {1; 2; : : : ; n} and  ∈ {1; 2; : : : ;m}. The weight
is denoted by w , gn is the nth feature of the  th input of
the vth visible neurons. Similarly, qm is the mth feature of
the  th input of the hth hidden neuron. Due to the RBM's
bipartite nature, there is no connection between a hidden
neuron and a hidden neuron, just as there is no connection
between a visible neuron and a visible neuron. The model for
conditional independence is described as
p(v;h) = 5m=1p(v |h) (13)
p(v;h) = 5n=1p(v |h) (14)
E. DATA DESCRIPTION
This paper utilizes the power system attack detection dataset
developed by the Oak Ridge national laboratory of Missis-
sippi State University [44]. The dataset is separated into three
types, binary class, three class, and multi-class. It is created
from a single dataset consisting of 15 sets of information from
37 types of power system events. Except for the multi-class
dataset, the details are in CSV format. The content of the
dataset is shown in Table 1.
Figure 7 shows a three-bus two-line transmission sys-
tem modied from the IEEE four-bus three-generator sys-
tem, it explores the architectural view of the test framework
used for the analysis. Despite being a very modest system,
it embodies the core of the broader power system and is
simple enough to be understood in its entirety. The classier
suggested in this work would be used multiple times to mon-
itor different parts of a power system. The framework merges
two generator models consisting of four IEDs, specically,
relays (R1 to R4) for providing a switching operation to
the circuit breakers (Bk1 to Bk4). Each circuit breaker is
connected with a separate IED [44]. Therefore, it trips off
the breaker unit when a real or fake fault is detected in the
circuit. The IEDs are not equipped with any algorithm so far
for analyzing the nature of the fault. Thus, this kind of model
requires a manual operation to re-enable the circuit from its
faulty condition. The major type of faults and attacks that can
happen in a power system model is as follows [45].
FIGURE 7. Overview of the power system framework.
1) FAULTS
a: SHORT CIRCUIT
These kinds of faults may happen in a power system owing
to natural and manual errors at any location. The location
of the fault can be identied by observing the current and
voltage changes in the circuit. A short circuit fault in a power
system occurs when there is an abnormal connection between
two points in the electrical circuit that are not intended to
be connected. This can cause a sudden and large increase in
the ow of electrical current, which can damage or destroy
electrical equipment and pose a risk of injury to personnel.
Short circuit faults can be caused by a variety of factors,
including damaged or faulty electrical components, loose
connections, and the presence of foreign objects or debris
in the electrical circuit. They can also be caused by natural
disasters such as lightning strikes or earthquakes [46].
When a short circuit fault occurs, the electrical system
is designed to automatically detect the fault and interrupt
the ow of current to prevent damage to the equipment and
protect personnel. This is typically done by using protective
devices such as circuit breakers, fuses, and relays, which are
designed to detect abnormal electrical conditions and inter-
rupt the ow of current. It is important to promptly address
short circuit faults in order to minimize the risk of damage
to the electrical system and ensure the safe and reliable
operation of the power system. This may involve identifying
and repairing the root cause of the fault, as well as testing and
inspecting the affected equipment to ensure it is safe to return
to service [46].
b: LINE MAINTENANCE
For the duration of the maintenance period, the relay modules
connected to the power system model are disconnected from
the circuit. These kinds of errors are intentional and are
18666 VOLUME 11, 2023
Acta Wasaensia 143
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
simple to x. Linemaintenance in power systems refers to the
activities that are performed to ensure that transmission and
distribution lines are operating safely and efciently. These
activities can include inspections, repairs, and upgrades of
transmission and distribution lines, as well as the associated
equipment such as transformers, switches, and other electri-
cal components.
Line maintenance is an essential part of the overall oper-
ation and maintenance of a power system, as it helps to
ensure the reliability and safety of the electrical grid. Line
maintenance activities can be performed on both overhead
and underground transmission and distribution lines, and
may involve a range of tasks, such as: Inspecting and test-
ing electrical equipment to identify any potential issues
or problems. Replacing damaged or worn-out components.
Upgrading equipment to improve performance or increase
capacity. Cleaning and maintaining transmission and distri-
bution lines to remove debris and vegetation that could cause
problems. Performing preventive maintenance activities to
prevent potential problems from occurring. Line maintenance
is typically carried out by trained and certied professionals
with the necessary knowledge and skills to work safely on
high-voltage electrical equipment. In some cases, specialized
equipment such as bucket trucks or aerial lifts may be used
to access transmission and distribution lines for maintenance
activities [46].
2) ATTACKS
a: DATA INJECTION ATTACK
A data injection attack in power systems, also known as a
manipulation attack, is a type of cyber-attack that involves
injecting false or malicious data into the control systems of
a power grid. The goal of this type of attack is to disrupt
the normal operation of the power grid and potentially cause
damage to the system.
Data injection attacks can take many different forms, but
they generally involve the attacker injecting false ormalicious
data into the control systems of the power grid to mislead the
operators or cause the system to malfunction. For example,
an attacker might inject false data into the control systems of a
power grid to indicate that there is a fault in the system, when
in fact there is not. This could lead to the operators taking
inappropriate or unnecessary actions to respond to the false
fault, which could potentially cause damage to the power grid.
Data injection attacks can be difcult to detect, as they
often involve the injection of small amounts of false data
into the control systems of the power grid. They can also be
difcult to prevent, as they require a high level of access to
the control systems of the power grid. Power grid operators
need to implement robust cybersecurity measures to protect
against these types of attacks [45].
b: RELAY SETTINGS CHANGE ATTACK
A relay settings change attack in power systems is a type of
cyber-attack that involves altering the settings of protective
relays in the power grid. Protective relays are electrical
devices that are used to automatically detect and respond to
abnormal conditions in the power grid, such as short circuits
or over currents. They are an essential component of the
power grid's protection system, as they help to ensure the
stability and reliability of the grid [45].
In a relay settings change attack, an attacker may attempt
to manipulate the settings of protective relays to disrupt the
regular operation of the power grid. For example, the attacker
may change the settings of the relays so that they do not
respond to certain types of fault conditions, or so that they
respond in a way that is not appropriate for the specic fault
condition. This can lead to widespread power outages and
other disruptions in the power grid [45].
Relay settings change attacks can be challenging to detect,
as they often involve subtle changes to the settings of the
protective relays. They can also be difcult to prevent, as they
require a high level of access to the power grid's control sys-
tems. Power grid operators need to implement robust cyberse-
curity measures to protect against these types of attacks [45].
c: TRIPPING COMMAND INJECTION ATTACK
It is a command kind of attack that makes the relay open
the circuit with a command received from a remote location.
A tripping command injection attack in power systems is a
type of cyber-attack that involves injecting false or malicious
commands into the control systems of a power grid in order
to disrupt the normal operation of the system. The goal of
this type of attack is to cause equipment to trip or shut down,
potentially leading to widespread power outages and other
disruptions in the power grid [45].
In a tripping command injection attack, an attacker may
inject false or malicious commands into the control systems
of the power grid in an effort to cause equipment to trip or
shut down. For example, the attacker might inject a command
to trip a circuit breaker or shut down a generator. This could
lead to widespread power outages and other disruptions in
the power grid. Tripping command injection attacks can be
challenging to detect, as they often involve the injection of
small amounts of false or malicious data into the control
systems of the power grid. They can also be difcult to
prevent, as they require a high level of access to the control
systems of the power grid. To defend against these kinds of
attacks, power grid operators must install strong cybersecu-
rity safeguards [45].
F. DEEP LEARNING PERFORMANCE EVALUATION
METRICS
Deep learning is a type of machine learning that uses deep
neural networks to learn and make predictions or decisions.
The performance metrics used to evaluate the effectiveness
of a deep learning model are similar to those used for other
types of machine learning models. Because of the task under
study and the kind of model being employed, we concentrate
only on the four threshold parameters that the classication
problem's performance metric is dened by
VOLUME 11, 2023 18667
144 Acta Wasaensia 
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
TABLE 2. Confusion matrix for a binary classifier.
Accuracy: This is a common metric for classication
tasks, and it is dened as the number of correct predictions
made by themodel divided by the total number of predictions.
Mathematically represented as [56]
Accuracy = TP+ TN
TP+ TN + FP+ FN (15)
Precision: This metric is used to measure the precision of
a classier, and it is dened as the number of true positive
predictions made by the model divided by the total number
of positive predictions [56].
Precision = TP
TP+ FP (16)
Recall: This metric is used to measure the recall of a classi-
er, and it is dened as the number of true positive predictions
made by the model divided by the number of positive cases
in the dataset [56].
Recall = TP
TP+ FN (17)
F1 score: This is a metric that combines precision and recall,
and it is dened as the harmonic mean of precision and
recall [56].
F1score = 2 .Presicion× Recall/
Presicion+ Recall (18)
Equations (15), (16), (17), and (18) are derived using the
confusion matrix. The confusion matrix is a table that is used
to evaluate the performance of a classier, and it is often used
in conjunction with various performance metrics to provide a
more complete picture of the classier's effectiveness.
IV. EXPERIMENTAL ANALYSIS
The experiment was performed in a Jupyter notebook on a
16GBRAM Intel 7 processor system. The proposedRF-RBM
technique was tested against conventional CNN, ANN, and
SVM algorithms because those were found to be success-
ful models in several intrusion detection studies [37], [49].
In this, the SVM is a machine learning-based technique,
whereas CNN and ANN are deep learning-based techniques.
We utilize the hyperparameters given in Table 3 for the
simulations, and we classify the network intrusion through
different algorithms.
One of the most used neural network algorithms, CNN,
can provide a higher accuracy rate when the training data
samples are plentiful. However, because CNN learns char-
acteristics from a large dataset, preprocessing of the training
data is minimal. Three layers make up a conventional CNN:
a convolution layer, a pooling layer, and a fully connected
TABLE 3. Hyperparameter settings.
layer. The convolution layer is set up to separate the kernel's
learnable parameters from the input data. The kernel claries
to the layer the kind of information that is available [47]
and [49]. Data is forwarded by the kernel to different neurons
in the pooling layer, which lowers the spatial complexity
of the retrieved information in the convolution layer. All of
the CNN's neurons are interconnected in the fully connected
layer with their biases toward comprehending the data that
has been gathered [50].
The ANN is one of the successful models that can mimic
the nature of the human brain. All neurons are interconnected
between them as different layers, just like in a human brain.
The input, output, and hidden layers are the principal layers
of an ANN, and the number of hidden layers can be increased
depending on the demands of a situation. The hidden layer is
used to extract different features and patterns from the input
data, while the input layer is used to provide diverse informa-
tion to the neural network design. Additionally, the hidden
layer applies a bias value to the gathered characteristics to do
an efcient calculation [48], [51].
The SVM is a supervised machine learning technique that
handles the classication problem by drawing the best dis-
tinction between the various classes. The optimal boundary
line can be determined by locating an extreme vector point
in the available dimension space. SVMs are frequently used
for binary classication and can be applied to multiple clas-
sications by generating a non-linear function that generates
new variables as the kernel [49], [52].
In machine learning, feature selection is a crucial operation
[53]. We opted for our algorithm because the meta-heuristic
nature-inspired algorithm can provide a strong foundation for
identifying patterns and anomalies in the data, by using the
input and output without needing gradient information [54].
The RBM can be used to learn and recognize more complex
features that may be indicative of an intrusion. Together, these
two approaches can provide a powerful tool for detecting and
responding to threats in smart grid systems.
A. RESULTS
The 15 sets of information from 37 types of power system
events were combined into a single dataset. For the experi-
ments in this paper, 70% of the data is used for training, and
30% is used for testing. Using the hyperparameter settings in
Table 3, the three distinct experiments are conducted.
18668 VOLUME 11, 2023
Acta Wasaensia 145
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
FIGURE 8. The accuracy of the conducted experiments.
FIGURE 9. The precision of the conducted experiments.
Figures 8, 9, 10, and 11 show the experiment's nd-
ings, demonstrating the accuracy, precision, recall, and f1
score of the veried algorithms in more detail. Figure 8
depicts the performance of the veried algorithms mea-
sured in terms of accuracy across all three experiments.
The results show that the accuracy of the algorithms in
the binary classication experiment consistently outper-
formed the other two experiments, with the exception of
the ANN algorithm in the three-class classication experi-
ment. In this case, the ANN algorithm performed slightly
better in the three-class classication experiment compared
to the binary classication experiment and the multi-class
classication.
According to the results depicted in Figure 9, the preci-
sion of the multi-class classication experiment improved
considering the three-class classication experiment, but this
improvement was only observed for the ANN algorithm.
These results suggest that the ANN algorithm may be more
effective at achieving higher precision in multi-class classi-
cation tasks. However, the performance of the multi-class
classication experiment was subpar when utilizing the CNN
and SVM algorithms.
FIGURE 10. The recall score of the conducted experiments.
FIGURE 11. The f1 score of the conducted experiments.
As illustrated in Figure 10, the recall of the experiment
revealed an improvement in the three-class classication for
both theANNand SVMalgorithms compared to the other two
experiments. The performance of the binary classication
experiment is higher for the proposed RF-RBM because the
sample counts on either one class in the binary classica-
tion are very large. However, the irregular distribution of
the three-class classication experiment is a result of the
signicant drop in data for the no-event class, leading to a
decrease in performance.
The results of the f1 score estimations shown in Figure 11
indicate that the outcomes of the three-class classication
are better in all the experiments, except for the proposed
RF-RBM. The proposed algorithm outperforms the other
three algorithms in three-class classication and multi-class
classication, but it extremely outperforms them in binary
classication.
Furthermore, we compare the results of this paper to the
result of comparable papers that employed the same dataset.
The comparison using the binary classication dataset is
shown in Table 4, and the three-class classication dataset
is shown in Table 5.
VOLUME 11, 2023 18669
146 Acta Wasaensia 
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
TABLE 4. Comparison of models results with binary classification dataset.
TABLE 5. Comparison of models results with three-class classification
dataset.
V. CONCLUSION
In this study, we present a nature-inspired restricted
Boltzmann machine algorithm to detect and classify the types
of attacks in the smart grids' SCADA systems. The funda-
mental notion is that the articial root foraging optimization
method is designed on the biological root growth optimiza-
tion algorithm. To demonstrate the optimization capability,
the dataset features were ne-tuned using the articial root
foraging algorithm before the neural network algorithm.
The proposed RF-RBM algorithm is compared to three
cutting-edge neural network algorithms in the experimen-
tal study, which was conducted in three categories: binary
classication, three-class classication, and multi-class clas-
sication. The outcomes of the experiments demonstrate that
the proposed algorithm RF-RBM is best suited for cyberat-
tack detection and classication in SCADA systems for smart
grids. This is shown by the excellent accuracy, sufcient pre-
cision, respectable recall, and a high f1 score demonstrated
by the proposed algorithm.
REFERENCES
[1] A. N. Milioudis, G. T. Andreou, and D. P. Labridis, `` Enhanced protection
scheme for smart grids using power line communications techniquesPart
I: Detection of high impedance fault occurrence,'' IEEE Trans. Smart Grid,
vol. 3, no. 4, pp. 16211630, Dec. 2012, doi: 10.1109/TSG.2012.2208987.
[2] C. P. Vineetha and C. A. Babu, `` Smart grid challenges, issues and solu-
tions,'' in Proc. Int. Conf. Intell. Green Building Smart Grid (IGBSG),
Apr. 2014, pp. 14, doi: 10.1109/IGBSG.2014.6835208.
[3] M. Zhengyou, `` Study on the application of advanced power electronics
in smart grid,'' in Proc. 6th Int. Conf. Future Gener. Commun. Technol.
(FGCT), Aug. 2017, pp. 14, doi: 10.1109/FGCT.2017.8103739.
[4] M. Cao, K. Cao, B. Wu, andM. Tan, `` Intelligent condition monitoring and
management for power transmission and distribution equipments in Yun-
nan power grid,'' in Proc. Int. Conf. High Voltage Eng. Appl., Sep. 2012,
pp. 811, doi: 10.1109/ICHVE.2012.6357153.
[5] J. Shair, H. Li, J. Hu, and X. Xie, `` Power system stability issues, clas-
sications and research prospects in the context of high-penetration of
renewables and power electronics,'' Renew. Sustain. Energy Rev., vol. 145,
Jul. 2021, Art. no. 111111.
[6] K. Ullah, A. Basit, Z. Ullah, S. Aslam, and H. Herodotou, `` Automatic
generation control strategies in conventional and modern power systems:
A comprehensive overview,'' Energies, vol. 14, no. 9, p. 2376, Apr. 2021.
[7] Y. Himri, S. M. Muyeen, F. H. Malik, S. Himri, K. A. bin Ahmad,
N. K. Merzouk, and M. Merzouk, `` A review on applications of the
standard series IEC 61850 in smart grid applications,'' in Cyberphysical
Smart Cities Infrastructures: Optimal Operation and Intelligent Decision
Making. 2022, pp. 197253.
[8] H. F. Habib, N. Fawzy, and S. Brahma, `` Performance testing and assess-
ment of protection scheme using real-time hardware-in-the-loop and IEC
61850 standard,'' IEEE Trans. Ind. Appl., vol. 57, no. 5, pp. 45694578,
Sep. 2021.
[9] A. Draz, M. M. Elkholy, and A. A. El-Fergany, `` Soft computing methods
for attaining the protective device coordination including renewable ener-
gies: Review and prospective,''Arch. Comput. Methods Eng., vol. 28, no. 7,
pp. 43834404, Dec. 2021.
[10] Y.-F. Li and C. Jia, `` An overview of the reliability metrics for power grids
and telecommunication networks,'' Frontiers Eng. Manage., vol. 8, no. 4,
pp. 531544, Dec. 2021.
[11] Y. Shi, Y. Li, Y. Zhou, R. Xu, D. Feng, Z. Yan, and C. Fang, `` Optimal
scheduling for power system peak load regulation considering short-time
startup and shutdown operations of thermal power unit,'' Int. J. Elect.
Power Energy Syst., vol. 131, Oct. 2021, p. 107012.
[12] A. Oshnoei, M. Kheradmandi, S. M. Muyeen, and N. D. Hatziargyriou,
`` Disturbance observer and tube-based model predictive controlled electric
vehicles for frequency regulation of an isolated power grid,'' IEEE Trans.
Smart Grid, vol. 12, no. 5, pp. 43514362, Sep. 2021.
[13] D. K. Panda and S. Das, `` Smart grid architecture model for control, opti-
mization and data analytics of future power networks with more renewable
energy,'' J. Cleaner Prod., vol. 301, Jun. 2021, Art. no. 126877.
[14] A. Ghasempour, `` Internet of Things in smart grid: Architecture, applica-
tions, services, key technologies, and challenges,'' Inventions, vol. 4, no. 1,
p. 22, Mar. 2019.
[15] M. Z. Gunduz and R. Das, `` Cyber-security on smart grid: Threats and
potential solutions,'' Comput. Netw., vol. 169, Mar. 2020, Art. no. 107094.
[16] M. Srivastava, `` An overview of cyber-security issues in smart grid,''
in Computer Networks, Big Data and IoT (Lecture Notes on Data
Engineering and Communications Technologies), vol. 66, A. Pandian,
X. Fernando, and S. M. S. Islam, Eds. Singapore: Springer, 2021,
pp. 643650, doi: 10.1007/978-981-16-0965-7_49.
[17] S. Liu, S. You, H. Yin, Z. Lin, Y. Liu, W. Yao, and L. Sundaresh, `` Model-
free data authentication for cyber security in power systems,'' IEEE Trans.
Smart Grid, vol. 11, no. 5, pp. 45654568, Sep. 2020.
18670 VOLUME 11, 2023
Acta Wasaensia 147
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
[18] P. P. Biswas, H. C. Tan, Q. Zhu, Y. Li, D. Mashima, and B. Chen, `` A syn-
thesized dataset for cybersecurity study of IEC 61850 based substation,''
in Proc. IEEE Int. Conf. Commun., Control, Comput. Technol. Smart Grids
(SmartGridComm), Oct. 2019, pp. 17.
[19] C. Mu, T. Ding, M. Qu, Q. Zhou, F. Li, and M. Shahidehpour, `` Decen-
tralized optimization operation for the multiple integrated energy systems
with energy cascade utilization,'' Appl. Energy, vol. 280, Dec. 2020,
Art. no. 115989.
[20] E. Hammad, M. Ezeme, and A. Farraj, `` Implementation and development
of an ofine co-simulation testbed for studies of power systems cyber secu-
rity and control verication,'' Int. J. Electr. Power Energy Syst., vol. 104,
pp. 817826, Jan. 2019.
[21] H. Wang, J. Ruan, B. Zhou, C. Li, Q. Wu, M. Q. Raza, and
G.-Z. Cao, `` Dynamic data injection attack detection of cyber physical
power systems with uncertainties,'' IEEE Trans. Ind. Informat., vol. 15,
no. 10, pp. 55055518, Oct. 2019.
[22] M. Ghiasi, M. Dehghani, T. Niknam, A. Kavousi-Fard, P. Siano, and
H. H. Alhelou, `` Cyber-attack detection and cyber-security enhancement
in smart DC-microgrid based on blockchain technology and Hilbert Huang
transform,'' IEEE Access, vol. 9, pp. 2942929440, 2021.
[23] Q.Wang,W. Tai, Y. Tang,M.Ni, and S. You, `` A two-layer game theoretical
attack-defense model for a false data injection attack against power sys-
tems,'' Int. J. Electr. Power Energy Syst., vol. 104, pp. 169177, Jan. 2019.
[24] A. Khan, M. Hosseinzadehtaher, M. B. Shadmand, D. Saleem, and
H. Abu-Rub, `` Intrusion detection for cybersecurity of power electronics
dominated grids: Inverters PQ set-points manipulation,'' in Proc. IEEE
CyberPELS (CyberPELS), Oct. 2020, pp. 18.
[25] B. Hu, C. Zhou, Y.-C. Tian, X. Hu, and X. Junping, `` Decentralized
consensus decision-making for cybersecurity protection in multimicro-
grid systems,'' IEEE Trans. Syst., Man, Cybern., Syst., vol. 51, no. 4,
pp. 21872198, Apr. 2021.
[26] H. Kure, S. Islam, and M. Razzaque, `` An integrated cyber security risk
management approach for a cyber-physical system,'' Appl. Sci., vol. 8,
no. 6, p. 898, May 2018.
[27] R. Lai, X. Qiu, and J. Wu, `` Robustness of asymmetric cyber-
physical power systems against cyber attacks,'' IEEE Access, vol. 7,
pp. 6134261352, 2019.
[28] T. Bailey, J. Johnson, and D. Levin, `` Deep reinforcement learning for
online distribution power system cybersecurity protection,'' in Proc. IEEE
Int. Conf. Commun., Control, Comput. Technol. Smart Grids (SmartGrid-
Comm), Oct. 2021, pp. 227232.
[29] E. Anthi, L. Williams, M. Rhode, P. Burnap, and A. Wedgbury, `` Adversar-
ial attacks on machine learning cybersecurity defences in industrial control
systems,'' J. Inf. Secur. Appl., vol. 58, May 2021, Art. no. 102717.
[30] M. Mohammadpourfard, Y. Weng, M. Pechenizkiy, M. Tajdinian, and
B.Mohammadi-Ivatloo, `` Ensuring cybersecurity of smart grid against data
integrity attacks under concept drift,'' Int. J. Electr. Power Energy Syst.,
vol. 119, Jul. 2020, Art. no. 105947.
[31] V. S. B. Kurukuru, M. A. Khan, and S. Sahoo, `` Cybersecurity
in power electronics using minimal dataA physics-informed spline
learning approach,'' IEEE Trans. Power Electron., vol. 37, no. 11,
pp. 1293812943, Nov. 2022.
[32] Y. Liu, J. Liu, L. Ma, and L. Tian, `` Articial root foraging optimizer
algorithm with hybrid strategies,'' Saudi J. Biol. Sci., vol. 24, no. 2,
pp. 268275, Feb. 2017.
[33] Y. Liu, J. Liu, L. Tian, and L. Ma, `` Hybrid articial root foraging opti-
mizer based multilevel threshold for image segmentation,'' Comput. Intell.
Neurosci., vol. 2016, pp. 116, 2016.
[34] X. He, H. Chen, B. Niu, and J. Wang, `` Root growth optimizer with self-
similar propagation,''Math. Problems Eng., vol. 2015, pp. 112, 2015.
[35] Z. Wang, M. V. Kleunen, H. J. During, and M. J. A. Werger, `` Root
foraging increases performance of the clonal plant potentilla reptans in
heterogeneous nutrient environments,'' PLoS ONE, vol. 8, no. 3, 2013,
Art. no. e58602.
[36] L. Ma, K. Hu, Y. Zhu, and H. Chen, `` A hybrid articial bee colony
optimizer by combining with life-cycle, Powell's search and crossover,''
Appl. Math. Comput., vol. 252, pp. 133154, Feb. 2015.
[37] L. Ma, Y. Zhu, Y. Liu, L. Tian, and H. Chen, `` A novel bionic algorithm
inspired by plant root foraging behaviors,'' Appl. Soft Comput., vol. 37,
pp. 95113, Dec. 2015.
[38] M. Kuchhold, M. Simon, and T. Sikora, `` Restricted Boltzmann machine
image compression,'' in Proc. Picture Coding Symp. (PCS), Jun. 2018,
pp. 243247, doi: 10.1109/PCS.2018.8456279.
[39] Z. Liu, R.Wang, N. Japkowicz, D. Tang,W. Zhang, and J. Zhao, `` Research
on unsupervised feature learning for Android malware detection based on
restricted Boltzmann machines,'' Future Gener. Comput. Syst., vol. 120,
pp. 91108, Jul. 2021.
[40] R. W. R. de Souza, D. S. Silva, L. A. Passos, M. Roder,
M. C. Santana, P. R. Pinheiro, and V. H. C. de Albuquerque, `` Computer-
assisted Parkinson's disease diagnosis using fuzzy optimum-path forest
and restricted Boltzmann machines,'' Comput. Biol. Med., vol. 131,
Apr. 2021, Art. no. 104260.
[41] L. Xing, K.Demertzis, and J. Yang, `` Identifying data streams anomalies by
evolving spiking restricted Boltzmann machines,'' Neural Comput. Appl.,
vol. 32, pp. 66996713, Jun. 2020.
[42] X. Lü, L. Meng, C. Chen, and P. Wang, `` Fuzzy removing redundancy
restricted Boltzmann machine: Improving learning speed and classica-
tion accuracy,'' IEEE Trans. Fuzzy Syst., vol. 28, no. 10, pp. 24952509,
Oct. 2020.
[43] K. Demertzis, L. Iliadis, E. Pimenidis, and P. Kikiras, `` Variational
restricted Boltzmann machines to automated anomaly detection,'' Neural
Comput. Appl., vol. 1, pp. 1520715220, Mar. 2022.
[44] Mississippi State University Critical Infrastructure Protection Center.
(Apr. 2014). Industrial Control System Cyber Attack Data Set.
[Online]. Available: http://www.ece.msstate.edu/wiki/index.php/ICS_
Attack_Dataset
[45] S. Pan, T. Morris, and U. Adhikari, `` Developing a hybrid intru-
sion detection system using data mining for power systems,'' IEEE
Trans. Smart Grid, vol. 6, no. 6, pp. 31043113, Nov. 2015, doi:
10.1109/TSG.2015.2409775.
[46] S. Pan, T. Morris, and U. Adhikari, `` Classication of disturbances and
cyber-attacks in power systems using heterogeneous time-synchronized
data,'' IEEE Trans. Ind. Informat., vol. 11, no. 3, pp. 650662, Jun. 2015,
doi: 10.1109/TII.2015.2420951.
[47] R. C. Borges Hink, J. M. Beaver, M. A. Buckner, T. Morris, U. Adhikari,
and S. Pan, `` Machine learning for power system disturbance and
cyber-attack discrimination,'' in Proc. 7th Int. Symp. Resilient Con-
trol Syst. (ISRCS), Aug. 2014, pp. 18, doi: 10.1109/ISRCS.2014.
6900095.
[48] B. Riyaz and S. Ganapathy, `` A deep learning approach for effective intru-
sion detection in wireless networks using CNN,'' Soft Comput., vol. 24,
no. 22, pp. 1726517278, Nov. 2020.
[49] L. Haghnegahdar and Y. Wang, `` A whale optimization algorithm-trained
articial neural network for smart grid cyber intrusion detection,'' Neural
Comput. Appl., vol. 32, no. 13, pp. 94279441, Jul. 2020.
[50] J. Qian, X. Du, B. Chen, B. Qu, K. Zeng, and J. Liu, `` Cyber-physical
integrated intrusion detection scheme in SCADA system of process man-
ufacturing industry,'' IEEE Access, vol. 8, pp. 147471147481, 2020.
[51] J. Kim, J. Kim, H. Kim, M. Shim, and E. Choi, `` CNN-based network
intrusion detection against denial-of-service attacks,'' Electronics, vol. 9,
no. 6, p. 916, Jun. 2020.
[52] M. Chora± and M. Pawlicki, `` Intrusion detection approach based on opti-
mised articial neural network,'' Neurocomputing, vol. 452, pp. 705715,
Sep. 2021.
[53] G. O. Young, `` Synthetic structure of industrial plastics,'' in Plastics,
vol. 3, J. Peters, Ed., 2nd ed. New York, NY, USA: McGraw-Hill, 1964,
pp. 1564.
[54] P. Agrawal, H. F. Abutarboush, T. Ganesh, and A. W. Mohamed, `` Meta-
heuristic algorithms on feature selection: A survey of one decade of
research (20092019),'' IEEE Access, vol. 9, pp. 2676626791, 2021, doi:
10.1109/ACCESS.2021.3056407.
[55] L. Wang, Q. Cao, Z. Zhang, S. Mirjalili, and W. Zhao, `` Articial
rabbits optimization: A new bio-inspired meta-heuristic algorithm for
solving engineering optimization problems,'' Eng. Appl. Artif. Intell.,
vol. 114, Sep. 2022, Art. no. 105082, doi: 10.1016/j.engappai.2022.
105082.
[56] X. Li, A. Zheng, X. Zhang, C. Li, and L. Zhang, `` Rolling element
bearing fault detection using support vector machine with improved
ant colony optimization,'' Measurement, vol. 46, no. 8, pp. 27262734,
Oct. 2013.
[57] O. A. Alimi, K. Ouahada, A. M. Abu-Mahfouz, and S. Rimer, `` Power sys-
tem events classication using genetic algorithm based feature weighting
technique for support vector machine,'' Heliyon, vol. 7, no. 1, Jan. 2021,
Art. no. e05936, doi: 10.1016/j.heliyon.2021.e05936.
[58] C. L. Huang and J. F. Dun, `` A distributed PSOSVM hybrid system with
feature selection and parameter optimization,'' Appl. Soft Comput., vol. 8,
pp. 13811391, Sep. 2008.
VOLUME 11, 2023 18671
148 Acta Wasaensia 
S. Y. Diaba et al.: Cyber Security in Power Systems Using Meta-Heuristic and Deep Learning Algorithms
SAYAWU YAKUBU DIABA (Graduate Student
Member, IEEE) was born in Suhum, Ghana.
He received the B.Eng. and M.Sc. degrees in
telecommunications engineering from the Kwame
Nkrumah University of Science and Technol-
ogy. He is currently pursuing the D.Sc. (Tech.)
degree in telecommunication engineering with the
University of Vaasa, Finland. He was formerly
employed with Electricity Company of Ghana,
where he worked as a Power Distribution Special-
ist for 13 years. His research interests include the use of machine learning
in smart grids, developing cyber security algorithms for smart grids SCADA
networks, and performance analysis of smart grids. He is also interested in
wireless communication and automation.
MIADREZA SHAFIE-KHAH (Senior Member,
IEEE) received the rst Ph.D. degree in electri-
cal engineering from Tarbiat Modares University,
Tehran, Iran, the second Ph.D. degree in elec-
tromechanical engineering from the University of
Beira Interior (UBI), Covilha, Portugal. He held
postdoctoral positions at UBI and the University of
Salerno, Salerno, Italy. Currently, he is a Professor
(tenure-track) with the University of Vaasa, Vaasa,
Finland. He has coauthored more than 500 papers
that received more than 12000 citations with an H-index of 62. His research
interests include electricity markets, power system optimization, demand
response, electric vehicles, price and renewable forecasting, and smart grids.
He has won ve best paper awards at IEEE conferences. He was con-
sidered one of the Outstanding Reviewers of the IEEE TRANSACTIONS ON
SUSTAINABLE ENERGY, in 2014 and 2017, the IEEE TRANSACTIONS ON POWER
SYSTEMS, in 2017 and 2018, and the IEEE OPEN ACCESS JOURNAL OF POWER
AND ENERGY, in 2020 and 2021; and one of the Best Reviewers of the IEEE
TRANSACTIONS ON SMART GRID, in 2016 and 2017. He is a Top Scientist in
the Research.com ranking in engineering and technology. He is an Editor of
the IEEE TRANSACTIONS ON SUSTAINABLE ENERGY and the IEEE OPEN ACCESS
JOURNAL OF POWER AND ENERGY; an Associate Editor of the IEEE SYSTEMS
JOURNAL, IEEE ACCESS, and IET-RPG; the Guest Editor-in-Chief of the IEEE
OPENACCESS JOURNALOF POWERAND ENERGY; and the Guest Editor of the IEEE
TRANSACTIONS ON CLOUD COMPUTING and more than 14 special issues. He is
also the Volume Editor of the book titled Blockchain-Based Smart Grids
(Elsevier, 2020).
MOHAMMED ELMUSRATI (Senior Member,
IEEE) received the B.Sc. and M.Sc. degrees
(Hons.) in electrical and electronic engineer-
ing from the University of Benghazi, Libya, in
1991 and 1995, respectively, and the Licentiate
of Science degree (Hons.) in technology and the
D.Sc. degree in technology, automation and con-
trol engineering from Aalto University, Finland, in
2002 and 2004, respectively. He is a Full Professor
of communication, automation, and digitalization
with the School of Technology and Innovations, University ofVaasa, Finland.
He has developed several international programs, such as the Communi-
cation and Systems Engineering Program and the Industrial Digitalization
Program.Now, he is theHead of the International Program of Sustainable and
Autonomous Systems (SAS). He has published about 160 papers, books, and
book chapters. His research interests include wireless communications, arti-
cial intelligence, machine learning, biotechnology, data analysis, stochastic
systems, and game theory.
18672 VOLUME 11, 2023
Acta Wasaensia 149
 
 
    
Publication IV 
150 Acta Wasaensia 
Neural Networks 165 (2023) 321332
Contents lists available at ScienceDirect
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
SCADA securing system using deep learning to prevent cyber
infiltration
Sayawu Yakubu Diaba a,∗, Theophilus Anafo b, Lord Anertei Tetteh c,
Michael Alewo Oyibo d, Andrew Adewale Alola e,f, Miadreza Shafie-khah g,
Mohammed Elmusrati a
a Department of Telecommunication Engineering, School of Technology and Innovations, University of Vaasa, Vaasa, Finland
b Department of Electrical/Electronic Engineering, Cape Coast Technical University 5P3+F7H, Cape Coast, Ghana
c Department of Electrical and Electronic Engineering, Koforidua Technical University, 3P8P+F5F, Koforidua, Ghana
d National Bureau of Statistics, Abuja, Nigeria
e CREDS-Centre for Research on Digitalization and Sustainability, Inland Norway University of Applied Sciences, Norway
f Faculty of Economics, Administrative and Social Sciences, Nisantasi University, Istanbul, Turkey
g Department of Electrical Engineering, School of Technology and Innovations, University of Vaasa, Vaasa, Finland
a r t i c l e i n f o
Article history:
Received 15 February 2023
Received in revised form 14 May 2023
Accepted 23 May 2023
Available online 2 June 2023
Keywords:
Genetically seeded flora
Intrusion detection systems
Long short-term memory
Recurrent neural network
Residual neural network
And transformer neural network
a b s t r a c t
Supervisory Control and Data Acquisition (SCADA) systems are computer-based control architectures
specifically engineered for the operation of industrial machinery via hardware and software models.
These systems are used to project, monitor, and automate the state of the operational network through
the utilization of ethernet links, which enable two-way communications. However, as a result of
their constant connectivity to the internet and the lack of security frameworks within their internal
architecture, they are susceptible to cyber-attacks. In light of this, we have proposed an intrusion
detection algorithm, intending to alleviate this security bottleneck. The proposed algorithm, the
Genetically Seeded Flora (GSF) feature optimization algorithm, is integrated with Transformer Neural
Network (TNN) and functions by detecting changes in operational patterns that may be indicative of an
intruder's involvement. The proposed Genetically Seeded Flora Transformer Neural Network (GSFTNN)
algorithm stands in stark contrast to the signature-based method employed by traditional intrusion
detection systems. To evaluate the performance of the proposed algorithm, extensive experiments
are conducted using the WUSTL-IIOT-2018 ICS SCADA cyber security dataset. The results of these
experiments indicate that the proposed algorithm outperforms traditional algorithms such as Residual
Neural Networks (ResNet), Recurrent Neural Networks (RNN), and Long Short-Term Memory (LSTM)
in terms of accuracy and efficiency.
' 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
1. Introduction
Industry 4.0, also known as the Fourth Industrial Revolut-
ion (Smith & Fressoli, 2021), began in the early 2000s (Mon-
talban, Iradier, & Member, 2020) as a result of advancements
in internet communication and the development of automated
software and frameworks (Hoffmann Souza, da Costa, de Oliveira
Ramos, & da Rosa Righi, 2021). This has made it possible to
control the manufacturing process using simple computer pro-
grams and microcontrollers, leading to increased customization
of products (Chen & Chang, 2020; Rousopoulou et al., 2022).
∗ Corresponding author.
E-mail address: sdiaba@uwasa.fi (S.Y. Diaba).
URL: http://dx.doi.org/10.1016/j.cviu.2017.00.000 (S.Y. Diaba).
Research is ongoing to develop self-decision-making control sys-
tems for manufacturing processes and to enable remote moni-
toring and control of these processes (Hassan Malik, Alam, Ku-
usik, & Moullec, 2020; Jasperneite, Sauter, & Wollschlaeger, 2020;
Sarker, 2022). SCADA systems are mainly used to control and
monitor (Kumar & S, 2020), components of vital infrastructures
(Kirubakaran, 2020), such as smart grids, pipelines, transporta-
tion, telecommunication, and manufacturing plants (Lee & Hong,
2020). The SCADA systems can also act as a status projector
for monitoring the operation of the Industrial Control Systems
(ICS). It can be integrated with a Programmable Logic Controller
(PLC) and other control technologies like Proportional Integral
Derivative (PID) controllers (V, 2020). SCADA devices have a high
operational speed, enabling real-time data analysis.
The high integration of communication infrastructure in the
smart grid and the connections to the internet (Cherifi & Hamami,
2018; Yang, McLaughlin, Sezer, Yuan, & Huang, 2014) in the
https://doi.org/10.1016/j.neunet.2023.05.047
0893-6080/' 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Acta Wasaensia 151
S.Y. Diaba et al. Neural Networks 165 (2023) 321332
Fig. 1. The architectural overview of a SCADA system.
Table 1
Comparison between SCADA and IoT.
Parameters SCADA IoT
Communication
medium
Semi-wireless Wireless
Storage Local Cloud
System integration Limited integration to
the peripheral
Easily integrate with
the peripheral
Operational reliability High Low
Suitability Suitable for a big
production line
Suitable for minor
applications
Threat possibility Medium to high High
SCADA system have created a lacuna for cyber-attacks (Altunay,
Albayrak, Ozalp, & Cakmak, 2021; Singh, Ebrahem, & Govindarasu,
2019). A technology designed with the sole purpose of grant-
ing cyber protection for computers, computer networks, data
transmissions, and legitimate access is termed Cybersecurity. The
cyber-security systems are primarily a setup of computer (host)
security systems and network security systems. Each of these
has, at minimum, antivirus software, a firewall, and Intrusion
Detection System (IDS) (Singh, Garg, Kumar, & Saquib, 2015).
SCADA system security concerns are receiving more attention
as the frequency of security incidents against these crucial infras-
tructures is rising (Samdarshi, Sinha, & Tripathi, 2016). Though,
the presence of cyber threats in SCADA systems are comparatively
less compared to the Internet of Things (IoT) systems because the
SCADA networks are not connected to an open internet as the IoT
systems (see Fig. 1).
Yet, the SCADA systems are in the third top position in terms
of attaining frequent threat disturbances amongst the other appli-
cations. The cyber-attacks are targeted at the SCADA systems for
manipulating the operational control of the network. The hackers
may cause damage to the power system when they have control
over the switches, isolators, and relays under the command of the
SCADA systems (see Table 1).
These attacks' repercussions may jeopardize the safety, avail-
ability, reputation, profitability, and reliability of the targeted
organizations (Liu & Wang, 2022). Traditional SCADA systems
were not created with a cyber-securing protocol, albeit at the
moment some models do have certain firewall security measures
activated. The SCADA systems are still a target for hackers (Altaha,
Lee, Aslam, & Hong, 2020) who exploit the firewall's weaknesses.
To protect the communication infrastructure of the smart grid, it
is essential to develop a SCADA network intrusion detection solu-
tion that considers both operational requirements and particular
traffic characteristics of SCADA systems (see Table 2).
Motivated by the above facts, the following are regarded as
the encapsulation of the primary contributions of this paper:
• We investigate the application of deep learning techniques
in detecting cyber threats in both industrial and general
settings with a specific focus on SCADA systems used in the
smart grid. We explore the potential of these techniques
in identifying and mitigating cyber-attacks in various en-
vironments, highlighting their effectiveness and limitations.
We also investigate the possible integration of deep learning
techniques with existing security systems to enhance their
performance and overall security.
• We propose an algorithm for detecting cyber intrusions by
analyzing changes in operational patterns that are related to
intrusion activity. To achieve this, we utilize a GSFTNN algo-
rithm that is custom-made for the specific task of intrusion
detection. The algorithm is designed to identify anoma-
lies in the operational patterns and flag them as potential
intrusions.
• We perform extensive simulations using the WUSTL-IIOT-
2018 ICS SCADA cyber security dataset to evaluate the ef-
fectiveness of deep learning techniques in detecting cyber-
attacks in SCADA systems. The simulation process consists
of two stages: binary classification and multiclass classifica-
tion. In the binary classification stage, the data is classified
as normal and attacks. In the multiclass detection stage, the
data is further classified into three categories: exploiting
attacks, aggressive attacks, and normal traffic. The results of
the simulation are used to analyze the effectiveness of the
deep learning techniques and to pinpoint possible areas for
development.
The remainder of the paper is structured as follows; related stud-
ies are presented in Section 2. The methodology is in Section 3
where we give the data description as well as the types of attacks
and the summary of the attacks therein. In Section 4, experi-
mental analysis is presented and finally, the paper's conclusion
is presented in Section 5.
2. Related studies
In the area of SCADA security, older papers have investigated
the use of machine learning techniques to enhance security. For
example, the authors of Maglaras and Jiang (2014) proposed
a machine learning-based approach for detecting anomalies in
SCADA systems and compares the performance of several differ-
ent algorithms, including neural networks, support vector ma-
chines, and decision trees. The authors proposed a novel method
for detecting intrusions in the SCADA system, which can identify
abnormal activity even if an attacker attempts to conceal it in the
control layer of the system. To assess the effectiveness of the al-
gorithms, supervised machine learning models were examined to
categorize normal and abnormal behaviors in an ICS. The authors
322
152 Acta Wasaensia 
S.Y. Diaba et al. Neural Networks 165 (2023) 321332
Table 2
Some types of attacks involved in the SCADA system.
Attack types Reflection Initiation
Denial of Service Enforcing maximum traffic to the network to block the actual
communication
Poor authentication platform
Ransomware attacks Malfunction and operational block of PLCs Vulnerable hardware
Malicious node attacks Execution of unauthorized operation Web interface with an outdated operating system
Phishing attacks Control over the SCADA system Absence of network isolation and weak authentication
Worm attacks Blocks access/operation No network isolation
Honeypot attacks Reframe the device function Weak servers and vulnerable policies on security
used several machine learning models in examining the models,
and they performed well at spotting abnormalities, particularly
stealthy attacks. According to the findings, random forest out-
performs other classifier algorithms (Mokhtari, Abbaspour, Yen,
& Sargolzaei, 2021).
In Lopez Perez, Adamsky, Soua, and Engel (2018) a machine
learning approach for intrusion detection in SCADA systems was
accessed on a real-world dataset. The authors find that the ran-
dom forest detects intrusion effectively. Older papers demon-
strate that machine learning has been a topic of interest in the
area of SCADA security for at least a decade and that the use of
machine learning for enhancing SCADA security is not a new idea.
The authors of Teixeira et al. (2018) looked at cyber-attacks that
use AI-based techniques and found some mitigation techniques
that can be used to stop such attacks. Also, they examined current
trends in AI-based cyber-attacks and were able to identify the
methodologies and strategies currently used in executing AI-
based cyber-attacks as well as what future scenarios will likely
be conceivable to control such attacks.
Several studies have investigated the use of artificial neural
networks (ANN), convolutional neural networks (CNN), and RNN
to detect and prevent cyber-attacks in SCADA systems. These
methods have been demonstrated to be successful in detecting
and preventing a wide range of cyber-attacks, including malware,
phishing, and distributed denial-of-service attacks (Al Husaini,
Habaebi, Hameed, Islam, & Gunawan, 2020; Balla, Habaebi, Is-
lam, & Mubarak, 2022; Khan, Zhang, Alazab, & Kumar, 2019). A
CNN (P, Hong, Gao, Yao, & Zhang, 2020; Wu, Hong, & Chanussot,
2022) defining the significant temporal patterns of SCADA com-
munication and pinpointing time windows that are vulnerable
to network attacks rather than hand-crafted characteristics for
specific network packets or flows was proposed. The authors
provided a re-training method to manage instances of network
attacks that have never been detected before. The study utilized
actual SCADA traffic datasets and the results demonstrate that the
deep-learning-based technique that has been proposed is suitable
for SCADA systems' network intrusion detection, attaining high
detection accuracy and offering the capacity to address newly
emerging threats (Altaha et al., 2020; Yang, Cheng, & Chuah,
2019).
The use of deep learning for enhancing the security of SCADA
systems has been a growing area of research in recent years. Stud-
ies that focus on deep learning, suggest that this area of research
has advanced significantly (Gao et al., 2023; Wu, Hong, & Chanus-
sot, 2023), and that deep learning is a promising direction (Yang
& Chen, 2019) for enhancing the security of SCADA systems. With
the increasing use of technology in critical infrastructures, such as
medical devices, power plants, and water treatment facilities (Lee
& Hong, 2020; Pliatsios, Sarigiannidis, Lagkas, & Sarigiannidis,
2020), the need for robust and secure SCADA systems is more
pressing than ever. Cyber-attacks on SCADA systems can result
in significant harm, including disruption of essential services, loss
of sensitive information, and physical damage to equipment. To
address these concerns, many researchers have turned to deep
learning as a promising solution for enhancing the security of
SCADA systems (Avola, Cinque, Fagioli, & Foresti, 2022).
The research in Wang, Harrou, Bouyeddou, Senouci, and Sun
(2022) presented a stacked deep learning-driven method for de-
tecting cyber-attacks. The relevant aspects of the suspicious be-
haviors were thoroughly learned by the proposed stacked deep
learning model, which then distinguishes them from normal ac-
tions. As a result, the stacked deep learning-based intrusion de-
tection approach performs better than some cutting-edge shallow
methods, such as the standalone deep learning models, naive
Bayes, random forests, nearest neighbor, oneR, AdaBoost, and
support vector machine. The research in Jmila and Houda (2022)
focuses more on shallow classifiers, which are still often em-
ployed in machine learning-based IDS because of their maturity
and ease of usage. The authors tested the resistance to vari-
ous adversarial approaches often utilized in the state-of-the-art
of AdaBoost, bagging, decision tree, gradient boosting, logistic
regression, random forest, support vector classifier, and even a
deep learning network. A Gaussian data augmentation defensive
method was implemented and its impact on increasing classifier
robustness was assessed. The findings demonstrate that not all
classifiers are affected equally by attacks, that a classifier's ro-
bustness relies on the attack, and that depending on the network
intrusion detection scenario, a trade-off between performance
and robustness must be considered.
It is worth noting that while deep learning has shown great
promise in enhancing SCADA security, there are still many chal-
lenges to overcome. For example, deep learning models can be
vulnerable to adversarial attacks (Jmila & Houda, 2022; Ozdag,
2018) and the quality of training data can significantly impact
the performance of these models. Nevertheless, the research in
this area suggests that deep learning is a promising direction for
enhancing the security of SCADA systems. It has the potential
to enhance the security of SCADA systems in a variety of ways,
including detecting and preventing cyber-attacks, mitigating sys-
tem failures, protecting sensitive information, and enhancing the
security of communication networks. However, as with any new
technology, there are still many challenges to overcome, such as
improving the robustness of deep learning models, addressing
the issue of data scarcity, and developing secure deep learning
systems that are resistant to adversarial attacks.
3. Methodology
Addressing cyber intrusion in SCADA is the main motive of the
proposed algorithm and it is implemented in this paper with a
deep learning-based approach. The algorithm is a hybrid of GSF
and TNN and it is compared to ResNet, RNN, and LSTM models for
identifying the best-performing algorithm in detecting intrusions
in SCADA systems. Washington University St. Louis-Industrial IoT-
2018 (WUSTL-IIOT-2018) dataset for ICS SCADA cybersecurity
used in Ahakonye, Nwakanma, Lee, and Kim (2023) is the dataset
used in this study to examine the efficiency and accuracy of the
above-mentioned algorithms.
3.1. Data
The WUSTL-IIOT-2018 ICS SCADA is a collection of network
traffic data captured from a real-world ICS that was intentionally
323
Acta Wasaensia 153
S.Y. Diaba et al. Neural Networks 165 (2023) 321332
Fig. 2. The summary of attacks available in the dataset.
subjected to cyber-attacks. The dataset was created to evaluate
intrusion IDS in cyberphysical systems. It was collected from
a water treatment testbed in the United States, a representative
example of a real-world industrial control system. To generate the
intrusion data, the testbed is connected to a network monitoring
system accompanied by a scan tool, and its features are summa-
rized to form a dataset. The following are the attacks available in
the dataset along with their data generation procedure.
3.1.1. Port scanner attack
The port scanner attacks are included in the SCADA system for
observing its active ports in the operation and control process.
To generate such attacks, certain targeted nodes are generated at
different frequencies of time using the Nmap tool. At the same
time, the Transmission Control Protocol (TCP) connection is also
partially disabled for allowing the attack to generate the data
attributes in the testbed.
3.1.2. Address scanner attack
The address scan attacks are generated to observe the Modbus
server address. This allows the attacker to reach the connected
hardware devices of the SCADA system for implementing the
malfunctioning algorithm. In general, the SCADA systems are
engaged with only one Modbus address and it gives an open path
to the intruder for generating different types of attacks when it
is tracked.
3.1.3. Device identification attack
The attackers are creating the device identification attack to
find out the specification and model numbers of the connected
devices to the SCADA network. It can be produced by tracking
the Modbus slave identification of the targeted SCADA system.
Thus, the vulnerable hardware devices must be verified regularly
in the SCADA systems. In some cases, device identification attacks
are avoided by having an additional authentication process.
3.1.4. Aggressive model device attack
In the aggressive mode of device identification attack, the
information of all the slave buses is collected along with the Mod-
bus slave identification. These kinds of attacks are employed in
the SCADA network to freeze all the connected hardware without
sending any malicious nodes. In real-time applications, each con-
nected hardware is implemented with a separate authentication
process. This improves the complexity of the security algorithm
and restricts the success rate of aggressive mode attacks.
Table 3
Sample of the employed dataset.
Sport Tpkt Tbyte Spkt Dpkt Sbyte Tgt
143 2 180 2 0 180 0
68 2 684 2 0 684 0
0 1 60 1 0 60 0
61845 20 127 10 10 644 0
61846 20 127 10 10 644 0
44287 6 372 4 2 248 1
48456 20 128 12 8 776 1
48458 20 139 12 8 782 1
44460 20 128 12 8 776 1
61850 12 780 6 6 396 0
61849 12 780 6 6 396 0
61848 18 1152 10 8 644 0
3.1.5. Exploit attack
In exploit attacks, information about the operational state
of PLC coils is obtained to understand how the SCADA system
is currently functioning. This allows attackers to replicate the
manufacturing process and produce identical products. A system
for generating and using inspection records was used to observe
the movements of normal and malicious nodes in a testbed model
during the dataset generation process. The testbed model was
made to run continuously for 25 h to monitor the changes in the
network. Fig. 2 presents the summary of attacks available in the
dataset utilized.
Table 3 includes several features that describe various aspects
of network communications. One of these features is the source
port (sport), which represents the number of unique source ports.
Another feature is the total packets (TotPkt), which represent
the total number of packets involved in the communications.
Additionally, the total bytes (TotBytes) feature indicates the to-
tal number of bytes transferred. Two other features included in
Table 3 are the source packets (SrcPkts) and destination packets
(DstPkts). These features represent the number of packets trans-
mitted from the source and the number of packets received at
the destination, respectively. Finally, the Source bytes (SrcBytes)
feature indicates the number of bytes transferred from the source
to the destination during communications. Together, these fea-
tures provide a comprehensive picture of the different aspects
of network communications that can be used to analyze and
understand network traffic patterns (see Fig. 3).
3.2. Preprocessing
The total number of traffic data available in the dataset is
7 037983 counts. In that 427206 instances are attack-oriented
324
154 Acta Wasaensia 
S.Y. Diaba et al. Neural Networks 165 (2023) 321332
Fig. 3. The design display of the proposed model.
Table 4
Summary of the dataset.
Data type Timetable
Number of variables 6
Number of rows 7037983
Number of variables with missing 0
Number of variables with duplicate 6
Timestamp is regularly spaced True
Timestamp has missing False
Timestamp has duplicates False
Timestamp sorted True
and the remaining 6610777 belong to the normal traffic cate-
gory. In the proposed algorithm, a dataset split ratio of 70:30
is employed after cross-validation of five folds is implemented
in the phase 1 simulations, the proposed algorithm is utilized
to identify the normal and attack traffics, and in phase 2, the
proposed algorithm was experimented to categorize the traffics
as normal traffic, exploiting attacks and aggressive mode attack.
The workflow followed in the proposed algorithm does not
have any pre-processing step as the data available in the dataset
are already pre-processed using the data cleaning and labeling
process. However, the raw data values collected from the network
monitoring tools may have some missing and error data due to
sudden fluctuations in its operation. Therefore, a data cleaning
(data cleaner app in MATLAB 2022b) process was used to ma-
nipulate the missing values in the dataset. The data cleaner app
in MATLAB 2022b is an interactive tool for locating messed-up
column-oriented data, cleaning numerous variables of data at
once, and improving the cleaning process. A total of 7 037983
samples and 6 variables dataset was loaded into the data cleaner
app. We set the data cleaner app to use only standard indicators
to detect missing values such as not a number (NaN), not a
time (NaT), and cell of character vectors. The remove missing
method is used to remove the data rows with missing entries.
The outliers are another type of error usually present in the
data with an unusual entry. To handle it, we used the fill outlier
cleaning method with linear interpolation as the filling method.
The method of detection is the moving mean and the threshold
was set at 3. Tables 4 and 5 explore the data summary after
processing.
3.2.1. Feature optimization
The feature optimization process is employed in this work
to handle the abnormalities in the available dataset. In some
cases, the feature attributes may remain almost the same on
different attack data. Hence it worsens the misclassification and
reduces the precision level of the classifier system. A GSF feature
optimization algorithm is utilized in the proposed algorithm to
avoid such limitations. GSF is an upgraded version of an artificial
flora optimization technique that selects the connecting point
relevancy based on the seed-growing property of the respective
points. The GSF model is equipped with a genetic algorithm for
estimating the best seed-growing points. The genetic algorithm
estimates the location by analyzing the propagation distance
among the points along with the plant weights. The propagation
distance dy of the seeds is predicted using the equation written
in (Cheng, Wu, & Wang, 2018; Selvarajan, Shaik, Ameerjohn, &
Kannan, 2020).
dy = dy1 . × j1/+ dy2 . × j2/ (1)
where j1 and j2 stand for searching coefficients. The uniform
random numbers between 0 and 1 are generated by rand and
denoted by  . The grandparent's propagation distance and the
parent's propagation distance are denoted by dy1 and dy2 respec-
tively. The two main steps of the flora optimization technique
are the spreading and selection behavior. Thus, the position of
the plant is determined using the matrix Pi;y, the dimension is
denoted by i and y represents the total number of plants in the
flora. The equation for the spreading process can be written as
Pi;y =  × d .2− d/ (2)
where d is for the maximum limit area. Since the weight value
may be determined by the standard deviation of the propagation
distance between the parent plant and offspring plant when
updating the plant position, we can express the equation as
 =
√∑N
i=1
(
Pi;y − P ′i;y
)2
N
(3)
The present position of the offspring plant is estimated as
P ′i;y∗j = Di;y∗j + Pi;y (4)
where, the position of the original plant is denoted by Pi;y, j is
the maximum number of seeds that a single plant can produce,
Di;y∗j is for the random estimation of Gaussian distribution value
with zero mean and j variance. The final best estimation of the
offspring plant is estimated by the probability of survival given
as
P =
⏐⏐⏐⏐⏐⏐⏐⏐
√F (P ′i;y∗j)
Fmax
⏐⏐⏐⏐⏐⏐⏐⏐ ∗ Q
y∗j−1
x (5)
325
Acta Wasaensia 155
S.Y. Diaba et al. Neural Networks 165 (2023) 321332
Table 5
Data summary after preprocessing.
Missing count Minimum Maximum Mean Median Mode Standard deviation
Time 0 00:00:00 69:46:38 34:53:19 53.09 00:00:00 20000
Sport 0 0 42238 0.00003589 35458 354 0.00621
TotPkt 0 1 96057 9.4518 2 2 66196
TotBytes 0 60 98180744 0.0003317 124 124 4.156e−5
SrcPkts 0 1 96057 8.0302 2 2
DstPkts 0 0 4881 1.6534 0 0 35.7538
SrcBytes 0 60 6942046 915.1435 124 124 4.81e−4
The selective probability is represented by Qx and the value
of Qx must fall within the range of 0 and 1. According to the
authors of Cheng et al. (2018), having a higher value of Qx is
desirable for problems that can easily get into a local optimal
solution. P ′i;y∗j represents the fitness of the jth solution. Fmax
represents the flora's maximum fitness. The best characteristics
among the overall attributes are chosen from this phase and sent
to the classification step. In doing so, the classifier automates the
rule-generation process to predict the class label. This improves
classification accuracy.
3.3. Classifier
Classifiers are like algorithmic filters used to segregate the
given data samples into their respective classes based on the
instructions learned from their training samples. The learning
process of a classifier model can be refined by implementing a
customized preprocessing or feature selection model. The neural
network algorithm assigns the testing sample into a particular
class based on their similarity score calculated from the compari-
son. The proposed algorithm explores the following classifiers on
the SCADA cyber-attack dataset to find the most suitable model
for the real-time application. Classifiers are used to classify data
samples into different classes based on the instructions learned
from the training samples. Many different types of classifiers can
be used for various applications. Some common types include:
Decision Trees: These classifiers use a tree-like structure to make
decisions. Each internal node represents a feature of the input
data, and each leaf node represents a class label. The algorithm
starts at the root node and follows the branches based on the
feature values of the input data until a leaf node is reached, which
determines the class label of the input.
Naive Bayes: This is a probabilistic classifier that makes class
predictions based on the probability of each class given the input
features. The ``naïve'' part of the name refers to the assumption
that the features are independent of each other, which is not
always true in real-world data.
Neural Networks: These classifiers use a network of artificial
neurons to make class predictions. The neurons are organized into
layers, with the input layer receiving the input features, one or
more hidden layers processing the information, and the output
layer producing the class predictions. The network learns to make
accurate predictions by adjusting the weights of the connections
between the neurons.
Random Forest: This ensemble technique combines different
decision trees to make class predictions. Each tree is trained using
a different random subset of the input data, and the consensus of
all the trees is used to get the final prediction.
Support Vector Machine: This is a type of linear classifier that
finds the best boundary (or hyperplane) to separate the input data
into different classes. The algorithm is based on finding the line
that maximizes the margin, which is the distance between the
boundary and the closest points from each class.
In the case of SCADA cyber-attack dataset, it is crucial to
find the most suitable model that can quickly and accurately
identify cyber-attacks in real-time. The proposed algorithm may
compare the performance of these different classifiers on the
dataset and select the one that achieves the highest accuracy or
lowest false-positive rate.
3.4. Transformer neural network
The TNN algorithm was proposed in the year 2017 to over-
come the limitation of computational complexity in many neural
network algorithms. It is achieved by utilizing the Graphic Pro-
cessing Unit (GPU) sources effectively by processing the input
data simultaneously. Therefore, the time required for the training
process is also limited in the TNN. TNNs are structured with
a multi-headed attention layer for learning the input data that
allows processing the data in a parallel process. However, in the
traditional RNN, the values are considered in sequential order.
The TNN is very efficient in natural language processing problems
and data mining problems, the architecture is shown in Fig. 4. The
encoder and decoder are the two major blocks involved in the
TNN architecture and it has a positional encoding block right in
front of the encoder block. The role of the positional embedding
block is to determine the value of the inputs denoted by (x) at
different attributes' places.
In the proposed algorithm the features are counted from
`F1 : : : F5'. The value of x at F1may not have the same weight at
F3. The values on certain features may remain the same even if
the output classes are different. The positional encoding model
addresses such issues and assigns the weights of x into a unique
parameter while storing it in the neurons of the TNN. The encoder
consists of multi-head attention and a feed-forward block, where
the multi-head attention (Selvarajan et al., 2020) has a pair of
sub-layers. The input parameters are learned by the multi-head
attention block in terms of queries, keys, and range format. The
collected parameters are operated with a learnable linear trans-
formation for n times. A constant value is applied in this block as
a tuning parameter for operating it to the product of the query
with all keys. The output values from the blocks are observed
from a SoftMax function from the value of its corresponding
weight. The attention outputs are linearly gathered to form a final
output where a normalization step is added. Hence the residual
connection of the input data is estimated.
Qini = xWQinp (6)
Kinp = xW Kinp (7)
Vinp = xW Vinp (8)
where, x represents the input, and W represents the customized
constant value of the input. The query, keys, and value parameters
of the input are represented by Q, K, and V respectively.
headinp = 
(
Qinp ∗ Kinp ∗ Vinp
)
(9)

(
Qinp ∗ Kinp ∗ Vinp
) = softmax (Qinp ∗ K Tinp√
d
)
Vinp (10)
326
156 Acta Wasaensia 
S.Y. Diaba et al. Neural Networks 165 (2023) 321332
Fig. 4. The architecture of the TNN.
norm = x+ headinp (11)
head2inp = norm
(
x+ headinp
)
(12)
The output attributes obtained from the multi-head attention
blocks are moved to a feed-forward network (FFN) for observing
an improved output. The normalization process is also included
in the output of FFNs and the value of FFN is analyzed as follows
FFN =  ./W1 + b1 (13)
xout = norm(head2inp + FFN) (14)
where  represents the rectified linear product and  is for the
gated recurrent block.
3.5. Recurrent neural network
The RNN models were developed to regularize the data move-
ment inside the neural network. In the traditional neural network
models, the input parameters were allowed to move from one
neuron to another neuron without considering anything. As a
result, some neurons are unaware of the status of other attributes
taken from the input. The RNN regularizes this by making all the
attributes follow a sequence movement inside the neural net-
works. The involvement of the hidden layer makes the neurons
store the hidden information regarding the previous attributes,
so a small amount of data storage is allocated to each neuron. In
some cases, the RNN models are implemented with more than
one hidden layer block. There the weight and bias of each hidden
will get change from each other to store the different feature
information from the given input. Hence the layers included
between the input and output layers are independent and do not
consider the formation of other hidden layers. The independence
of hidden layer weights and biases is making the RNN more
complex than their previous models. In some applications, the
weights and biases are regularized with the same value, improv-
ing computational efficiency. The current state of the neurons is
analyzed by
Curs = f .Curs−1; Inps/ (15)
where Cur s represents the present state and Inps denotes the
input state. The activation function of the current state is applied
as Eq. (15) and the output is predicted by Eq. (16).
Curs = tanh
(
WrecCurs−1 +WinpInps
)
(16)
Outs = tanh .WoutCurs/ (17)
327
Acta Wasaensia 157
S.Y. Diaba et al. Neural Networks 165 (2023) 321332
Fig. 5. The architecture of a residual learning.
3.6. Long short-term memory
The LSTM network (Karim, Fazle, Majumdar, & Darabi, 2019)
was developed to address the vanishing and exploding gradients
problems found in RNN. The LSTM networks were trained to
erase the irrelevant information stored in the neuron from the
given input. It is achieved by implementing the network with a
customized activation function called gates. The internal cell state
of the neurons is having the useful information extracted from the
training data which is required for the upcoming operation. The
LSTM network reads the state of the input gate, modulation gate,
forget gate, and output gate by calculating it with element-wise
multiple vectors of the given input. Then the neurons will erase
the information of the forget gate and combine the information
of the input and modulation gate to form output as
Curs =
(
Ginp ∗ Gmodinp
)+ (Gfor ∗ Curs−1) (18)
where Ginp takes gate input, Gmodinp stands for gate-modulated input
and it is Gfor for forget gate
3.7. Residual network
ResNet models were proposed to observe more complex fea-
tures from the given input attributes with a greater number
of hidden layers. Each layer in the ResNet is allowed to take
some specific feature from the given input. The main idea behind
ResNet is to allow the network to learn residual functions, rather
than learning the full mapping from input to output. In a residual
network, each layer has a shortcut connection that bypasses one
or more layers and directly connects the input of the current layer
to the output. This allows the network to learn the residual, or
the difference between the input and the output of the layer,
which is easier to optimize than the full mapping. The residual
functions are then added to the output of the corresponding
layer, allowing the network to effectively learn the full mapping.
ResNet has shown impressive results on a wide range of com-
puter vision tasks, including image classification, object detection,
and semantic segmentation. Its architecture has inspired many
subsequent neural network models and has become a benchmark
for deep learning research (He, Zhang, Ren, & Sun, 2016a, 2016b).
However, in some cases, the ResNet was giving poor accuracy by
having some unwanted features in its operations. It is addressed
by the recent year ResNets by adding dropout and regularization
blocks. The architectural view of the ResNet is shown in Fig. 5
with its operational outcome.
4. Experimental analysis and discussion
The experimental work is performed in two phases, binary
classification, and multiclass classification. In binary classification,
the given information is segregated as normal and attacks. In
multiclass detection, the data are classified as exploiting attacks,
aggressive attacks, and normal traffic. The performances of the
Table 6
The hyperparameter setting.
Parameter Total
Validation scheme Cross-validation
Cross-validation folds 5 folds
Epoch 1000
Activation functions Softmax, ReLu
Maximum number of split 100
Split criterion Gini's diversity index
Fig. 6. Phase 1 performance analysis on the verified algorithms.
GSFTNN, RNN, LSTM, and ResNet models are verified with their
accuracy, precision, recall, detection rate, and f1 score in both
phases (see Tables 6 and 7).
Tables 8 and 9 are representing the performances of the veri-
fied algorithms in phase 1 and phase 2 respectively.
The results presented in Fig. 6 demonstrate that the proposed
GSFTNN model achieves a high accuracy of 98.54%, indicating its
ability to correctly classify a significant majority of the cases.
In comparison, the RNN model achieves an accuracy of 94.22%,
while the LSTM and ResNet models exhibit accuracy rates of
95.98% and 97.7%, respectively. The proposed GSFTNN model also
shows an average precision of 98.7% across all normal and attack
categories, outperforming the RNN (93.64%), LSTM (96.3%), and
ResNet (98.1%) models. The recall values were 98.42% (Proposed
GSFTNN), 93.73% (RNN), 95.78% (LSTM), and 97.54% (ResNet). The
F1 score provides a well-rounded evaluation of system perfor-
mance. In the case of the proposed GSFTNN model, the F1 score
is reported as 98.61%, with the RNN, LSTM, and ResNet achieving
scores of 94.38%, 96.17%, and 97.89%, respectively.
The result depicted in Fig. 7 indicates the proposed GSFTNN
model attained an accuracy of 99.12%, outperforming the compar-
ative models, which attained accuracies of 96.4% (RNN), 97.25%
(LSTM), and 98.1% (ResNet), respectively. Additionally, the pro-
posed model attained an average precision of 99.26%, while the
precision values for RNN, LSTM, and ResNet were 96.82%, 97.57%,
and 98.33%, respectively. The recall values were 98.85% (Proposed
328
158 Acta Wasaensia 
S.Y. Diaba et al. Neural Networks 165 (2023) 321332
Table 7
Data split up into phases.
Phase 1  (2 Class) Phase 2  (3 Class)
Classes Training Testing Classes Training Testing
Normal traffic 3 966466 2644311 Normal traffic 3 966466 2644311
Attacks 256324 170882 Exploiting attacks 47769 31845
Aggressive mode attacks 208222 138814
Table 8
Performance of the verified algorithms on the phase 1 dataset.
Algorithms Accuracy Precision Recall F1 score
GSFTNN 98.54 98.7 98.42 98.61
RNN 94.22 93.64 93.73 94.38
LSTM 95.98 96.3 95.78 96.17
ResNet 97.7 98.1 97.54 97.89
Table 9
Performance of the verified algorithms on the phase 2 dataset.
Algorithms Accuracy Precision Recall F1 score
GSFTNN 99.12 99.26 98.85 99.2
RNN 96.4 96.82 93.73 96.5
LSTM 97.25 97.57 96.97 97.41
ResNet 98.1 98.33 97.74 98.26
Fig. 7. Phase 2 performance analysis on the verified algorithms.
GSFTNN), 93.73% (RNN), 96.97% (LSTM), and 97.74% (ResNet).
Finally, the GSFTNN model achieved an F1 score of 99.2%, while
the RNN, LSTM, and ResNet models scored 96.5%, 97.41%, and
98.26%, respectively.
Fig. 8 indicates the accuracy comparison between the phase
1 and the phase 2 analyses. The phase 2 accuracies are compar-
atively high in all the algorithms because the attack classes in
phase 2 contain only 2 attacks but in phase 1 the data attack
model count is 5. The two major attack classes are considered
in phase 2, whereas in phase 1 the minor classes with fewer
sample counts were considered for the analysis. It indicates that
all the classifiers are performing well when their sample counts
are high for the training process. The phase 1 accuracy can also
be improved if the remaining 3 data sample counts are averaged
using some data augmentation process.
The performance of the proposed GSFTNN model shows better
accuracy on both phase operations. This is achieved because of its
multi-head attention block. At the same time, the performance
of its previous model RNN shows a lesser accuracy rate when
compared to all the other models due to its sequential operation
process. Also, the performances of the LSTM show a slighter
Fig. 8. Comparative analysis of accuracies at both phases.
Fig. 9. Comparative analysis of training time with both phases.
improvement in its experiment by eradicating unwanted infor-
mation from its neurons. The ResNet models are very efficient in
general but their nature of having more layer count makes the
model suffer from getting the optimum features for the analytic
process. The training time attainments of the verified algorithms
are shown in Fig. 9 where the performances of GSFTNN indicate a
betterment due to the nature of the simultaneous operation. All
the algorithms are showing a betterment in the phase 2 model
where the sample counts are comparatively minimum than the
phase 1 operation.
Zooming on to measure the effectiveness of the proposed
algorithm, further comparative analysis of the four (4) deep learn-
ing algorithms was conducted. Figs. 10 and 11 illustrate the
confusion matrix and the receiver operating characteristic (ROC),
respectively.
A confusion matrix is a table that is used to evaluate the
performance of a classifier by comparing the predicted classes
with the true classes. It is a useful tool for understanding the
strengths and weaknesses of a classifier, and it can be used
to identify areas for improvement. The matrix is made up of
four quadrants that represent the number of true positives, false
positives, true negatives, and false negatives. These values can
329
Acta Wasaensia 159
S.Y. Diaba et al. Neural Networks 165 (2023) 321332
Fig. 10. (a) Confusion matrix of the GSFTNN; (b) Confusion matrix of the ResNet; (c) Confusion matrix of the LSTM; (d) Confusion matrix of the RNN.
Fig. 11. (a) ROC of the GSFTNN; (b) ROC of the ResNet; (c) ROC of the LSTM; (d) ROC of the RNN.
Table 10
Comparison of existing algorithms.
Reference Year Dataset Algorithm Accuracy Recall F1 score Precision
Altaha et al. (2020) 2020 Generated dataset
Generated dataset
Generated dataset
Generated dataset
Generated dataset
CNN
FNN
GRU
LSTM
RNN
98.1
98.8
98.1
98.0
98.0
Chen, Dewi, Huang, and Caraka (2020) 2020 Bank marketing dataset
Bank marketing dataset
Bank marketing dataset
RF+SVM
RF+KNN
RF+RF
89.0
88.6
90.99
91.37
90.8
91.22
97.91
96.91
98.10
Khoei, Aissou, Hu, and Kaabouch (2021) 2021 CICDDoS-2019
CICDDoS-2019
CICDDoS-2019
CICDDoS-2019
KNN
RF
Stacking
Naïve Bayes
94.6
94.0
96.0
87.0
94.4
94.0
97.3
77.1
Abdelkhalek and Govindarasu (2022) 2022 WUSTL-IIoT-2018 ANN 98.40 98.02 98.97 99.57
2023 WUSTL-IIoT-2018 GSFTNN 99.12 98.85 99.2 99.26
then be used to calculate various performance metrics such as
f1 score, recall, precision, and accuracy. A binary classifier sys-
tem's performance as the discrimination threshold changes are
graphically depicted by a ROC curve. The genuine positive rate
(sensitivity) against the false positive rate (specificity) at various
threshold settings is plotted on the ROC curve. By comparing a
classifier algorithm's performance to that of a random guessing
classifier, it is frequently possible to gauge how well it performs.
The area under curve (AUC) is a frequently used performance
statistic for classifiers. While a classifier that performs no better
than random guessing has an AUC of 0.5, a perfect classifier has
1. ROC curves are frequently used to compare the effectiveness
of various classifiers or the effectiveness of a single classifier in
various scenarios (see Table 10).
5. Conclusion
Cybersecurity in the smart grid has become critically impor-
tant on a multi-stakeholder scale and worldwide for academics
and entrepreneurs. The danger to smart grid cyber security is
significantly expanding in scope as energy systems gain pervasive
intelligence and communications capabilities throughout their
operational processes. Numerous SCADA networks have been the
target of significant cyber-attacks that badly damaged the opera-
tional control circuits and related components. In other instances,
a cyber-attacker creates a knockoff by imitating the distinctive
algorithmic flow embedded into the SCADA network. The internet
and wireless connectivity have made it possible for hackers to
quickly achieve their objectives in several industries. As a result,
we proposed a GSFTNN approach with a GSF feature selection
model to develop a trustworthy deep learning algorithm. Exten-
sive experiments were conducted using the WUSTL-IIOT-2018 ICS
SCADA cyber security dataset. The experimental results reveal
that the proposed GSFTNN algorithm surpasses RNN, LSTM, and
ResNet in both accuracy and training time. The proposed algo-
rithm's adeptness in categorizing data and predicting outcomes
expeditiously serves as a testament to its robustness. The re-
sults provide empirical evidence that the GSFTNN algorithm is a
330
160 Acta Wasaensia 
S.Y. Diaba et al. Neural Networks 165 (2023) 321332
more efficacious and efficient algorithm than the aforementioned
algorithms. Performance comparison of the proposed GSFTNN
model to its latest counterparts' results as in Ahakonye et al.
(2023) will be investigated in the future. In addition, we will also
focus on the key factors such as spectral variability in SCADA
systems that could influence the model's performance. We will
further improve the model's generalization ability to unfamiliar
scenarios.
Declaration of competing interest
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared
to influence the work reported in this paper.
Data availability
The data is available online.
References
Abdelkhalek, M., & Govindarasu, M. (2022). ML-based anomaly detection system
for DER DNP3 communication in smart grid. In Proc. 2022 IEEE int. conf. cyber
secur. resilience, CSR 2022 (pp. 209214). http://dx.doi.org/10.1109/CSR54599.
2022.9850313.
Ahakonye, L. A. C., Nwakanma, C. I., Lee, J. M., & Kim, D.-S. (2023). Agnostic
CH-DT technique for SCADA network high-dimensional data-aware intrusion
detection system. IEEE Internet of Things Journal, 1. http://dx.doi.org/10.1109/
jiot.2023.3237797.
Al Husaini, M. A. S., Habaebi, M. H., Hameed, S. A., Islam, M. R., & Gunawan, T. S.
(2020). A systematic review of breast cancer detection using thermography
and neural networks. IEEE Access, 8, 208922208937. http://dx.doi.org/10.
1109/ACCESS.2020.3038817.
Altaha, M., Lee, J. M., Aslam, M., & Hong, S. (2020). Network intrusion de-
tection based on deep neural networks for the SCADA system. Journal
of Physics: Conference Series, 1585(1), http://dx.doi.org/10.1088/1742-6596/
1585/1/012038.
Altunay, H. C., Albayrak, Z., Ozalp, A. N., & Cakmak, M. (2021). Analysis of
anomaly detection approaches performed through deep learning methods
in SCADA systems. In HORA 2021-3rd int. congr. human-computer interact.
optim. robot. appl. proc. http://dx.doi.org/10.1109/HORA52670.2021.9461273.
Avola, D., Cinque, L., Fagioli, A., & Foresti, G. L. (2022). SIRe-networks: Convolu-
tional neural networks architectural extension for information preservation
via skip/residual connections and interlaced auto-encoders. Neural Networks,
153, 386398. http://dx.doi.org/10.1016/j.neunet.2022.06.030.
Balla, A., Habaebi, M. H., Islam, M. R., & Mubarak, S. (2022). Applications of deep
learning algorithms for supervisory control and data acquisition intrusion
detection system. Cleaner Engineering and Technology, 9(June), Article 100532.
http://dx.doi.org/10.1016/j.clet.2022.100532.
Chen, J. I.-Z., & Chang, J.-T. (2020). Applying a 6-axis mechanical arm com-
bine with computer vision to the research of object recognition in plane
inspection. Journal of Artificial Intelligence and Capsule Networks, 2(2), 7799.
http://dx.doi.org/10.36548/jaicn.2020.2.002.
Chen, R. C., Dewi, C., Huang, S. W., & Caraka, R. E. (2020). Selecting critical
features for data classification based on machine learning methods. Journal
of Big Data, 7(1), http://dx.doi.org/10.1186/s40537-020-00327-4.
Cheng, L., Wu, X. H., & Wang, Y. (2018). Artificial flora (AF) optimization
algorithm. Applied Sciences, 8(3), http://dx.doi.org/10.3390/app8030329.
Cherifi, T., & Hamami, L. (2018). A practical implementation of unconditional
security for the IEC 60780 − 5 − 101 SCADA protocol. International Journal
of Critical Infrastructure Protection, 20, 6884. http://dx.doi.org/10.1016/j.ijcip.
2017.12.001.
Gao, H., Zhang, Y., Chen, Z., Xu, S., Hong, D., & Zhang, B. (2023). A multi-
depth and multi-branch network for hyperspectral target detection based
on band selection. IEEE Transactions on Geoscience and Remote Sensing, 61, 1.
http://dx.doi.org/10.1109/tgrs.2023.3258061.
Hassan Malik, S. P., Alam, Muhammad Mahtab, Kuusik, Alar, & Moullec, Yan-
nick Le (2020). Narrowband internet of things (NB-IoT) for industrial
automation. In Wirel. autom. as an enabler next ind. revolut (pp. 6587).
He, K., Zhang, X., Ren, S., & Sun, J. (2016a). Deep residual learning for image
recognition. In Proc. IEEE comput. soc. conf. comput. vis. pattern recognit., Vol.
2016-Decem (pp. 770778). http://dx.doi.org/10.1109/CVPR.2016.90.
He, K., Zhang, X., Ren, S., & Sun, J. (2016b). Identity mappings in deep residual
networks. In LNCS: vol. 9908, Lect. notes comput. sci. (including subser. lect.
notes artif. intell. lect. notes bioinformatics) (pp. 630645). http://dx.doi.org/
10.1007/978-3-319-46493-0_38.
Hoffmann Souza, M. L., da Costa, C. A., de Oliveira Ramos, G., & da Rosa Righi, R.
(2021). A feature identification method to explain anomalies in condition
monitoring. Computers in Industry, 133, http://dx.doi.org/10.1016/j.compind.
2021.103528.
Jasperneite, J., Sauter, T., & Wollschlaeger, M. (2020). Why we need automation
models. IEEE Industrial Electronics Magazine, 14(1), 2940.
Jmila, M. I. K., & Houda (2022). Adversarial machine learning for network
intrusion detection: A comparative study. Computer Networks, 214(109073).
Karim, S. H., Fazle, Majumdar, Somshubra, & Darabi, Houshang (2019). Mul-
tivariate LSTM-FCNs for time series classification. Neural Networks, 116,
237245.
Khan, R. U., Zhang, X., Alazab, M., & Kumar, R. (2019). An improved convolutional
neural network model for intrusion detection in networks. In Proc. - 2019
cybersecurity cyberforensics conf. CCC 2019, No. Ccc (pp. 7477). http://dx.doi.
org/10.1109/CCC.2019.000-6.
Khoei, T. T., Aissou, G., Hu, W. C., & Kaabouch, N. (2021). Ensemble learning
methods for anomaly intrusion detection system in smart grid. In IEEE int.
conf. electro inf. technol., Vol. 2021-May (pp. 129135). http://dx.doi.org/10.
1109/EIT51626.2021.9491891.
Kirubakaran, S. S. (2020). Study of security mechanisms to create a secure
cloud in a virtual environment with the support of cloud service providers.
Journal of Trends in Computer Science and Smart Technology, 2(3), 148154.
http://dx.doi.org/10.36548/jtcsst.2020.3.004.
Kumar, D., & S, D. S. (2020). Enhancing security mechanisms for healthcare
informatics using ubiquitous cloud. Journal of Ubiquitous Computing and
Communication Technologies, 2(1), 1928. http://dx.doi.org/10.36548/jucct.
2020.1.003.
Lee, J. M., & Hong, S. (2020). Keeping host sanity for security of the SCADA
systems. IEEE Access, 8, 6295462968. http://dx.doi.org/10.1109/ACCESS.2020.
2983179.
Liu, Q., & Wang, B. (2022). Neural extraction of multiscale essential structure
for network dismantling. Neural Networks, 154, 99108. http://dx.doi.org/10.
1016/j.neunet.2022.07.015.
Lopez Perez, R., Adamsky, F., Soua, R., & Engel, T. (2018). Machine learning for
reliable network attack detection in SCADA systems. In Proc. - 17th IEEE
int. conf. trust. secur. priv. comput. commun. 12th IEEE int. conf. big data sci.
eng. trust. 2018 (pp. 633638). http://dx.doi.org/10.1109/TrustCom/BigDataSE.
2018.00094.
Maglaras, L. A., & Jiang, J. (2014). Intrusion detection in SCADA systems using
machine learning techniques. In Proc. 2014 sci. inf. conf. SAI 2014 (pp.
626631). http://dx.doi.org/10.1109/SAI.2014.6918252.
Mokhtari, S., Abbaspour, A., Yen, K. K., & Sargolzaei, A. (2021). A machine
learning approach for anomaly detection in industrial control systems
based on measurement data. Electron, 10(4), 113. http://dx.doi.org/10.3390/
electronics10040407.
Montalban, J. O. N., Iradier, E., & Member, G. S. (2020). NOMA-based 802. In 11n
for industrial automation, Vol. 8.
Ozdag, M. (2018). Adversarial attacks and defenses against deep neural net-
works: A survey. Procedia Computer Science, 140, 152161. http://dx.doi.org/
10.1016/j.procs.2018.10.315.
P, A., Hong, J. C. D., Gao, L., Yao, J., & Zhang, B. (2020). Graph convolutional net-
works for hyperspectral image classification. IEEE Transactions on Geoscience
and Remote Sensing, 59(9), 59665978. http://dx.doi.org/10.1109/TGRS.2020.
3015157.
Pliatsios, D., Sarigiannidis, P., Lagkas, T., & Sarigiannidis, A. G. (2020). A survey
on SCADA systems: Secure protocols, incidents, threats and tactics. IEEE
Communications Surveys and Tutorials, 22(3), 19421976. http://dx.doi.org/
10.1109/COMST.2020.2987688.
Rousopoulou, V., et al. (2022). Cognitive analytics platform with AI solutions
for anomaly detection. Computers in Industry, 134, Article 103555. http:
//dx.doi.org/10.1016/j.compind.2021.103555.
Samdarshi, R., Sinha, N., & Tripathi, P. (2016). A triple layer intrusion detection
system for SCADA security of electric utility. In 12th IEEE int. conf. electron.
energy, environ. commun. comput. control (E3-C3), INDICON 2015 (pp. 15).
http://dx.doi.org/10.1109/INDICON.2015.7443439.
Sarker, I. H. (2022). AI-based modeling: Techniques, applications and research
issues towards automation, intelligent and smart systems. SN Computer
Science, 3(2), 120. http://dx.doi.org/10.1007/s42979-022-01043-x.
Selvarajan, S., Shaik, M., Ameerjohn, S., & Kannan, S. (2020). Mining of intrusion
attack in SCADA network using clustering and genetically seeded flora-
based optimal classification algorithm. IET Information Security, 14(1), 111.
http://dx.doi.org/10.1049/iet-ifs.2019.0011.
Singh, V. K., Ebrahem, H., & Govindarasu, M. (2019). Security evaluation of two
intrusion detection systems in smart grid SCADA environment. In 2018 north
am. power symp. NAPS 2018. http://dx.doi.org/10.1109/NAPS.2018.8600548.
Singh, P., Garg, S., Kumar, V., & Saquib, Z. (2015). A testbed for SCADA cyber
security and intrusion detection. In 2015 int. conf. cyber secur. smart cities,
ind. control syst. commun. SSIC 2015 - proc (pp. 16). http://dx.doi.org/10.
1109/SSIC.2015.7245683.
Smith, A., & Fressoli, M. (2021). Post-automation. Futures, 132(June), Article
102778. http://dx.doi.org/10.1016/j.futures.2021.102778.
331
Acta Wasaensia 161
S.Y. Diaba et al. Neural Networks 165 (2023) 321332
Teixeira, M. A., Salman, T., Zolanvari, M., Jain, R., Meskin, N., & Samaka, M. (2018).
SCADA system testbed for cybersecurity research using machine learning
approach. Future Internet, 10(8), http://dx.doi.org/10.3390/fi10080076.
V, D. S. (2020). Automatic spotting of sceptical activity with visualization using
elastic cluster for network traffic in educational campus. Journal of Ubiquitous
Computing and Communication Technologies, 2(2), 8897. http://dx.doi.org/10.
36548/jucct.2020.2.004.
Wang, W., Harrou, F., Bouyeddou, B., Senouci, S. M., & Sun, Y. (2022). A stacked
deep learning approach to cyber-attacks detection in industrial systems:
application to power system and gas pipeline systems. Cluster Computing,
25(1), 561578. http://dx.doi.org/10.1007/s10586-021-03426-w.
Wu, X., Hong, D., & Chanussot, J. (2022). Convolutional neural networks for mul-
timodal remote sensing data classification. IEEE Transactions on Geoscience
and Remote Sensing, 60, http://dx.doi.org/10.1109/TGRS.2021.3124913.
Wu, X., Hong, D., & Chanussot, J. (2023). UIU-net: U-net in U-net for infrared
small object detection. IEEE Transactions on Image Processing, 32, 364376.
http://dx.doi.org/10.1109/TIP.2022.3228497.
Yang, H. F., & Chen, Y. P. P. (2019). Representation learning with extreme learn-
ing machines and empirical mode decomposition for wind speed forecasting
methods. Artificial Intelligence, 277, Article 103176. http://dx.doi.org/10.1016/
j.artint.2019.103176.
Yang, H., Cheng, L., & Chuah, M. C. (2019). Deep-learning-based network
intrusion detection for SCADA systems. In 2019 IEEE conf. commun. netw.
secur. CNS 2019. http://dx.doi.org/10.1109/CNS.2019.8802785.
Yang, Y., McLaughlin, K., Sezer, S., Yuan, Y. B., & Huang, W. (2014). Stateful
intrusion detection for IEC 60870 − 5 − 104 SCADA security. In IEEE power
energy soc. gen. meet., Vol. 2014-Octob (pp. 59). http://dx.doi.org/10.1109/
PESGM.2014.6939218, no. October.
332
162 Acta Wasaensia 
 
 
    
Publication V 
Acta Wasaensia 163
Risk Accessment of Machine Learning Algorithms
on Manipulated Dataset in Power Systems
1st Sayawu Yakubu Diaba
School of Technology and Innovations
University of Vaasa
Vaasa, Finland
sdiaba@uwasa.fi
2nd Miadreza Shafie-khah
School of Technology and Innovations
University of Vaasa
Vaasa, Finland
miadreza.shafiekhah@uwasa.fi
3rd Mike Mekkanen
School of Technology and Innovations
University of Vaasa
Vaasa, Finland
mike.mekkanen@uwasa.fi
4th Tero Vartiainen
School of Technology and Innovations
University of Vaasa
Vaasa, Finland
Tero.Vartiainen@uwasa.fi
5th Mohammed Elmusrati
School of Technology and Innovations
University of Vaasa
Vaasa, Finland
moel@uwasa.fi
Abstract—The emergence of the communication infrastructure
in power systems has increased the variety and sophistication of
network assaults. Intrusion Detection Systems’ (IDS) importance
has increased in relation to network security. IDS, however, is no
longer secure when confronted with adversarial examples, and
attackers can boost assault success rates by tricking the IDS.
As a result, resilience must be increased. This paper assesses
the Decision Tree, Logistic regression, Support Vector Machines
(SVM), Naı¨ve Bayes, K-Nearest Neighbours (KNN), and Ensem-
ble’s effectiveness. Using the WUSTL-IIoT-2021 dataset and CIC-
IDS2017 dataset, we train the algorithms on the unmanipulated
dataset and then train the algorithms on the manipulated dataset.
Per the simulation results, the accuracy and prediction speed
drop on the manipulated dataset while the training time rises.
Index Terms—Communication infrastructure, Intrusion detec-
tion systems, Network security, Power systems
I. INTRODUCTION
The growing interconnectedness of smart systems has led
to increased concerns regarding their security vulnerabilities
[1]. These systems are extensively utilized in several areas,
including intelligent industries, smart cities, healthcare, and,
many more, making security a critical issue [2]. Security
measures such as authentication, authorization, encryption,
Intrusion Detection Systems (IDS), and Intrusion Prevention
Systems (IPS), have been employed to mitigate these vulnera-
bilities. Despite these efforts, these systems remain vulnerable
to cyber-attacks [3].
The energy sector is not an exception and thus, the sector
faces a wide range of cyber-attacks, including physical, inter-
nal, external, and cyber threats. Cyber threats, in particular,
can emerge from any location and pose significant challenges
for the industry. Although it is impossible to eliminate these
threats [4], mitigation strategies can help reduce their im-
pact. However, implementing these strategies often requires
significant financial resources and effort. In addition, cyber-
attack mitigations can affect negatively and create psycho-
logical consequences that can be detrimental to the industry,
leading to a decline in performance and potentially affecting
national economies. Therefore, it is crucial to prioritize and
implement effective cyber-attack mitigation strategies while
also considering their potential economic and psychological
impacts [5].
Utilizing the power of information technology, smart grid
technology uses a two-way communication system to intel-
ligently distribute energy [6]. This enables the integration
of green technologies to satisfy environmental requirements.
However, the use of communication technology also leaves
the system open to many security risks. Even though many
survey studies have addressed these problems and suggested
solutions, most of them categorize attacks according to how
they affect confidentiality, integrity, and availability [7]. As
the prevalence of cyber-attacks continues to rise in the smart
grid [8], the need for reliable and accurate IDS becomes
increasingly critical [9].
The authors of [10] classified attack scenarios into control-
based and measurement-based attacks. The control-based at-
tack includes altering or fabricating control signals sent to
the targeted power system assets. It can directly result in
frequency and transient voltage instability, line overloading,
load reduction, and cascading failures. The measurement-
based attack seeks to compromise measurements in order to
conceal or falsely represent the system’s current state, impair
observability, and ultimately deceive operators or control sys-
tems.
It’s critical to comprehend the possible repercussions [11]
of data manipulation on the accuracy of IDS algorithms. Any
system that relies on these algorithms to detect malicious
traffic may be at risk if the data has been manipulated. As
such, it is crucial to identify and address the challenges asso-
ciated with data manipulation in IDS algorithms to improve
their reliability and effectiveness. In this paper, we aim to
explore the impact of data manipulation on the performance
of machine learning algorithms. We will evaluate the strengths
20
23
 I
nt
er
na
ti
on
al
 C
on
fe
re
nc
e 
on
 F
ut
ur
e 
E
ne
rg
y 
S
ol
ut
io
ns
 (
F
E
S
) 
| 9
79
-8
-3
50
3-
32
30
-8
/2
3/
$3
1.
00
 ©
20
23
 I
E
E
E
 | 
D
O
I:
 1
0.
11
09
/F
E
S
57
66
9.
20
23
.1
01
82
75
1
Authorized licensed use limited to: Vaasan Yliopisto. Downloaded on July 20,2023 at 21:32:41 UTC from IEEE Xplore.  Restrictions apply. 
164 Acta Wasaensia 
and weaknesses of various approaches used to mitigate these
effects. By examining the limitations and vulnerabilities of
these algorithms in the face of manipulated data, we can
develop more robust and effective systems to safeguard against
cyber-attacks [11].
The remaining sections of the paper are structured as fol-
lows. The system model is presented in Section II, and the data
description is summarized in Section III. The experimental
analysis is presented in Section IV of the paper, and the
conclusions are presented in Section V.
II. SYSTEM MODEL
The modeling approach used in this paper is borrowed
from Linnartz et. al. as shown in Fig.1. The figure depicts
a simplified schematic summary of the crucial components of
the Cyber-Physical Systems (CPS), including the manipulation
strategy. Information and Communication Technology (ICT) is
used to link the power system’s assets (physical system) to the
central control system (cyber system).
Fig. 1. Overview of the cyber-physical system. .
Supervisory control and data acquisition (SCADA) systems
are used by Transmission System Operators (TSOs) and Dis-
tribution System Operators (DSOs) to continuously monitor
and control the transmission and distribution assets of the
power system (closed-loop control). Standardized protocols
like IEC 61850, IEC 60870-5-104 or DNP3 are used for
communication between the central management system and
the assets. Typically, the information is transmitted through
private channels, but neither of these protocols supports se-
curity features like authentication or integrity protection [12].
Thus, the communication channels are open to online threats.
For this experiment, the following assumptions are made.
• The attacker gains access and can interact with the assets
according to the rules of the communication protocols. As
a result, the attacker can alter control signals, create new
ones, or stop all contact with the central control center.
• The attacker gained entry to the DSO’s communication
network. Due to earlier extensive reconnaissance opera-
tions, the attacker also has adequate knowledge of the
control system, the power system, and its resources. He
can then use the communication system to implement
manipulation strategies by using denial of information
attacks to transmit control signals to every connected
DER.
• The attacker can attack and make data manipulations
without the DSO noticing.
III. DATA DESCRIPTION
Our proposed model’s simulation evaluation stage utilizes
the WUSTL-IIoT-2021 dataset [13] and the WUSTL-IIoT-
2018 dataset. The WUSTL-IIoT-2021 dataset contains network
data from an Industrial Internet of Things (IIoT) system.
The dataset was developed using an IIoT testbed to support
cybersecurity research, with the aim of emulating real-world
industrial systems as accurately as possible. This testbed also
allows for the simulation of actual cyber-attacks, providing a
realistic environment for evaluating the effectiveness of the
models. The normal and abnormal attacks are included in the
WUSTL-IIOT-2018 dataset for ICS (SCADA) Cybersecurity
research which closely matches true real-world data. It was
constructed using a SCADA system testbed, which enabled
the execution of an authentic cyber-attack [14].
A. Data pre-processing
The data preprocessing along with the data-cleaning process
is depicted in Fig. 2. In the data pre-processing stage, columns
with headings ‘StartTime’, ‘LastTime’, ‘SrcAddr’, ‘DstAddr’,
‘sIpId’, ‘dIpId,’ are removed from the data because they
are specific to certain types of attacks. Including them in
the training data would result in the algorithms being too
specialized and not being able to accurately generalize to new,
unseen data.
Fig. 2. The experimental model design.
Therefore, they were excluded to ensure the algorithms’
ability to make accurate predictions on a wider range of input
data. In the normalization and digitization stage, with respect
to the target column, we represented normal with 0 and attack
with 1. This column is manipulated by editing some 0’s to 1
Authorized licensed use limited to: Vaasan Yliopisto. Downloaded on July 20,2023 at 21:32:41 UTC from IEEE Xplore.  Restrictions apply. 
Acta Wasaensia 165
and some 1’s to 0. The dataset is then split into a 70:30 ratio
and used for the experiment.
However, Pearson’s Correlation Coefficient (PCC) analysis
is carried out on the dataset to perceive if it includes correlated
features. This method was required because it guarantees the
elimination of over-fitting. The PCC is mathematically given
as
PCC =
∑
(xi + xj)(yi + yj)∑√
(xi + xj)2(yi + yj)2
(1)
The symbol xi represents the content of the variable in
the dataset while xj is referring to the average value of that
variable. Similarly, yi represents the values of the sample yj
represents the average value of that variable in the sample.
Correlation coefficients are statistical measures that describe
the strength and direction of the relationship between two
variables. A correlation matrix is typically square, with each
variable listed on the table’s rows and columns. The diagonal
elements of the matrix are always 1 since each variable is
perfectly correlated with itself, and the off-diagonal elements
represent the correlation coefficients between the correspond-
ing pairs of variables. A high positive correlation coefficient
between two variables indicates that they tend to increase or
decrease together, while a high negative correlation coefficient
indicates that they tend to move in opposite directions. Cor-
relation matrices score between -1 and +1 are often used to
analyze the relationships between variables and identify data
patterns and trends [15].
Fig. 3. The correlation matrix for the wustle-2021 dataset.
The effectiveness of the algorithm’s categorization is as-
sessed using a confusion matrix [16]. A confusion matrix is a
method to evaluate how well a classifier algorithm is perform-
ing. It is composed of four fundamental parameters, including
True Positive (TP), True Negative (TN), False Positive (FP),
and False Negative (FN) [17].
Fig. 4. The correlation matrix for the wustle-2018 dataset.
Fig. 5. The confusion matrix.
TP refers to when the algorithm correctly predicts a positive
class, while TN indicates when the algorithm correctly predicts
a negative class. FP happens when the algorithm predicts a
positive class, but the actual class is negative, while FN refers
to when the algorithm predicts a negative class, but the actual
class is positive. Several performance metrics can be calculated
based on these parameters, including accuracy, F1-score, pre-
cision, and recall. Accuracy measures the overall correctness
of the algorithm’s predictions. The algorithm’s capability to
correctly predict positive classes is measured by precision. The
algorithm’s capacity to recognize positive classes is measured
by the recall, also known as sensitivity. Finally, the F1-score is
a harmonic mean of precision and recall, providing a combined
metric that balances both measures. The formulas are given as
Authorized licensed use limited to: Vaasan Yliopisto. Downloaded on July 20,2023 at 21:32:41 UTC from IEEE Xplore.  Restrictions apply. 
166 Acta Wasaensia 
follows [18], [19]
Accuracy =
TP + TN
TP + TN + FP + FN
(2)
Precision =
TP
TP + FP
(3)
Recall =
TP
TP + FN
(4)
F − score = 2(Presicion ∗Recall)
Precision+Recall
(5)
IV. EXPERIMENTAL ANALYSIS
In this section, we will present the experimental analysis
of the impact of data manipulation on the performance of
IDS algorithms. We conducted a series of experiments to
evaluate the accuracy, prediction speed, and training time
of these algorithms when the data they analyze has been
manipulated. We will present our findings and analyze the
strengths and weaknesses of these algorithms in the context
of manipulated data. Evaluating the performances of the
algorithms, we selected the best-performing algorithm in each
category. Performance is evaluated in terms of the likelihood
of a successful detection [20]. The performance of these
algorithms in terms of accuracy, considering the manipulated
and unmanipulated WUSTL-IIoT-2018 dataset is presented in
Fig. 6.
Fig. 6. Performance of the best-performing algorithms on the WUSTL-IIoT-
2018 dataset.
The accuracy of the Fine Tree, Linear Discriminant, and
Linear SVM all slightly declined with the manipulated data,
as shown by the figure in Fig. 6. The Fine KNN showed little
degradation while the Gaussian Naive Bayes experienced a
significant decline. Surprisingly, the Booted tree’s accuracy
rose from 96.1% with the unmanipulated data, to 98% with
the manipulated data.
Fig. 7. Performance of the best-performing algorithms on the WUSTL-IIoT-
2021 dataset.
The performance analysis of the best-performing algorithms
is shown in Fig. 7. The Coarse Tree, Linear Discriminant,
Quadratic SVM, and Bagged Tree classifiers experienced a
moderate decrease in their performance when tested on manip-
ulated data. On the other hand, the Fine KNN classifier showed
a significant drop in accuracy, while the Kernel Naı¨ve Bayes
performed poorly on both unmanipulated data and manipulated
data.
Fig. 8. Prediction Speed and Training Time Comparison on the WUSTL-
IIoT-2018 dataset.
The simulation results comparing prediction speed and
training time on manipulated versus unmanipulated data are
shown in Table 1. The Fine Tree, Linear SVM, and Booted
Tree exhibited a decrease in prediction speed on the manipu-
lated data, resulting in an increase in training time. Conversely,
the Linear Discriminant and Gaussian Naı¨ve Bayes classifiers
demonstrated an increase in prediction speed accompanied by
a reduction in training time on the manipulated data. The Fine
KNN classifier, on the other hand, demonstrated an increase
in both prediction speed and a decrease in training time on
the manipulated data.
Authorized licensed use limited to: Vaasan Yliopisto. Downloaded on July 20,2023 at 21:32:41 UTC from IEEE Xplore.  Restrictions apply. 
Acta Wasaensia 167
Fig. 9. Prediction Speed and Training Time Comparison on the WUSTL-
IIoT-2021 dataset.
The simulation results comparing prediction speed and
training time on manipulated versus unmanipulated data
(WUSTL-IIoT-2021) are presented in Table II. On the manipu-
lated data, the Coarse Tree, Kernel Naı¨ve Bayes, and Quadratic
SVM exhibited a decrease in prediction speed, leading to an
increase in training time. Conversely, the Linear Discriminant
demonstrated an improvement in prediction speed and a de-
crease in training time on the manipulated data. In contrast,
the Bagged Tree showed a reduction in prediction speed while
maintaining a consistent training time across both datasets.
V. CONCLUSION
This paper exposes the potential risk of power systems to
cyber-attacks where a malicious actor could alter the data.
In intrusion detection systems, which are commonly used to
defend networks against adversarial attacks, machine learning
or deep learning techniques are extensively used due to their
high accuracy and quick detection rates. The effectiveness of
the top machine learning algorithms on manipulated datasets is
examined in this paper. The experimental assessment made use
of the WUSTL-IIoT-2021 dataset as well as the WUSTL-IIoT-
2018 dataset. We evaluate the performance of several machine
learning algorithms on both unmanipulated and manipulated
datasets. According to experimental findings, except for the
Boosted Tree algorithm, all algorithms’ accuracy dropped,
prediction speed decreased, and training time increased.
REFERENCES
[1] K. Kumar, and V. Bhatnagar, “Machine Learning Algorithms Per-
formance Evaluation for Intrusion Detection,” Journal of Information
Technology Management 13, no. 1 (2021): 42-61.
[2] X. Fu, N. Zhou, L. Jiao, H. Li, and J. Zhang, “The robust deep
learning–based schemes for intrusion detection in the internet of things
environments,” Annals of Telecommunications 76, no. 5-6 (2021): 273-
285.
[3] E. Degirmenci, I. Ozcelik, and A. Yazici, “Effects of Un targeted Adver-
sarial Attacks on Deep Learning Methods,” In 2022 15th International
Conference on Information Security and Cryptography (ISCTURKEY),
pp. 8-12. IEEE, 2022.
[4] J. A. Abraham, and V. R. Bindu, “Intrusion Detection and Prevention
in Networks Using Machine Learning and Deep Learning Approaches:
A Review,” In 2021 International Conference on Advancements in
Electrical, Electronics, Communication, Computing and Automation
(ICAECA), pp. 1-4. IEEE, 2021.
[5] S. K. Venkatachary, J. Prasad, and R. Samikannu, “Economic impacts
of cyber security in energy sector: A review.” International Journal of
Energy Economics and Policy 7, no. 5 (2017): 250.
[6] P. A. Oyewole and D. Jayaweera, “Power System Security With Cyber-
Physical Power System Operation,” in IEEE Access, vol. 8, pp. 179970-
179982, 2020, doi: 10.1109/ACCESS.2020.3028222.
[7] D. G. Kyrollos, K. Greenwood, J. Harrold, and J. R. Green, “Detection
of false alarms in the NICU using pressure sensitive mat.,” In 2021 IEEE
Sensors Applications Symposium (SAS), pp. 1-5. IEEE, 2021.
[8] B. Li, B. Zhang, and D. S. Kirschen, “Cyber-Physical Attack Leveraging
Subsynchronous Resonance.” arXiv preprint arXiv:2207.04149 (2022).
[9] V. D. A. Kumar, “An Effective Comparative Analysis of Data Prepro-
cessing Techniques in Network Intrusion Detection System Using Deep
Neural Networks,” Smart Intelligent Computing and Communication
Technology 38 (2021): 14.
[10] H. He, and Jun Yan, “Cyber-physical attacks and defences in the smart
grid: a survey,” IET Cyber-Physical Systems: Theory and Applications
1, no. 1 (2016): 13-27.
[11] H. Wang, J. Ruan, B.Zhou, C. Li, Q. Wu, M. Q. Raza, and G.
Cao, “Dynamic data injection attack detection of cyber physical power
systems with uncertainties,” IEEE Transactions on Industrial informatics
15, no. 10 (2019): 5505-5518.
[12] P. Linnartz, A. Winkens, and A. Ulbig, “Assessing the impact of cyber
attacks manipulating distributed energy resources on power system
operation,” arXiv preprint arXiv:2207.07968 (2022).
[13] M. Zolanvari, M. A. Teixeira, L. Gupta, K. M. Khan, and R. Jain,
“Machine learning-based network vulnerability analysis of Industrial
Internet of Things,” in IEEE Internet of Things Journal 6 (2019), pp.
6822-6834.
[14] M. A. Teixeira, T. Salman, M. Zolanvari, R. Jain, N. Meskin, M. Samaka,
“SCADA System Testbed for Cybersecurity Research Using Machine
Learning Approach,” Future Internet 2018, 10, 76,
[15] L. A. C. Ahakonye, C. I. Nwakanma, J. -M. Lee and D. -S. Kim,
“Efficient Classification of Enciphered SCADA Network Traffic in Smart
Factory Using Decision Tree Algorithm,” in IEEE Access, vol. 9, pp.
154892-154901, 2021, doi: 10.1109/ACCESS.2021.3127560.
[16] Y. Chen, C. Fan, and K. Chang, “Manufacturing intelligence for reducing
false alarm of defect classification by integrating similarity matching
approach in CMOS image sensor manufacturing,” Computers and In-
dustrial Engineering 99 (2016): 465-473.
[17] Q. Zhang, X. Chen, Z. Fang, and S. Xia. “False arrhythmia alarm
reduction in the intensive care unit using data fusion and machine
learning,” In 2016 IEEE-EMBS International Conference on Biomedical
and Health Informatics (BHI), pp. 232-235. IEEE, 2016.
[18] M. Qian, J. Luo, Y. Ge, C. Sun, X. Ge, and W. Huang, “Semantic-based
false alarm detection approach via machine learning,” In 2021 IEEE 21st
International Conference on Software Quality, Reliability and Security
Companion (QRS-C), pp. 60-66. IEEE, 2021.
[19] B. Taji, A. D. C Chan, and S. Shirmohammadi, “False alarm reduction in
atrial fibrillation detection using deep belief networks,” IEEE Transac-
tions on Instrumentation and Measurement 67, no. 5 (2017): 1124-1131.
[20] T. Diskin, U. Okun, and A. Wiesel, “Learning to Detect with Constant
False Alarm Rate,” In 2022 IEEE 23rd International Workshop on Signal
Processing Advances in Wireless Communication (SPAWC), pp. 1-5.
IEEE, 2022.
Authorized licensed use limited to: Vaasan Yliopisto. Downloaded on July 20,2023 at 21:32:41 UTC from IEEE Xplore.  Restrictions apply.