Juha-Matti Toivainen 
A. I. Utilization in the Construction Business 
A review on present state and potential for Elenia Oy 
 
 
 
 
 
 
 
  
Vaasa 2023 
School of Technology and Innovations  
Master’s thesis in Smart Energy 
Master of science in Technology 
2 
VAASAN YLIOPISTO 
Tekniikan ja innovaatiojohtamisen yksikkö 
Tekijä: Juha-Matti Toivainen 
Tutkielman nimi: A. I. Utilization in the Construction Business : A review on 
present state and potential for Elenia Oy 
Tutkinto: Diplomi-insinööri 
Oppiaine: Smart Energy, Master of science in technology 
Työn valvojat: Petri Välisuo, Hannu Laaksonen 
Työn ohjaaja: Joonas Tutti 
Valmistumisvuosi: 2023 Sivumäärä: 71 
TIIVISTELMÄ: 
Tässä diplomityössä selvitettiin rakentamisliiketoimintoihin liittyviä tekoälyn käyttökohteita. Ny-
kyisin liiketoiminnoissa keskitytään operatiivisten toimintojen turvallisuuteen. Projektiliiketoi-
minnassa projektien ja portfolioiden johtaminen yhdessä turvallisuusjohtamisen kanssa on 
huomattavan tärkeää. Tiedon puute on harvoin juurisyy ei-toivotuille poikkeamille. Useammin 
poikkeamat prosesseissa johtuvat epäsäännöllisyydestä ohjeistuksien ja sääntöjen noudattami-
sen suhteen. Tekoälyyn pohjautuvien työkalujen, kuten koneoppiminen, avulla on mahdollista 
kehittää turvallisuuteen ja projektijohtamiseen liittyvien tehtävien tehokkuutta. Tutkielma sisäl-
tää yleisen katsauksen tekoälyyn ja tarkastelun nykyisistä lähestymistavoista tekoälyn hyödyn-
tämiseen rakentamisliiketoimintoihin liittyen. Lisäksi työssä muodostetaan ehdotukset tuleville 
vaiheille tekoälyn hyödyntämiseen Elenian rakentamisliiketoiminnassa. Ensimmäisessä osassa 
käydään läpi yleiskatsaus tekoälyyn liittyen. Toisessa ja kolmannessa osassa työtä tarkastellaan 
nykyisiä tekoälyn käyttökohteita. Toisessa osassa tarkastellaan rakentamistöiden turvallisuu-
teen liittyviä hyödyntämiskohteita. Kolmannessa osassa vastaava tarkastelu keskittyy projekti ja 
portfoliojohtamisen toimintaympäristöön. Yleisin tapa hyödyntää tekoälyä on selvittää ja tun-
nistaa toimintaympäristön riskeihin liittyvien tekijöiden suhteita toisiinsa. Erilaisissa toimin-
taympäristöissä on erilaisia riskejä, joiden esiintymisen todennäköisyyttä on syytä pienentää.  
Koneoppimismallien rakentamisen toteutus on käyttökohde sidonnainen, joten on monia ta-
poja hyödyntää koneoppimista. Elenia Oy:n toiminnassa projektit ja niiden hallinta ovat keskei-
sessä osassa mahdollistamassa yhtiön missiota: Elämää sähköistämässä. Sähköverkot vaativat 
jatkuvaa kunnossapitoa ja johdonmukaista kehittämistä. Osa tätä kehittämistä on teknisen 
käyttöiän saavuttaneiden komponenttien uusinta, esimerkiksi Elenian Säävarma-hankkeissa. 
Työturvallisuuden edistämiseksi Elenia on yhdessä kumppaniensa kanssa allekirjoittanut Turval-
lisuusmanifestin, jonka keskeinen teema on mahdollistaa kaikkien Elenian töissä olevien henki-
löiden turvallisen palaamisen terveenä kotiin. Tutkielman keskeisenä lähestymistapana oli etsiä 
laajasti erilaisia tapoja hyödyntää tekoälyä liittyen turvallisuus- ja projektitavoitteiden kehittä-
miseen.  
 
  
AVAINSANAT: Project management, Safety management, Safety, Artificial intelli-
gence, Machine learning, ML, AI, Portfolio management, Construction, Construction 
business, Risk management, DSO, Distribution Network 
3 
UNIVERSITY OF VAASA 
School of technology and innovations  
Author:    Juha-Matti Toivainen 
Title of the Thesis: A. I. Utilization in the Construction Business : A review on 
present state and potential for Elenia Oy 
Degree:    Master of Science in Technology 
Programme:   Master’s Programme in Smart Energy 
Supervisors:   Petri Välisuo, Hannu Laaksonen 
Instructor:   Joonas Tutti 
Year:    2023 Sivumäärä: 71 
ABSTRACT: 
The thesis examines the present applications of artificial intelligence in the construction busi-
ness domain. Nowadays, businesses are focusing on the safety of an operating environment. In 
a project-based business, managing projects and portfolios with safety management is signifi-
cantly important. Lack of knowledge is rarely a root cause of undesired deviations. More often, 
the deviations in processes are related to an irregularity in compliance with the instructions and 
rules. With the assistance of AI-based tools, such as machine learning, one can improve effi-
ciency on safety and project management tasks. The thesis provides a general view of artificial 
intelligence and a review of present approaches on AI utilization in the construction domain. 
Also, the thesis suggests the next steps for the utilization of AI in Elenia’s construction business. 
The first section of the thesis gives an overall view of artificial intelligence. In the second and 
third sections, a review of the present utilization approaches is examined. In the second section, 
the utilization is examined in the construction site safety domain. In the third section the exam-
ined field is related to the project management domain. The most common way to utilize AI 
were to exploit existing data for risk prediction and relationship detection. The risks differ from 
the examined domain. Thus, building a machine learning model is use-case related. There are 
various ways to utilize different models to achieve the benefits of machine learning. In Elenia 
Oy’s activities managing projects have a key role for achieving company’s mission: Electrifying 
life. The electric grids demand continuous maintenance and consistent development. One part 
of the development is replacement of components that have reached end of the technical lifecy-
cle. For example, replacement can be executed in Elenia’s Säävarma projects. The development 
of occupational safety Elenia together with its partners has committed for safety manifesto. The 
key theme of safety manifesto is to render everyone related to Elenia’s work field to return 
home in good health. The key approach of thesis was to find widely different approaches to 
utilize an AI for the development of safety and project objectives.  
 
 
 
 
 
 
 
 
 
 
 
KEYWORDS: Project management, Safety management, Safety, Artificial intelligence, Ma-
chine learning, ML, AI, Portfolio management, Construction, Construction business, Risk man-
agement, DSO, Distribution Network 
4 
Contents 
 
1 Introduction 8 
2 What is Artificial Intelligence 10 
3 Machine Learning 13 
3.1 Artificial Neural Network 14 
3.2 Deep Learning 14 
3.3 Support Vector Machine 15 
3.4 Natural Language Processing 17 
3.5 Convolutional Neural Networks 17 
3.6 Recurrent neural networks 18 
3.7 Decision trees 18 
3.8 Optimization 19 
3.8.1 Performance metrics of a model 19 
3.8.2 Cross Validation 21 
3.8.3 Genetic Algorithms 22 
3.9 Coding the machine learning models 25 
3.9.1 Python 25 
3.9.2 Julia 26 
4 Literature review on AI utilization in construction domain 28 
4.1 AI utilization in safety domain 28 
4.2 AI utilization in project and portfolio management 47 
5 Process for AI utilization 54 
6 Conclusion 58 
6.1 Data process and model 60 
6.2 Safety management – Data utilization for safety development 61 
6.3 Project management – risk and activity management 62 
6.4 Summary 63 
References 67 
5 
LIST OF FIGURES 
Figure 1. Illustration of artificial intelligence's concept (Abioye et al., 2021) ................ 11 
Figure 2. Overview of the machine learning techniques and algorithms (Ajayi et al., 2020).
 ........................................................................................................................................ 13 
Figure 3. Illustration of a multi-layer network classification of a data (Lecun et al, 2015).
 ........................................................................................................................................ 15 
Figure 4. Illustration of four different kernels used in SVM (Pedregosa et al. (2011). ... 16 
Figure 5. Illustration of the k-fold cross-validation principle (Pedregosa et al. (2011). . 22 
Figure 6. Example of the GA generated random population and explanation of the 
datapoints in the next generation (MathWorks, n.d.). ................................................... 23 
Figure 7. Evolution of the datapoints during the iteration of genetic algorithm 
(MathWorks, n.d.). .......................................................................................................... 24 
Figure 8. Comparison of the coding languages, regarding to user’s learning rate and code 
execution (Stropoli et al., 2021). .................................................................................... 26 
Figure 9. The process of prediction model’s development with explanation of the 
process’s phases (Zhang et al., 2019). ............................................................................ 30 
Figure 10. The early risk warning system’s process by Lin et al. (2021). ........................ 34 
Figure 11. Risk warning system’s principle by Lin et al. (2021). ..................................... 35 
Figure 12. Different layers of the vision-based system (Arashpour et al., 2022). .......... 36 
Figure 13. The excavation risk warning system’s data workflow (Arashpour et al., 2022).
 ........................................................................................................................................ 37 
Figure 14. Example of utilization process of the predictive model (Koc et al., 2022). ... 38 
Figure 15. Process of the data preprocessing and ML model building for safety hazard 
prediction model (Oyedele et al., 2021). ........................................................................ 39 
Figure 16. Illustration of the deep learning model’s principle (Oyedele et al., 2021). ... 41 
Figure 17. Interactions and relationships between the safety hazard features and  
predictions (Oydele et al. 2021). .................................................................................... 42 
Figure 18. Illustration of the multi-stage DNN model (Ajayi et al., 2020). ..................... 43 
Figure 19. Relationship between MAE and number of the layers as function of neurons 
(Ajayi et al., 2020). .......................................................................................................... 44 
6 
Figure 20. Different requirements for the smart and safety application layer for the 
construction site smart application (Kochovcki and Stankovski, 2021). ........................ 45 
Figure 21. Example of the supervised machine learning process with process phase 
related tasks (Sattari et al., 2022). .................................................................................. 46 
Figure 22. Delay risk identification process for predictive model building (Yaseen et al. 
(2020). ............................................................................................................................. 49 
Figure 23. The cost management prediction models’ process activities (Chen and He, 
2012). .............................................................................................................................. 50 
Figure 24. System architecture module illustration with data process of the artificial 
intelligence-based decision support tool (Choi et al., 2021). ......................................... 51 
Figure 25. Modules and functions of the AI decision support application by (Choi et al., 
2021). .............................................................................................................................. 52 
Figure 26. The model’s development process (Choi et al., 2021). ................................. 53 
Figure 27. The AI process and tasks divided into work positions (Heo et al., 2021). ..... 54 
Figure 28. Informative attributes of the project bidding data (Chou et al., 2015). ........ 56 
Figure 29. Model’s prediction development information flow (Chou et al., 2015)........ 56 
  
7 
 
Abbreviations 
 
AI Artificial intelligence 
ANN Artificial neural network 
BIM Building information model 
CEM Construction engineering and management 
CNN Convolutional neural networks 
DNN Deep neural network 
EPC Engineering, procurement, and construction 
GA Genetic algorithm 
GDP Gross domestic production 
HSEQ Health, safety, environment, and quality 
IoT Internet of things 
NLP Natural language processing 
NLTK Natural language toolkit 
NSE Nash-Sutcliffe efficiency 
MAE Mean absolute error  
MAPE Mean absolute percentage error 
ML Machine learning 
RF Random forest 
RMSE Root means squared error 
R2 Determination coefficient 
SVM Support vector machine  
 
 
  
8 
1 Introduction 
 
What if the data could describe what happened, why did it happen and what will happen 
next? Could one use this kind of ability to develop an enterprise’s, or individual’s, actions 
for achieving better results for the desired objectives?  
 
In present day the enterprises have comprehensive opportunities to use data for devel-
opmental purposes because most of management work is done through system that col-
lects and stores data in frequent basis. An existing data often reflects priorities of the 
enterprise. Hence, the data have often key performance indicators stored into it, the 
data can explain the history of operative actions. Also, the outcomes are often deter-
mined by these key attributes. The data is, therefore, a backbone of utilizing an artificial 
intelligence with machine learning applications.  
 
The machine learning models can be used for various analysis purposes: descriptive, di-
agnostic, predictive and prescriptive analysis. The descriptive analysis describes what 
has happened and diagnostic analysis answers the question why it happened. Further-
more, predictive analysis aims to describe what will happen in future. Hence, the predic-
tive analysis is not able to answer what must be changed for achieving better results 
there is demand for a prescriptive analysis. A prescriptive analysis aims to describe what 
part of the process should be changed for achieving development.  
 
The Thesis’ aims to conclude the present state of AI utilization in construction related 
business. After compiling the present state of AI utilization in the construction business 
domain for a theory base, the thesis concentrates on leading this knowledge to solutions 
that enterprise needs to develop project portfolio management and safety in construc-
tion sites. Thus, Elenia’s operating domain concentrates on project management a liter-
ature review’s focus is on the projects and safety management. Latter is integrated fix-
edly into the work life. No matter what enterprises’ operational domain is, the develop-
ment of the safety management, and therefore an overall safety of the operations, is 
9 
essential for every enterprise. The literature review’s use cases focus on the machine 
learning solutions. Hence, the objective for the Thesis was to establish present utiliza-
tions of the AI in the construction related project and safety management domain a 
structure and content of the Thesis relates mainly to machine learning applications. Fur-
thermore, it is noteworthy that machine learning is not mandatory for utilizing the AI. 
The intelligence of the machine may also be beneficial without learning dimension. For 
example, a chatbot solutions can utilize the AI without the machine learning solutions, 
chatbots can operate with predetermined rules that the bot follows.  
 
AI based systems often need a large quantity of a data for work properly. This data is 
often referred to as big data. The big data structure and management could be a subject 
of a thesis work itself, but the big data and its sub-subjects are discussed only negligibly 
in this thesis. AI systems uses the data for learning purposes and after learning AI can 
serve us in business-related tasks. Also, thus AI solutions are saving scarce time of the 
experts for more essential tasks.   
10 
2 What is Artificial Intelligence 
 
Artificial Intelligence is broad concept that can be divided into multiple subtopics, see 
Figure 1. Artificial intelligence can be described in a one sentence as for example: A ma-
chine being able to perform intelligent actions. There are also many other ways to de-
scribe AI, hence the concept being overwhelmingly wide and complex. Nevertheless, ac-
cording to Russel & Norvig (2014) the definition of the AI could be divided into human 
like and rational acts. The former aims to mimic human behavior in thinking and acting 
whilst the latter is based on predetermined rules for making rational decisions.  
 
In Roles of artificial intelligence in construction engineering and management: A critical 
review and future trends Pan & Zhang (2021) argues that artificial intelligence deals com-
plex and dynamic decisions with a better accuracy compared to old systems. The paper 
examined publications considering the artificial intelligence adoption in the construction 
engineering and management (CEM). They concluded that there is exponential growth 
of the papers related to AI and its present trends. The authors also stated that an AI acts 
as a backbone for the future digitalization processes. Hence, the construction industry 
includes project-based businesses AI solutions are connected tightly to projects.  
 
There are multiple different solutions to mimic advantageous behavior by a machine. It 
is noteworthy, that human behavior is not considered to be a goal itself because human 
action may be inconsistent quite often. The logical actions of human are generally target 
of an artificial intelligence. Computer vision utilizes the machine for seeing, speech 
recognition makes a machine able to hear and machine learning models, gives ability for 
a machine to process and use an information collected.  
11 
 
Figure 1. Illustration of artificial intelligence's concept (Abioye et al., 2021) 
 
In Figure 1 the concept of the AI is illustrated with subfields of AI. Left side of the figure 
describes types of AI. Abioye et al. (2021) explains that the artificial narrow intelligence 
is type of AI that operates in narrow predetermined domain, for example, repetitive sales 
prediction. The artificial general intelligence refers to human like behavior by the ma-
chine. In this form AI has general learning abilities and the ability to solve problems with-
out predetermined rules, such as humans can learn by examining prevailing circum-
stances. The artificial super intelligence is the form of AI where machines abilities sur-
pass the human’s capabilities in multiple domains. Components of the AI is illustrating 
the actions that one is keen to achieve with the machine’s intelligence. Subfields of the 
AI are describing the methods and mode of action to achieve the actions.  
12 
 
In Artificial intelligence: 101 things you must know today about our future the author 
discusses data being more valuable than an oil at the beginning of the industrial era. The 
author argues that an oil was beneficial for handful of companies, but a data is beneficial 
for much larger quantities. The author also discusses that artificial intelligence is fourth 
industrial revolution.  
 
 
13 
3 Machine Learning 
Objective of the machine learning is to teach a machine to act with advantageous matter. 
Often, this action is exploring data and generating predictions from the gathered infor-
mation. Predicting is done with a model that has learned to predict variables from a 
training data set. Datasets are divided into learning, validation, and testing datasets; for-
mer is used to teach a model to make prediction, validation is used to tune a model for 
better accuracy and latter is used to test the model. Test-set must always be unseen data 
for the model. In this section utilized construction business related machine learning 
tools are discussed. 
 
 
Figure 2. Overview of the machine learning techniques and algorithms (Ajayi et al., 
2020). 
 
Ajay et al. 2020 gathered main machine learning techniques into the table, see  Figure 2. 
The table might be useful when one determines the need for the machine learning 
model. The authors illustrate the application from the perspective of safety domain.  
 
 
14 
There are four main orientations of machine learning; supervised machine learning, un-
supervised machine learning, reinforcement machine learning and deep learning (Abi-
oye et al., 2021). The difference between these disciplines is the state of human interac-
tion in a learning process. In supervised machine learning, the user’s interaction is great-
est, whilst in deep learning, the machine is making decisions quite self-referential. Thus, 
the user’s interaction is lowest in deep learning. In unsupervised learning the machine 
learns more independently from the data. The main task of unsupervised learning is to 
generate relevant information from non-structured data (Pan & Zhang, 2021). 
 
3.1 Artificial Neural Network 
The Artificial neural network (ANN) is designed to mimic human brain operating principle. 
ANN has multiple nodes that forward and receive information with non-linear basis. The 
system contains visible and hidden layers of nodes (Lin et al., 2021). Also, Lin et al. (2021) 
describes that ANN can perform the predictions without being taught specific relation-
ships between the data’s attributes because the information flows from different input 
nodes to multiple hidden nodes. The model consists of neurons the information flows in 
various directions. Hence, the network can generalize learned information. Thus, the 
model has decreased dependency on the single attribute’s effect on the prediction.  
 
Koc et al. (2022) explains that ANN mitigates two-way calculations made in the network, 
feed forward and back propagation. First mentioned generates random weights from the 
input data and latter solves optimal weights by using different parts of the neural net-
work.  
 
3.2 Deep Learning 
In the article Deep Learning LeCun et al. (2015) explained that conventional machine 
learning models has limitations to handle a raw data from the nature. Deep learning 
brought development for this domain. For example, in the computer vision-based recog-
nition tasks the first, second and third layers detects different parts of the picture. In 
15 
other words, different layers detect different features from the data. This allows models 
develop during the process. Hence, there is multiple layers that can individually learn 
different parts of the picture, and later gather the knowledge, the learning becomes 
deeper compared to single layer models.  
 
 
Figure 3. Illustration of a multi-layer network classification of a data (Lecun et al, 2015). 
 
Lecun et al. (2015) illustrated, see Figure 3, how multi-layer neural network can bend the 
data space to make a data linear. Red and blue lines represent a categorical data, area of 
the red and blue data is non-linear at the left side of the figure and the hidden layer 
process the regular grid for the linear mode. The authors discuss that deep learning re-
lated subjects such as Backpropagation and feedforward architectures. Backpropagation 
is the concept where the deep learning models is calculating different weights for the 
layers, the weights affect to the prediction that model is producing. Feedforward struc-
tures calculate these weights during the process while moving between the layers. The 
model is optimizing the weights for best prediction accuracy (LeCun et al., 2015).  
 
 
3.3 Support Vector Machine 
Koc et al. (2022) describes Support Vector Machine (SVM) as the multiuse model due it 
universal structural learning process. The SVM generates a hyperplane and optimizes it 
16 
regarding to the datapoints. Thus, the hyperplane represents the optimal mean regard-
ing to the datapoints there is error with the non-linear data. This demands one to use 
“kernel tricks”.  
 
 
Figure 4. Illustration of four different kernels used in SVM (Pedregosa et al. (2011). 
 
Lei (2016) explains that by applying kernel function the sample data is placed in high-
dimensional map where the non-linear classification becomes possible. By generating 
high dimensions for the datapoint mapping the support vector can gain higher resolution 
when obtaining support vectors. Hence, the decision boundary follows data in higher 
resolution it is not linear, see  Figure 4. Note that there are examples of two linear and 
two non-linear kernels. Also, by using kernel functions it is possible to generate linear 
decision boundary if the data points are taken from the two-dimensional map to the 
three-dimensional map. Non-linearity can disappear by adding the y-axis.  
 
17 
3.4 Natural Language Processing 
Natural language processing is used to identify descriptive key words or phrases from 
the written text. Cheng et al. (2020) explains that the Natural Language Toolkit (NLTK) is 
most popular library for NLP. The toolkit is providing multiple different components for 
text modification such as a tokenization, stemming and parsing, for example. With these 
a user can make natural written language to more suitable for the machine learning (ML) 
model to use it for prediction.  
 
In Natural Language Processing and Computational Linguistics: A Practical Guide to Text 
Analysis with Python, Gensim, SpaCy, and Keras Srinivasa-Desikan (2018) explains that 
Gensim is a Python library dedicated to text processing, particularly in vectorizing the 
text. Also, the author notes that abilities of the Gensim extend further. Vectorizing the 
text is beneficial for the ML prediction making process. After vectorizing a word, it can 
be represented in a mathematical form. The vector has a magnitude and direction so it 
can be represented in the multiple dimensions. The author explains that Gensim is 
memory independent, and it can use multiple implementations of semantic domain. 
Also, smooth operation in the Python’s ecosystem helps to co-operate with multiple vec-
torizing tools and algorithms.  
 
3.5 Convolutional Neural Networks 
LeCun, Bengio and Hinton (2015) in Deep Learning discusses that convolutional neural 
networks (CNN) principle is to detect local combinations of the features and a pooling 
layer connects these local results together. Architecture of a CNN divide features in many 
layers, content of these layers is divided to feature maps. Layers are connected to feature 
map and filter banks.  The model is feedforwarding the weights to next layer.  
 
18 
3.6 Recurrent neural networks 
LeCun, Bengio and Hinton (2015) explained that Recurrent neural networks (RNN) are 
powerful tools when input data have sequential inputs. The network processes the data 
in the hidden units and creates a state vector that got a history of the process in it. The 
authors note that RNN’s training can cause problems due the backpropagation gradients, 
these gradients tend to over- or underfit the prediction. Thus, the gradient tends to be 
similar to the earlier gradient. 
 
3.7 Decision trees 
Pedregosa et al. (2011) explain that decision trees are used for the classification and 
regression problems. The algorithm exploits simple decision rules during the learning 
process. The decision nodes are generating the tree structure of the model. The authors 
note that decision trees have tendency to overfit if the data have many features. Thus, 
it is important to decrease the probability of overfit by suitable method, such as maxi-
mum depth of the tree. The limitation in the tree depth decreases the amount of vertical 
decision nodes, therefore, the model is not able to exploit all information of the data’s 
features. Thus, the model’s ability to perform predictions with unseen data is better.  
 
Pedregosa et al. (2011) explain that because the decision tree’s prediction is not contin-
uous either smooth the variance of the individual tree’s prediction can be quite high. By 
using the random forest individual decision tree’s prediction weight is decreasing and 
therefore the variance of the prediction is lower. Lin et al. (2021) explains that Random 
Forest (RF) is suitable for a nonlinear regression and a classification machine learning 
problem. Therefor RF is popular in the risk prediction ML domain due its ability to solve 
nonlinear relationships of variables. Random forest works in equivalent manner that de-
cision tree. Structure of the decision tree is a seed for the random forest. In random 
forest the algorithm is producing multiple trees that are used as an ensemble for making 
the prediction. 
 
19 
3.8 Optimization 
Pan & Zhang (2021) discuss the optimization of a project. In project management domain 
the optimization can be considered an optimal decision support system to achieve the 
best end-result for a project. 
 
In the machine learning domain, the purpose of the optimization is to decrease the pre-
diction error. Hence, the model’s ability to generate more precise prediction increases. 
At the process of the building a model for prediction one should divide the data into 
training, validation, and testing datasets. By this simple procedure one can evaluate and 
validate the model’s performance on the unseen data. The model’s performance is 
measured multiple times during the process, see section 1.2.3 Performance metrics of a 
model.  
 
In Machine Learning: An Applied Econometric Approach Mullainathan & Spiess (2017) 
explains that the performance of the model may be overrated in the training data. The 
authors explain that some algorithms tend to overfit. Overfit is a situation where the 
model learns training attributes of the data too well. Thus, the model can perform sig-
nificantly better on the training set compared to testing or validation dataset where data 
attributes deviate from learned ones. The authors explain that part of the solution is 
regularization that measures the complexity of the model and directs the optimization 
towards simpler models. For example, in the regression tree-based model one can per-
form regularization by comparing the tree depth and performance. Thus, one may not 
select merely the best overall performing model because the regularization demands to 
also take the model complexity in concern, such as tree depth.  
 
3.8.1 Performance metrics of a model 
A model’s accuracy must be validated with some objective practice. It is intuitive to 
measure a model’s performance with multiple indicators. The evaluation metrics meas-
ure the error between the prediction to the actual value. At the classification the metric 
20 
measures model’s ability to classify data inputs into correct categories. In the regression 
problem the evaluation is based on the numerical value between the actual data point 
and the predicted one.  
 
Oyedele et al. (2021) describes the evaluation metrics represented above shortly and 
accurately: 
“Precision is a fraction of correct predictions for a specific class, while recall is the 
model’s ability to classify relevant cases. F-1 score defines the harmonic mean (or 
a weighted average) of precision and recall, and it reaches its best value at one and 
its worst at zero.” 
 
Koc et al. (2022) used five different indicators for the regression-based machine learning 
problem: Mean Absolute Error (MAE), Root Means Squared Error (RMSE), Mean Abso-
lute Percentage Error (MAPE), Determination Coefficient R2 and Nash-Sutcliffe efficiency 
(NSE). The authors explained that first three indicators represent higher accuracy with 
smaller absolute values. The determination coefficient near -1 and 1, thus zero repre-
sents non-acceptable prediction accuracy. NSE prediction accuracy moves from 0,5 to 1 
where 1 is perfect prediction.  
 
Oyedele et al. (2021) discusses of the model’s performance analytic tool Cohen Kappa 
in their paper. Cohen Kappa is suitable for the classification-based problem’s perfor-
mance evaluation.  The authors discuss that Cohen Kappa fits in the multi-class and bi-
ased problems measurement. The equation for the Cohen Cappa is described in equa-
tion (1).   
 
 𝑘 =
(𝑡−𝑦)
(1−𝑦)
 (1) 
 
 
Where the y is predicted output and t is value of predicted variable, 1 represents the 
perfect performance of the model.  
 
The authors used multiple other evaluation metrics; Accuracy, Precision, Recall and F1 
Score. The evaluation metrics are described in equations (3) – (6) as follows:  
21 
 
 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 (𝑇𝑃)+𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 (𝑇𝑁)
(𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁)
 (2) 
 
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
  
 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑃
𝑇𝑃+𝐹𝑃
   (3) 
 
 𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑃
𝑇𝑃+𝐹𝑁
 (4) 
 
 𝐹1 = 2 ∙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∙ 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
 (5) 
 
3.8.2 Cross Validation 
According to Mullainathan & Spiess (2017) cross validation is a tool for selecting correct 
level of the regularization. In the cross-validation one splits the training data to equal 
sized sections, often called folds. Thus, one has, for example, ten training dataset over 
one larger one. One can see from the Figure 5 that data is divided into different folds 
and splits. The model is tuned by using separate folds as testing data in different splits.  
 
Cross validation is blocking overfitting of the model by separating data to the folds. The 
model is tested against test set multiple times during the learning process. Thus, the 
model’s performance is tested in every split. Therefore, the feedback from multiple test-
ing events is used for tuning the model’s parameters into right direction. 
 
22 
 
Figure 5. Illustration of the k-fold cross-validation principle (Pedregosa et al. (2011). 
 
In other words, the model has better ability to consider the attributes and relationships 
that are important for the prediction accuracy in unseen dataset. Also, the advantage of 
cross validation multiple testing events is that the data can be tested only once. After 
the data is tested against test set the model has seen the data and the test can’t be 
repeated on the same data set. It is essential to train model for making accurate predic-
tions with unknown data. Therefore, the cross validation is using data with the greater 
efficiency compared on the conventional data set structure. 
 
 
3.8.3 Genetic Algorithms 
There are multiple ways to optimize the model. Different methods are suitable for dif-
ferent problems. For example, Gradient Descent algorithm calculates gradient and iter-
ates through minimum value of the data. Hence, the gradient for the data can’t be cal-
culated in every data set there is need for different algorithms. Some algorithms iterates 
23 
on the linear basis these can’t be used in the non-linear data effectively, also the data 
shape can limit the algorithms ability to find minimum and maximum values of the data.  
 
 
Figure 6. Example of the GA generated random population and explanation of the data-
points in the next generation (MathWorks, n.d.). 
 
Genetic Algorithms (GA) is one way to optimize the model. In the thesis’s domain, safety 
and project management, the GA is popular way to optimize the models. Thus, GA is 
introduced in this subsection. Skorpil and Oujezsky (2020) explains that Genetic Algo-
rithm (GA) is an optimization algorithm that mimics nature’s evolutionary development. 
The authors note that GA is suitable for many problems but is unsuitable for some prob-
lems due time usage of the fitness evaluation. An individual execution of the function 
demands high amount of time.  
 
In GA domain individual datapoint, often referred to gene, is part of the population. The 
data points that have best abilities compared on the problem that one is solving are se-
lected by GA during the process. The data points are used as the parents for next itera-
tions population. GA can generate the parents in multiple ways, the practice is often 
referred to as operator. The genetic operators define how next iterations population is 
born. For example, single point crossover operator chooses random point that divides 
24 
the population into segments, after this the populations are compounded. Other com-
mon operators are selection and mutation. In the selection data points with suitable fea-
tures are chosen to next iterative population. In the mutative operator the genes are 
randomly inserted to population, or the features of the genes are altered for achieving 
variability in the population. 
 
First the algorithm generates random population of the data. The generated population 
is illustrated in the Figure 6. The datapoints are marked with different patterns that ex-
plains evolution of the data points at the next generation. The elite datapoints are di-
rectly selected as a parent in the next generation and crossover datapoints are mixed 
with other crossover datapoints. The crossover selects two parents for a data point, mu-
tation changes the form of the data and selection delivers the data for the new genera-
tion as it is. The mutated children’s attributes are changed for the next generation. These 
generations are reproduced till the stopping parameters, such as max generations or 
maximum time, are achieved (MathWorks, n.d.).  
 
 
Figure 7. Evolution of the datapoints during the iteration of genetic algorithm (Math-
Works, n.d.). 
 
Due the iterations the population is slowly gathering to the center of the diagram, see 
Figure 7. This is a result of the evolutional nature of the algorithm.  
25 
3.9 Coding the machine learning models 
Coding of the machine learning models can be done in various ways and the languages 
has strengths and weaknesses. See below Julia’s developers’ explanation for creating 
new language: 
“We want a language that's open source, with a liberal license. We want the 
speed of C with the dynamism of Ruby. We want a language that's homoiconic, 
with true macros like Lisp, but with obvious, familiar mathematical notation like 
Matlab. We want something as usable for general programming as Python, as 
easy for statistics as R, as natural for string processing as Perl, as powerful for 
linear algebra as Matlab, as good at gluing programs together as the shell. 
Something that is dirt simple to learn, yet keeps the most serious hackers happy. 
We want it interactive and we want it compiled.” Bezanzon et al. (2012). 
 
3.9.1 Python 
A Python is an open-source object-oriented coding language that is often used in the 
machine learning context. The Python seem to be intuitive and relatively easy to learn. 
Often machine learning courses suggest Python for a coding platform. The Python is at-
tractive language for the machine learning due it has quite wide package of pre-coded 
libraries in it. The user can simply use line of code for download the library needed for 
the current coding work.  
 
In Learn Python programming: A beginner's guide to learning the fundamentals of Py-
thon language to write efficient, high-quality code Romano (2018) explains that in the 
programming often one must represent real world connections in the code. In the coding 
language connections are often represented as objects. Therefore, Python’s object-ori-
ented nature is advantageous for practical solutions.  
 
The Python, among many other languages, is used to handle necessary attributes and 
format of the data. Amount of a data could be massive, and unfortunately all data is not 
in the same format compared to each other. The enterprise that wants to build a ma-
chine learning model usually must process the data in multiple ways. For example, the 
data could be in the text format and needs to be converted to the numerical variables. 
26 
The purpose of the data processing is to clarify structure of the data to make it more 
understandable. For example, the one can relatively easily handle missing values of the 
data set by filling thousands of values with few lines of the code. The code can fill the 
missing values with average of the values or simply delete rows with certain missing val-
ues.  
 
3.9.2 Julia 
Julia is an open-source programming language. In Julia Data Science Storopoli et al. (2021) 
note that Julia is fast and easy to learn language. The authors claims that Julia is easier 
to read during debugging compared to Python and R. Also, the authors note that Julia 
has better ability to adjust on the other languages and open-source packages, thus in-
terfaces are reduced, thus coding is more effective. The authors also noted that the pro-
gram project management tool and package management.  
 
 
Figure 8. Comparison of the coding languages, regarding to user’s learning rate and 
code execution (Stropoli et al., 2021). 
 
27 
In Figure 8, the authors classed five open-source languages to quadrants. At the right 
side of the horizontal axis is languages that are harder to learn and write. At the left side 
is an opposite language in this manner. The vertical line is dividing languages by the 
speed of the code execution.  
28 
4 Literature review on AI utilization in construction domain 
Safety management is a corner stone of a project related to any kind of a construction. 
Everyone has right to work safely, arrive home from the work. Elenia has committed to 
the Safety Manifesto. The idea behind manifesto is to connect Elenia and main partners 
to take responsibility and proactive actions to raise safety levels at Elenia’s operating 
domain. Elenia’s main contractor partners has signed the safety manifesto as a commit-
ment to goals of the manifesto.  
 
Project and portfolio management are also key elements for the business to function 
correctly. The management of projects and portfolios is management of the resources 
that the enterprise has under control. Hence, the resources are often limited effective 
management supports resources objectives. An effective use of the scarce resources is 
therefore beneficial for the enterprise. It is noteworthy, that from the projects perspec-
tive the resources are related to projects objective and tasks and does not only compre-
hend workforce. Therefore, the management of all resources, such as a cashflow, bene-
fits a project to achieve objectives and may determine objectives that can be achieved. 
 
4.1 AI utilization in safety domain 
Pan & Zhang (2021) discusses, according to McKinsey global institute (2017), that con-
struction business is responsible for the approximately 13-15% of the world’s gross do-
mestic production (GDP) while the construction domain is responsible for the 30-40% of 
fatal accidents (Zhou et al., 2015). Therefore, a Health, Safety, Environment and Quality 
(HSEQ) management should be the one top priority for the industry, thus it should be 
top priority for an individual enterprise and for an individual worker. A worker should be 
in center of the actions for developing the safety of the construction site. Thus, the con-
struction projects are labor intensive and relatively unique projects it is challenging to 
see considerable change in near future. Artificial intelligence systems are adopted first 
in the abstract expert-level and later in the physical activities. Hence, the implementa-
tion of the non-physical systems is generally straight forward. For example, AI 
29 
development in a project management process serves all project resources Thus, using 
an AI system to physical tasks requires task specific AI system it is costly and time con-
suming to produce the system for different tasks compared to non-physical systems.  
 
A Natural language processing can be used to process safety reports with relatively low 
or no cost. Processing the reports can lead prolific knowledge from safety incidents. 
Baker et al. (2020) used attributes and natural language processing (NLP) to predict in-
jury severity and incident type among other factors. The attributes describe different key 
elements and conditions of the safety incidents. It is notable that the keywords, attrib-
utes, are not outcomes of the safety deviation but the preceding action or circumstance.  
Also notable is that authors did not use machine learning to extract the keywords while 
keywords were selected manually. It is notable that manual processing can be boosted 
by using data processing actions from the machine learning domain.  
 
Also, Baker et al. (2020) constructed machine learning models that predicted safety out-
comes. At their work they used the NLP and machine learning modelling. As written 
above the attributes were extracted so they represented the circumstances before the 
safety occurrence. They built multiple different models, evaluated models’ performance, 
and stacked the models to achieve best performance. The models can be utilized to iden-
tify correlations between safety issues and for example certain tools, the system can be 
used to identify inverse correlations (Baker;Hallowell;& Tixier, 2020). Thus, the model 
can be used as a diagnostic inference.  
 
Zhang et al. 2019 examined different machine learning models to classify causes of the 
accidents. The authors built the model by using sequential quadratic programming to 
optimize weights for five different models that are represented in Figure 9. At the data 
preprocessing phase, the accident reports data is structured and cleaned for the future 
use. The authors executed multiple natural language text processing actions for the ac-
cident data. In the model building phase, the authors made five different models for 
30 
predicting causes of the accidents. In the model tuning phase, the Sequential quadratic 
programming algorithm optimizes the weights of the models.  
 
Figure 9. The process of prediction model’s development with explanation of the pro-
cess’s phases (Zhang et al., 2019). 
 
The optimized classifiers were able to outperform the classifiers with no optimization. 
The optimized models F1 score increased significantly. Non-optimized models scored F1 
at the range of 0,44 to 0,58 and optimized results were at 0,68. The results describes 
average F1 score for different accident cause prediction. The author’s model predicted 
11 different causes for accidents, for example “collapse of object”, “Falls” and 
31 
“electrocution”. It is noteworthy that the optimized model could not predict all the 
causes with expectable range of F1 score. For the example, the cause of struct by falling 
object was predicted significantly under 0,5. In other word the model is biased towards 
wrong direction. With random guessing a model can predict the causes with the 0,5 F1 
score. The authors explain the reason behind this is an inaccurate language used at the 
accident reports. A natural language can have multiple different options to describe sim-
ilar causes. The authors also claim that in many cases human was not able to extract the 
correct reason for the accident.  
 
Liang and Liu (2021) examined a safety system with the risk warning and indication con-
trol. The authors discussed the system integration with a Building Modelling System 
(BIM), Internet of things (IoT) and safety risk warning system. The BIM system is used to 
share information in the building construction process with all the main participants of 
the project. The authors explain that BIM can be understood as a platform over the con-
ventional design plan.  
 
Also, Liang and Liu (2021) discuss that in the construction building process risk exists 
among the process all the time. A risk must be measured by its seriousness and proba-
bility of occurrence. In a present work field, the variability of the risks is quite high. Hence, 
the safety culture has developed to a point, where proactive actions block reoccurring 
incidents. Nowadays a typical risk is reduced by risk mitigation process, this leads decline 
in typical risk occurrence. Thus, reported incidents and risks are high in variability. The 
authors discussed to capture three main elements to their early risk warning system. The 
system had to be able to identify relevant and deviating factors from the construction 
domain and therefor help a project personnel to mitigate unfavorable probability of a 
risk with correct actions. The authors approached the systems design from the quality-
based context, by focusing to create a measurable and repeatable model that connects 
science of a safety. An objective approach reduce subjectivity from the safety develop-
ment process. By reducing subjectivity, an enterprise can increase quality with a meas-
urement among other actions. 
32 
 
At the article Hazard Analysis: A deep learning and text mining framework for accident 
prevention Zhong et al. (2020) discusses similar usage of the system that evaluates haz-
ard reports automatically with a combination of multiple machine learning models and 
text processing. The authors argue that analyzed repeatable behavior collected from the 
reports can improve safety. An unorganized reporting due use of a natural language 
makes limitations to conventional systems to describe data efficiently. In the article, au-
tomation was concentrated to exploit the deep learning-based system. By using Deep 
Learning based models, one can significantly reduce workload of an enterprise’s work 
resources.  
  
Also, Zhong et al. (2020) analyzed the hazard records with the text mining model to con-
nect relations between different keywords from free-text descriptions. The authors 
aimed to visualize related causes, common factors, and circumstances of hazards oc-
curred. The model’s input feed was gathered from the mobile reporting application that 
was used in various Wuhan Metro’s construction sites. The paper concentrated to the 
text processing and visualization. Also, there was section where the authors used ma-
chine learning model to recognize the relevant keywords. The authors tested the support 
vector machine and central neuron network for work.  The model achieved average F1 
score of 0,71, variance of F1 score was quite high.   
 
Goh and Ubeynarayana (2017) examined six different models to recognize safety hazard 
causes, and factors leading to safety hazards. The authors describe a text mining process 
that uses tokenization. In natural language text processing tokenization refers to process 
where the natural language is broken into tokens. The authors used uni-grams and bi-
grams tokens that consists of one and two words, respectively. The token is a dynamic 
expression because the token can be a word, two words or some other specified collec-
tion of the words. By making text to numerical values with tokens one is vectorizing the 
text, thus the machine can better understand the data. The authors describes that their 
method is not fully automated and proposed modifications to the reports for the better 
33 
prediction accuracy. The authors explained that misclassification and over-focusing to 
non-important factors cause challenges to the model’s accuracy. To achieve better accu-
racy the authors, suggest that reporter manually connects cause to pre-set label, for ex-
ample “electrocution” or “traffic”.  
 
Pan & Zhang (2021) discussed ability of an AI in a risk mitigation process. An Artificial 
Intelligence system can predict risks and work phase’s interrelations, therefor helping a 
project personnel to adapt correct actions at the right time. An Artificial intelligence in 
risk management process can support an expert with reduction of a subjectivity and an 
indefiniteness from the safety, or an overall, management process. With an ability to 
manage multiple information sources at the same time an AI system can produce insights 
and notifications to a project personnel.  
 
In Text mining-based construction site accident classification using hybrid supervised ma-
chine learning Cheng et al. (2020) discusses that accident report narratives explain the 
causes of the occurred accidents. The AI solutions are in active use in present time, but 
the system’s learning ability is limited, and error rate is quite high. A neural network and 
a recurrent neural network praise these abilities over more traditional models, such as 
the decision tree, K-nearest neighbors, linear regression, and support vector machine. 
The authors also discuss about gated recurrent unit that is newer model for a sequential 
data forecasting.  
 
Lin et al. (2021) describes the early warning system for the excavation works. The authors 
approached risk assessment and management with a fuzzy set theory together with the 
machine learning models. The model used the big data together with the sensors to 
achieve a proactive risk system. The authors identified main risk factors concerning the 
work domain and derived the knowledge to work phases where risks occurred. The sys-
tem benefits an analytical hierarchy process and TOPSIS-method, both systems are de-
signed to process multi-criteria decisions. The paper’s early warning system starts from 
the overall risk management which determines the objectives of the risk management 
34 
process. The system itself consists of two main parts, see Figure 10. The data gathering 
could be done by multiple ways, such as remote sensing, sensors, geological information 
system. The data is then processed with random forest algorithm, this is second part of 
the application where the data is processed. The RF is analyzing the gathered infor-
mation and producing the excavation status as an output. The outputs are categorized 
into three risk categories. The predictive actions are chosen by the predicted risk level.  
 
 
Figure 10. The early risk warning system’s process by Lin et al. (2021). 
 
Lin et al. (2021) examined the metro subway station as the case building site. The authors 
gathered and processed a relevant data for the risk prediction system. In the case study 
data was collected with the sensors buried near the excavation site. The authors seek 
the system that could predict a risk level to pre-fixed classes. The author’s goal was to 
create the system that indicates risk levels for the construction site personnel to proceed 
with the relevant actions to mitigate the risks. The authors claims that the system’s 
strength is to predict actual risk level and non-subjectivity of the ML model.  
35 
 
Figure 11. Risk warning system’s principle by Lin et al. (2021). 
 
In the Figure 11 authors illustrate the principle of the early risk warning system. The sys-
tem detects ground surface settlement state from the site’s sensors. The risk evaluation 
is done by random forest, Bayesian network and support vector machine algorithm. Pro-
active actions are done for lowering the risk levels of the excavation site. Different algo-
rithms are used to decrease the variance of the risk level prediction compared to actual 
state of the risk. Hence, a model could forecast, for example, low-risk stage during high-
risk actual stage it is logical to decrease the variance of the prediction by using multiple 
models.  
 
At the Computer vision for anatomical analysis of equipment in civil infrastructure pro-
jects: Theorizing the development of regression-based deep neural networks Arashpour 
et al. (2022) examined a machine vision together with a deep neural network-based sys-
tem to raise heavy equipment safety among other beneficial factors. The authors claim 
that the deep neural network-based machine learning model is accurate in excavation 
36 
related domain. The vision-based systems utilize multiple layers to determine, for exam-
ple, the excavator position, see illustrative Figure 12.  
 
 
Figure 12. Different layers of the vision-based system (Arashpour et al., 2022). 
 
The system’s machine vision focuses on identifying point-form spots of the excavator, 
such as cabin boom and arm bucket. Also, system aimed to identify the angles of these 
key points. In the Figure 13 authors are illustrating the data-flow, the system prepro-
cesses the images with channel shuffle, depth wise separable convolution, and com-
pound scaling. After these actions the image’s features are easier to detect by the algo-
rithm. Relatively big machinery, such as an excavator, has quite long reachability and 
demands a lot of space around the working station that makes such machinery possible 
object to the safety hazards. The preprocessed images are processed by the neural net-
work algorithm. The algorithm is producing a position of the excavation machine as a 
result of the process. 
 
The authors explain that deep neural network’s ability to generate multiple layers is ben-
eficial for the overall efficiency of the system. With a help of the machine vision the ex-
cavator operator can control the key points of the machine with a better accuracy leading 
more precise movements of the machine. The authors note that for the complex objec-
tives the system needs to be trained with a benchmark imaginary data set. 
 
37 
 
Figure 13. The excavation risk warning system’s data workflow (Arashpour et al., 2022). 
 
Koc et al. (2022) examined a database of near 400 000 accidents and built a hybrid 
model with a wavelet and machine learning model. The authors claim that in the pre-
sent literature there is lack of time series-based approach regarding to the accident 
prediction. In the wavelet transformation the time series is divided into different bands 
of the wavelet, where the different bands represent details of the data or approxima-
tions with the different wave lengths. The study focused on three different time peri-
ods, 1 day ahead, 7 days ahead and 30 days ahead. The authors explained that time se-
ries-based prediction is most accurate for the 1 day and 7 days ahead prediction, expla-
nation of this could be day of a week-anomaly, certain days are more dangerous than 
others.  
 
38 
 
Figure 14. Example of utilization process of the predictive model (Koc et al., 2022). 
 
The authors described the model’s utilization process in the Figure 14. One should fo-
cus on timing the actions in the model’s predictions of the higher quantity of the safety 
hazards. Hence, the proactive safety actions are there for targeted and correctly timed 
the frequency of the hazards should decrease. If the enterprise can repeat this process 
over time the safety hazard rate is going to decrease continuously.  
39 
 
Oyedele et al. (2021) in Deep learning and Boosted trees for injuries prediction in power 
infrastructure projects argues that conventional machine learning techniques are not 
optimal in the modelling causes of injuries.  
 
 
Figure 15. Process of the data preprocessing and ML model building for safety hazard 
prediction model (Oyedele et al., 2021). 
 
40 
The authors rely on deep learning and boosted trees, hence these models do not require 
manual data engineering compared to conventional models such as SVM and ANN. In 
the Figure 15 Oyedele et al. (2021) illustrates the model’s process. The figure shows how 
data is preprocessed into structured form. After data handling machine learning models 
are developed and performance is measured with the test data set. After evaluation of 
the models the main model is tuned, and outputs are used for the safety hazard predic-
tion. The authors recorded quite accurate prediction probability of 0,967 and Cohen 
Kappa rate of 0,964. The authors explained that according to the result of the sensitivity 
analysis deep neural network (DNN) based modelling has good generalization ability and 
the DNN has a good ability to identify connections between complex labels. The DNN 
neurons get quite high variation of the inputs that neurons handle through with an error 
rate, the error rate is used to modify the weight of the neuron in the model. 
 
Oyedele et al. (2021) used DNN for the model. The predictors, such as site conditions 
or tool types, are used as predictors and the prediction of injured body part is done, 
see Figure 16. Also, the authors explained that with the model there is opportunity to 
predict likelihood for different combinations of the working domains. For example, the 
authors discussed how certain location and task indexes produced different probabili-
ties for the head, ankle, and eye injuries. Some tasks and locations generate higher 
probability for the multiple injuries, the one could raise awareness and proactive ac-
tions for these tasks and conditions. 
41 
 
Figure 16. Illustration of the deep learning model’s principle (Oyedele et al., 2021). 
 
The authors explained that a task repetitiveness is linked to the injury rate of the line-
men, a normal task seem to be more dangerous than rare task. The paper discusses the 
local interpretable model-agnostic that explains DL models predictions in the local 
scale such as hand injury. In practice, the model describes different predictors, labels, 
and effects of a prediction that the model produces. For example, the equipment used 
in the task has higher effectiveness in hand related incidents than the state of the elec-
tricity in the wire. The authors discusses that it might be that electrification causes 
more serious injuries while hand tool equipment rarely causes injuries to other body 
parts than hands.  
 
42 
 
Figure 17. Interactions and relationships between the safety hazard features and  pre-
dictions (Oydele et al. 2021). 
 
Oyedele et al. (2021) explained that some project attributes affect to the incident rate 
more than others. For example, the site characteristics, equipment, and task type af-
fect most in the incident occurrence, as can be viewed from Figure 17. This was con-
firmed by all models that the authors reviewed. The models recognized other powerful 
attributes, such as a season, project duration etc., from the data. The authors de-
scribed the relationship between the predictors as an interaction strength of the pre-
dictor. The LOC predictor, that describes the site characteristics such as the terrain, 
ground conditions, wind conditions and site logistics and other characteristics of the 
site had the strongest interconnection to other attributes. The site conditions are, at 
the prediction domain, dynamic regarding to subject prediction of an incident. For ex-
ample, the windy conditions are considered more dangerous to an eye than an ankle. 
This is simple logic and together with a machine learning model it perhaps has more 
importance than single observation.   
 
In Deep Learning Models for Health and Safety Risk Prediction in Power Infrastructure 
Projects Ajayi et al. (2020) built six deep learning models together with the text-mining 
practices. In the Figure 18 the models are illustrated. In first stage the DNN model is 
producing the feed data for five other models in the stage two. In second stage the pre-
dictions are generated. The paper approached of the machine learning safety domain 
from a practical approach. The authors benchmarked their model with the existing 
models and developed user interface for better awareness of the safety issues. The 
43 
models predicted relationship with different variables using mainly regression. The au-
thors used area under curve, mean absolute error, kappa coefficient, sensitivity, and 
determination coefficient as the performance metrics. The authors examined different 
combinations of the layers and neurons compared to mean average error of their 
model, see Figure 19. This is logical approach for constructing optimal structure for the 
model. One should focus on structure where error’s decline gradient is greatest.  
 
The authors explain that the DNN’s approach is divided into global and local views of 
the data. The global view is seeking an interaction between the predictors and the local 
view seeks explanation for the individual predictor’s effect on the outcome.  
 
Figure 18. Illustration of the multi-stage DNN model (Ajayi et al., 2020). 
 
The variables were determined with the text-mining approach from a health and safety 
incident cases of 17 972 that decreased to 16 900 at the set-up process. The variables 
44 
extracted from the reports included, for example, condition of the personal protective 
equipment kit, project type and duration, and weather conditions.  
 
According to the data that Ajayi et al. (2020) used, largest proportion of the injured 
body parts are fingers and backs with a cumulative proportion of 34% of all injuries. 
Hand, ankle, and knee injuries were cumulative proportion of 27%. It is noteworthy 
that 45% of the injuries were caused during excavation.  
 
 
Figure 19. Relationship between MAE and number of the layers as function of neurons 
(Ajayi et al., 2020). 
 
In Building applications for smart and safe construction with the DECENTER Fog Com-
puting and Brokerage Platform Kochovcki and Stankovski (2021) discussed combining 
artificial intelligence, internet of things and blockchain technology. The authors con-
structed smart applications to produce the smart and safety construction sites. The au-
thors focused to four different scenarios to achieve smarter and safer construction site. 
The scenarios included notifications for the site managers, surveillance of the site vehi-
cles, management of the resources, assets and waste management and working condi-
tions observation. The authors utilized a Decenter fog computing platform for the task, 
45 
the platform is designed to run microservices needed in the smart and safety applica-
tion utilization. 
 
 
Figure 20. Different requirements for the smart and safety application layer for the 
construction site smart application (Kochovcki and Stankovski, 2021). 
 
The Authors explained that end-users, such as construction engineers and managers 
gave positive feedback after the scenarios were utilized in the construction site. The 
authors note that a technical usability of the system supported information transfor-
mation from the site to application, and therefor for the users. In Figure 20 the authors 
go through technical requirements and principles of the smart application. The require-
ments are separated into functional and non-functional requirements together with 
the system level requirements. Functional requirements are related to system ability to 
achieve desired outcomes. Non-functional requirements set boundaries how the out-
comes should be achieved. The system requirements are related to technical abilities 
of the system. Also, the authors note that system’s ability to access different AI 
46 
methods is important feature. It enables system to note, for example, personal safety 
equipment wearing issues.  
 
Sattari et al. (2022) examined an AI-based decision-making system that takes assets in 
the consideration. The authors focused on the process safety management together 
with the asset management. The asset management were divided into two groups: as-
sets and human resources.   
 
 
Figure 21. Example of the supervised machine learning process with process phase re-
lated tasks (Sattari et al., 2022). 
 
In the Figure 21 the authors describe the process of the paper’s machine learning mod-
elling. In the manual classification phase part of the data are randomly selected for the 
classification that are done by the asset management process’s elements.  After classi-
fication the data is split for the machine learning part. The incident data is processed 
into numerical form and the classification model was built and data were classified for 
helping to develop the predictive model on the next phase. The authors selected ran-
dom sample of 764 from the 7643 incidents. The authors describe that the classifica-
tion problems are numerical problems, thus the plotted data can be divided with a 
47 
decision boundary. Pythons TFidfVectorizer was used to generate vectorized data from 
the incident reports. The authors used linear support vector classifier to draw decision 
boundaries. The model uses the boundaries in the classification process. The authors 
searched for the dependencies and causes from the modified data. With the results of 
the paper the authors created a clear procedure and practical recommendations for 
each asset and operative class.  
 
According to Pedregosa et al. (2011) the TFidVectorized is a Scikit-Learn based tool that 
transfers the terms occurring in the incident reports to numerical values. TFID is an ab-
breviation for times inverse document-frequency. The principle of the TDIF vectoriza-
tion is to give more importance for the rarely occurring words because of the better in-
formation value of rare words.  
 
4.2 AI utilization in project and portfolio management 
 
Pan & Zhang (2021) discussed that a construction engineering and management benefit 
from the artificial intelligence in many ways. An AI can process multiple data sources that 
is beneficial for decision making. An AI can recognize patterns with a machine learning 
modelling and is able to do so with enormous dataset sizes. An AI is powerful tool for 
project and portfolio management. As discussed in the safety section the use domain of 
AI system is quite diverse and a user can achieve relatively high prediction rates with the 
machine learning modelling.  
 
In Big Data in the construction industry: A review of present status, opportunities, and 
future trends Bilal et al. (2016) discussed the big data utilization in the construction do-
main. The paper reviewed multiple views on the big data domain such as data mining, 
data warehousing, machine learning and big data analytics. The authors discussed dif-
ferent practices that can be used with the big data related applications. A Document 
classification and analysis is used for, for example, classification documents to correct 
class and with document analysis the documents content can be examined.  
48 
 
In Deep learning and Boosted trees for injuries prediction in power infrastructure projects 
Oyedele et al. (2021) used project features, such as employee experience, project dura-
tion and project season for the prediction purposes. Similarly, the features can be used 
for a project and portfolio success prediction. The enterprise can perform the data anal-
ysis from a historical data and derive knowledge for the future purposes. This kind of 
action in the management process can lead the enterprise to more mature management 
process that is based on the knowledge from the past projects. 
 
Mirnezami et al. (2020) concentrated on a project cash flow management together 
with a critical chain management and multi-criteria decision-making process. The au-
thors divided the data to intervals that represents optimistic and pessimistic scenarios 
to gain knowledge from the data to project managers. The data represented, for exam-
ple, most uncertain time of the project. The enterprise can use the information to allo-
cate resources for more uncertain time of the project lifecycle, this seem to be rational 
approach in the multi-project domain. Hence, the managers time is limited it is rational 
to use energy for the most uncertain phase of the project.  
 
In Prediction of risk delay in construction projects using a hybrid artificial intelligence 
model Yaseen et al. (2020) produced random forest classification model with genetic 
algorithm. The model’s goal was to predict project delay problems at the construction 
business. The model gained accuracy of 91,67%, kappa of 87% and classification error 
of 8,33%. The authors explain genetic algorithm generates random decisions that pro-
duces mutation to the decision pool. Hence, the process is repeated the decisions are 
getting better in each round.  
 
49 
 
Figure 22. Delay risk identification process for predictive model building (Yaseen et al. 
(2020). 
 
Yaseen et al. (2020) searched most common reasons for the schedule delays with the 
literature survey and expert meetings, see Figure 22. The data set were built based on 
the surveys and meetings. The model was feed with project features and RF classifier 
was set to predict projects schedule related risks. The genetic algorithm was used to 
tune parameters for achieve acceptable results. The reasons were categorized to seven 
groups, for example material and owner related purposes. The authors divided the rea-
sons to the sub reasons and examined data of the 40 projects. The data was divided 
into risk levels together with a probability of occurrence. Also, the authors divided risk 
delay reasons to the classes by impact of the reason compared to original schedule and 
the model goal was to predict a delay category.  
 
Chen and He (2012) did research on the cost management system that used data min-
ing, the classification, cost analysis and cost forecasting. The authors explained that es-
sential part of the data mining process is to form insights from the data through pro-
cess of the machine learning. The data collection, data training, data testing and 
50 
application derive knowledge for the future decisions. At the project classification the 
authors used decision tree model. The purpose was to identify suitable projects for the 
cost analysis phase. The cost analysis purpose was to clarify the cost structure of the 
projects. The authors explain that analysis is popularization and extension of principle 
component analysis. The analysis contained multiple horizontal and vertical layers, that 
covers an analysis of overall cost analysis together with geographic analysis. 
 
 
Figure 23. The cost management prediction models’ process activities (Chen and He, 
2012). 
 
Chen and He divided projects by project’s building type, for example cabling projects 
and power distribution projects were separated, these categories had sub-categories 
for the more accurate analysis. In the Figure 23 the authors describe the actions for the 
data during the modelling process. After the data preprocess the authors executed fac-
tor and scenario analysis for the data. After forecasting the cost levels, the authors exe-
cuted a sensitivity analysis for verifying the results before implementing derived 
knowledge further.  
 
In the Engineering Machine-Learning Automation Platform (EMAP): A Big-Data-Driven 
AI Tool for Contractors' Sustainable Management Solutions for Plant Projects Choi, Lee, 
51 
and Kim (2021) aimed to predict risks of the plant projects with a machine learning al-
gorithm, also authors aimed to create a support decision system. The system consists 
of five modules; invitation to bid analysis, design cost estimation, design error check-
ing, change order forecasting and equipment predictive maintenance see Figure 24. 
The information is gathered from existing plant’s project data. The data is collected 
from enterprise resource planning systems together from commercial and public pro-
ject data. Data were cleaned and preprocessed. In the machine learning platform, the 
data is further processed for machine learning basis. The authors used both regression 
and random forest together with the natural language processing activities. The au-
thors note that existing solution, with also assisted by a machine learning model, de-
manded manual work from data analyst and experts.  
 
 
Figure 24. System architecture module illustration with data process of the artificial in-
telligence-based decision support tool (Choi et al., 2021). 
 
The authors note that the data must be preprocessed. The authors used spaCy library 
for the text tokenization, lemmatization, POS tagging and dependency parsing. The 
model’s process included a risk detecting part that used phrase matcher of the SpaCy 
52 
library with fixed rules. The model can detect wanted keywords and phrases for pre-
venting realization of the future risks. The authors built many submodules for different 
purposes such as direct clauses detection from a contract.  
 
 
Figure 25. Modules and functions of the AI decision support application by (Choi et al., 
2021). 
 
The authors examined various EPC projects for defining suitable keywords to determine 
relationships to the risks. The risks were determined by the impact of the risk to strong, 
moderate, and weak impact.  
 
In Figure 25 thew authors describe the purpose of the modules together with their 
functions. Also, authors note algorithms used for separate modules. For the plant pro-
ject bidding phase, the algorithms are focused on natural language processing and 
other modules are mostly focused on the numerical processing. One can view this also 
from the Figure 26, the paper’s process is divided into two segments. The authors note 
that the model’s ability to achieve simultaneously high precision and high recall results 
is limited. For example, highest contrast at the model’s prediction were recall of 99,3% 
and precision of 54%. The authors explain that low precision in the risk detection sub-
module is explained by duplicate risk detection.  
 
Choi et al. (2021) produced decision making system using the data from the selected 
projects. Also, the authors utilized the machine learning for the different submodules 
of the system. The system collects risks from the documents, such as bidding 
53 
documents. The system predicts design costs together with quality of the design in 
terms of errors and schedule. Also, the system is set to predictively maintain parts and 
equipment. The authors note that the system was produced by using python. The per-
formance of the system’s prediction rates by F1 measure were 70%, 86,8%, 87,6%, and 
88,4%. The authors note that the system integration into cloud-based system benefits 
the enterprise by providing discussed applications easily on site. The system can detect 
risk through the project’s life cycle; thus, the risk level is relatively lower during the 
project execution. The author’s intention was to produce a system that requires low or 
no machine learning experience during usage. Also, the automated risk analysis re-
quires fewer work hours, thus the project management can use more work hours to 
manage the projects. Hence, the process is automated risk management is done to 
every project at the same level. This level could be considered at the minimum level of 
the risk management.  
 
 
Figure 26. The model’s development process (Choi et al., 2021). 
54 
5 Process for AI utilization 
 
Building AI-based systems demands resources from the enterprise. In Challenges of 
data refining process during the artificial intelligence development projects in the archi-
tecture, engineering and construction industry Heo et al. (2021) discussed human re-
sources that an AI development project need, the authors examined need of the hu-
man resources from the qualitive, and quantitative perspective and tasks related to the 
project.  
 
 
Figure 27. The AI process and tasks divided into work positions (Heo et al., 2021). 
 
Heo et al. (2021) notes that developmental projects with an AI should has continuous 
data modification process. The first modification is a start of refinement process where 
55 
the data evolves to better form. The authors discuss that the data can be divided into 
two categories, image-based and time series data. The authors describe that process of 
data utilization demands collection of the raw data, data modification, segmentation 
and labeling. After data modification a machine learning model is introduced with the 
data and tuning the model begins. During the process the quality control is done for 
achieving optimal results. This process is also illustrated in Figure 27 together with the 
workflow related work positions and proportions of work times. The paper concluded 
that project management’s participation from the beginning of the project is beneficial 
for the later phases.  
 
Heo et al. (2021) represented a work index for the evaluation of needed resources for 
the AI project, see equation (6). It is notable that one should concentrate among the 
data amount to the data quality. Complexity of the raw data and demanded results for 
the data refinement and machine learning models’ performance depends on the data 
quality.  
 
 𝑊𝑜𝑟𝑘 𝑖𝑛𝑑𝑒𝑥 =
𝑇𝑜𝑡𝑎𝑙 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑎 𝑑𝑎𝑡𝑎
𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑖𝑛𝑝𝑢𝑡 𝑚𝑎𝑛𝑝𝑜𝑤𝑒𝑟 ∙ 𝑊𝑜𝑟𝑘 ℎ𝑜𝑢𝑟𝑠
 (6) 
 
The authors executed a case study where they observed results of the model when the 
refinement manager was used to manage the research and development team. Thus, 
the AI expert could focus on the model and data refinement. Also, the manager can ex-
ecute the quality control on the process and data.  
 
In Optimized artificial intelligence models for predicting project award price Chou et al. 
(2015) strived to forecast a project price with AI modelling. The price of the project 
evolves quite rapidly, hence the changes in costs of construction. The authors used 
multiple approaches; multiple regression analysis, artificial neural network, and case-
based reasoning. The authors examined bid materials of the near 100 bridge projects. 
The details of the used data in Figure 28.  
 
56 
 
Figure 28. Informative attributes of the project bidding data (Chou et al., 2015). 
 
The authors evaluated the model’s performance with mean absolute percentage error. 
The genetic algorithm together with the artificial neural network model achieved aver-
age MAPE rate of 7,526%. The genetic algorithm is a search algorithm that uses sto-
chastic approach. The algorithm sets weights in the model’s calculation and weights are 
tuned during the process.   
 
 
Figure 29. Model’s prediction development information flow (Chou et al., 2015). 
 
57 
The authors conclude that data points with numerical values tend to have higher corre-
lation to the project award amount than categorical values. Although the authors ex-
plain that categorical values can introduce regional differences between project bids. In 
Figure 29 process of the model’s development. In the beginning configuration is set to-
gether with the parameters. The parameters affect to project prices and are tuned by 
the genetic algorithm. After acceptable level of error has been reached the model’s 
performance is verified and further optimized.  
 
 
58 
6 Conclusion 
In sections 2 and 3 the review on the present applications was made. In section 2 the 
review was made for the safety domain. Section 3 consist of the review on the project 
and portfolio management domain.  
 
In the section 2 different approaches for utilizing the AI to safety related purposes has 
been reviewed. Practically all papers aimed to solve safety incident causes for raising 
knowledge to future development purposes. Some authors used the information for de-
veloping the end-product, such as a risk warning system or smart application. According 
to the review it is common to use multiple predictive models. Considering the work done 
in the data preprocessing phase it is logical to formulate a new model continuing of the 
previous model. Natural language processing together with classification- and regres-
sion-based models were used as the tools for achieving ability to predict incident causes 
and other factors that affected in occurrence of the incidents.  
 
Baker et al. (2020) focused on the key elements of the site condition during the occur-
rence. Similarly, Oyedele et al. (2021) examined relationships between the incidents and 
site conditions. They also investigated the relationship between the incidents and work 
phase and tools used during the incidents. Zhang et al. (2019) classified the causes of 
the incidents with five model combination that was tuned by the SQP algorithm. Liang 
and Liu (2021) were keen to solve three main elements affecting to the occurrence of 
the incident, the authors also aimed to formulate repeatable conditions for the system. 
This aimed to effect on the quality of the predictive model. Thus, the input conditions 
are kept similar the model performs better. Goh and Ubeynarayana (2017) exploited pre-
set labels during the incident cause prediction process. Pan and Zhang (2021) focused 
on identification of the risks related to different work phases related to the incidents. 
Identification of the relationship helps project personnel to time predictive actions cor-
rectly. Cheng et al. (2020) used natural language processing activities to unravel causes 
of the incidents from the accident report narratives. Ajayi et al (2020) focused to solve 
the relationship between the incident reason and the injured body part. The only 
59 
machine vision-based paper reviewed was by Arashpour et al.2022). The paper’s pur-
pose was to demonstrate how machine can define position of an excavation machine. 
Kochovcki and Stankovski (2021) were keen to build smart application for construction 
site’s use, the application purpose in safety domain was to generate notifications for site 
personnel. For example, detecting PPE use through the site video monitoring. The appli-
cation of the risk warning system was examined by Lin et al. (2021). Koc et al. (2022) 
viewed occurrence of the incidents with the time series examination, the paper focused 
on predictive actions with knowledge of predicted incidents based on the time series. 
Sattari et al. (2022) focused on safety in the process and asset management domain. The 
authors main emphasis was to find relationships between different incidents and differ-
ent assets and processes.  
 
The review focused on the project management domain in the section 3. Bilal et al. (2016) 
reviewed mainly classification related model usage in construction industry, such as clas-
sification of a document into right class. Mirnezami et al. (2020) examined ways to gen-
erate scenarios of the project outcome with project related data, the authors discussed 
about project cashflow, critical chain management and supportive multi-criteria deci-
sion-making system. The attributes that Oyedele et al. (2021) used for the safety incident 
relationship prediction can be used for the project managemental purposes. Also, it is 
reasonable to seek the attributes that effect directly on the project outcome from pro-
ject managemental view. Yaseen et al. (2020) examined most common reasons for a pro-
ject delay which were gathered from the expert meetings. The reasons were used to 
model the projects schedule risk levels and occurrence probabilities. Chen and He (2012) 
used decision tree for finding suitable projects into the cost analysis phase. The authors 
also noted that it is valuable to explore the data hence this can give valuable insights 
itself, the modelling demands one to process the data. Choi et al. (2021) used enterprises 
resource planning system’s data together with the existing project data for reducing the 
risks on the project, the authors produced system with multiple modules that focused 
on the separate phases, such as a design review, of the project. 
 
60 
 
6.1 Data process and model 
 
The data should be organized with the way that supports later phases of the data ex-
ploration. Also, one can collect data from the multiple sources and merge the data. 
This could lead for increased knowledge of attributes that business has generated over 
time. One should explore through the data and process the attributes. For example, 
one can pivot the data table to the wanted format, handle the missing values, and cal-
culate means and averages for reducing the variance of the attributes. One can aggre-
gate lower-level information data to upper-level information, such as an information of 
the municipality to region level.  Also, one can normalize data. The normalization re-
duce effects of extreme values from the data by normalizing the values for determined 
range.  
 
A complexity and structure of the data determine the procedure for the data. Also, a 
problem formulation affects to demanded procedure. The machine learning domain is 
strongly connected to the statistical problem solving and Python with machine learning 
libraries is often used tool. The Python’s popularity has driven creation of the useful re-
sources around the platform that helps user to solve individual problems with a low 
amount of the experience. It is noteworthy that although the data processing and the 
model building is time consuming despite the relatively simpleness of the coding with 
the Python or similar coding environment.  
 
The data processing is problem dependent and highly related to the data’s structure 
that is processed.  A precise representation of the data processing activities is therefore 
quite inconvenient task to execute. Nevertheless, data preprocessing is an important 
task to execute properly. One can handle missing values, for example, by filling the val-
ues with an average of missing attribute value or by deleting a data related to missing 
attribute. Also, one can process, encode, the data from verbal form to numeral form. 
The machine learning algorithms understands latter better. For example, one can 
61 
encode the categorical string type data from the safety report to numerical form that 
allows the algorithms to process the data. One can also handle data observations that 
differs significantly from other observations by detecting and cleaning these observa-
tions. This is called outlier detection. The outliers are problematic for statistical calcula-
tions, such as mean calculation. The outliers distort the calculations.  
 
As one can view from Figure 27 the AI’s learning process has more weight on data pro-
cessing phases compared to actual learning phase. One can view from the figure that 
80% of estimated working hours are used before learning phase. My experience sup-
ports this view hence I would estimate that 90% of the time is used within data pro-
cessing. As mentioned earlier this is quite good opportunity to generate insights of the 
business domain by going through data related on the business operations. 
 
6.2 Safety management – Data utilization for safety development 
According to the review there is various ways to utilize the AI in the construction busi-
ness domain. A logical approach is to start utilizing the data that already exists in the 
enterprise’s database. After the utilization one can have better understanding on the 
data’s abilities, strengths, and weaknesses. After this the enterprise can modify the 
data collection process towards demanded data attributes and formalities.  
 
After general data cleaning and preprocessing the text mining activities could be exe-
cuted on the safety incident data’s descriptive text part. For the other attributes, such 
as the day of the week, time, project stage and other numerical or categorical values 
the visualization could be done. One can seek for the relationship between operative 
attributes and project life-cycle related attributes. This could gain knowledge for the 
proactive actions. The knowledge could also develop the risk management in domains 
safety and project management. 
 
The relationships between project stage, project season, tools and incident time were 
examined in the safety section. Also, one can seek for asset related relationships 
62 
between the incidents and other attributes. In the data preprocessing and visualization 
phase one can formulate insights from the data, the insights can be highly valuable for 
the enterprise. 
 
After the data is processed, examined, and visualized one shall build model for gener-
ate, for example, predictions from the data. Also, one can use preprocessed data for 
the diagnostic use. Diagnostic approach can explain the data and determine proactive 
actions to prevent similar incidents. Choosing the model is also dependent on the 
problem and dataset’s attributes that is under examination, and it is quite difficult to 
outline a best model for all purposes. According to the review, the use of the multiple 
models is a common approach. One can also consider using the outperforming opti-
mized model as the chosen one. Hence, the data processing is the most time-consum-
ing phase, it is logical to rely on the multiple models. Hence, multiple models can be 
tested one can select the best model in terms of accuracy for the selected objective. 
Furthermore, one can optimize the model for achieving better results.  
 
 
6.3 Project management – risk and activity management 
The main goal of the project and project management personnel, such as a project 
manager, is to achieve planned cost, schedule, and scope. The goals are related to, for 
example, different processes, interactions between stakeholders, resource manage-
ment, and leadership.  
 
The data preprocessing is valuable for different operative domains. Classification of the 
documents, and content of them, could be beneficial for operative use of the build net-
works. Also, a quality of the documents is beneficial in the later project management 
manner. Thus, the projects often connect each other from a scope’s perspective but 
not from a schedule perspective. If the content of the documents is structured, or 
semi-structured, the machine should be able to determine the quality of the content. It 
63 
is notable that this works for safety document domain also. One can determine the re-
lationship between the incidents and the safety documents related to the work.  
 
The design and maintenance of the cash flow is often important part of the project 
management. The cash flow often follows the milestones of the project. Thus, the mile-
stones usually connect most important points of the projects to practical work or deliv-
eries it is logical to manage the cashflow in regular basis. If the cash flow is managed 
regularly a project cost, scope and schedule are more predictable.  
 
One can also determine the most common reasons for different deviations of the pro-
jects. When a machine learning model can detect deviations the risk occurrence is pos-
sible to solve. Project activities, such as design, can be measured from the design out-
put but also from the numerical attributes. Thus, in the distribution networks the de-
sign’s output is affecting to the scope and therefor the budget of the project. Measure-
ment for every project lifecycle points, such as bidding, starting the project, during the 
design could lead to quality in terms of managing deviations.  
 
 
6.4 Summary 
The next practical steps are related to data availability in the present time. The data could 
be used for generating new insights during the data processing and from, for example, 
the predictive model’s outputs. The focus should be on the safety and project manage-
ment domain because these are the key elements in the Elenia’s construction domain. 
As explained in sections 6.2 and 6.3 Elenia should utilize existing data for artificial intel-
ligence use. The enterprise operating domain and management requires predictions that 
are used for the project and safety management. Also, the predictions are used for de-
cision making purposes and are therefore essential for the enterprise’s operation. Also, 
considering the literature review’s outcome and the domain where Elenia is operating 
the machine learning approach is a logical choice.   
 
64 
The literature review on AI utilization in construction domain optimistically drives future 
development in the enterprise. The review gave practical use cases for the further de-
velopment purposes. Hence, it demands less experiment, and therefore, less resource 
when one elaborates existing solution compared to producing a new solution. Thus, it is 
logical to approach AI utilization with the literature review’s examples. As the review’s 
outcome, there is no silver bullet for producing machine learning systems that work in 
every operational domain. Thus, the model’s ability to achieve an objective is dependent 
on the problem and the data structure. The cases should be considered one by one. Fur-
thermore, the review’s use cases exemplify how AI and machine learning domain can be 
used in Elenia’s construction domain.  
 
It is noteworthy that even the machine learning models may seem to be superior on the 
tasks they are built for, the generalization between unrelated task is unknown.  In other 
words, the ability to solve different problems than the model is made for may not work. 
Also, the data structure and attributes effect the possibilities of how one can utilize the 
data and AI. One may have to use significant resources for the preprocessing of the data. 
Also, most of the review’s use cases focused on supporting a decision-making process 
with a machine-learning solution. Therefore, it is logical to develop systems for support-
ive decision making. As mentioned, the dependency on the problem and data demands 
resources from the enterprise. Thus, it is important to identify the tasks where machine 
learning potential is greatest. According to the review, the probability of achieving the 
benefits from AI utilization is greater when the data’s size is significantly large and there 
is a reasonable number of attributes that correlate with the outcome. The machine 
learning solutions may not work properly when the data’s structure or size is inconven-
ient. Also, the solutions are often built for a specific need. Therefore, the solutions are 
not performing well if the outcome is not clearly described.  
 
It is quite easy to understand the opportunities of the utilization of the AI and machine 
learning, for example, to safety management domain. Hence, a machine learning model 
can describe what elements are related to different outcomes one can use this for 
65 
deciding what proactive actions could be made. For example, the machine learning 
model can collect simultaneously information from the enterprise’s resource manage-
ment program, Finnish meteorological institute’s data, safety incident data. Also, the 
model could simultaneously describe, for example, the safety material usage and up-
dates and formulate a risk level for incident by using all that information. A project per-
sonnel, such as project manager and safety supervisor, can risk level detection to priori-
tize time and proactive actions to projects with higher risk level. A strength of the ma-
chine learning models is ability to collect and process enormous amounts of the data. 
Also, the analysis process is continual and similar. Thus, the model’s actions are high in 
quality. 
 
The next steps for utilization Elenia should take advantage of the existing data. The en-
terprise can process the data for formulate deep insights from key processes. After the 
data is examined, one can define the key elements and actions for achieving the objec-
tives. After the process one can define, with the assistance of machine learning, the re-
lationships of the key elements to outcome of the project. It is noteworthy that the 
outcome contains the safety measurements of projects. The enterprise should develop 
process parts that effect into key processes. Over time, the development of key ele-
ments could be used for building decision support system that has multiple modules, 
such as project plan and safety modules. The system could have multiple approaches to 
take advantage of AI. Previously built models for the safety and project management 
domain could work as the modules in the system. The machine vision- and sensor-
based warning systems are also interesting from the enterprises operating domain. For 
example, one could increase the safety of worksite if a machine vision could be utilized 
for warning undesirable movement by excavator near high voltage overhead lines. 
Hence, these systems demand equipment that may be expensive to obtain in the work 
sites. Also, equipment must be connected to artificial intelligence-based system with 
real-time connection in demanding conditions. The requirements raise technical de-
mands for the system. Hence, most of the challenges related to safety and project 
management can be solved by integrating available information efficiently in the 
66 
processes. It is logical to start development from the knowledge and data utilization. 
The applications can support enterprises’ personnel to follow the determined pro-
cesses and instructions punctually.  
 
The Thesis’s focus was on the literature review on the selected domains for utilizing the 
AI. The safety, project and portfolio management domain were examined for collect in-
formation related to the opportunities that AI offers for Elenia’s operational domain.  
Optimistically, Thesis’ acts as one of the first cornerstones for significant utilization of 
AI in Elenia’s construction business. Hence, Thesis’ included only the literature review, 
the practical approaches are apart from the Thesis. For future works, the technical im-
plementation of machine learning models could be examined. As mentioned, this The-
sis´ focused on the utilization of AI with machine learning applications. Thus, the re-
view on the other applications of AI utilization could be beneficial for the enterprise’s 
process development.  
 
 
67 
References 
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., 
Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., 
Brucher, M., Perrot, M. & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. 
Journal of machine learning research. https://doi.org/10.5555/1953048.2078195 
 
Baker, H., Hallowell, M. R., & Tixier, A. J. (2020). AI-based prediction of independent con-
struction safety outcomes from universal attributes. Automation in construction, 118, 
103146. https://doi.org/10.1016/j.autcon.2020.103146 
 
Storopoli J., Huijzer R. & Alonso L. (2021). Julia Data Science. https://juliadatascience.io. 
ISBN: 9798489859165. 
 
Bezanson J., Karpinski S., Shah V. B. & Edelman. (2012). Why We Created Julia. Retrieved 
20, September. https://julialang.org/blog/2012/02/why-we-created-julia/  
 
Romano, F. (2018). Learn Python programming: A beginner's guide to learning the fun-
damentals of Python language to write efficient, high-quality code. Packt Publishing. 
 
Skorpil, V., Oujezsky, V., Cika, P. & Tuleja, M. (2019). Parallel Processing of Genetic Algo-
rithms in Python Language. https://doi.org/10.1109/PIERS-Spring46901.2019.9017332 
 
Mezher, M. A. (2022). PGFLibPy: An Open-Source Parallel Python Toolbox for Genetic 
Folding Algorithm. Journal of advanced computational intelligence and intelligent infor-
matics, 26(2), 169-177. https://doi.org/10.20965/jaciii.2022.p0169 
 
Zhang, F., Fleyeh, H., Wang, X. & Lu, M. (2019). Construction site accident analysis using 
text mining and natural language processing techniques. Automation in construction, 99, 
238-248. https://doi.org/10.1016/j.autcon.2018.12.016 
 
68 
Goh, Y. M. & Ubeynarayana, C. (2017). Construction accident narrative classification: An 
evaluation of text mining techniques. Accident analysis and prevention, 108, 122-130. 
https://doi.org/10.1016/j.aap.2017.08.026 
 
Pan, Y. & Zhang, L. (2021). Roles of artificial intelligence in construction engineering and 
management: A critical review and future trends. Automation in construction, 122, . 
https://doi.org/10.1016/j.autcon.2020.103517 
 
Zhou, Z., Goh, Y. M. & Li, Q. (2015). Overview and analysis of safety management studies 
in the construction industry. Safety science, 72, 337-350. 
https://doi.org/10.1016/j.ssci.2014.10.006 
Rauhiainen, L. & Estra, C. (2018). Artificial intelligence: 101 things you must know today 
about our future. [Lasse Rouhiainen]. 
 
Cheng, M., Kusoemo, D. & Gosno, R. A. (2020). Text mining-based construction site acci-
dent classification using hybrid supervised machine learning. Automation in construction, 
118, 103265. https://doi.org/10.1016/j.autcon.2020.103265 
Arashpour, M., Kamat, V., Heidarpour, A., Hosseini, M. R. & Gill, P. (2022). Computer vi-
sion for anatomical analysis of equipment in civil infrastructure projects: Theorizing the 
development of regression-based deep neural networks. Automation in construction, 
137, . https://doi.org/10.1016/j.autcon.2022.104193 
 
Koc, K., Ekmekcioğlu, Ö. & Gurgun, A. P. (2022). Accident prediction in construction using 
hybrid wavelet-machine learning. Automation in construction, 133, 103987. 
https://doi.org/10.1016/j.autcon.2021.103987 
 
Oyedele, A., Ajayi, A., Oyedele, L. O., Delgado, J. M. D., Akanbi, L., Akinade, O., . . . Bilal, 
M. (2021). Deep learning and Boosted trees for injuries prediction in power infrastruc-
ture projects. Applied soft computing, 110, 107587. 
https://doi.org/10.1016/j.asoc.2021.107587 
69 
 
Ajayi, A., Oyedele, L., Owolabi, H., Akinade, O., Bilal, M., Davila Delgado, J. M. & Akanbi, 
L. (2020). Deep Learning Models for Health and Safety Risk Prediction in Power Infra-
structure Projects. Risk analysis, 40(10), 2019-2039. https://doi.org/10.1111/risa.13425 
 
Bilal, M., Oyedele, L. O., Qadir, J., Munir, K., Ajayi, S. O., Akinade, O. O., . . . Pasha, M. 
(2016). Big Data in the construction industry: A review of present status, opportunities, 
and future trends. Advanced engineering informatics, 30(3), 500-521. 
https://doi.org/10.1016/j.aei.2016.07.001 
  
LeCun, Y., Bengio, Y. & Hinton, G. (2015). Deep learning. Nature (London), 521(7553), 
436-444. https://doi.org/10.1038/nature14539 
 
Kochovski, P. & Stankovski, V. (2021). Building applications for smart and safe construc-
tion with the DECENTER Fog Computing and Brokerage Platform. Automation in con-
struction, 124, 103562. https://doi.org/10.1016/j.autcon.2021.103562 
 
Sattari, F., Lefsrud, L., Kurian, D. & Macciotta, R. (2022). A theoretical framework for data-
driven artificial intelligence decision making for enhancing the asset integrity manage-
ment system in the oil & gas sector. Journal of loss prevention in the process industries, 
74, 104648. https://doi.org/10.1016/j.jlp.2021.104648 
 
Mirnezami, S. A., Mousavi, S. M. & Mohagheghi, V. (2020). An innovative interval type-2 
fuzzy approach for multi-scenario multi-project cash flow evaluation considering TODIM 
and critical chain with an application to energy sector. Neural computing & applications, 
33(7), 2263-2284. https://doi.org/10.1007/s00521-020-05095-z 
 
Heo, S., Han, S., Shin, Y. & Na, S. (2021). Challenges of data refining process during the 
artificial intelligence development projects in the architecture, engineering and 
70 
construction industry. Applied sciences, 11(22), 10919. 
https://doi.org/10.3390/app112210919 
 
Chou, J., Lin, C., Pham, A. & Shao, J. (2015). Optimized artificial intelligence models for 
predicting project award price. Automation in construction, 54, 106-115. 
https://doi.org/10.1016/j.autcon.2015.02.006 
 
Yaseen, Z. M., Ali, Z. H., Salih, S. Q. & Al-Ansari, N. (2020). Prediction of risk delay in 
construction projects using a hybrid artificial intelligence model. Sustainability (Basel, 
Switzerland), 12(4), 1514. https://doi.org/10.3390/su12041514 
 
Chen, S. & He, J. (2012). Research on cost management system of distribution network 
construction projects based on data mining. https://doi.org/10.1109/CI-
CED.2012.6508454 
 
Choi, S., Lee, E. & Kim, J. (2021). The Engineering Machine-Learning Automation Platform 
(EMAP): A Big-Data-Driven AI Tool for Contractors' Sustainable Management Solutions 
for Plant Projects. Sustainability (Basel, Switzerland), 13(18), 10384. 
https://doi.org/10.3390/su131810384 
 
Russell, S. J. & Norvig, P. k. (2014). Artificial intelligence: A modern approach (3. ed., 
Pearson new internat. ed.). Pearson. 
 
Mullainathan, S. & Spiess, J. (2017). Machine Learning: An Applied Econometric Ap-
proach. The Journal of economic perspectives, 31(2), 87-106. 
https://doi.org/10.1257/jep.31.2.87 
 
 
 
71 
MathWorks. (n.d). How the Genetic Algorithm works. Retrieved September 18, 2022, 
https://www.mathworks.com/help/gads/how-the-genetic-algorithm-works.html 
 
Srinivasa-Desikan, B. (2018). Natural Language Processing and Computational Linguis-
tics: A Practical Guide to Text Analysis with Python, Gensim, SpaCy, and Keras. 
 
Lin, S., Shen, S., Zhou, A. & Xu, Y. (2021). Risk assessment and management of excavation 
system based on fuzzy set theory and machine learning methods. Automation in con-
struction, 122, . https://doi.org/10.1016/j.autcon.2020.103490