Jari Isohanni 
Recognition of 
Subtle Colour 
Differences 
A Comparative Study of Machine Learning and 
Colour Difference Metrics 
ACTA WASAENSIA 557 
ISBN 978-952-395-198-3 (print) 
978-952-395-199-0 (online) 
ISSN 0355-2667 (Acta Wasaensia 557, print) 
2323-9123 (Acta Wasaensia 557, online) 
URN http://urn.fi/URN:ISBN:978-952-395-199-0 
PunaMusta Oy, Joensuu, 2025. 
ACADEMIC DISSERTATION 
To be presented, with the permission of the Board of the School of Technology and 
Innovations of the University of Vaasa, for public examination on the 3rd of 
September, 2025, at noon. 
Dissertation of the School of Technology and Innovations at the University of Vaasa 
in the field of computer science 
Author Jari Isohanni, https://orcid.org/0000-0002-7154-2515 
Supervisor(s) Associate Professor Jani Boutellier 
University of Vaasa, School of Technology and Innovations, Computer Science 
University Lecturer Birgitta Martinkauppi 
University of Vaasa, School of Technology and Innovations, Energy Technology 
Custos Associate Professor Jani Boutellier 
University of Vaasa, School of Technology and Innovations, Computer Science 
Reviewers Professor of Biomedical Engineering Tapio Seppa¨nen 
University of Oulu, Faculty of Information Technology and Electrical Engineering 
Professor of Electrical Engineering Mattias O’Nils 
Mid Sweden University, Department of Computer and Electrical Engineering (DET) 
Opponent Associate Professor Miguel Bordallo Lopez 
University of Oulu, Center for Machine Vision and Signal Analysis (CMVS) 
VTIIVISTELMA¨
Va¨ito¨skirjassa tutkitaan hienovaraisten va¨risa¨vyerojen tunnistamista eri menetel-
milla¨. Hienovaraiset va¨risa¨vyerot ovat ta¨rkeita¨ eri sovelluksissa, kuten tervey-
denhuollossa, maataloudessa ja elintarviketeollisuudessa. Va¨risa¨vyerojen avulla
voidaan luokitella ja tunnistaa kuvista kiinnostavia piirteita¨. Ta¨ssa¨ va¨ito¨skirjassa
keskityta¨a¨n hienovaraisten va¨risa¨vyerojen tunnistamiseen painotuotteista, ja yksi
tutkimuksen sovelluskohteista ovat ns. funktionaaliset musteet. Funktionaalisten
musteiden avulla voidaan valmistaa edullisia ei-elektronisia indikaattoreita, jotka
ilmoittavat esimerkiksi tuotteen la¨mpo¨tilan tai kosteuden va¨rinmuutoksen avulla.
Va¨ito¨skirjassa ka¨yteta¨a¨n va¨risa¨vyerojen laskenta-algoritmeja seka¨ valvomatonta
(unsupervised) ja valvottua (supervised) koneoppimista hienovaraisten va¨risa¨vye-
rojen tunnistamiseen. Va¨risa¨vyerojen laskennassa ka¨yteta¨a¨n uusimpia ja tarkoi-
tukseen parhaiten soveltuvia matemaattisia kaavoja. Valvomattomissa menetel-
missa¨ hyo¨dynneta¨a¨n erilaisia ryhmittelymenetelmia¨ (clustering). Valvottujen me-
netelmien osalta tutkitaan yleisimpia¨ neuroverkkorakenteita (convolutional neural
network). Va¨ito¨skirjan kokeellisissa osissa pyrita¨a¨n lo¨yta¨ma¨a¨n ne menetelma¨t, joil-
la hienovaraiset sa¨vyerot voidaan tunnistaa parhaiten.
Va¨risa¨vyerojen laskenta-algoritmit pystyva¨t tunnistamaan riitta¨va¨n suuria va¨riero-
ja todellisissa ka¨ytto¨tapauksissa. Ryhmittelymenetelma¨t toimivat tarkemmin kuin
laskenta-algoritmit ja mahdollistavat myo¨s pienempien va¨risa¨vyerojen havaitse-
misen. Parhaat tulokset saavutettiin konvoluutioneuroverkoilla, joista ResNet-
34 osoittautui testeissa¨ tarkimmaksi. Ta¨ta¨ arkkitehtuuria muokattiin edelleen
ka¨ytto¨tarkoitukseen sopivammaksi. Sopivin neuroverkko saatiin, kun viimeinen
yhdista¨va¨ kerros muutettiin keskiarvolaskennasta maksimilaskentaan. Ta¨ssa¨ ark-
kitehtuurissa ka¨ytettiin myo¨s gradienttien keskitta¨mista¨ osana oppimisprosessin ta-
kaisinkytkenta¨a¨.
Eri menetelmiin vaikuttavat merkitta¨va¨sti kuvien ha¨irio¨t ja kuvien laatu. Ku-
vien esika¨sittelylla¨ on ta¨rkea¨ rooli, kun va¨ito¨skirjassa esiteltyja¨ menetelmia¨ otetaan
ka¨ytto¨o¨n eri sovelluksissa. Ka¨ytto¨tarkoituksesta riippuen ryhmittelymenetelma¨t
voivat tarjota paremman hyo¨ty–panossuhteen kuin monimutkaisemmat neurover-
kot. Ta¨ma¨ johtuu siita¨, etta¨ ryhmittelymenetelma¨t eiva¨t vaadi aineiston keruuta tai
neuroverkon kouluttamista. Na¨ma¨ menetelma¨t eiva¨t myo¨ska¨a¨n pyri oppimaan pai-
notuotteen paperin ominaisuuksia.
Avainsanat: tekoa¨ly, koneoppiminen, va¨riero
VI
ABSTRACT
The dissertation investigates the identification of subtle colour differences using
different methods. Subtle colour differences are important in applications such as
healthcare, agriculture, and the food industry. Colour differences can be used to
classify and identify features of interest in images. In this dissertation, the focus is
on identifying subtle colour differences in prints, and one use for the results of the
dissertation are functional inks. Functional inks can be used to create inexpensive
non-electronic indicators that can trough colour change tell the product’s tempera-
ture or humidity, for example.
In this dissertation, colour difference calculation algorithms as well ass unsuper-
vised, and supervised machine learning methods are used to identify subtle colour
differences. The colour difference algorithms use the newest and most suitable for-
mulas developed. For unsupervised methods, different clustering algorithms were
used. For supervised learning, the most common architectures were explored. In
the various experiments in the dissertation, the aim was to find the methods that are
ideal for identifying subtle colour differences.
Colour difference algorithms can be used to identify colour differences that are
significant enough in real-life use cases. Unsupervised methods work better than
colour difference algorithms and can be used to identify smaller differences. The
best results were achieved with convolutional neural networks, of which ResNet-34
proved to be the most accurate in the tests. This architecture was further modified
to better suit the use case. The best architecture turned out to be the version in
which the last connecting layer was changed from average pooling to maximum
pooling. In this architecture, gradient centralisation was also used as part of the
backpropagation of the learning process.
The different methods are significantly affected by the disturbances in the images
and the quality of the images. Image preprocessing plays an important role when
the method presented in the dissertation is put into practice. Depending on the use
case, unsupervised clustering methods can lead to a better effort-outcome ratio than
more complex supervised neural networks. This is due to the fact that unsupervised
methods do not necessitate the prior collection of labeled datasets or the training of
neural networks. Moreover, these methods do not explicitly aim to learn or model
the underlying structure of the input data.
Keywords: artificial intelligence, machine learning, colour difference
VII 
ACKNOWLEDGEMENTS 
First of all, I would like to thank Kimmo Svinhufvud for his encouraging words 
during the ”Scientific Writing” course at the Kokkola University Consortium Chy-
denius — the course that sparked the beginning of this journey. It’s hard to believe 
how quickly time has passed since 2020, back when everything sounded so simple, 
it was not. Another person who had a major influence on starting this process, thank 
you former Rector / CEO of Centria University of Applied Sciences Kari Ristim¨ aki,
you nudged me over the start line. I am also grateful to our current rectory of Tapio 
Huttula, Jennie Elfving and Marko Forsell for supporting employees towards new 
degrees. Warm thanks to my stydying colleagues — and hopefully future doctors 
— Annika and Johanna, for sharing their insights and knowledge. 
This dissertation would not have been possible without the ideas and innovative 
thinking of Sture Udd and UPC Konsultointi Oy. The visionary development of 
functional inks provided a real-world use case that anchored this thesis in practical 
relevance. I also extend my sincere thanks to VTT Technical Research Centre of 
Finland and Liisa Hakola for the sample materials, which were essential in carrying 
out the experiments. 
Deep respect go to Kai Hermsen and Lorna Goulden — I can’t express how grateful 
I am that you kept the Twinds Foundation going strong while I focused on my 
studies. 
The first article of this thesis was written under the supervision of Professor Jo-
han Lilius from ˚ Abo Akademi. I am thankful for the support I received during 
those early and important steps of the doctoral process. The initial phases of this 
work were financially supported by the Central Ostrobothnia Regional Fund of the 
Finnish Cultural Foundation — thank you for your trust. 
The main supervisory effort was led by Associate Professor Jani Boutellier and 
University Lecturer Birgitta Martinkauppi from the University of Vaasa. Thank you 
both for your continuous guidance, thoughtful feedback, and unwavering support 
throughout this journey. I would not have been connected with the University of 
Vaasa without Professor Heidi Kuusniemi and her presentation in the Digitalisa-
tion Committee of the Ostrobothnia Chamber of Commerce — thank you for that 
connection. 
I would also like to express my sincere gratitude to the two pre-examiners: Profes-
sor Tapio Sepp¨ anen from the University of Oulu and Professor Mattias O’Nils from 
Mid Sweden University. Your accurate comments and suggestions significantly im-
VIII 
proved the quality of this thesis in its final stage. Dear Associate Professor Miguel 
Bordallo Lopez, thank you for being my opponent, your respectful and insightful 
feedback contributed meaningfully to the academic depth of this dissertation. 
To my parents, Harri and Ulla, as well as my grandmother Kaisu, aunt Elisa and 
uncle Kauko — thank you for your continuous support and genuine curiosity about 
the progress of this work. 
And finally, the most important, my heartfelt thanks go to Tanja, Aaron, and Leevi. 
You brought balance and perspective to this journey. Your everyday presence gave 
me strength to finish this thesis — and reminded me of what truly matters. I hope 
that one day I’ll have the honour of attending your own dissertations. 
Acta Wasaensia IX 
CONTENTS 
Figures XI 
Tables XII 
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 
1.1 Related past research . . . . . . . . . . . . . . . . . . . . . 5 
1.2 Objectives and research methods . . . . . . . . . . . . . . . 7 
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 8 
1.4 Structure of the dissertation . . . . . . . . . . . . . . . . . . 9 
2 DIGITAL IMAGES AND COLOURS . . . . . . . . . . . . . . . . 10 
2.1 Pre-sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 
2.2 Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 
2.3 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . 15 
2.4 Presentation of colours . . . . . . . . . . . . . . . . . . . . 17 
2.5 Measurement of colour differences . . . . . . . . . . . . . . 24 
2.6 Colour related applications . . . . . . . . . . . . . . . . . . 26 
3 MACHINE LEARNING . . . . . . . . . . . . . . . . . . . . . . . 28 
3.1 Unsupervised learning . . . . . . . . . . . . . . . . . . . . 29 
3.2 Supervised learning . . . . . . . . . . . . . . . . . . . . . . 42 
3.3 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . 48 
4 METHODS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 
4.1 Dataset for the study . . . . . . . . . . . . . . . . . . . . . 62 
X 
4.2 Article I - Mathematical methods . . . . . . . . . . . . . . . 65 
4.3 Article II - Unsupervised learning . . . . . . . . . . . . . . 65 
4.4 Articles III & IV - Convolutional neural networks . . . . . . 66 
5 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 
5.1 Article I - Mathematical methods . . . . . . . . . . . . . . . 68 
5.2 Article II - Unsupervised learning . . . . . . . . . . . . . . 70 
5.3 Article III - Convolutional neural networks . . . . . . . . . 73 
5.4 Article IV - Optimised ResNet . . . . . . . . . . . . . . . . 75 
6 DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 
6.1 Mathematical methods in colour comparison . . . . . . . . . 79 
6.2 Clustering colours with unsupervised learning . . . . . . . . 80 
6.3 Neural network based colour classification . . . . . . . . . . 83 
6.4 Comparison of the methods . . . . . . . . . . . . . . . . . . 86 
6.5 Limitations and future research . . . . . . . . . . . . . . . . 87 
7 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 
Acta Wasaensia XI 
Figures 
1 Thermochromic indicator in a beer bottle (photo: UpCode Ltd) . . . 2 
2 Example of the colour change in functional inks . . . . . . . . . . . 2 
3 Example of the dataset used . . . . . . . . . . . . . . . . . . . . . . 3 
4 Examples of the different environments, paper and printer noise . . . 5 
5 Electromagnetic wavelengths (reproduced from Tooms (2015)) . . . 10 
6 A traditional stack of lens elements in the holder tube (reproduced 
from: Bloss (2009)) . . . . . . . . . . . . . . . . . . . . . . . . . . 12 
7 5 × 5 Bayer CFA pattern (reproduced from: Bayer (U.S. Patent 
3971065A, Jul. 1976)) . . . . . . . . . . . . . . . . . . . . . . . . 13 
8 Example of an imaging pipeline (reproduced from: Nakamura (2017)) 15 
9 CMYK colour space . . . . . . . . . . . . . . . . . . . . . . . . . . 19 
10 RGB colour space . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 
11 LAB colour space (reproduced from: Belasco, Edwards, Munoz, 
Rayo, and Buono (2020)) . . . . . . . . . . . . . . . . . . . . . . . 21 
12 Various colour depths presented on red channel . . . . . . . . . . . 22 
13 Different colour steps . . . . . . . . . . . . . . . . . . . . . . . . . 22 
14 Different colour space gamuts (reproduced from: Palus (1998)) . . . 23 
15 Example of clustering, left image source data, right image expected 
clustering (reproduced from: Jain (2010)) . . . . . . . . . . . . . . 30 
16 Different clustering methods (reproduced from: Han, Kamber, and 
Pei (2012a)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 
17 Euclidean and Manhattan distance . . . . . . . . . . . . . . . . . . 33 
18 Illustration of different linkages (reproduced from: Jeon, Yoo, Lee, 
and Yoon (2017)) . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 
19 The process of supervised learning (reproduced from: Pramoditha 
(n.d.)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 
20 Example of confusion matrix for multi-classification . . . . . . . . . 48 
21 Example of LeNet CNN-architecture (reproduced from: J. Gu et al. 
(2018)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 
22 Multilayer Perceptron architecture . . . . . . . . . . . . . . . . . . 52 
23 Cross-validation process . . . . . . . . . . . . . . . . . . . . . . . 59 
24 Transfer learning process . . . . . . . . . . . . . . . . . . . . . . . 60 
25 Samples of images used in Article I . . . . . . . . . . . . . . . . . . 63 
26 One sample of the DS1 and DS2 image, and extraction of colour 
areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 
XII Acta Wasaensia 
27 Samples of DS1 with varying colour intensity . . . . . . . . . . . . 65 
28 The process used in Article II . . . . . . . . . . . . . . . . . . . . . 66 
29 Failed images in experiment two . . . . . . . . . . . . . . . . . . . 72 
30 Final architectures A and B . . . . . . . . . . . . . . . . . . . . . . 77 
31 40% colour intensity change . . . . . . . . . . . . . . . . . . . . . 80 
32 Example of the challenge of clustering colours . . . . . . . . . . . . 81 
33 20% colour intensity change . . . . . . . . . . . . . . . . . . . . . 82 
34 Examples of small colour differences . . . . . . . . . . . . . . . . . 83 
Tables 
1 Distribution of samples in Article I . . . . . . . . . . . . . . . . . . 62 
2 Distribution of samples in DS1 . . . . . . . . . . . . . . . . . . . . 63 
3 Distribution of samples in DS2 . . . . . . . . . . . . . . . . . . . . 64 
4 Results from the Article I, second experiment, CIEDE2000(1, 1, 1) 
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 
5 Results from the Article I, second experiment, CIEDE2000(2.76, 
1.58, 1) algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 
6 Results from the Article I, second experiment, CIEDE2000(2, 1, 1) 
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 
7 Results of the clustering process algorithm with colour difference 
>= 20% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 
8 Results of the second experiment, colour difference <= 10% . . . . 72 
9 Results of the first experiment (difference >= 20%) in Article III . . 74 
10 Results of the second experiment, colour difference <= 10% . . . . 74 
11 Results of first experiment (without K-Fold cross-validation) in Ar-
ticle IV, colour difference <= 10% . . . . . . . . . . . . . . . . . . 76 
12 Final architectures, K-Fold cross-validation accuracy . . . . . . . . 77 
13 Results of the different MLPs . . . . . . . . . . . . . . . . . . . . . 85 
XIII 
LIST OF PUBLICATIONS 
The dissertation is based on the following four refereed articles: 
(I) Jari Isohanni, Use of Functional Ink in a Smart Tag for Fast-Moving Con-
sumer Goods Industry, Journal of Packaging Technology and Research, 6, 
187–198, (2022). https://doi.org/10.1007/s41783-022-00137-4 © 2022, The 
Author. Published by Springer Nature. CC BY. 
(II) Jari Isohanni, Recognising small colour changes with unsupervised learn-
ing, comparison of methods, Advances in Computational Intelligence, 4, 
6, (2024). https://doi.org/10.1007/s43674-024-00073-7 © 2024, The Author. 
Published by Springer Nature. CC BY. 
(III) Jari Isohanni, Using convolutional neural networks to classify subtle colour 
differences (In peer-review at ELCVIA, Electronic Letters on Computer Vi-
sion and Image Analysis journal) 
(IV) Jari Isohanni, Customised ResNet architecture for subtle colour classifi-
cation, International Journal of Computers and Applications, (2025). 
https://doi.org/10.1080/1206212X.2025.2465727 © 2025, The Author. Pub-
lished by Taylor & Francis. CC BY. 
All of the articles are reprinted with the permission of the copyright owners. 
XIV 
AUTHOR’S CONTRIBUTION 
Publications I - IV 
The author has contributed to the articles of this dissertation, having independently 
conceived, conducted, and analysed the research, as well as having authored the 
manuscripts in their entirety. This work represents the author’s original contribu-
tions to the scientific literature, reflecting the author’s experience and dedication to 
advancing knowledge in the field. 
XV 
Abbreviations 
ADC Analog-to-digital converter 
Adam Adaptive Moment Estimation 
Adagrad Adaptive Gradient Algorithm 
AI Artificial intelligence 
AMI Adjusted Mutual Index 
ANN Artificial Neural Network 
ARI Adjusted Rand Index 
AUC Area under the curve 
BERT Bidirectional Encoder Representations from Transformers 
BIRCH Balanced Iterative Reducing and Clustering using Hierarchies 
CCD Charge-coupled device 
CFA Colour Filter Array 
CFM Colour Filter Mosaic 
CIE Commission Internationale de Photome´trie 
CMOS Complementary metal-oxide-semiconductor 
CMYK Cyan magenta yellow black colour space 
CNN Convolutional Neural Network 
CO2 Carbon oxide 
DBI Davies-Bouldin Index 
DBSCAN Density-based spatial clustering of applications with noise 
DCT Discrete Cosine Transform 
DI Dunn’s index 
DSC Digital Still Camera 
EMCCD Electron Multiplying CCD 
EU European Union 
FMI Fowlkes-Mallows Index 
FPN Fixed pattern noise 
GAN Generative adversarial network 
GD Gradient Descent 
GMM Gaussian mixture model 
HVS Human Vision System 
ICA Independent Component Analysis 
ICC International Color Consortium 
IoT Internet of Things 
IR Infrared 
ISO International Organization for Standardization 
JPEG Joint Photographic Experts Group 
LAB LAB colour space 
LED Light emitting diode 
LSTM long short-term memory networks 
LOF Local Outlier Factor 
XVI 
LOOCV Leave-One-Out Cross Validation 
nm Nanometre 
MOSFET Metal-oxide-semiconductor field-effect transistor 
MI Mutual Index 
ML Machine learning 
MLP Multilayer perceptron 
NAG Nesterov Accelerated Gradient 
NLP Natural Language Processing 
NN Neural Net 
MAE Mean absolute error 
MSE Mean squared error 
MVCOR Correlation Coefficients for Multivariate Data 
OPTICS Ordering Points To Identify the Clustering Structure 
PCA Principal Component Analysis 
PCB Printed Circuit Board 
PRNU Photo-response nonuniformity noise 
QR code Quick Response code 
RGB Red green blue colour space 
ReLU Rectified Linear Unit 
RI Rand Index 
RMSE Root Mean Squared Error 
RMSprop Root Mean Square Propagation 
RMSSTD Root-mean-square standard deviation 
RNN Recurrent neural networks 
ROC Receiver operating characteristic curve 
RS R-squared 
sCMOS Scientific CMOS 
SDD Slope Difference Distribution 
SGD Stochastic Gradient Descent 
SLR camera Single-lens reflex camera 
SVM Support Vector Machine 
tSNE t-Distributed Stochastic Neighbour Embedding 
UV Ultraviolet 
VGG Visual Geometry Group 
WB White balance 
WCSS Within-Cluster Sum of Squares 
Acta Wasaensia 1 
1 INTRODUCTION 
The recognition of colours, their classification, and separation is useful for many in-
dustries, including healthcare, the food industry, and manufacturing. This research 
was originally sparked by the fast-moving consumer goods (FMCG) and food in-
dustry sectors. Especially in these industries, consumers and end users will be more 
interested in the life cycle of products they use, and the exchange of information 
between consumers and producers has a high value (Golan et al., 2004; Isohanni, 
2022). Consumers and industry are also more aware of CO2 emissions, and espe-
cially in the food industry, of the safety of food and its sources. In response, an 
increasing level of data needs to be collected and used. IoT devices and printed 
electronics are looking to create data from various life-cycle events, but the prices 
of these devices are too high for certain low-cost items. Also, integration to most 
consumer items is not possible as many items have a short life-span and their unit 
price is low. In such cases, printed indicators with functional inks might play an 
important role. Functional inks work by interpreting their status through colour 
change; but reading these colour values with mobile devices is currently not reli-
able or accurate, especially when subtle changes are considered. 
Colour recognition in controlled environments, or with clearly separate colours, is 
a trivial task in computer vision. Colour recognition can be done either by using 
colour difference algorithms or by machine learning. When colours become closer 
to each other, the task becomes more challenging and achieving the same accuracy 
as human vision is not certain. This applies especially in real-life use cases, and 
these subtle colour changes can be meaningful, as shown later in the dissertation. 
The word ’subtle’ is not a scientific term and must be defined in more detail when 
used. In this dissertation, colours are seen to have a subtle difference if their Delta-
E (E) is below 10. Delta-E is a metric that quantifies the difference between two 
colours in a colour space, typically LAB (Luo, Cui, & Rigg, 2001). Delta-E is 
widely used in industries such as printing, textile industry, healthcare, and digital 
imaging to measure colour accuracy. 
An example of the role of colour change is presented in Figure 1. where the ther-
mochromic ink works as an indicator. In this example, the indicator gives consumer 
information on whether the beer is at the correct drinking temperature by interpret-
ing the temperature of the bottle. If the bottle is too warm (more than 7oC), the 
green bar at the bottom of the Datamatrix becomes transparent. 
Functional inks have the characteristics of working through chemical reactions that 
change the state of the optical properties of the ink – i.e. the colour. Chemical 
reactions are influenced by current environmental conditions such as temperature, 
humidity, and some gasses. The functional inks appear in two stages; the stage that 
applies when condition A is valid, and the stage that applies in condition B. For 
example, the functional ink can appear transparent when the temperature is below 
2 Acta Wasaensia 
Figure 1. Thermochromic indicator in a beer bottle (photo: UpCode Ltd). 
60oC, and red when the temperature is above 60oC (Harvey, 2007). A good example 
of a functional ink is the thermochromic ink used in the TagItSmart Horizon2020 
project (Figure 2). This ink changes colour if the temperature of the ink is above or 
below 7oC. In this demonstration, the ink reversibly changes colour, but the change 
can also be irreversible (Bilgin & Backhaus, 2017). Functional inks have the advan-
tage of being non-electronic which helps their recycling, and they are also suited for 
high-speed printing processes (conventional, non-conventional, and hybrid), which 
makes them suitable for many products. The mentioned properties are useful when 
consumers are the main user group, as functional inks will not require consumers to 
take any extra action. The indicator colour can be read either by visual inspection 
or by a mobile application (Isohanni, 2022). 
Figure 2. Example of the colour change in functional inks. 
Mobile devices are now used by almost all consumers, which makes them attractive 
for use in different applications. If functional ink is used in consumer products, 
it would enable consumers or other end-users to read indicator data using smart-
phones. This requires accurate colour recognition so that the consumer / end-user 
Acta Wasaensia 3 
can trust the data provided by the indicator. In recent history, we have seen many 
use cases in which mobile devices have been used for colour recognition. In their 
research, Angelico et al. (2021) developed an app to recognise a limited range of 
infant stool colours with mobile devices, and the accuracy of their solution was al-
most 100%. Yang et al. (2021) proposed a solution that can estimate the organic 
matter content of the soil under ideal lighting and soil preparation conditions; but 
faced a challenge of sharing the parameters between devices. Yulita, Amri, and 
Hidayat (2023) trained a DenseNet model for the detection of tomato leaf disease 
using a mobile application for image capture and achieved 95.7% accuracy. Teren-
san, Salgadoe, Kottearachchi, and Weerasena (2024) showed the power of smart 
phones and developed an app that can identify blast and brown spot diseases by us-
ing K-means clustering, and their app worked with an accuracy of 84.3% and could 
identify diseases without the knowledge of experts. 
As shown in the previous research mentioned above, mobile devices have great po-
tential in both colour recognition and in the recognition of colour differences. The 
characteristics of the devices and cameras must be taken into account when solu-
tions are developed, and appropriate methods must be used to ensure the accuracy 
and reliability of the solution. These past applications also show that colour recog-
nition can be done in many different ways. However, the previous mentioned and 
other research have focused on various other domains of colour recognition than 
printed ones, and the earlier research is more carefully examined in Section 1.1. 
Printed colours depend on the printer (and its settings), ink, paper, and printing 
conditions, and combination defines how the final colour appears (Mangin & Silvy, 
1997). If multiple printers are given the same command ’print 100% blue’, the 
results will vary between printers, and even with the same printer, different results 
are produced over time. 
In this research, the colours are recognised from printed samples (Figure 3). In the 
samples used, the colours are embedded inside a QR code for two reasons. Firstly, 
the areas within the QR code can be easily extracted for further processing. In 
addition, QR codes simulate the usage of functional inks as part of the item label, 
and in terms of the presented research, make it more efficient to incorporate the 
results of this dissertation into production (Isohanni, 2022). 
Figure 3. Example of the dataset used. 
The presented use case is prone to noise that comes from the structure of the paper, 
the printing quality, and the imaging pipeline. In optimal conditions, the colours 
4 Acta Wasaensia 
printed on the paper substrate would be constant throughout the surface. However, 
when the ink intensity becomes lower, the structure of the paper becomes more 
visible, leading to more noise. Another noise source is the quality of the print, and 
depending on the printing method used, the pattern of the print can produce noise 
and make the colour appear uneven (Figure 4). The print pattern can be so unique 
that it can be used to identify the printer (Ali et al., 2004). Partly, these sources 
of noise can be managed by image processing methods, and this dissertation looks 
at whether the methods used can manage noise as well. The last source of noise 
is the imaging pipeline, where electronic signals are converted into the final image 
(see Section 2.3). But this source of noise has only a limited influence on the use 
case presented, as other sources of noise are more meaningful, and the noise of the 
imaging pipeline is reduced by applying blur to images. 
With the use case presented in this dissertation, uneven and varying conditions and 
ambient light have their influence on the colour information. The effect of un-
even ambient illumination is mitigated by the limited spatial extent of the area from 
which colour information is acquired, thereby minimising the influence of lighting 
non-uniformities on the accuracy of colour measurement. However, dynamic or 
fluctuating conditions - such as changes in lighting, background, full shadows and 
paper colour - represent a more substantial challenge, as they can compromise the 
stability and reproducibility of the captured data (Figure 4) and in real life, dust, 
dirt, and tampered colours make the challenge even more complex. Ambient light 
conditions cannot be strictly controlled, as consumers use mobile applications in 
many different environments. To tackle this common issue, image auto-levelling 
can be used to make the solution presented more generally applicable. In image 
auto-levelling, the brightness and contrast of a digital image are automatically ad-
justed to enhance its overall appearance. The goal is to enhance the visual quality 
by distributing the intensity values (brightness) more evenly across the available 
range, often from the darkest black to the brightest white. Image auto-levelling ad-
justs the pixel values so that the darkest and brightest regions are stretched to fit 
the full dynamic range of the image. The references for darkest and brightest re-
gions are known as source data and contain regions with no-colour and those which 
are totally black (Limare, Lisani, Morel, Petro, & Sbert, 2011). A more advanced 
solution would be to add white-balance or similar corrections to the image, which 
could be helpful when adapting the solution to different ambient light temperature 
conditions. But as matters of imaging scene in the use case, close range to object, 
and a low number of colours present, such algorithms are not suitable to be used in 
the presented use case. 
This dissertation offers a novel contribution to the fields of computer vision and 
machine learning by providing a comparative analysis of various methods for subtle 
colour difference recognition. Additionally, the dissertation compiles a collection 
of approaches suitable for diverse use cases in the recognition of colour differences. 
The dissertation also supports the development of functional inks and their use cases 
Acta Wasaensia 5 
Figure 4. Examples of the different environments, paper and printer noise. 
by using new methods that can accurately recognise colour or its change. Indirectly, 
the dissertation also provides new information for the development of QR codes if 
colours are embedded in the QR code for a specific reason. With this information, 
more use case-specific smart tags can be developed with functional inks, and the 
information that is read from the smart tags will be more trustworthy and accurate. 
1.1 Related past research 
This dissertation is structured in such a way that the colour difference algorithm 
CIEDE2000, unsupervised and supervised learning are explored. The methods are 
then compared, and the relevant past research related to each of the methods is 
described in this section. In previous research, many approaches for colour recog-
nition have been used. Notably, some of the earlier research has been conducted 
long ago in the 1970s and 1980s, before artificial intelligence emerged during the 
2010s. 
The mathematical methods in colour difference calculation are widely used and de-
veloped. The most recent version of industrial standard is the so called CIEDE2000 
(CIE00) by CIE (International Commission on Illumination), which calculates the 
difference between two colours, building on the previous models CIE76 and CIE94. 
The formula is particularly designed to address perceived colour differences across 
the colour space. CIEDE2000 has been used in many studies where the colour 
difference is matched against human perception or in general colour comparison. 
Examples of such research are rice colour recognition by Nguyen, Vo, and Ha 
(2022) and mortar colour difference recognition by L´ opez, Guzm´ an, and Di Sarli 
(2016). Previous research by other researchers (e.g. Ghinea et al. (2011, 2010); 
6 Acta Wasaensia 
Mangine, Jakes, and Noel (2005); Pecho, Ghinea, Alessandretti, Pe´rez, and Della 
Bona (2016)) shows that mathematical methods are useful when the differences in 
colour are large or when the environment can be controlled. Using such an en-
vironment, the authors showed that mathematical methods can be used in small 
colour difference recognition. The work done in the context of dental ceramics 
(e.g. Ghinea et al. (2011, 2010); Pecho, Ghinea, Alessandretti, P´ erez, and Della 
Bona (2016)) shows that the CIEDE2000 algorithm can be used when there are 
small differences in the colours. In all of these previous studies, mathematical al-
gorithms have been used in a very controlled environment, and they have focused 
on capturing colour information from surfaces which are quite solid in colour. In 
this dissertation, mathematical methods are used to calculate the difference of two 
colours (Article I). 
Unsupervised learning (Article II) has been mostly used for colour segmentation 
when it comes to colour recognition. Colour segmentation aims to identify and iso-
late distinct regions within an image based on their colour attributes. This process 
typically involves algorithms that analyse the pixel values of an image, grouping 
similar colours to clusters to create meaningful segments that can be further pro-
cessed or analysed. Such clustering can be used for colour recognition if pixels 
with the same colour belong to one cluster. In the past, unsupervised learning, 
mainly clustering, has been used, for example, in agriculture (e.g. Abdalla, Cen, 
El-manawy, and He (2019); Al-Shakarji, Kassim, and Palaniappan (2017); Di Gen-
naro, Toscano, Cinat, Berton, and Matese (2019); Fuentes-Pen˜ailillo, Ortega-Farias, 
Rivera, Bardeen, and Moreno (2018)) and healthcare (e.g. Khan et al. (2021); 
Rundo et al. (2020)). The mentioned past research articles show that unsupervised 
clustering has its use cases in colour image segmentation. But past research has 
also shown that unsupervised learning has its challenges when colour differences 
between segments are small or when segments do not have clear edges. The big 
difference from previously mentioned mathematical methods is that unsupervised 
methods have been used in environments which are not so strictly controlled. How-
ever, even though these past studies were working with colours, they do not directly 
focus on the recognition of colour differences in printed sources. 
Past studies on supervised learning are loosely related to the approach presented in 
this study (Articles III & IV). However, they have explored the use of colours in ar-
tificial neural networks in different contexts or as features of broader object recogni-
tion tasks (Anandhakrishnan & Jaisakthi, 2022; Arsenovic, Karanovic, Sladojevic, 
Anderla, & Stefanovic, 2019; Lai & Westland, 2020; J. Wu et al., 2019). These 
research articles are examples of the use of supervised learning in the context of 
colour recognition. But although they only use colour as one feature and state that 
using many features is advantageous when it comes to colour recognition, they give 
directions for future research. Many other past studies have demonstrated the power 
of convolutional neural networks (CNNs) (e.g. Apriyanti, Spreeuwers, Lucas, and 
Veldhuis (2021); Atha and Jahanshahi (2018); Boulent, Foucher, Th´ eau, and St-
Acta Wasaensia 7 
Charles (2019); B¨ uy¨ ukarıkan and ¨ Ulker (2022); Engilberge, Collins, and S¨ usstrunk
(2017); Przybyło and Jabło´ nski (2019); Q. Zhang et al. (2018)) in use cases that 
relate but are different, as they do not cover subtle colour difference in printed 
sources. Of the mentioned research, Atha and Jahanshahi (2018) found out that 
CNN’s, especially ZFNet, can classify items based on small differences. The work 
by B¨ uy¨ ukarıkan and ¨ Ulker (2022) also proved that various CNN’s can estimate the 
illumination of images obtained in varying light colours, and this study is quite well 
related to the approach presented in this dissertation. However B¨ uy¨ ukarıkan and 
¨ Ulker had a special setup for image capturing and a larger surface area for colour. 
Colour identification does not always need a deep network with a large amount of 
parameters, as Zhang et al. showed, as they were able to classify eight different 
colours with their custom lightweight CNN. Apriyanti et al. (2021) were able to 
use different CNN’s to classify flower colours into five categories. As can be seen, 
past research has shown that CNNs are well suited for colour recognition or similar 
tasks. Research has further shown that CNN’s ability to automatically learn hierar-
chical features (such as intricate patterns and variations) from images is useful for 
the recognition of colour differences. 
As a summary of the past research, there are many studies that relate to this disser-
tation as they have looked into colour or colour difference recognition. What makes 
this dissertation different is the use of printed colours as a dataset. Such data sets, 
using printed colours and collected in a non-controlled environment, have not been 
collected in the past. Another new contribution is the recognition of subtle colour 
differences in printed sources, in which mathematical, unsupervised, or supervised 
methods were used, and such comparisons have not been performed in the past. 
1.2 Objectives and research methods 
Previous scientific research has focused on use cases where the coloured sample 
areas occupy a relatively large portion of the object (e.g., cars or fruits with a dom-
inant surface colour), or scenarios with clear chromatic deviations. Many studies 
have relied on high-quality cameras designed for surveillance or photography, such 
as SLR cameras. In prior neural network research, colour has typically been treated 
as only one of many item features. 
The main research problem addressed in this dissertation is the reliable detection 
and classification of subtle colour differences with machine learning and colour dif-
ference metrics. The objective is to investigate methods for recognising colours and 
their intensity from printed sources, particularly in cases where the colour sample 
areas are small and no accurate colour reference is available. The small colour area 
is not typically influenced by shadows or uneven ambient light, so research focusses 
purely on colour difference recognition. 
8 Acta Wasaensia 
The study aims to evaluate and compare different technical approaches including a 
non-learning-based colour difference algorithm, an unsupervised clustering meth-
ods, and supervised neural network models for their applicability in real-world use 
case, where images are captured with mobile devices. The research focusses on 
adapting and extending existing methodologies to a context where colours are em-
bedded within a printed QR code marker, simulating an colour changing indicator. 
1.3 Contributions 
1.3.1 Article I 
This article demonstrates that embedding functional ink within traditional markers 
does not negatively impact the decoding performance of smart tags. The article 
explores the effectiveness of the CIEDE2000 colour-difference algorithm in detect-
ing indicator states, with a specific focus on its performance in different parame-
ter combinations. CIEDE2000 parameters account for differences in lightness (L 
*), chroma (C *), and hue (H *), and these parameters incorporate corrections for 
chroma compression and hue interactions to align with human visual perception. 
Adjustable weighting factors of the parameters allow the algorithm to be tailored 
for specific applications, ensuring a more accurate assessment of colour. Among 
the parameter weights tested, the combination CIEDE2000(2.76, 1.58, 1) demon-
strated the best performance, particularly in scenarios involving low-intensity func-
tional ink. The article highlights the need for future studies to address absolute 
colour-value detection and improve colour recognition accuracy, especially for low-
intensity colours in functional inks. 
1.3.2 Article II 
Article II evaluates the effectiveness of prevalent unsupervised learning techniques 
in detecting and differentiating printed colours on paper, with a specific focus on 
CMYK ink levels. This shows that unsupervised clustering methods can reliably 
identify colours within QR codes when the CMYK saturation difference is 20% or 
higher in at least one CMYK channel. The article highlights the performance of 
K-means in recognising colours even when CMYK saturation is equal to 10%, al-
though its accuracy is notably reduced for the yellow and magenta channels. The 
results suggest that a minimum saturation of 20% in one CMYK channel is essential 
for reliable colour detection using unsupervised learning techniques. Using only a 
10% difference is possible, but this might lead to incorrect predictions. The arti-
cle highlights the need for further research or alternative unsupervised methods to 
Acta Wasaensia 9 
handle low ink densities below 5%. 
1.3.3 Article III 
Article III applies standard CNN architectures to identify printed colours from mo-
bile phone captured images, where images are pre-processed and stored in datasets 
with varying CMYK colour intensities. Although most CNN models performed 
well with colour differences of 10% or more. The best models were DenseNet (77% 
accuracy) and ResNet (95% accuracy) when very subtle (under 10%) colour differ-
ences were used. ResNet’s residual connections and skip connections contributed 
to its strong performance, although it is prone to overfitting. This study sets a base-
line for CNN colour classification, but more research is required on fine-tuning, 
optimisation, and preprocessing methods to enhance performance. 
1.3.4 Article IV 
Article IV introduces a customised ResNet-34 architecture designed for the accu-
rate recognition of subtle colour differences when the colour difference is 10% 
or less. By modifying the standard ResNet-34 based on previous research, the 
model achieved an accuracy of 98% in colour classification, validated through a 
five-fold cross-validation. The colour dataset undergoes colour correction, image 
pre-processing, and data augmentation prior to training. The proposed model high-
lights key modifications, such as adding max pooling after residual block operations 
or replacing the final average pooling layer with max pooling. These adjustments, 
combined with gradient centralisation, improve the model’s accuracy in detecting 
variations in colour intensity, demonstrating the effectiveness of custom ResNet ar-
chitectures for specialised image recognition tasks. 
1.4 Structure of the dissertation 
This dissertation is structured as follows, Sections 2 and 3 cover the various tech-
nologies to an extent necessary for the rest of the dissertation. Section 4 covers the 
datasets and methods used in Articles I-IV. Section 5 discusses the results of each 
article individually and as a whole. Section 6 presents a discussion of the research 
results, and Section 7 summarises the dissertation as conclusions. 
10 Acta Wasaensia 
2 DIGITAL IMAGES AND COLOURS 
The colours we see are electromagnetic waves of different wavelengths (Figure 5) 
and visible radiation. Depending on the wavelength of the wave, it can be observed 
in different colours. The human eye recognises colours based on light that is re-
flected from an object or sent by a source into photoreceptor cells by light sources. 
The electric signals from the cones of photoreceptor cells are then sent to the brain 
to process and form vision. The spectrum that the human eye can recognise starts 
with violet at wavelengths below 400 nm. This lower limit of the spectrum cannot 
be exactly defined, but it starts around 380-400 nm. Human vision is most reactive 
at approximately 555 nm (yellowish green). And then starts to fade away, finally 
ending at far reds, somewhere around 700-780 nm. The human eye has three types 
of colour photoreceptor cells, which react independently to green, blue, and red 
light. Each colour has a specific wavelength range; however, the exact wavelength 
range for each colour cannot be strictly defined. (Tooms, 2015) 
Figure 5. Electromagnetic wavelengths (reproduced from Tooms (2015)). 
Violet has the shortest wavelength in the visible spectrum (around 380–450 nm). 
These high-energy wavelengths lie at the edge of what the human eye can perceive. 
The blue wavelengths are about 450–495 nm, and these colours are often perceived 
as cool. Green spans 495–570 nm. Green wavelengths are in the middle of the 
visible spectrum; this range is where human vision is most sensitive. The yellow 
wavelengths are between 570 and 590 nm. Orange ranges from 590 to 620 nm, and 
the orange wavelengths are longer and are often perceived as warm. Red has the 
longest wavelengths visible to the human eye, 620–700 nm, at the lower-energy end 
of the visible spectrum. (Tooms, 2015) 
The digital camera works in principle in the same way as the human eye. It cap-
tures electromagnetic waves as analogue signals and converts them into electronic 
signals. The formation of digital images can be considered to occur in three phases. 
Things that happen before the actual sensor, what happens in the sensor, and the 
final processing of the sensor data. The purpose of the camera system as a whole 
Acta Wasaensia 11 
is to transform the visible light reflected from the object(s) into a digital file. All 
cameras are different depending on how much data storage and component space 
they use; however, in principle they still work in the same way and produce digital 
images. For example, high-quality single-lens reflex (SLR) camera’s and cell phone 
cameras have a very different amount of physical space in their use. But in general, 
all cameras are still built from the same general components / modules. (Hirsch, 
2022; Nakamura, 2017) 
Digital cameras have developed during the last few decades into systems that can 
capture very fine-detailed information and are equipped with capacity to capture 
images which have 106 ... 108 pixels and around 102 ... 103 intensity levels present 
in several colour channels (Pak, Reichel, & Burke, 2022). 
2.1 Pre-sensor 
Before electronic waves arrive at the actual camera sensor and the final image is 
formed, some phases affect how the final image looks and how colours are repre-
sented. The following subsections describe the main parts of the image-forming 
process. 
2.1.1 Lens 
As Robert Hirsch explains in his book “Light and Lens: Thinking about Photogra-
phy in the Digital Age” (Hirsch, 2022), the main purpose of the lens is to collect 
light rays coming from a subject in front of the camera and project them as images 
onto a sensor. 
The lens allows the camera to capture light in a very short time to form the final 
image. The lens plays a crucial role in determining the field of view and its depth 
(Hirsch, 2022). In mobile devices or other devices with limited physical space avail-
able, camera lenses are usually stacked in a holder tube (Figure 6). All lenses serve 
a specific purpose in forming the final image. The holder tube can range from a 
few millimetres to a tenth of a millimetre. The small space of a mobile phone cam-
era does not allow as fine-detailed control of parameters related to camera system 
lenses used in larger cameras (Skandarajah, Reber, Switz, & Fletcher, 2014). 
The lenses used on mobile phones have different purposes. Normal lenses function 
without reduction or magnification to create two-dimensional images. These lenses 
attempt to replicate human vision as closely as possible, including the field of view. 
Wide-angle lenses (with short focal length) extend the field of view to wider scenes, 
12 Acta Wasaensia 
Figure 6. A traditional stack of lens elements in the holder tube (reproduced from: 
Bloss (2009)). 
providing a greater depth of field. Telephoto lenses are the opposite of wide-angle 
lenses and have a smaller depth of field, and allow taking photographs from greater 
distances. The fourth lens type that can be found in mobile phones is the macro-
lens, which allows sharp and clear close-up photography.(Hirsch, 2022) 
Mobile phones have very limited space with regard to the overall camera structure. 
This also means that mobile phones cannot be equipped with highly moving lenses, 
as featured in traditional cameras. Moving lenses allow cameras to adapt to different 
distances, environments, and user needs. To overcome this restriction, manufactur-
ers of cell phone cameras have added multiple cameras with different lenses for 
different purposes to their product range. One of the mobile phones with multiple 
cameras is the Nokia Pureview, which has five 12-megapixel cameras (Nokia N9 
product page, n.d.). In today’s popular mobile phones, users can easily find two or 
three cameras with different lenses for photography. 
When it comes to using mobile devices for special imaging, special equipment for 
hyperspectral, thermal, or ultra-violet imaging has been developed. 
2.1.2 Filters 
Before visible light reaches the sensor within the camera, it passes through camera-
specific filters. These are called either colour filter array (CFA) or colour filter mo-
saic (CFM). Camera sensors (see Section 2.2) feature a grid of millions of individual 
light-sensing elements (pixels), and are monochromatic sensors that react only to a 
specific wavelength range. The neighbouring pixels record different wavelengths, 
forming the basis for subsequent colour reconstruction. CFA or CFM is integrated 
Acta Wasaensia 13 
into each pixel sensor. Filters are responsible for selectively transmitting specific 
wavelengths of light, enabling each pixel to discern a particular colour channel of 
red, green, or blue, if a common Bayer pattern is used. (Nakamura, 2017) 
Figure 7. 5 × 5 Bayer CFA pattern (reproduced from: Bayer (U.S. Patent 
3971065A, Jul. 1976)). 
The CFA (or CFM) can be constructed in various ways, leading to different ap-
proaches to the final image reconstruction. One of the most common CFA designs 
is the Bayer pattern (Figure 7), which consists of a 2x2 grid where each pixel is 
assigned a red, green or blue filter. In the Bayer pattern, the green filters dominate 
50% of the total. This pattern leverages the human eye’s increased sensitivity to 
green light. However other designs than Bayer patterns are also used. (Bayer, U.S. 
Patent 3971065A, Jul. 1976) 
Before CFA (or CFM), there are usually at least infrared (IR) filters, which prevent 
harmful wavelengths from entering the image sensor and negatively affecting image 
quality (Nakamura, 2017; Wilkes et al., 2016). 
2.2 Sensor 
A digital sensor array converts the light it receives into digital information (pix-
els). There are different versions of the sensor array (EMCCD, SCMOS, CMOS, 
and CCD), each of them working in a slightly different way and having their own 
characteristics. The most popular are CMOS and CCD, which can be found in cell 
phone cameras. 
Charge-coupled devices sensors (CCDs) comprise an array of light-sensitive pho-
14 Acta Wasaensia 
todiodes integrated into a semiconductor substrate, typically silicon. Upon photon 
absorption, each photodiode generates electron-hole pairs, initiating the accumu-
lation of photogenerated charge within the depletion region of the semiconductor. 
The CCD orchestrates charge transfer through the lattice-like structure of the device. 
This is facilitated by a sequence of potential wells formed by electrodes placed on 
the semiconductor surface. At an operational level, the CCD sensor has an arrange-
ment of clocked voltages to shuttle charge packets across the pixel array. This pro-
cess is known as charge transfer, and it ensures that spatial information is preserved 
while also mitigating noise and distortion. (Barbe, 1975; Nakamura, 2017) 
CCD sensors have some advantages over CMOS, they are less prone to noise, pro-
vide higher quality, and are more sensitive to light (Cabello et al., 2007; Magnan, 
2003). 
Complementary Metal-Oxide-Semiconductor (CMOS) sensors have photodetectors 
and active pixel readout circuitry embedded within a monolithic silicon substrate. 
CMOS sensors harness the principles of metal-oxide-semiconductor field-effect 
transistors (MOSFETs) to facilitate pixel-level signal amplification and readout. 
Each pixel on a CMOS sensor comprises a photodiode, responsible for photon-
to-charge conversion, and associated transistor circuitry, including amplifiers and 
analog-to-digital converters (ADCs), enabling on-chip signal processing and digiti-
zation. CMOS sensors can be passive or active; In passive pixel arrays, charge am-
plifiers are located at the bottom of each column of pixels, and each pixel has only 
one transistor. Active pixel arrays implement an amplifier on every pixel. (Magnan, 
2003; Nakamura, 2017) 
The main advantages of CMOS sensors are low power consumption, low cost, and 
high speed. The disadvantages are low light sensitivity, low charge capacity, pixel 
uniformity, and noise. (Bigas, Cabruja, Forest, & Salvi, 2006; Cabello et al., 2007; 
Magnan, 2003) 
Both types of sensors are highly reliable and suitable for use in mobile devices. In 
both sensors, noise is still generated, which affects image quality when closely ob-
served (Naveed, Ehsan, McDonald-Maier, & Ur Rehman, 2019). In the context of 
the dissertation, both can capture 8-bit or higher colour depths. However, CMOS 
has the advantage of having all of the electronic components in one circuit, making 
the integration of CMOS sensors more feasible in smaller spaces. CMOS-based 
cameras also have lower power dissipation, which is an advantage in mobile de-
vices. (Litwiller, 2001) 
Acta Wasaensia 15 
2.3 Post-processing 
The final phase to produce a digital image is digital signal processing (Figure 8), 
which occurs in the image processor. This part is also called post-processing or 
the imaging pipeline. The purpose of the pipeline is to create a digital image from 
the raw data that the sensor produces. The aim of signal processing is to make the 
final image appear as natural as possible. In consumer use, post-processing might 
optimise images for viewing purposes, and the results can be somehow different 
from the actual imaging scene. Depending on the camera module manufacturer, the 
purpose of the camera, and the phases that occur before the digital imaging pipeline, 
various algorithms are used when the final image is produced. In addition, the order 
of operations used varies. An example of a possible imaging pipeline is presented 
in Figure 8. (Nakamura, 2017; Ramanath, Snyder, Yoo, & Drew, 2005) 
Figure 8. Example of an imaging pipeline (reproduced from: Nakamura (2017)). 
The black-level adjustment ensures that the darkest parts of the image (where the 
sensor does not capture light) are set to zero. Even when the sensor does not capture 
any light (total darkness), some factors like electronic offset and thermal noise can 
produce non-zero values for the sensing element. This is called dark-current noise. 
Incorrectly addressing the dark current noise results in a whitening of the shadows 
and a reduced overall image contrast. (Sch¨ oberl, Senel, F¨ oßel, Bloss, & Kaup, 
2009; Zhou & Glotzbach, 2007) 
White balance (WB) is performed by an algorithm to remove the colour cast caused 
by scene illumination. The objective of white balance is to make white objects ap-
pear as white in the image. The colour of the cast depends on the used light source; 
for example, fluorescent and incandescent lights produce differently coloured casts. 
(Ramanath et al., 2005; Zhou & Glotzbach, 2007) The automatic white balancing 
(default option in smart phones) does its best to estimate true white in the imaging 
scene and then adjusts the colour balance accordingly (Zhou & Glotzbach, 2007). 
Demosaicing and colour interpolation are used to reconstruct the three-dimensional 
(R,G,B) colour image. The demosaicing receives information from the Bayer filter 
of the camera. The Bayer filter (or any other filter) used has an interleaved pattern 
of different colour filters, which results in a sensor output that also has an inter-
16 Acta Wasaensia 
leaved pattern. In practice, this means that not all of the colour pixels of the final 
image receive a direct signal relating to them. Demosaicing uses algorithms like 
bilinear interpolation, adaptive colour-plane interpolation, and frequency domain to 
form the final image, where each pixel has a colour value. (Gunturk, Glotzbach, 
Altunbasak, Schafer, & Mersereau, 2005; Ramanath et al., 2005) 
There are many sources that contribute to the noise during the imaging process. The 
black-level adjustment previously mentioned is intended to minimise so-called fixed 
pattern noise (FPN). Other noise that appears in the raw sensor data is called photo-
response non-uniformity noise (PRNU). Some PRNU occurs because colour pixels 
have different sensitivities to light, which is initially caused by the non-homogeneity 
of silicon wafers and imperfections during the sensor manufacturing process. En-
vironmental and scene-related variables also contribute to PRNU, and these are for 
example the light refraction of dust particles and settings that relate to camera be-
haviour (such as zoom use). Some of the noise in the camera sensor is so permanent 
and unique that it has been used to identify individual cameras (Lukas, Fridrich, & 
Goljan, 2006). An imaging pipeline typically contains an algorithm that reduces 
noise in the early phases of the imaging process. 
Colour correction can be done from multiple different phases and with various al-
gorithms. The final objective of colour correction is to adapt the camera to current 
scene illumination and make colours as natural as possible. This part is also called 
the colour constancy of the imaging pipeline. (Nakamura, 2017; Zhou & Glotzbach, 
2007) In consumer use, imaging pipeline optimisations might aim to make the im-
age both engaging and visually comfortable. 
In addition to colour correction, which attempts to make colours as naturally as 
possible, other processing can also be done for images. Each image captured by 
a digital camera contains some degree of blur, at least in some parts of the image. 
To make the image sharper, especially with regard to the edges of objects, edge 
detection is used to sharpen edges. Edges are an important part of an image because 
they are the locations where objects start or end. Detecting the edges of objects 
is also a way to recognise objects (Zhou & Glotzbach, 2007). Edge detection first 
looks for edges that should be sharpened and then applies transformation to make 
edges appear more natural. Enhancing edges too much, on the other hand, makes 
images appear unnatural. (Tang, Astola, & Neuvo, 1994) 
Other algorithms and methods are also used, most of them optimised for a certain 
camera model or even individual camera and conditions (Jiang, Tian, Farrell, & 
Wandell, 2017; Ramanath et al., 2005). Basically, it can be considered that two 
cameras that take images exactly at the same time and take images from the same 
scene still produce slightly different images. So, even if you capture a scene twice 
in a very short time frame, both images may vary slightly although this difference 
is hardly noticeable. 
Acta Wasaensia 17 
The final phase of the imaging pipeline is the storage of the image. In advanced 
cameras, image data can be stored without compression, as a so-called RAW image. 
However, in mobile devices and consumer use, this is rarely possible or feasible. If 
an image is stored without compression, it consumes more storage space and is 
slower to process and share. Typically, images are stored in JPEG format on mobile 
devices (Nakamura, 2017). The compression of images comes with a cost, as some 
fine details, as well as colour and tonal detail, can be lost during the compression. 
The compression also forces the image to 8 bits per channel, even if a deeper colour 
depth was used earlier in the imaging pipeline. JPEG compression is a lossy image 
compression method that reduces file size by discarding some of the image data. 
The algorithm for compressing the JPEG image uses 8x8 pixel blocks, and a dis-
crete cosine transform (DCT) is applied to each block to represent the pixel data 
as frequency components (Rao & Yip, 2018). The higher frequency components, 
which typically represent less visually significant details, are quantised and often 
discarded to achieve compression. Finally, the remaining data are encoded using 
entropy-coding techniques such as Huffman coding, resulting in a smaller file size. 
(Pennebaker & Mitchell, 1992) 
2.4 Presentation of colours 
This subsection provides an overview of the most commonly used colour spaces 
and those related to this dissertation. 
The presentation of colours depends on the medium used. Roughly colours are 
presented either in digital or non-digital format. Digital mediums like displays, 
cameras, etc. use different colour spaces (mathematical models) to present colours 
(X. Wang & Zhang, 2010). The colour space defines the colours a specific medium 
is able to show. The most commonly used colour spaces in digital imaging are vari-
ous RGB colour spaces. In these colour spaces, red, green, and blue are represented 
by individual electronic signals. In a non-electronic medium like printing, colours 
are typically represented in the CMYK colour space. In CMYK, cyan, magenta, 
yellow, and black are mixed to achieve the desired colour. When working with dif-
ferent media, a conversion between colour spaces is required. In addition, there are 
many colour spaces that are used for different purposes; some of them are used for 
special devices, and some of them are made for mathematical purposes (Fan, Li, 
Guo, Xie, & Zhang, 2021). 
Different mediums like displays or printers can display different ranges of colours 
and gamut. In the optimal world, all media would have the same colour space, 
but medias are somehow different and even two identical displays have a slightly 
different colour range that they can interpret. Another major difference comes from 
the way mediums work, where displays emit light that results in a colour, and paper 
18 Acta Wasaensia 
reflects wavelengths representing different colours. A colour space specifies what 
colours the current medium can interpret, but it also looks at standardising colours 
between mediums (Berns, 2019). 
Colour spaces can be broadly categorised into device-dependent and device-independent 
colour spaces. Both of these space types serve different purposes. If colours are 
presented in a device-dependent colour space (most RGB, CMYK colour spaces) 
colour space and then another device is used to present them, the colours may 
vary. In addition, the colours captured by different imaging devices vary, as the 
devices cannot capture the same range of colours. The device-dependent colour 
space presents the colour range that a specific device can display. Device-dependent 
colour spaces are primarily used in tasks where the output is intended for a partic-
ular device type, such as graphic design for screens or preparing files for specific 
printer models. (Anderson, Motta, Chandrasekar, & Stokes, 1996; Nakamura, 2017; 
X. Wang & Zhang, 2010) 
Device-independent colour spaces are designed to represent colours consistently, 
regardless of the device used to capture, display, or print them (Green & MacDon-
ald, 2011). Examples of such colour spaces are CIE L*a*b* (LAB) and CIE XYZ. 
The LAB colour space represents colours based on human vision, and CIE XYZ 
serves as the foundation for many other colour spaces (a reference space) (Nixon, 
Outlaw, & Leung, 2020). Device-independent colour spaces are used in scenarios 
where accurate colour matching is essential, such as in professional photography, 
printing, and colour quality control (Ford & Roberts, 1998; Green & MacDonald, 
2011). 
Device-dependent colour spaces are essential for the specific requirements of in-
dividual devices, whereas device-independent colour spaces provide accurate and 
consistent colour representation across different devices and media. 
2.4.1 CMYK 
When colours are printed on paper or painted on the walls of a house, a subtractive 
colour space is used. Subtractive colour space works by subtracting (absorbing) 
certain wavelengths of light and reflecting other wavelengths. If the object (paper 
or label) has a white background colour, the CMYK colour printed on it reduces 
specific wavelengths of light that would otherwise be reflected. In the printing in-
dustry, the most common subtractive colour space is CMYK, which is used in large 
industrial printers and home and office printers. In the CMYK colour scheme (Fig-
ure 9), cyan (C), magenta (M), yellow (Y), and black (K) all have their own source 
of colour. All C, M and Y colours that are used are complements of additive primary 
colours (Figure 10). Mixing CMYK colours in their full intensity would produce a 
black colour, but for technical and cost reasons, black (K) is used as its own colour 
Acta Wasaensia 19 
in printing. (Anilkumar, KK and Manoj, VJ and Sagi, TM, 2018) 
Figure 9. CMYK colour space. 
The CMYK colour can be defined in many ways. For example, yellow can be 
defined as CMYK(0.0, 0.0, 1.0, 0.0), and in this presentation, each colour can vary 
in the range 0.0 ... 1.0. Here, 0.0 is no intensity and 1.0 is full intensity. Another 
way is to use 0%C, 0%M, 100%Y, and 0%K, in which the intensity of each colour 
varies between 0% and 100%. The percentage sign can sometimes be left out which 
leads to a presentation CMYK = 0, 0.0, 100, 0. Some features of CMYK colour and 
its printing process have been defined in the ISO 12647-2:2013 (2013) and ISO 
2846-1:2017 (2017) standards. 
2.4.2 RGB 
The red, green and blue (RGB) colour space (Figure 10) is commonly used in digital 
devices. This colour space is additive, and the colours are created by combining 
red (R), green (G), and blue (B) light at various intensities (Palus, 1998). The 
colour space is built around each primary colour, and when primary colours are 
combined, different colours can be created. When all three colours are combined 
at full intensity, the result is white colour. If all three colours are set to zero (no 
signal), the result is black. (Nakamura, 2017) 
In the RGB colour space, each channel (R,G,B) is represented by a numerical value. 
These values typically range from 0 to 255 in 8-bit systems. The value range can 
also be 0.0 ... 1.0, where 0.0 is no signal (Ford & Roberts, 1998). The RGB colour 
space can allow for more than 16 million possible colour combinations. RGB colour 
is commonly described by the text RGB: 255,0,0, which represents the full red 
colour. (Fan et al., 2021) 
20 Acta Wasaensia 
Figure 10. RGB colour space. 
The ISO standard that closely relates to digital cameras is ISO 17321-1:2012 ”Graphic 
technology and photography — Color characterization of digital still cameras (DSCs).” 
ISO 17321 works as a quality assurance tool for digital camera manufacturers and 
helps to achieve consistent colours between devices, as well as providing a way to 
benchmark different devices. (SO 17321-1:2012, 2012) 
2.4.3 LAB/CIELab 
The LAB colour space (Figure 11) is a perceptually uniform colour space. The 
LAB colour space was developed by the International Commission on Illumina-
tion (CIE). The purpose of the LAB colour space is to represent all perceivable 
colours perceived by the human eye. The LAB colour space is designed to be 
device-independent, ensuring consistent colour representation across different de-
vices. The LAB can accurately describe colours and their relationships. The per-
ceptual uniformity of LAB makes it valuable for applications that require precise 
colour measurements and comparisons. (Berns, 2019) 
LAB consists of three axes: L*, a*, and b*. L* defines lightness, ranging from 0.0 
(black) to 100.0 (white). a* defines the green-red component, which ranges from 
-128 to +127. Negative a* values indicate a greenish hue, while positive values 
indicate a reddish hue. b* defines the blue-yellow component, ranging from -128 to 
+127. Negative b* values indicate a blueish hue, whereas positive values indicate a 
yellowish hue. In both a* and b*, zero means that there is no tint. (Fan et al., 2021) 
LAB colour space has a special relation to the human vision system because LAB 
is perceptually uniform, and the change in its component values equals a change in 
visual perception (Berns, 2019). 
Acta Wasaensia 21 
Figure 11. LAB colour space (reproduced from: Belasco et al. (2020)). 
2.4.4 Colour depth 
The colour depth / bit depth defines the number of bits used to represent the colour 
of a single pixel in the digital image. The colour depth is one single feature that is 
important when the accuracy of colour recognition is considered. The more depth 
an image has, the more different colours can be captured. The imaging pipeline 
(starting from the sensor) must be able to capture enough colours, and the final 
display device must be able to interpret them. 
The depth of the colour is determined by the number of bits assigned to each pixel, 
and more bits per pixel allows for a greater variety of colours. If an image has only 
1 bit per pixel, it can present two different colours, typically black and white. In 
RGB (see 2.4.2) each channel has an individual depth of colour. Most commonly 
in consumer devices, colour depth is 8-bits per pixel, which allows for 256 different 
intensities of colours to be captured on each channel. In total, 8 bits per channel 
on three channels (R, G, B) gives a total of 24-bits, and means 16.7 million pos-
sible colours. 24-bit image is considered to be true colour. (Doolittle, Doolittle, 
Winkelman, & Weinberg, 1997) 
48-bit and higher, deep or high dynamic range, images are used for applications 
requiring extreme colour accuracy and detail. These images are used in professional 
photo editing, medical imaging, and scientific visualisations. If an image has 16 bits 
per channel, a total of 281 trillion possible colours can be presented. High colour 
22 Acta Wasaensia 
depth enables more accurate representation of subtle colour variations, which can 
be critical for detecting fine details in images. 
The following image (Figure 12) shows various depths of colour. In the top-most 
colour bar there are 2 bits (4 intensities) and the most bottom bar has 8 bits, 256 
intensities on the red colour channel. 
Figure 12. Various colour depths presented on red channel. 
Later in this dissertation, subtle colour differences are experimented with. In the 
experiments, the colour differences are split into different intervals of 5%, 10% and 
20%. These differences are related to colour depth, and if intervals are in 5% steps 
there can be 20 different intensities, 10% steps have 10 different intensities and 20% 
steps have 5 different intensities (presented in Figure 13). 
Figure 13. Different colour steps. 
2.4.5 Colour conversions 
As seen in the previous subsections, different colour spaces are used for different 
purposes. When colours from different mediums and different colour spaces are 
processed in another colour space, a colour space conversion is needed (Z. Su, Yang, 
Acta Wasaensia 23 
Li, Jing, & Zhang, 2022). Converting an image from a scanner (RGB) to a printer 
(CMYK) may involve first transforming the image into a device-independent colour 
space (e.g., LAB) to ensure accurate colour mapping. In addition, when printed 
colour (CMYK) is captured into a digital image, it is first stored in an RGB colour 
space. 
The common way to process colours is to either use them in the source colour space 
or convert colours into a device-independent reference colour space like LAB. The 
role of colour space is a topic that has recently been addressed in many studies (e.g. 
Moreira, Magalha˜es, Pinho, dos Santos, and Cunha (2022); Nugroho, Goratama, 
and Frannita (2021); Zhbanova (2020)). Each digital colour space that represents 
the colours that a physical device can interpret has its own colour area (Figure 14), 
gamut; therefore, there must be a colour space that is large enough to act as a trans-
formation colour space. For this purpose, it is possible to use LAB colour spaces 
(see 2.4.3) (Berns, 2019). The role of transformation between colour spaces is il-
lustrated in Figure 14 where the CIEXYZ (CIE1931xy) reference colour space is 
significantly larger than the device-depend colour space (sRGB). (Palus, 1998) 
Figure 14. Different colour space gamuts (reproduced from: Palus (1998)). 
Colour conversion between different colour spaces involves mathematical transfor-
mations that map values from the source to target colour spaces. This process usu-
ally begins with an understanding of the properties of both the source (e.g. RGB) 
and target (e.g. LAB) colour spaces (C.-H. Lee, Lee, Ahn, & Ha, 2001). Con-
verting a device-dependent colour space to a device-independent colour space first 
requires transforming RGB values to CIE XYZ values using a linear conversion 
matrix, followed by conversion from XYZ to LAB using a non-linear transforma-
tion based on reference white points (Plataniotis, 2001). When transforming from a 
device-independent colour space to a device-dependent one, the process is reversed. 
The conversion process includes gamut mapping to handle colours outside the tar-
get space range (Morovic & Luo, 2001). Accurate colour conversion maintains the 
24 Acta Wasaensia 
consistency and fidelity of the colours across different devices. 
2.5 Measurement of colour differences 
Measurement of colour differences in digital images involves quantifying similar-
ities or differences between two colours. The process is usually performed in a 
colour space like LAB because of its perceptual uniformity, ensuring that measured 
differences correlate well with human visual perception. (Nakamura, 2017) 
The measurement of colour can be done either as an absolute difference or as a 
relative difference. Absolute colour difference refers to the measured colour dif-
ference between two colours in a standardised colour space, such as LAB, without 
considering the context or environment. Relative colour difference considers the 
colour difference in the context of surrounding factors, such as lighting, surface 
reflectance, or viewing conditions. 
The mathematical formula (CIEDE2000) for calculating the E00 colour differ-
ence is as follows (Luo et al., 2001): 
E00 = 
s  
L 0 
kLSL 
 2 
+ 
 
C 0 
kC SC 
 2 
+ 
 
H 0 
kH SH 
 2 
+ RT 
 
C 0 
kC SC 
 
H 0 
kH SH 
 
where the terms are defined as shown on the next page. 
Acta Wasaensia 25 
L 0 = L2 − L1 
L = 
L1 + L2 
2 
C1 = 
q 
a 2 1 + b
2 
1 
C2 = 
q 
a 2 2 + b
2 
2 
C = 
C1 + C2 
2 
C 0 = C2 − C1 
a 0 1 = a1 + 
a1 
2 
0 @1 − 
s 
C7 
C7 + 257 
1 A 
a 0 2 = a2 + 
a2 
2 
0 @1 − 
s 
C7 
C7 + 257 
1 A 
C 0 1 = 
q 
a 02 1 + b2 1 
C 0 2 = 
q 
a 02 2 + b2 2 
C 0 = 
C 0 1 + C 0 2 
2 
a 0 = a 0 2 − a 0 1 
b = b2 − b1 
H 0 = 
p 
a 02 + b2 − C 02 
h1 = atan2(b1; a 0 1) 
h2 = atan2(b2; a 0 2) 
H 0 = 
h1 + h2 
2 
RT = −2 
s 
C 07 
C 07 + 257 
sin(60  exp 
T = 1 − 0:17 cos(H 0 − 30 ) + 0:  24 cos(2H  0 ) + 0:32 cos(3H 0 + 6 ) − 0:20 cos(4H 0 − 63 ) 
− 
 H 0 − 275 
25 
2 ! 
) 
SL = 1 + 
0:015(L − 50)2 p 
20 + (L − 50)2 
SC = 1 + 0:045 C 
0 
SH = 1 + 0:015 C 
0 T 
kL; kC ; kH = parametric factors 
The parametric factors in the CIEDE2000 algorithm are scaling coefficients that 
26 Acta Wasaensia 
allow the user to adjust the weighting of the lightness, chroma, and hue components 
in the colour difference calculation. These factors are particularly useful for adapt-
ing the formula to specific viewing conditions, applications, or industries where the 
perception of colour differences may vary. Parametric factors in the equation are 
usually set to 1.0, but for different purposes other values can be used, and in this 
way the formula can give different weights to differences in chroma or hue (see del 
Mar Perez et al. (2011); He, Xiao, Pointer, Melgosa, and Bressler (2022); Isohanni 
(2022); Pereira, Carvalho, Coelho, and Coˆrte-Real (2019)). 
The E00 gives information about how different two colours are. If E00 < 0.5, 
the difference is practically invisible. If E00 < 2.0, the colours have a slight 
difference, not noticeable to the human eye. The actual E00 threshold human eye 
can recognise is different for each individual (Mokrzycki & Tatol, 2011). 
The CIEDE2000 algorithm has been widely used and research in different fields 
has demonstrated its capabilities to some extent. As examples, these studies in-
clude ”Chroma-dependence of CIEDE2000 acceptability thresholds for dentistry” 
by Tejada-Casado et al. (2024), ”Colorimetric Evaluation of a Reintegration via 
Spectral Imaging—Case Study: Nasrid Tiling Panel from the Alhambra of Granada 
(Spain)” by Martı´nez-Domingo, L´ opez-Baldomero, Tejada-Casado, Melgosa, and 
Collado-Montero (2024), ”New Insights into Wine Color Analysis: A Comparison 
of Analytical Methods to Sensory Perception for Red and White Varietal Wines” 
by Hensel, Scheiermann, Fahrer, and Durner (2023) and ”Color difference of yarn-
dyed fabrics woven from warp and weft yarns in different colour depths” by X. Wang 
et al. (2024). These are only a few examples of the use of the CIEDE2000 algo-
rithm. Some of the past research has also found threshold values for the E00 
value, and for example in research by Xu et al. the algorithm was not applicable if 
E00 was smaller than 2.0 (B. Xu, Zhang, Kang, Wang, & Li, 2012). The use of 
the CIEDE2000 algorithm is justified if the colours compared are clearly different, 
or if the threshold between colours is large enough. The threshold value depends 
on the use case, and needs to be examined case by case. 
The challenge of measuring colour differences comes from aspects that have been 
mentioned earlier in this chapter. Measurements can be made to work if the devices 
used are standardised and the environment is controllable. However, if the context 
of use has variation (e.g. different devices, environments, and so on), it may be 
difficult to achieve satisfactory results. 
2.6 Colour related applications 
”The desire for quantitative measurements of colour exists across many fields” 
(Nixon et al., 2020). The most recent research in the field of computer vision, 
Acta Wasaensia 27 
during the years 2022–2024 where colours have been a critical feature, has found 
many applications that benefit from accurate colour recognition or the recognition 
of colour change. 
Healthcare research has adopted the usage of colour differences in different cases, 
for example when matching or finding the correct tooth colour, finding tumours or 
other diseases, and also in skin and tongue colour classification. Accurate recog-
nition of colours in healthcare makes disease detection more automated and helps 
healthcare professionals to make informed decisions (see Amakdouf et al. (2021); 
Balaji et al. (2020); Kumar, Singh, and Sachan (2024); Maiti, Chatterjee, and San-
tosh (2021); Ni, Yan, and Jiang (2022)). 
Another industry that uses colour recognition and research is food and agricul-
ture. Many use cases (see Adiwijaya, Romadhon, Putra, and Kuswanto (2022); 
de Brito Silva and Flores (2021); Keivani, Mazloum, Sedaghatfar, and Tavakoli 
(2020); M.-K. Lee, Golzarian, and Kim (2021); Nalhiati, Borges, Speranc¸a, and 
Pereira (2023); Shrivastava and Pradhan (2021); X. Su et al. (2023); D. Wang, 
Wang, Chen, Wu, and Zhang (2023)) in which colour plays an important role relate 
to food quality. In addition, plant and soil analyses have used colour classification. 
Colour recognition in food and agriculture can help in sorting items and automat-
ically discarding non-valid items during the production process. Machine-based 
colour sorting is also suitable for high-speed production lines. 
Accurate recognition of colours can also help make life easier for people with dis-
abilities. Othman et al. (2020) and Cho, Jeong, Kim, and Lee (2020) investigated 
devices that can recognise different colours. In these two studies, colours that can 
be easily recognised were used. Particularly, colour recognition via smartphone can 
help people with disabilities in the retail, commuting, and home environments. 
Recently, the development of so-called functional inks, which can react to environ-
mental values like temperature and humidity, has led to a need to recognize colours 
accurately. One use case for such functional inks is presented in Article I of this 
dissertation (Isohanni, 2022). Originally, the usage of FMCG functional inks and 
their application in the food industry has been sparked by various other research 
projects, and for example, the EU Horizon 2020 funded project TagItSmart, devel-
oped various interesting use cases for functional inks (Gligoric et al., 2019). 
28 Acta Wasaensia 
3 MACHINE LEARNING 
In this section, the most relevant machine learning techniques applicable to this dis-
sertation are presented. Machine learning (ML) is a subset of artificial intelligence 
(AI), and is a crucial part of almost any technical development that humankind is 
pursuing. The development of applications like ChatGPT has brought artificial in-
telligence into our daily discussions. Although high-level applications have gained 
a lot of attention, many things are happening ‘under the hood’, and artificial intelli-
gence is an integral part of many ongoing developments. 
Machine learning focusses on the development of algorithms and statistical models 
that enable computers to learn, make predictions, or the decisions taken based on the 
provided data. In traditional programming, where explicit instructions are provided 
by the programmer for each task, machine learning systems identify patterns and 
rules directly from the data. Machine learning systems also attempt to improve 
their performance based on new data over time, without the need for writing new 
code. (Samuel, 1959, 1967) 
Advances in the development of machine learning systems during the last decade 
have been driven largely by the increasing availability of large datasets, new re-
search findings, and improvements in computational power. Some remarkable re-
cent events, such as the success of AlexNet in the ImageNet competition, Google’s 
DeepMind achieving human-level performance in certain games, the development 
of generative adversarial networks (GANs), the introduction of pre-trained models 
such as BERT, and the recent rise of Large Language Models (such as ChatGPT), 
to mention just a few. But also, more datasets have become available for the devel-
opment of machine learning algorithms as the sharing of datasets has become more 
common. However, data collection has also become more feasible with the devel-
opment of intelligent devices and applications, like the Internet of Things (IoT). 
The primary input to machine learning algorithms is a dataset, with or without pre-
defined labels or outcomes. The data can be numerical, categorical, text, or image, 
and is often high-dimensional. Individual elements of the dataset have unique fea-
tures and individual measurable properties or characteristics. These features play a 
crucial role in determining the relationships and patterns that the algorithms will un-
cover. Consequently, the quality and relevance of features can significantly impact 
machine learning performance. (Goodfellow, Bengio, & Courville, 2016) 
A dataset is typically organised in a structured format such as a table, where each 
row represents a unique observation or record, and each column represents a fea-
ture or variable of interest. Datasets can vary in size and complexity, ranging 
from small, simple collections to large, high-dimensional datasets with millions 
of records and features. A dataset can also be a collection of files, such as images, 
or other data. Examples of open-source dataset can be found, for example, from 
Acta Wasaensia 29 
https://www.kaggle.com/datasets. 
3.1 Unsupervised learning 
In the unsupervised learning a set of X1; X2; :::Xp features from n observations 
is given to the algorithm; however, there is no associated response variable Y 
(Bishop & Nasrabadi, 2006). Unsupervised learning uncovers hidden patterns, 
structures, or relationships within the data (Valkenborg, Rousseau, Geubbelmans, 
& Burzykowski, 2023). A common example of unsupervised learning is the rec-
ommendations we see on various online shopping sites. In these sites, unsupervised 
learning is used to find a group of items that the items that we have in our cart match 
the most (James, Witten, Hastie, Tibshirani, & Taylor, 2023). 
3.1.1 Algorithms 
Different algorithms are at the core of unsupervised learning, and these algorithms 
work with datasets and individual item features. The nature of unsupervised learn-
ing is that the dataset does not contain correct answers (like in supervised learning). 
Algorithms that solve problems in an unsupervised manner can be roughly classified 
into clustering techniques, dimensionality reduction, and anomaly detection algo-
rithms. Some of these algorithms are slightly overlapping; for example, clustering 
can also be used for anomaly detection. 
Clustering methods (Figure 15) are used for exploratory data analysis to investigate 
the underlying structure of the data. Clustering groups objects with similar features. 
Govender and Sivakumar have provided a well-described definition for clustering; 
”objects in a cluster are more similar than objects in different clusters.” (Govender 
& Sivakumar, 2020) 
Jain recognised three main purposes for clustering (Jain, 2010): 
• Exploring, solves the underlying structure and this way provides insights into 
the data, generates hypotheses, detects anomalies, and identifies salient fea-
tures 
• Natural classification, finds the degree of similarity among items 
• Compression, organises the data and summarises it 
Clustering analysis can be broadly categorised into hierarchical and non-hierarchical 
algorithms (Figure 16). Hierarchical clustering groups similar datapoints into clus-
30 Acta Wasaensia 
Figure 15. Example of clustering, left image source data, right image expected 
clustering (reproduced from: Jain (2010)). 
ters based on distance or similarity. The hierarchy is built on a tree-like structure, 
where each node represents a cluster of datapoints. If the clustering process is ag-
glomerative, the process begins with each datapoint being its own cluster. Then, the 
algorithm iteratively merges the closest pairs of clusters until all points are grouped 
into a single cluster. In a divisive approach, all points are initially placed in one 
large cluster, and then the algorithm recursively splits the data into smaller clusters 
until a predefined number of clusters are reached. Hierarchical clustering is use-
ful because of its ability to visualise the nested relationships between datapoints, 
allowing for the determination of an appropriate number of clusters by cutting the 
tree at the desired level. (Jain, Murty, & Flynn, 1999) 
Non-hierarchical clustering or partitional clustering is an unsupervised learning 
method that divides a dataset into distinct groups or clusters without creating a hi-
erarchical structure. Non-hierarchical clustering directly assigns datapoints to clus-
ters based on optimisation criteria. These clustering methods are generally faster 
and more scalable than hierarchical clustering. (Han et al., 2012a; Jain et al., 1999) 
In non-hierarchical clustering algorithms, datapoints can belong only to one cluster 
if hard clustering is used. In soft clustering, datapoints can belong, to some degree, 
to multiple clusters (Bishop & Nasrabadi, 2006). Non-hierarchical clustering can be 
divided into four subcategories (Figure 16): partitioning, density-based, grid-based, 
and others. 
The partitioning methods divide the data into a predetermined number of clusters. 
A typical partitioning process is to first initialise as many centroids as clusters, then 
each datapoint is assigned to the most suitable cluster, and the centroids of the 
clusters are updated until convergence. (Jain & Dubes, 1988) 
Density-based methods identify clusters based on their density, connectivity, and 
boundaries. In density-based clustering, a cluster can grow in any direction if 
Acta Wasaensia 31 
distance metrics and other parameters are satisfied. In practice, this means that 
density-based clustering can be used to adjust clusters to arbitrary shapes. Grid-
based methods partition data space into a finite number of cells that form a grid 
structure. Partitioning of the data is performed on the basis of the algorithm param-
eters present. Other methods include model-based clustering and other methods that 
do not fit previous categories. In model-based methods, it is assumed that the data 
is a mixture of underlying probability distributions (mathematical models). (Han et 
al., 2012a) 
When different clustering methods are computationally compared, hierarchical clus-
tering has complexity O(n2), partitioning is O(n), grid-based method is O(n) and 
density-based method is O(n log n). Different clustering methods vary in terms of 
computational cost; however, these values can be used as a general rule of thumb 
when a suitable algorithm is selected. (Ezugwu et al., 2022) 
Figure 16. Different clustering methods (reproduced from: Han et al. (2012a)). 
In clustering, two essential metrics are used (discussed in more detail in Section 
3.1.2) - a distance measure to quantify similarity or dissimilarity between subjects, 
and an additional distance measure to quantify the difference between clusters or 
between a cluster and a subject (linkage). The clustering algorithm is then respon-
sible for maximising the similarity within a cluster and the dissimilarity between 
clusters. (Jain & Dubes, 1988; Valkenborg et al., 2023) 
Unsupervised clustering is useful when natural groupings or patterns are analysed 
from the data; but also if the dataset is large (Jain et al., 1999). Some challenges 
related to unsupervised clustering were identified in 1988 by Jain and Dubes. To 
address these challenges, clustering users must consider, for example, what features 
they want to use, whether the data have outliers, and how many clusters there are 
(Jain & Dubes, 1988). Depending on the domain knowledge and the answers to 
these questions, objectives and data, the most suitable algorithm is selected. Some 
of the most commonly used clustering algorithms are discussed and used in this 
dissertation Article II to identify which works best in the detection of subtle colour 
differences. 
32 Acta Wasaensia 
3.1.2 Distance Metrics 
Distance metrics in unsupervised learning (non-hierarchial clustering) are mathe-
matical measures used to quantify the similarity or dissimilarity between datapoints. 
Metrics are the most important value in determining the structure and formation of 
clusters; for example, the K-means algorithm uses distance metrics to minimise the 
within-cluster variance. (Jain & Dubes, 1988) 
Over the years, multiple methods for calculating the distances between two or mul-
tiple points have been developed. One of the most commonly used methods is 
Euclidean distance, which is used to measure the straight-line distance between two 
points in a feature space. Euclidean distance d(x; y) is the shortest possible dis-
tance between two points (x; y). As a very simple distance measurement function, 
the Euclidean distance does not take into account the correlation between variables. 
The Euclidean distance is suitable for continuous data in which the magnitude of 
differences is meaningful, such as physical distances or measurements. (Mimmack, 
Mason, & Galpin, 2001) The Euclidean distance can be calculated using the follow-
ing equation (Jain & Dubes, 1988): 
d(x; y) = 
vuut nX 
i=1 
(xi − yi)2 
Here, d is the distance between the datapoints x and y. n is the dimension of the 
data (in a three-dimensional system n = 3). xi and yi are the coordinates of the 
points in the ith dimension. 
The Manhattan distance formula, also known as the L1 norm or the city block dis-
tance, calculates the distance between two points by summing the absolute differ-
ences in their coordinates. The result of the Manhattan distance formula is the 
distance if a grid-like path is made between two points. The Manhattan distance 
can be calculated using the formula (Jain & Dubes, 1988; Minkowski, 1910): 
d(x; y) = 
nX 
i=1 
|xi − yi| 
Here, d is the distance between the datapoint x and y. n is the dimension of the data 
(in a three-dimensional system n = 3). xi and yi are the coordinates of the points 
in the ith dimension. 
The Hamming distance is used as a dissimilarity metric between categorical data-
points. The Hamming distance is described by counting the number of positions in 
Acta Wasaensia 33 
Figure 17. Euclidean and Manhattan distance. 
which the corresponding symbols differ. The formula for the Hamming distance is 
(Hamming, 1950; Jain & Dubes, 1988): 
d(x; y) = 
nX 
i=1 
1(xi ̸= yi) 
Here, d is the Hamming distance between the datapoint x and y. n is the dimension 
of the data (in a three-dimensional system n = 3). 1(xi ̸= yi) is an indicator 
function that equals 1 if xi! = yi and 0 if xi == yi 
The Minkowski distance is the generalisation of the Euclidean and Manhattan dis-
tances. In the Minkowski distance formula, the parameter p defines the order of the 
norm. The value p can be adjusted to form groups that best capture the underlying 
structure of the data. (Minkowski, 1910) 
d(x; y) = 
  
nX 
i=1 
|xi − yi|p 
! 1 
p 
Here, d is the Minkowski distance between the datapoint x and y. n is the di-
mension of the data (in a three-dimensional system n = 3). When p = 1 the 
Minkowski distance is equivalent to the Manhattan distance. When the value of 
p = 2 the Minkowski distance is equivalent to the Euclidean distance. As p 
approaches infinity, the Minkowski distance approaches the Chebyshev distance 
d(x; y) = maxi(|xi − yi|), which is the highest absolute value among all coordinate 
differences between points. (Fu & Yang, 2021) 
Mahalanobis is an effective distance metric when data features have different scales 
and correlations; thus, it is often used in anomaly detection and classification tasks. 
The Mahalanobis distance is also more suitable if the clusters have an ellipsoidal 
34 Acta Wasaensia 
structure or if the datapoints depend on each other. The formula for Mahalanobis is 
(Mahalanobis, 1936): 
dM (x; ) = 
p 
(x − )T S−1(x − ) 
Here, x is a vector representing the point for which the distance is calculated (n-
dimensional vector).  is the mean vector of the distribution, representing the cen-
troid or mean values of each dimension in a multivariate space, and is also a vector 
of dimensions n. S is the covariance matrix of the distribution, the matrix n × n 
that describes the variance and covariance between each pair of dimensions. S−1 
is the inverse of the covariance matrix and (x − )T is the transpose of the differ-
ence vector, turning the row vector into a column vector for matrix multiplication. 
(Mahalanobis, 1936) 
The right distance metric is crucial in clustering tasks because it defines the similar-
ity or dissimilarity between datapoints. The distance algorithm directly affects the 
performance and accuracy of the clustering. (Arora, Khatter, & Tushir, 2019) 
Different distance metrics capture different aspects of the data. For example, the 
Euclidean distance is sensitive to differences in magnitude and is suitable for con-
tinuous numerical data, whereas the Manhattan distance is more appropriate for 
data that follow a grid-like structure (R. Su, Guo, Wu, Jin, & Zeng, 2024; Ultsch & 
L¨ otsch, 2022). Euclidean and Manhattan distance tends to form spherical clusters, 
whereas the Manhattan distance may form clusters with different shapes (Hastie, 
Tibshirani, Friedman, & Friedman, 2009). When working with data with different 
scales, it is useful to normalise the data or use a metric like the Mahalanobis dis-
tance. When datapoints are in a high-dimensional space, some of the mentioned 
distance metrics, or other distance metrics, may lead to results where all points tend 
to appear equally distant from each other. In these situations, metrics such as cosine 
similarity (Han, Kamber, & Pei, 2012b) can be used because it measures the angle 
between vectors, making them more robust to the dimensionality problem. 
The appropriate distance metric improves the algorithm’s ability to identify mean-
ingful patterns and relationships in the data, resulting in more accurate, interpretable, 
and relevant results. Choosing an inappropriate metric can result in misleading or 
suboptimal clustering results. (Arunachalam & Kumar, 2018; Han et al., 2012a) 
3.1.3 Linkage 
The linkage methods (Figure 18) in unsupervised clustering, particularly hierar-
chical clustering, determine the distance between the clusters used to form larger 
clusters from smaller ones or to divide larger groups into smaller ones. Linkage 
Acta Wasaensia 35 
methods influence the shape and composition of clusters. The used linkage method 
is selected on the basis of the specific requirements of the analysis and the nature of 
the data. Various methods to calculate linkage exist, with the following being the 
most popular. 
Figure 18. Illustration of different linkages (reproduced from: Jeon et al. (2017)). 
A single linkage (minimum linkage or nearest-neighbour method) combines two 
clusters, A and B, if the measured distance between any point in cluster A is less 
than the defined minimum distance. A single linkage method can result in long 
chain-like clusters and is sensitive to noise and outliers with minimal distance. The 
single-linkage distance d(A; B) between clusters A and B is defined as follows: 
(Gere, 2023; Johnson, 1967) 
d(A; B) = min 
x2A;y2B 
d(x; y) 
Here d(A; B) is the single-linkage distance between clusters, x and y are the in-
dividual datapoints in clusters A and B, respectively, and d(x; y) is the distance 
between datapoints. 
Complete linkage (maximum linkage or furthest neighbour method) defines the dis-
tance between two clusters as the maximum distance between any single pair of 
points in the two clusters. Complete linkage produces compact clusters but is overly 
sensitive to outliers. (Gere, 2023; Johnson, 1967) 
36 Acta Wasaensia 
The complete linkage distance d(A; B) between the clusters A and B is defined as 
follows: 
d(A; B) = max 
x2A;y2B 
d(x; y) 
Here d(A; B) is the complete linkage distance between clusters, x and y are the 
datapoints in clusters A and B, respectively, and d(x; y) is the distance between 
datapoints. 
The average linkage defines the distance between two clusters as the average dis-
tance between all pairs of points in the two clusters. This linkage is between a 
single and a complete linkage and is a compromise that is less sensitive to noise and 
outliers. (Gere, 2023; Sokal, Michener, & of Kansas, 1958) The average linkage 
distance d(A; B) between clusters A and B is defined as follows: 
d(A; B) = 
1 
jAjjBj 
X
x2A 
X 
y2B 
d(x; y) 
Here d(A; B) is the average linkage distance between clusters, x and y are the 
datapoints in clusters A and B, respectively, and d(x; y) is the distance between 
datapoints. 
A single linkage has the tendency of linking clusters together in cases where data-
points in cluster A are close to clusters B datapoints, which might be further away 
from the clusters B centre than other datapoints. Complete and average linkages do 
not have this tendency and create more compact clusters with approximately equal 
diameters. However, they may not merge clusters close together if there are outliers. 
(Hastie et al., 2009) 
The centroid linkage defines the distance between two clusters as the distance be-
tween the centroids (geometric centres) of each cluster. A centroid linkage can 
result in clusters that minimise the within-cluster variance. However, it is not al-
ways monotonic, meaning that it can sometimes produce non-intuitive clustering. 
(Gere, 2023; Lance & Williams, 1967) 
The centroid linkage distance d(A; B) between clusters A and B is defined as the 
distance between their centroids as follows: 
d(S; B) = d(a; b) 
The centroids a and b were calculated as follows: 
a = 
1 
jAj 
X 
x2A 
x 
Acta Wasaensia 37 
b = 
1 
|B| 
X 
y2B 
y 
Ward’s method seeks to minimise the total variance within the cluster. At each step, 
the pair of clusters that leads to the smallest increase in total variance within the 
cluster is merged. Ward’s method tends to produce clusters of roughly equal size 
and is effective at minimising the overall variance within clusters. (Gere, 2023; 
Ward Jr, 1963) 
The Ward linkage distance d(A; B) between clusters A and B is defined as follows: 
d(A; B) = 
|A||B| 
|A| + |B| ∥a − b∥
2 
where centroids a and b are given by: 
a = 
1 
|A| 
X 
x2A 
x 
b = 
1 
|B| 
X 
y2B 
y 
The purpose of the Ward method is not to optimise the clusters, but to find the 
clusters that are most homogeneous (having the most similar datapoints). The Ward 
method is computationally intensive and assumes spherical cluster shapes (Ward Jr, 
1963). 
The different linkage methods are used in Article II and are evaluated as part of the 
results presented in Section 5. 
3.1.4 Challenges of unsupervised clustering 
Unsupervised clustering faces several common challenges. One of the biggest chal-
lenges is the determination of the number of clusters. In many use cases, it is 
not known beforehand how many clusters should be found (Jain & Dubes, 1988). 
In some use cases, it is possible to give cluster amounts, making the evaluation 
of the unsupervised learning outcome more feasible. If the number of clusters 
is not known beforehand, it is possible to use methods like the elbow method 
(Thorndike, 1953), silhouette analysis (Rousseeuw, 1987), or the gap statistic (Tib-
shirani, Walther, & Hastie, 2001). These methods typically require human analysis 
and may not be conclusive. In this dissertation, the determination of the amount of 
38 Acta Wasaensia 
the clusters is not a challenge, as it is known that there should be three clusters. This 
knowledge can be used to evaluate clustering results and it can also be provided as 
a parameter to clustering algorithms. 
If the data is very high-dimensional, it may be difficult to make clusters out of 
it. In practice, this is related to the dimension calculation. If multiple dimensions 
have multiple scales, the measurement distance metrics might be a challenge. The 
choice of the distance metric greatly influences the clustering results. Although the 
Euclidean distance is common, it may not be appropriate for high-dimensional data. 
In some cases, dimensionality reduction techniques such as PCA (Principal Compo-
nent Analysis) (Hotelling, 1933; Pearson, 1901) or t-SNE (t-Distributed Stochastic 
Neighbour Embedding) (Hinton & Roweis, 2002) can address this challenge (Stein-
bach, Ert¨ oz, & Kumar, 2004). This dissertation uses LAB colour data as its source 
data for clustering algorithms, and while the complexity of the data is not very 
high, but more important is to consider how three dimensions are related to correct 
distance metrics. 
Unsupervised clustering algorithms are computationally intensive. This is a chal-
lenge if a large dataset is used. Some clustering algorithms, such as K-means or 
C-means, can work with large datasets. However, complex algorithms such as hi-
erarchical clustering or DBSCAN may struggle with large datasets. (Hastie et al., 
2009; D. Xu & Tian, 2015) 
Many clustering algorithms, such as K-means, assume that the clusters are spheri-
cal and of equal size. However, real-world data often contain clusters of arbitrary 
shapes and varying sizes. The size and shape of the cluster may not be known 
beforehand; thus, selecting a suitable algorithm may not be a clear choice. Density-
based methods such as DBSCAN and GMM can handle arbitrary shapes; however, 
they have their limitations. (Han et al., 2012a) The nature of the use case presented 
in this dissertation leads to spherical shapes. However, these shapes are not perfect 
in the form of a ball, rather being stretched towards each other as the colour changes 
to another. 
One of the most typical challenges in clustering is noise and outliers. Depending 
on the algorithm used, these may have a large impact on the final clustering result. 
Some algorithms (such as DBSCAN) are designed to be robust against such anoma-
lies, and in some cases preprocessing of the data is necessary. (Hastie et al., 2009; 
Hodge & Austin, 2004) Noise is also a challenge in this dissertation, as when a 
change from one colour to another occurs, there are pixel values between these two 
colours. Also, as seen in previous sections, noise has multiple sources. 
Some algorithms, such as the K-means algorithm, are sensitive to initial conditions 
and can converge to different solutions based on the starting points. Multiple runs 
with different initialisation setups or using more sophisticated initialisation tech-
niques like K-means++, can help mitigate this problem. (James et al., 2023; Sinaga 
Acta Wasaensia 39 
& Yang, 2020) 
Depending on the dataset used, it is often necessary to scale or normalise the values 
prior to clustering, especially when using distance-based methods. In normalisation, 
values are scaled to a specific range, for example 0.0 ... 1.0. Inappropriate scaling 
can lead to misleading results because some features with large ranges may dom-
inate the distance calculations. (Han et al., 2012a) Some normalisation methods 
commonly used are min-max normalisation, z-score normalisation, and normali-
sation by decimal scaling. Normalisation also speeds up data processing because 
calculations with smaller numbers are faster. 
Addressing these challenges often requires a combination of algorithmic adjust-
ments, visual data inspection, preprocessing steps, and domain-specific knowledge. 
3.1.5 Model Evaluation and Validation 
Evaluating clustering results, and their quality, is inherently challenging due to the 
lack of ground-truth in unsupervised learning. External validation can only be used 
to evaluate the quality of the clustering results if there is a ground-truth. External 
validation can be performed by comparing the results with an external set of labels 
or ground-truth data. Ground-truth data represent the true cluster memberships. 
External validation helps to assess how well the clustering algorithm has performed 
in relation to known classes or groups. (J. Wu, Chen, Xiong, & Xie, 2009) As such, 
external validation is suitable for selecting the best clustering algorithm for a given 
dataset (Liu et al., 2013), and can be performed using some of the following metrics 
which compare the clustering results to the ground-truth labels. 
• The Rand Index (RI) measures the similarity between two data clusterings, 
which can be the result of two different algorithms or between algorithm and 
ground truth. The Adjusted Rand Index (ARI) adjusts the Rand Index for 
chance that may occur between clusters, providing a more accurate measure 
of clustering quality. The Adjusted Rand Index ranges between -1.0 and 1.0, 
where 1.0 indicates that clusters have perfect agreement, and 1.0 indicates 
that the clusters are totally different. The value 0 indicates random agreement. 
(Rand, 1971) 
• Mutual Information (MI) measures the amount of information shared between 
clustering assignments and ground truth labels. The adjusted mutual informa-
tion (AMI) version of the MI adjusts the mutual information for chance and 
offers a normalised measure. (Kreer, 1957; Shannon, 1948) 
• The purity measures the extent to which clusters contain a single class, with 
higher purity indicating better clustering performance. (Rend´ on et al., 2011) 
40 Acta Wasaensia 
• Fowlkes-Mallows Index (FMI) considers the geometric mean of precision and 
recall evaluating clustering quality. (Fowlkes & Mallows, 1983) 
In addition, external validation metrics dependent on the use case can be used, 
as shown in Section 5 and Article II, where the clustering results were evaluated 
against the ground-truth by using CIEDE2000 colour difference calculation be-
tween the ground-truth and the clustering result. Delta-E difference was then used to 
determine whether the clustering was performed correctly (success) or incorrectly 
(failure). The success rate for the entire dataset was calculated using the standard 
formula: success rate = correctly clustered images / total images. 
Internal validation measures can be used to choose the best clustering algorithm if 
no external information (ground-truth) is available. In contrast to external valida-
tion, internal validation can also be used when choosing the optimal cluster amount. 
Internal validation of the clustering results have two main metrics: 1) compactness, 
i.e. how closely datapoint are in a cluster and 2) separation, i.e. how separated 
a cluster is from other clusters. Sometimes density-based measures are also used. 
The following methods are common for internal clustering validation. (Liu et al., 
2013) 
• Davies-Bouldin Index (DBI) measures the average similarity ratio of each 
cluster to its most similar cluster. DBI uses the ratio of within-cluster scatter 
to between-cluster scatter. A smaller DBI indicates better clustering. (Davies 
& Bouldin, 1979) 
• The silhouette score measures how similar an object is to its own cluster com-
pared to other clusters. The silhouette score ranges from -1.0 to 1.0, where 
higher scores indicate that objects are well matched to their own cluster and 
poorly matched to neighbouring clusters. (Rousseeuw, 1987) 
• Within-Cluster Sum of Squares (WCSS) quantifies the total variance within 
each cluster. WCSS is calculated by summing the squared differences be-
tween each datapoint and the centroid of its respective cluster. WCSS reflects 
the compactness of the clusters and a lower WCSS indicates a more compact 
cluster. (Hartigan & Wong, 1979) 
• Calinski-Harabaz Index (the Variance Ratio Criterion), measures the quality 
of clustering by assessing the ratio of the sum of between-cluster dispersion 
to within-cluster dispersion. Higher values indicate more dense and well-
separated clusters.(Cali´ nski & Harabasz, 1974) 
• Dunn’s index (DI) measures the ratio of the minimum inter-cluster distance to 
the maximum intra-cluster distance. Higher values indicate better clustering, 
which indicates well-separated clusters with small within-cluster variance. 
(Dunn, 1973) 
Acta Wasaensia 41 
• R-squared (RS), the ratio of the sum of squares between clusters and the total 
sum of squares of the entire dataset. RS is the degree of difference between 
clusters. (Sharma, 1995) 
• Root-mean-square standard deviation (RMSSTD) is calculated by taking the 
square root of the pooled sample variance of all attributes. RMSSTD provides 
information about the homogeneity of clusters. (Sharma, 1995) 
The mentioned metrics are just examples of ones that could be used; there are many 
others, and research of new methods is an interesting topic in the context of unsu-
pervised clustering. If cluster analysis is performed by a human expert, then tools 
such as scatter plots, dendrograms, and heathmaps can be used to visualise the result 
of the clustering algorithm. 
3.1.6 Unsupervised clustering in computer vision 
As can be seen from the previous subsection, an unsupervised cluster can be im-
plemented in different ways depending on the given dataset, the use case, and 
expected results. Unsupervised clustering is suitable for solving many machine 
learning problems, and is commonly used as an initial machine learning technique 
for evaluation due to its applicability and easy on-boarding. Unsupervised cluster-
ing has several use cases across various domains. These use cases are driven by 
the unsupervised clustering ability to identify patterns and group similar datapoints 
without pre-labelled examples. 
In the context of computer vision, unsupervised clustering can be used at least for 
image segmentation / compression and object recognition. Clustering algorithms 
can partition an image into segments or regions with similar characteristics, which 
aids in object detection, image analysis, and feature extraction, where the identi-
fication of distinct regions is crucial. Genc¸tav, Aksoy, and ¨ Onder (2012) showed 
that unsupervised methods can segment and classify images of cervical cells. An-
other similar study by J. Wu et al. (2017) showed that unsupervised clustering can 
be used to reveal breast cancer subtypes. These studies demonstrate that the use 
of unsupervised clustering is an option when images need to be classified based 
on certain features. Bilius and Pentiuc used K-means and ISODATA clustering 
methods in their research to classify materials in the field, and their dataset was 
based on hyperspectral imagining. The approach of Bilius and Pentiuc (2020) helps 
in agriculture, mineralogy, and other industries when planning land use. Sch¨ afer,
Heiskanen, Heikinheimo, and Pellikka (2016) also showed that unsupervised clus-
tering is useful for image segmentation when mapping the diversity of tree species 
in a tropical mountain forest 
42 Acta Wasaensia 
With unsupervised clustering, it is possible to obtain clusters which share some 
feature (like colour) from images. Depending on the use case conditions, their 
properties and/or relations can be used for decision making. 
3.2 Supervised learning 
Supervised learning (Figure 19) differs from unsupervised learning, as the data in 
the supervised learning is labelled. With labelled data, supervised learning looks to 
discover patterns related to the data. Thus, the goals of supervised learning mod-
els are predetermined and supervised learning attempts to create a model between 
input and output. This is done in a training process, where the goal is to learn a 
function that can accurately predict the output labels of new, unseen data. (Bishop 
& Nasrabadi, 2006) 
Figure 19. The process of supervised learning (reproduced from: Pramoditha 
(n.d.)). 
Supervised learning consists of labelled data (dataset), a model, and a learning al-
gorithm. The dataset is a collection of input-output pairs (X; Y ), where X is the 
input feature vector and Y is the corresponding output label (Bishop & Nasrabadi, 
2006). X can be an image or some other object, and Y can be a label that best 
describes X . The dataset is usually divided into training, testing, and validation 
datasets. The training dataset is used to train the best possible model, the validation 
dataset is used in each iteration to validate the training results, and the test dataset 
is used to assess the model performance on unseen data. (Szeliski, 2022) Model is 
a mathematical function f(X; ), where X represents the input feature vector and  
represents the model parameters. Function f maps input X to the predicted output 
Acta Wasaensia 43 
^ Y . (Deisenroth, Faisal, & Ong, 2020) The choice of model depends on the nature 
of the problem and the data; the model can be linear regression, logistic regression, 
decision tree, random forest, support vector or some other model (Shalev-Shwartz 
& Ben-David, 2014). In this dissertation, neural networks are used as a model for 
supervised learning. The learning algorithm is used to find the optimal parameters 
 for the model that minimises the error between the predicted and actual results 
(Deisenroth et al., 2020). 
In the supervised learning process, the learning algorithm adjusts the model pa-
rameters by minimising a loss function. The loss function measures the difference 
between the predicted and actual outputs. As a model and learning algorithm, the 
loss function can also be selected. Common loss functions include the mean squared 
error for regression and the cross-entropy loss for classification. The adjustment of 
the model parameters is performed in an iterative process, where the parameters 
are updated in each iteration. At the end of each iteration, the model is validated 
with data that are not used for training. Once the training process has gone through 
a certain number of iterations or some other goal has been reached, the model is 
ready and can be tested. During the evaluation of the testing model with unseen 
data, common metrics such as accuracy, precision, recall, F1-score, Mean Squared 
Error (MSE), and confusion matrix can be used to measure the performance of the 
model. (Hastie et al., 2009; Shalev-Shwartz & Ben-David, 2014) 
3.2.1 Loss functions 
The loss function is used as a cost or objective function during the training and 
validation processes. The loss function is a mathematical formula that calculates 
the discrepancy between the predicted outputs of the model and the actual target 
values. The loss function value provides a single scalar value that encapsulates how 
well the predictions of the model align with the true values. The training process 
looks to minimise loss function, where a smaller loss function value indicates a 
more accurate model. (Bishop & Nasrabadi, 2006; Hastie et al., 2009) 
In a mathematical presentation, the loss function L(y; y^) is a function of the pre-
dicted output ^ y and true target value y (Hastie et al., 2009). The gradient of the loss 
function is used during backpropagation to update the model parameters (Hecht-
Nielsen, 1992). The selection of the loss function affects how the model is trained 
and optimised. 
The mean squared error (MSE) is used in regression tasks to measure the average 
squared difference between the predicted values (y^) and the true values (y). (Hastie 
et al., 2009) 
44 Acta Wasaensia 
MSE = 
1
n 
nX 
i=1 
(yi − y^i)2 
where n is the number of datapoints. MSE penalises larger errors more severely 
due to the squaring term; this means that outliers have a significant impact on MSE. 
RMSE (Root Mean Squared Error) is a version of the MSE, where the square root 
of the MSE is taken, providing a value that directly relates to the scale of the data. 
The mean absolute error (MAE), also used for regression tasks, calculates the aver-
age of the absolute differences between predicted and actual values. 
MAE = 
1
n 
nX 
i=1 
|yi − y^i| 
MAE uses absolute difference; thus, it is less sensitive to outliers compared to MSE. 
MAE can also be a better choice than MSE if there are anomalies in the errors. 
(Hastie et al., 2009). 
Huber Loss attempts to combine the advantages of MSE and MAE, and is useful in 
the presence of outliers while maintaining the smoothness of MSE. (Huber, 1992) 
Huber Loss = 
( 
1 
2 (yi − y^i)2 for |yi − y^i| ≤  
|yi − y^i| − 1 2 2 otherwise 
where  is a threshold parameter. 
Hinge Loss is used for binary classification problems, particularly with Support 
Vector Machines (SVMs). The hinge loss method is designed to maximise the mar-
gin between the decision boundary and the datapoints. (Rennie & Srebro, 2005) 
Hinge Loss = max(0; 1 − yi · y^i) 
where yi is the true label and y^i is the predicted value. Hinge Loss not only penalises 
the model for incorrect classifications but also those too close to the decision bound-
ary. 
Cross-Entropy Loss (Log Loss) is commonly used in classification tasks where the 
model outputs probabilities. Log Loss measures the difference between the true 
probability distribution and the predicted distribution. For binary classification, the 
cross-entropy loss is calculated with: (Hastie et al., 2009; Rubinstein & Kroese, 
2004) 
Acta Wasaensia 45 
Cross-Entropy Loss = − 
nX 
i=1 
[yi log(y^i) + (1 − yi) log(1 − y^i)] 
In multi-class classification Cross-Entropy Loss generalises to: 
Cross-Entropy Loss = − 
nX
i=1 
CX 
c=1 
yi;c log(y^i;c) 
where C is the number of classes, yi;c is an indicator of whether class c is the correct 
label for instance i, and y^i;c is the predicted probability of class c. 
Kullback-Leibler Divergence (information divergence or relative entropy) can be 
used to measure how one probability distribution diverges from a second, expected 
distribution. Kullback-Leibler Divergence is often used in variational auto-encoders 
and generative models. Kullback-Leibler Divergence is calculated as follows: (Kull-
back & Leibler, 1951) 
KL(P ∥Q) = 
X 
i 
P (i) log 
P (i) 
Q(i) 
where P and Q are the true and predicted probability distributions, respectively. 
Kullback-Leibler Divergence is not symmetric, meaning it is not a true distance 
metric. However, it quantifies the discrepancy between distributions. (Kullback & 
Leibler, 1951) 
The selection of the loss function is a crucial step in the training process. The 
selection process depends on the given problem and a custom loss function may be 
required (Barton, Alakkari, O’Dwyer, Ward, & Hennelly, 2021; Brophy, Hennelly, 
De Vos, Boylan, & Ward, 2022). 
3.2.2 Evaluation metrics 
As a loss function, an important part of the neural network training process are 
evaluation metrics. Evaluation metrics are used to assess the performance and ef-
fectiveness of trained models. Evaluation metrics are mathematical formulas that 
provide the developer of a neural network with quantitative measures, and with 
these measures, it is possible to evaluate how well the model generalises to unseen 
data. As with the loss function, the selection of evaluation metrics depends on the 
nature of the problem. Common evaluation metrics include accuracy, precision, 
46 Acta Wasaensia 
recall, F1-score, receiver operating characteristic curve (ROC), and area under the 
curve (AUC). The loss functions MSE, MAE, and cross-entropy loss can be used 
as evaluation metrics in regression tasks. In addition, a confusion matrix is used to 
evaluate classification tasks. (Han et al., 2012b) 
Accuracy is defined as the ratio of the number of correct predictions to the total 
number of predictions: 
Accuracy = 
Number of Correct Predictions 
Total Number of Predictions 
While accuracy is straightforward and easy to interpret, it may not be suitable for 
imbalanced datasets where certain classes are under-represented (Q. Gu, Zhu, & 
Cai, 2009). If the dataset has such a property, precision and recall provide more 
reliable metrics. Precision measures the proportion of true positive predictions out 
of all positive predictions made by the model, and recall (or sensitivity) measures 
the proportion of true positives out of all actual positives (Japkowicz & Shah, 2011): 
Precision = 
True Positives 
True Positives + False Positives 
Recall = 
True Positives 
True Positives + False Negatives 
The F1-score is the harmonic mean of precision and recall, providing a single metric 
that balances both (Japkowicz & Shah, 2011): 
F1-Score = 2  Precision  Recall 
Precision + Recall 
The Receiver Operating Characteristic (ROC) curve is not a formula, but a graphical 
representation of the true positive rate versus the false positive rate across different 
threshold values. The Area Under the Curve (AUC) quantifies the overall perfor-
mance of the model (Japkowicz & Shah, 2011): 
AUC = 
Z 1 
0 
ROC(t) dt 
An AUC value of 1 indicates perfect classification, and an AUC value of 0.5 sug-
gests random guessing. The receiver operating characteristic curve and the AUC are 
useful for evaluating models in binary classification tasks with varying thresholds. 
Acta Wasaensia 47 
(X. Zhang, Li, Feng, & Liu, 2015) 
A confusion matrix is used in classification tasks to evaluate model performance. 
The confusion matrix provides a detailed breakdown of the model predictions com-
pared to the actual results, helping to identify not only the general accuracy but 
also where the model is making errors. This helps in the development of models 
and in the development of pre-processing and data acquisition. (Han et al., 2012b; 
Japkowicz & Shah, 2011) 
A confusion matrix is typically presented as a structured square table with rows 
and columns representing different classes in the classification task. For a binary 
classification problem (labelled as Positive and Negative), the matrix is a 22 table: 
Predicted Positive Predicted Negative 
Actual Positive True Positive (TP) False Negative (FN) 
Actual Negative False Positive (FP) True Negative (TN) 
True positive (TP) denotes the number of instances correctly predicted as positive. 
True Negative (TN) is the number of instances correctly predicted as Negative. 
False Positive (FP) denotes the number of incorrectly predicted instances as pos-
itive. False Negative (FN) represents the number of incorrectly predicted instances 
as negative. 
A confusion matrix for multi-class classification problems extends the concept of 
a binary confusion matrix to handle multiple classes. In a multi-class scenario, 
the confusion matrix is a square matrix, where both rows and columns represent 
different classes. 
For a classification problem with n classes, the confusion matrix will be an n  n 
matrix. Each element in the matrix at position (i; j) represents the number of in-
stances where the true class is i, and the predicted class is j. The diagonal elements 
represent the number of correct predictions for each class, and the off-diagonal ele-
ments represent mis-classifications. 
In Figure 20, 13 images of label 1 were classified as label 1, and no images were 
classified as other labels. Then, in label 2, some images have been labelled as label 
3. Furthermore, all images of label 3 are correctly labelled. Therefore, the model 
may require some improvement. 
A confusion matrix can be used to obtain insight into the performance of a classi-
fication model beyond scalar metrics for example, from the confusion matrix it can 
be seen which classes are confused and where the model performs well or poorly. 
48 Acta Wasaensia 
Figure 20. Example of confusion matrix for multi-classification. 
3.3 Neural networks 
Neural networks (artificial neural network ANN, or neural net NN) are models that 
have structure and functions that mimic biological neural networks (Hristev, 1998). 
Neural networks can be trained with unsupervised or supervised learning methods, 
supervised methods being the most common approach. Neural networks perform a 
wide range of tasks in various domains. 
In classification tasks, neural networks assign input data to one of several predefined 
categories, classification can identify objects in images, text classification can iden-
tify language nuances, and speech recognition produces text (Bishop & Nasrabadi, 
2006). Regression tasks predict continuous values based on input data, and can be 
used to predict things such as prices or weather conditions (Bishop & Nasrabadi, 
2006). In image processing tasks, neural networks, particularly convolutional neu-
ral networks (CNNs), are suitable for processing grid-like data. The tasks suitable 
for CNNs include object detection, image segmentation, and analysis, as well as im-
age generation (Goodfellow et al., 2016). In Natural Language Processing (NLP) 
tasks, neural networks process and generate human language. The NLP task can 
be translation, text summarisation, or text generation. Another language-related 
task is speech recognition, where CNNs are used to convert spoken language into 
text. (Abdel-Hamid et al., 2014) As in unsupervised learning, neural networks are 
also suitable for anomaly detection. Artificial Neural Networks (ANNs) can iden-
tify unusual patterns or outliers in data, which is useful for fraud detection, network 
security, and industrial maintenance. (H. Wang et al., 2021). In reinforcement learn-
ing tasks, neural networks are used to train agents that learn to make decisions by 
Acta Wasaensia 49 
interacting with the environment. This task is suitable for developing artificial in-
telligence gameplay, robotics, and autonomous driving. (Souchleris, Sidiropoulos, 
& Papakostas, 2023) In time series analysis, neural networks, especially recurrent 
neural networks (RNNs) and long- and short-term memory networks (LSTMs), are 
used to analyse and forecast time series data (Goodfellow et al., 2016). This is 
closely related to regression tasks and can be used to predict sales and forecasts. 
One of the latest developments in ANNs is generative models, such as generative 
adversarial networks (GANs). GANs can be used to create new data samples that 
resemble a given dataset. GANs can generate realistic images, music, and text. 
(Szeliski, 2022) 
Neural networks are composed of individual units, called nodes or artificial neurons, 
which are connected to each other through edges. Depending on the architecture of 
the neural network (ANN model), there can be varying numbers of nodes across 
different layers of the network. These layers and the number of nodes in them 
define the width and depth of the model. The nodes in each layer receive input 
signals from the nodes in the previous layer, process the signal, and then pass the 
processed signal to the nodes in the next layer. The signal received by each node is 
a weighted sum of the inputs from the connected nodes, along with a bias term. The 
node then applies a non-linear activation function to this weighted sum to produce 
an output. The input to the neural network consists of the characteristic values 
relevant to the task, such as images, audio, or text documents. The final output 
of the network provides a solution to the problem, such as classifying an image or 
predicting an outcome based on the input features. (Hristev, 1998) 
The layers of the ANN begin from the input layer and end at the output layer. These 
layers are called visible layers. The layers between the input and output layers per-
form different transformations to their inputs, and are referred to as hidden layers. 
An ANN is considered deep if it has at least two layers between the input and output 
layers. The ANN layers can therefore be broadly categorised into three types: input, 
hidden, and output layers. (Haykin, 2009; Hristev, 1998) 
The input layer receives the raw input data. The neurons in the input layer do not 
modify or perform any computation; they simply pass the data to the subsequent 
layers. The number of nodes correlates with the size of the input. After the input 
layer, the hidden layers perform transformations and extract features from the in-
put data. Depending on the model used and the complexity of the task, different 
numbers of layers and neurons in layers are used. (Hristev, 1998) In addition, the 
types of hidden layers vary; some common types of layers include (Goodfellow et 
al., 2016): 
• Dense (Fully Connected) Layers connect to every neuron in both the pre-
ceding and subsequent layers. These layers compute a weighted sum of the 
inputs, followed by the application of an activation function to introduce non-
50 Acta Wasaensia 
linearity. 
• Convolutional Layers apply learnable convolutional filters (kernels) to the 
input data in order to extract spatial features such as edges, textures, and 
patterns. These layers are particularly effective for image and spatial data 
processing. 
• Recurrent Layers are designed for processing sequential data. They maintain 
a form of memory by passing information across time steps, making them 
well-suited for tasks such as time series analysis and natural language pro-
cessing. 
• Pooling Layers reduce the spatial dimensions of the feature maps, typically 
by computing the maximum or average value within a sliding window (e.g., 
max pooling or average pooling). This operation lowers the computational 
burden and helps to mitigate overfitting by reducing spatial redundancy. 
• Dropout Layers randomly deactivate a fraction of the input neurons during 
each training iteration. This regularization technique prevents co-adaptation 
of neurons and encourages the network to learn more robust and generalizable 
features. 
The output layer is the final layer of the neural network. The number of neurons 
in the output layer corresponds to the number of classes in the classification task 
and the number of output values in the regression task. The output layer frequently 
uses activation functions, such as softmax for classification and linear activation for 
regression. (Goodfellow et al., 2016; Hristev, 1998) 
3.3.1 Convolutional Neural Networks (CNN) 
A Convolutional Neural Network (CNN) is a specialised artificial neural network. 
CNN is designed for processing and analysing grid-like data, particularly images. 
CNNs are designed to reduce the number of parameters and computational com-
plexity, which is important when images and other high-dimensional grid-like data 
are processed. (Goodfellow et al., 2016) 
The core component of a CNN (Figure 21) is the convolutional layer, and works as 
follows. The convolutional layers apply convolutional filters (kernels) to the local 
regions of the input data. These filters are passed across the input to produce fea-
ture maps. Feature maps capture local patterns, such as edges, colour information, 
textures, and shapes. Filters are specialised to recognise a specific type of feature, 
and during the learning process, the network learns the optimal filters for the given 
task. (Goodfellow et al., 2016) 
Acta Wasaensia 51 
Figure 21. Example of LeNet CNN-architecture (reproduced from: J. Gu et al. 
(2018)). 
CNNs use pooling layers to down-sample feature maps. Downsampling reduces 
the spatial dimensions and number of parameters in the network, thereby reducing 
the complexity and computational costs of the network. Downsampling also helps 
reduce overfitting and improves the ability of the network to generalise. Common 
pooling operations include max pooling, which takes the maximum value in a win-
dow, and average pooling, which computes the average value. (LeCun et al., 1989; 
LeCun, Bottou, Bengio, & Haffner, 1998) 
One feature of a CNN is the usage of non-linear activation functions, such as the 
Rectified Linear Unit (ReLU). ReLU is designed to introduce non-linearity into the 
model, which is required when patterns and relationships within the grid-like data 
are recognised. ReLU helps to accelerate the convergence of training by mitigating 
the vanishing gradient problem, which often occurs with other activation functions 
like the sigmoid or tanh. (Goodfellow et al., 2016; Nair & Hinton, 2010) 
In the final stages of a CNN, fully connected (dense) layers are used to integrate 
the high-level features extracted by the convolutional layers. The fully connected 
layers perform classification or regression tasks based on the learnt representations. 
(Goodfellow et al., 2016) 
3.3.2 Multilayer Perceptrons (MLPs) 
Multilayer Perceptrons (MLPs) (Figure 22) are a class of feedforward artificial neu-
ral networks composed of multiple layers of nodes, where each node (or neuron) 
is connected to every node in the subsequent layer. The fundamental structure of 
an MLP includes an input layer, one or more hidden layers, and an output layer. 
Each connection between nodes is associated with a weight that is adjusted during 
training to minimize prediction error. (Fiesler & Beale, 2020; Goodfellow et al., 
2016) 
52 Acta Wasaensia 
Figure 22. Multilayer Perceptron architecture. 
An MLP operates by performing a series of linear transformations followed by non-
linear activation functions (Fiesler & Beale, 2020). 
y = f(W  x + b) 
where x is the input vector, W is the weight matrix, b is the bias vector, and f is 
a nonlinear activation function such as the sigmoid, hyperbolic tangent (tanh), or 
the more commonly used Rectified Linear Unit (ReLU). The nonlinearity enables 
MLPs to learn complex, non-linear mappings between inputs and outputs. (Bishop, 
1995; Fiesler & Beale, 2020) 
MLPs are trained using supervised learning, typically via backpropagation in con-
junction with gradient descent optimization algorithms such as stochastic gradient 
descent (SGD), Adam, or RMSprop. During training, the weights are iteratively up-
dated to minimize a chosen loss function, such as mean squared error for regression 
tasks or cross-entropy loss for classification. (Fiesler & Beale, 2020) 
The primary distinction from CNNs lie in how these models process input data. 
MLPs operate on flattened input vectors, thereby disregarding any inherent spa-
tial structure. CNNs also are more parameter-efficient than MLPs. While MLPs 
are fully connected—resulting in a large number of parameters that increase rapidly 
with input size, CNNs leverage weight sharing through local receptive fields, signif-
icantly reducing the number of learnable parameters. This efficiency allows CNNs 
to scale better and generalize more effectively in tasks involving high-dimensional 
input. While MLPs offer simplicity and computational efficiency for non-spatial 
data, CNNs are inherently better equipped for tasks involving image data due to 
their ability to model local dependencies and spatial hierarchies. (Botalb, Moinud-
din, Al-Saggaf, & Ali, 2018; Haykin, 2009) 
Acta Wasaensia 53 
3.3.3 Forward and backward propagation 
Forward propagation involves passing input data through the network. The process 
begins with the input layer, where the initial data x is input to the network. The 
layers after input process the data using a set of weights W and biases b, apply-
ing an activation function f to introduce nonlinearity. (Bishop & Nasrabadi, 2006; 
Goodfellow et al., 2016) 
For a layer l, the input a(l−1) from the previous layer is transformed as follows 
(Bishop & Nasrabadi, 2006; Goodfellow et al., 2016): 
z(l) = W(l)a(l−1) + b(l) 
where z(l) is the weighted sum of the inputs. The activation function f is then 
applied to z(l) to produce the output a(l): 
a(l) = f(z(l)) 
The process is repeated layer by layer, propagating the activation forward through 
the network until it reaches the output layer. For example, in a neural network with 
L layers, the output a(L) is computed after processing through all intermediate lay-
ers. (Bishop & Nasrabadi, 2006; Goodfellow et al., 2016) If the network is working 
on a classification problem, the final result is commonly passed through a softmax 
function, which produces a probability distribution over the classes. (Bishop & 
Nasrabadi, 2006; Goodfellow et al., 2016) 
During training, forward propagation provides the predicted output required to cal-
culate the loss function, which quantifies the error between the predicted and ac-
tual outputs. This error is used in backpropagation to adjust the network weights 
and biases in order to minimise the loss function and improve model performance. 
(Hristev, 1998; LeCun, Touresky, Hinton, & Sejnowski, 1988) 
Backpropagation works vice versa as forward propagation, and enables the model to 
learn from the data by adjusting weights and biases to minimise the loss function. In 
the iterative optimisation process, backward propagation follows forward propaga-
tion. After forward propagation generates the output, the loss function L measures 
the difference between the predicted output and the actual target. The backpropa-
gation updates the network parameters to reduce this loss. (Hristev, 1998; LeCun et 
al., 1988) 
Backward propagation begins by computing the gradient of the loss function relative 
to the output of the network. Using the chain rule of calculus, these gradients are 
54 Acta Wasaensia 
propagated backward through the network layers to calculate the gradients of the 
loss function relative to weights and biases. For a given layer l, the gradient of 
the loss L with respect to the weights W(l) and biases b(l) is computed as follows: 
(Hristev, 1998; Rumelhart, Hinton, & Williams, 1986) 
@L 
@W(l) 
= (l)a(l−1)
T 
@L 
@b(l) 
= (l) 
where (l) represents the error term of layer l, and a(l−1) is the activation of the 
previous layer. The error term (l) is computed recursively using: 
(l) = (W(l+1)
T 
(l+1)) ⊙ f ′ (z(l)) 
where ⊙ denotes the element-wise Hadamard product, and f ′ (z(l)) is the derivative 
of the activation function applied to the input z(l) . 
Once the gradients are computed for all layers, the weights and biases are updated 
using an optimisation algorithm such as gradient descent as follows: 
W(l) = W(l) −  @L 
@W(l) 
b(l) = b(l) −  @L 
@b(l) 
where  is the learning rate, which controls the size of the steps taken to minimise 
the loss. 
The backpropagation is repeated for many iterations over the training dataset until 
the model converges to a solution with minimal loss. This process allows the neural 
network to learn the optimal parameters that best fit the data, thus improving its 
performance for the given task. (Goodfellow et al., 2016) 
3.3.4 Optimisation algorithms 
Optimisation algorithms enable neural networks to learn from data by adjusting 
their weights and biases to minimise prediction errors. Learning algorithms use 
optimisation algorithms to iteratively update network parameters to reduce the loss 
function. Previous research has developed several learning algorithms suitable for 
Acta Wasaensia 55 
different problems. (Goodfellow et al., 2016) 
One of the most used optimisation algorithms is Gradient Descent (GD), which 
works by iteratively adjusting the model parameters in the direction of the steepest 
descent of the loss function, based on the gradient (partial derivatives) of the loss 
relative to the parameters. The update rule for  at each iteration t is computed as 
follows: 
t+1 = t − ∇L(t) 
where  is the learning rate, which is a hyperparameter that controls the step size, 
and ∇L(t) is the gradient of the loss function L at t. (Cauchy et al., 1847). The 
correct learning rate is crucial for optimal parameters to be found; too large a learn-
ing rate might make the algorithm overshoot the minimum or cause divergence; too 
small a learning rate makes the training process slow, and the process might not 
reach the best parameters within the given training epochs. In GD and other func-
tions that do not use the adaptive learning rate, the learning rate can be adjusted by 
a certain constant, such as 0.1, for every fixed number of training epochs. (Takase, 
Oyama, & Kurihara, 2018) 
GD variants have been developed to address the challenges of GD in optimising 
convergence speed and computational efficiency. These include the Stochastic Gra-
dient Descent (SGD) and the momentum. SGD updates the model weights using 
a single training example at a time rather than the entire dataset. This mechanism 
introduces randomness, which helps escape local minima and explore the parameter 
space more effectively. The update rule for SGD is calculated with: (Goodfellow et 
al., 2016) 
w := w − ∇Li(w) 
where the other variables are the same as in GD, Li(w) is the loss for a single train-
ing example. SGD differs from GD so that it approximates the gradient using only 
a single datapoint. This saves time compared to GD. However, it can cause the 
weights to oscillate around the minimum rather than converge smoothly. (Goodfel-
low et al., 2016) 
Momentum has been developed as an enhancement of SGD. The aim of Momentum 
is to accelerate gradient vectors in the right direction so that learning convergence is 
faster and oscillation is reduced. The approach used in Momentum is to add a frac-
tion of the previous update to the current update to maintain a velocity vector that 
accumulates gradients over time. The momentum is calculated using the following 
equation: (Polyak, 1964; Rumelhart et al., 1986) 
56 Acta Wasaensia 
v := v + ∇L(w) 
w := w − v 
where v is the velocity and  is the momentum term. The momentum term is typi-
cally set between 0 and 1. 
The previously mentioned optimising algorithms are examples of traditional (non-
adaptive) optimisers, and use a fixed or globally adjusted learning rate for all param-
eters throughout the training process. Versions in which individual parameters are 
up-dated dynamically can be used instead of traditional algorithms, such Adagrad, 
Adam, RMSprop and AdaDelta algorithms as examples. 
Adagrad (Adaptive Gradient Algorithm) adapts the learning rate for each parameter 
by scaling it inversely proportional to the square root of the sum of all historical 
squared gradients. Adagrad can perform larger updates for infrequent parameters 
and update frequent parameters with smaller values. In the Adagrad model, the 
learning rate continuously decays, leading to stopping the learning process when it 
is running for too long. The Adagrad update rule is as follows: (Duchi, Hazan, & 
Singer, 2011) 
w := w −  √ 
Gii +  
∇L(w) 
where G is the sum of the squares of past gradients, and  is a small constant that 
prevents division by zero. 
As Adagrad aggressively decreases the learning rate, AdaDelta was developed to 
allow the algorithm to continue learning even after many updates. AdaDelta uses 
a moving window for gradient updates, and this window restricts cumulative past 
gradients to a fixed size. When AdaDelta is used, there is no need to set the initial 
learning rate because AdaDelta adapts the learning rates based on historical gradient 
information. (Zeiler, 2012) 
Gt := Gt−1 + (1 − )(∇L(w)) 2 
w := − 
√ 
wt−1 +  √ 
Gt +  
∇L(w) 
RMSprop (Root Mean Square Propagation) addresses Adagrads decaying learning 
rate problem by maintaining a moving average of squared gradients to normalise the 
gradient. RMSprop tries to maintain the effective learning rate throughout training. 
The RMSprop update rule is calculated as (Tieleman & Hinton, 2012): 
Acta Wasaensia 57 
Gt := Gt−1 + (1 − )∇L(w)2 
w := w −  √ 
Gt +  
∇L(w) 
where  is the decay rate. The decay rate is typically set to 0.9. (Zou, Shen, Jie, 
Zhang, & Liu, 2019) 
The Adam (Adaptive Moment Estimation) optimising algorithm combines the ad-
vantages of RMSprop and Momentum. Adam computes adaptive learning rates for 
each parameter using estimates of the first and second moments of the gradients. 
The Adam update rules are as follows: (Kingma & Ba, 2014) 
mt := 1mt−1 + (1 − 1)∇L(w) 
vt := 2vt−1 + (1 − 2)(∇L(w)) 2 
m^t := 
mt 
1 − t 1 
v^t := 
vt 
1 − t 2 
w := w − m^t √ 
v^t +  
where 1 and 2 are the decay rates for moment estimates, typically set at 0.9 and 
0.999, respectively. (Zou et al., 2019) 
Comparison and research of optimising algorithms is a common topic in neural 
networks research. This ongoing development has led to various other optimising 
algorithms and also results that guide choosing the most suitable learning algorithm 
in relation to the used neural network and challenge. 
3.3.5 Over- and underfitting 
The training process of a neural network can face two major problems, overfitting 
and underfitting. In overfitting, the model learns the details and noise in the training 
data. The core of the overfitting problem lies in the excessive use of neurons in the 
network. In overfitting, the model captures not only the underlying patterns but also 
the random fluctuations and outliers in the training data. (Goodfellow et al., 2016) 
This negatively impacts the performance of the model on unseen data and prevents 
the model from generalising on a given task. The reasons for overfitting vary. The 
58 Acta Wasaensia 
most common reasons include a small or biased dataset, noise in training samples, 
high variance in model predictions, too complex a model, and not stopping the 
training procedure before convergence. (Bejani & Ghatee, 2021). 
In overfitting, the model performs exceptionally well on the training data, which can 
be observed as low error rates. However, during the validation phase the model fails 
to generalise. This leads to high error rates and a significant gap between training 
and validation performance metrics, such as accuracy, precision, recall, and loss 
values. (Deisenroth et al., 2020; Goodfellow et al., 2016) 
There are many ways to control overfitting; Bejani and Ghatee proposed three con-
trol mechanisms: passive, active, and semi-active. Passive methods look for the 
best possible neural network model and hyper-parameter optimisation techniques 
(Bejani & Ghatee, 2021). Common L1 and L2 regularisation are considered as pas-
sive control methods. L1 adds a penalty proportional to the absolute value of the 
coefficients in the loss function and L2 adds a penalty proportional to the square 
of the coefficients to the loss function (Tibshirani, 1996). Active control meth-
ods introduce noise into the learning model or the training algorithm so that the 
model cannot memorise the connection between the data and the output (Bejani 
& Ghatee, 2021). The methods in this control scheme include dropout (Srivas-
tava, Hinton, Krizhevsky, Sutskever, & Salakhutdinov, 2014), data augmentation 
(Szeliski, 2022), normalisation (Rafiq, Bugmann, & Easterbrook, 2001), gradient 
centralisation (Yong, Huang, Hua, & Zhang, 2020) and the well-chosen activation 
functions. Semiactive methods can also modify the network by pruning (Sietsma & 
Dow, 1988) or by building the network as seen in (Bejani & Ghatee, 2021). 
One technique not mentioned in the Bejani and Ghatee categorisation is early stop-
ping, which can be considered a form of passive regularisation. Early stopping halts 
the training process when performance of the model on a validation set ceases to 
improve, thereby preventing overfitting. If training is stopped too early, the model 
may be underfitted — too simplistic to capture the underlying patterns in the data. 
Conversely, if training continues too long, the model risks overfitting to the training 
data, reducing its ability to generalise to unseen samples (Goodfellow et al., 2016; 
Prechelt, 2002). 
Underfitting typically occurs when the model is too simple or lacks sufficient ca-
pacity (i.e., too few parameters) to represent the complexity of the problem. In such 
cases, the model fails to perform well both on training data and on new, unseen data. 
(Deisenroth et al., 2020; Goodfellow et al., 2016) 
An underfitted model has high bias and low variance. High bias indicates that the 
model has made strong assumptions about the data and could not adapt to the nu-
ances of the dataset. Low accuracy and/or high error rates in training and validation 
indicate underfitting. (Goodfellow et al., 2016; Hastie et al., 2009) 
Acta Wasaensia 59 
Strategies to reduce underfitting include increasing model complexity and feature 
engineering. Increasing the complexity of the model can be achieved by incorpo-
rating additional layers and neurons in a neural network or by using more complex 
models (H¨ oge, W¨ ohling, & Nowak, 2018). In feature engineering, more informa-
tive features are created from existing data, providing the model with more inputs 
(Beaugnon & Chifflier, 2018; Emmert-Streib & Dehmer, 2019). 
3.3.6 Cross-validation 
Cross-validation is a process that can be used to obtain more reliable information 
about a model’s performance, especially when available data is limited. Cross-
validation also makes it possible to maximise the use of the dataset. Compared to 
processes without cross-validation, cross-validation provides a more robust estimate 
of model accuracy, variance, and generalisation error. Cross-validation involves 
partitioning the dataset into multiple subsets or folds, allowing the model to be 
trained and tested on different combinations of these folds. Cross-validation not 
only provides a more accurate estimate of model performance but also mitigates 
issues such as overfitting. (Craven & Wahba, 1978; Deisenroth et al., 2020) 
K-Fold cross-validation (Figure 23) is a commonly used cross-validation technique. 
The K-fold cross-validation divides the dataset into k equal sized folds. The model 
is then trained k times, each time using k − 1 folds for training and the remaining 
fold for testing. The items belong to the same fold in each fold. For each iteration 
of training, different folds were used for training and testing. The overall perfor-
mance is obtained by averaging the performance metrics for each of the k iterations. 
(Deisenroth et al., 2020; Hastie et al., 2009) The value of k is not fixed, but some 
common values include 5 and 10 (Wong & Yeh, 2019). If 5-folds are used, then 
80% of the data is used for training in each iteration. 
Figure 23. Cross-validation process. 
Stratified cross-validation is a variation of K-fold cross-validation, where folds are 
60 Acta Wasaensia 
created to preserve the proportion of each class in each fold. Stratified cross-
validation ensures that each fold is representative of the overall class distribution. 
Stratified cross-validation is especially suitable for unbalanced datasets, where skewed 
performance metrics may appear. (Han et al., 2012b) 
Leave-One-Out Cross Validation (LOOCV) is a special case of K-fold Cross Vali-
dation, where k equals the number of items in the dataset. Each fold contains ex-
actly one item for testing, and the remaining k − 1 datapoints are used for training. 
LOOCV is useful for small datasets; however, it can be computationally expensive. 
LOOCV trains the model on all item points except for one, and tests with a single 
item. As with the standard K-Fold, the performance metrics are the average of all 
iterations. (Han et al., 2012b; James et al., 2023) 
3.3.7 Transfer learning 
Recent developments in neural networks and machine learning have led to models 
that are well trained but are task specific. Transfer learning (Figure 24) can be used 
to fine-tune these models for another task. Transfer learning leverages knowledge 
gained from previously solved problems and adapts to solve new but related tasks. 
Transfer learning can be used to speed up the training process. However, it is also 
useful if a limited amount of data is available for the new task. In transfer learning, 
the weights of the pre-trained model are fine-tuned with the new dataset. (Han et 
al., 2012a; Yosinski, Clune, Bengio, & Lipson, 2014) 
Figure 24. Transfer learning process. 
Transfer learning is a popular research topic in machine learning, which has led to 
different categorisation of transfer learning, as well as different approaches. One 
Acta Wasaensia 61 
way is to categorise transfer learning into inductive, transductive, and unsupervised 
transfer learning. In inductive transfer learning, the target task differs from the 
source task. The domain of the task can be similar; however, it does not need to be. 
In an inductive transfer teaching model, a new domain is considered as multi-task 
learning. (Niu, Liu, Wang, & Song, 2020) 
Multitask learning is an inductive transfer method that enhances generalisation by 
leveraging the domain knowledge embedded in the training data of related tasks 
as an inductive bias. This is achieved by learning multiple tasks using common 
models. In multitask learning, the knowledge gained from another task can help 
improve the learning of other tasks. (Caruana, 1997) 
Multitask learning uses hard or soft parameter sharing. In hard parameter-sharing 
networks, the shared part is followed by the task-specific section. In soft parameter 
sharing, each network has its own set of parameters, but the network architecture is 
the same. Multitask learning is also considered as either homogeneous or heteroge-
neous. In homogeneous learning, each task corresponds to a single output, and in 
heterogeneous learning, each task corresponds to a unique set of output labels. This 
dissertation uses multitask learning, where hard parameter sharing is used to solve 
homogeneous transfer learning. (Sun, Panda, Feris, & Saenko, 2020) 
A typical case of multitask learning is when the model has been trained with a 
certain (image) dataset, and then this model is modified to solve the problem of 
another dataset. This can be achieved by maintaining the sharing of the lower-level 
representations of the model and training only the higher-level layers. 
The transfer learning process is typically done in the following steps: 1) selection 
of the pre-trained model, 2) configuration of the pre-trained model, and 3) training 
the model for the target domain. The process of configuration can be performed by 
freezing the pre-trained layers, removing the last (task-specific) layer, and adding 
a new layer that fits the new task. (Zhong & Ban, 2022). During transfer learning, 
general feature extraction layers can have their weights ”frozen,” meaning that they 
are not updated during training on the new task. This helps to preserve the pre-
trained knowledge and often leads to faster and more effective learning on the new 
task. (Goodfellow et al., 2016) These early layers capture fundamental patterns like 
edges or textures, applicable across various visual tasks. 
62 Acta Wasaensia 
4 METHODS 
4.1 Dataset for the study 
This thesis uses various datasets for Articles I - IV, some datasets are used in mul-
tiple articles. To support the creation of a solid and replicable dataset, all materials 
were printed using a conventional LaserJet office printer on A4-sized plain paper. 
While not formally standardized, the printing process was kept consistent across 
all samples to minimize variability in print quality and ensure comparability. The 
materials were captured with an iPhone 7, an iPhone 11 Pro and a Nokia TA-1032, 
with their built-in camera applications. All of the camera applications used were set 
to use automatic settings for white balance and other settings. The purpose was to 
mimic a consumer taking a photo. 
In the first article, colour recognition was performed for a dataset containing 1260 
images of a QR code with colour embed inside as a bar The colours used were 
magenta, cyan, yellow, black, red, green, and purple. For each colour five different 
intensity levels (100%, 80%, 60%, 40%, 20%) were used. Article I also uses some 
of the QR codes that were printed with functional inks in black (C = 0.0, M = 0.0, 
Y = 0.0, K = 1.0 ), magenta (C = 0.0, M = 1.0, Y = 0.0, K = 0.0) and green (C = 
0.4, M = 0.0, Y = 0.4, K = 0.0 ). The distribution of samples is shown in Table 1, 
and each class had at least 12 images that were used for the calculations. 
Table 1. Distribution of samples in Article I. 
Intensity Cyan Purple Green Black Magenta Red Yellow 
0% 50 
20% 19 19 76 19 83 19 13 
40% 19 19 83 18 64 19 13 
60% 19 19 58 15 71 19 13 
80% 19 16 63 19 63 20 13 
100% 19 16 129 20 85 19 12 
Total 95 105 409 91 366 96 80 
For Article I, the images were captured both in a controlled environment and in 
everyday situations (Figure 25). The controlled environment was a 25 cm × 20 cm 
× 40 cm white box. Inside the box was a moving sledge for distance control and 
adjustable LED lights for ambient lighting control. In the controlled environment, 
two levels of ambient light, 400 lx (office ambient light) and 15 lx (dark environ-
ment) were used. The images captured in everyday situations were taken in normal 
home, office, and retail environments. These environments had a varying (warm or 
cold) colour of light, with ambient light level over 200 lx. However the light level 
Acta Wasaensia 63 
was not measured due multiple locations used, but it was sufficient for completing 
general tasks. 
Figure 25. Samples of images used in Article I. 
The dataset used in Article I had challenges that were fixed for the dataset 1 (DS1) 
used in the Article II. The challenges were related to the bar which was used to 
embed colour inside the QR code. For DS1 the different colours (black, white and 
colour) were clearly separated from each other and given square area. In this way, 
the extraction of the data was more feasible. The final DS1 (Isohanni, 2023) used 
25 different modified QR codes with different colour and intensities. QR codes had 
three colour zones in them: black, white, colour. The colour zone was the colour 
which was sought to be recognised correctly. The black zone was printed with pure 
black (100K / CMYK(0.0, 0.0, 0.0, 1.0)) colour, the white zone had no colour (paper 
white), and the third zone was printed with specific colour. In total DS1 had 722 
images distributed as shown in Table 2. 
Table 2. Distribution of samples in DS1. 
Intensity Cyan Green Black Magenta Yellow 
0% 20 
20% 25 25 26 33 22 
40% 27 34 18 31 34 
60% 33 25 20 42 26 
80% 27 31 25 32 23 
100% 29 40 21 29 24 
Total 141 155 130 167 129 
In the example Figure 26, the colour area has saturation 20% in magenta (20M 
/ CMYK(0.0,0.2,0.0,0.0)). The dataset was created by printing colour areas with 
different ink saturations (20%, 40%, 60%, 80% and 100%). Figure 27, shows an 
64 Acta Wasaensia 
Figure 26. One sample of the DS1 and DS2 image, and extraction of colour areas . 
example of QR codes with colour saturation 20% ... 100% in a magenta channel 
(M). The other colours and combinations of colours were C (cyan), M (magenta), Y 
(yellow), K (black), and CY (green). This dataset was further developed to dataset 
2 (DS2) (Isohanni, 2024a) by including additional saturation levels (10% and 5%). 
This was done so that the algorithms and models could be tested with very subtle 
differences. DS2 had a total of 561 images, distributed as shown in Table 3. 
Table 3. Distribution of samples in DS2. 
Intensity Cyan Green Black Magenta Yellow 
0% 52 
5% 45 55 50 47 50 
10% 53 55 49 51 54 
Total 98 110 151 98 104 
The images in both datasets were acquired using two consumer-grade mobile de-
vices (distinct iPhone models) under typical ambient lighting conditions. The con-
ditions which are commonly found in residential and office settings. The datasets 
cover a range of variations in colour temperature, illumination levels, imaging dis-
tances, and device-specific camera settings. This variability was intentionally pre-
served to simulate various real-world scenarios encountered in everyday contexts. 
The dataset used was relatively small, so for supervised learning, data augmentation 
was used with the following options to extend the dataset, rotation of 90 degrees, 
0.2 width shift, 0.2 height shift, 0.2 shear range, 0.2 zoom range, horizontal flip and 
vertical flip. 
In Articles III and IV DS1 and DS2 were split into training (80%) and validation 
(20%) datasets, and in the last experiments of Article IV K-Fold cross-validation 
Acta Wasaensia 65 
(five times) was also used. 
Figure 27. Samples of DS1 with varying colour intensity. 
4.2 Article I - Mathematical methods 
Article I explores the embedding of functional ink inside a QR code and how accu-
rately the colour of the indicator can be read using mathematical methods. For this 
dissertation, the most important part of Article I is colour recognition. Article I uses 
two different colour areas, white and colour, the mean colour value of these areas 
is calculated, and then values are compared. The colour difference is calculated 
using the CIEDE2000 algorithm using different parametric values for KL, KC, and 
KH. These parameters were CIEDE2000 (1, 1, 1), CIEDE2000 (2.76, 1.58, 1), and 
CIEDE2000 (2, 1, 1). 
4.3 Article II - Unsupervised learning 
Article II uses the DS1 and DS2 datasets and focusses on using common unsuper-
vised learning methods to recognise colours. In this article, the process used is 
shown in the following Figure 28. The process takes an RGB JPEG image as input 
and outputs the LAB values of three (white, black, and colour) cluster centres. After 
clustering, the cluster closest (smallest DeltaE) to CIELAB(1.0, 0.0, 0.0) is labelled 
as ”white”. The cluster closest to CIELAB(0.0, 0.0, 0.0) gets the label ”black”. 
The final cluster is labelled ”colour”. If only two clusters, or more than three major 
clusters are found, the result of the clustering process is considered to have failed. 
Finally, Delta-E between the white and colour clusters is calculated. This value 
indicates how different or similar colours are on a scale 0 ... 100. Clusters were 
considered to represent different colours if their Delta-E was > 2:0. Article II com-
pares the K-mean, Fuzzy C-Means, DBSCAN, Mean shift, Hierarchical clustering, 
Spectral clustering, Gaussian Mixture Model (GMM), BIRCH and OPTICS algo-
rithms. 
66 Acta Wasaensia 
Figure 28. The process used in Article II. 
4.4 Articles III & IV - Convolutional neural networks 
Articles III and IV use the DS1 and DS2 datasets. The images are processed partly 
through the same process as described in Article II, image auto-levelling and an 
extraction of colour areas. The difference from Article II is that the images are 
slightly blurred, with Gaussian Blur, to reduce noise that might impact the training 
process. After this, a difference image is calculated and the image is resized to 
256x256. This difference image represents the difference between paper-white and 
colour, and is labelled with a label that defines colour. 
In Article III, the dataset is divided 80% into training data and 20% into validation 
data. Article IV used K-Fold cross-validation for a final evaluation of the CNN 
model. These articles use data augmentation to expand the training set. Data aug-
mentation was performed with rotation, width shift, height shift, shear, and zoom 
options. 
Articles III and IV also used transfer learning to transfer knowledge from DS1 to 
DS2. First, the neural network is trained with DS1. Then, the top layers of the 
network were changed to match the DS2, and the network is trained again. For 
the DS1 models were trained with 30 epochs and for the DS2 with 15 epochs. 
Acta Wasaensia 67 
The Article II used standard versions of each architecture. Optimisations regarding 
hyper-parameters, such as momentum, learning rate, and batch size, were not used, 
and exploring them with different architectures was left for further research. All 
CNN models were compiled using the cross-entropy loss function and the Adam 
optimiser. The CNNs used in Article III were AlexNet and ZFNet, VGG, ResNet, 
GoogLeNet, DenseNet and EfficientNet. 
Article IV has the same general approach as taken in Article III. In this article, 
the standard versions of the ResNet-18, ResNet-34, ResNet-50, ResNet-101, and 
ResNet-153 architectures are trained with DS1 and DS2. The architectures were 
compared with two different approaches, the first being the standard architecture 
and the second using gradient centralisation (GC). GC is an optimisation technique 
that normalises gradients during training (Yong et al., 2020). Normalisation of gra-
dients can help the training process to focus on the relevant signals rather than being 
dominated by noise or extreme gradients. Finally, a custom ResNet-34 was built by 
experimenting with different modifications to the standard architecture. These mod-
ifications include adding a dropout layer to the fully connected layers, adding batch 
normalisation at the end of the residual block, and using different pooling methods. 
68 Acta Wasaensia 
5 RESULTS 
5.1 Article I - Mathematical methods 
In Article I, first experiments discuss how the indicator embedded inside a QR code 
affects the decoding distance. The results show that the distance change is only 
small. This is related to the Reed-Solomon error correction, which reconstruct lost 
or unreadable data. These results give insight on how embedding should be done, 
but do not directly contribute to the main theme of the dissertation. The second 
experiment is more interesting as it experiments different CIEDE2000 algorithms 
with three different parametric (KL, KC and KH) values (1, 1, 1), (2.76, 1.58, 1) and 
(2, 1, 1). These parametric values were selected based on findings of past research 
done in the field. 
The best-performing algorithm in the study was CIEDE2000 with the parameter 
configuration (2.76, 1.58, 1). This version of the algorithm demonstrated the high-
est and most consistent accuracy in recognising sensor states, particularly in situ-
ations where the colour intensity of the functional ink was low. Compared to the 
other tested configurations, CIEDE2000(2.76, 1.58, 1) yielded broader and more 
distinct value ranges, which improved the clarity of sensor state differentiation. Fur-
thermore, in the third experiment involving real-world printed markers with actual 
functional inks, this parameter setup maintained strong performance by producing 
stable and interpretable sensor values. Configuration (2.76, 1.58, 1) was determined 
to be the most accurate and robust option, of the used ones, for detecting the func-
tional ink state in smart tags. 
The results of the experiment two in Article I are shown Tables 4, 5 and 6. In these 
tables each row corresponds to a specific CMYK colour used in the simulated sensor 
area, including both primary colours (such as cyan, magenta, yellow, and black) and 
secondary or mixed colours (such as red, green, and purple). The first row (”No 
sensor”) provides the baseline colour difference range for a non-active sensor area. 
The columns represent different intensity levels of the sensor ink, ranging from 
20% to 100%. For each combination of colour and intensity, tables shows the range 
of colour difference values calculated by the algorithm. These values reflect how 
strongly the sensor area differs from the reference (non-sensor) area, with higher 
values indicating a more detectable change. 
The results in Article I show that the CIEDE2000 algorithm, with any of the para-
metric settings mentioned, can reliably recognise the indicator state accurately if the 
indicator intensity is 40%. When the indicator intensity decreases to 20% or below, 
incorrect results are occasionally provided, and the reliability decreases. These in-
correct results appeared especially with the yellow ink. The results also demonstrate 
Acta Wasaensia 69 
CMYK 20% 40% 60% 80% 100% 
No sensor [0.00 ... 0.05] 
(0, 0, 0, 1) [0.16 ... 0.27] [0.31 ... 0.60] [0.58 ... 0.87] [0.82 ... 0.95] [0.60 ... 1.00] 
(1, 0, 0, 0) [0.07 ... 0.17] [0.17 ... 0.36] [0.28 ... 0.36] [0.37 ... 0.49] [0.47 ... 0.61] 
(0, 1, 0, 0) [0.06 ... 0.24] [0.16 ... 0.31] [0.29 ... 0.46] [0.29 ... 0.61] [0.43 ... 0.80] 
(0, 0, 1, 0) [0.04 ... 0.12] [0.08 ... 0.17] [0.08 ... 0.16] [0.12 ... 0.14] [0.14 ... 0.23] 
(1, 1, 0, 0) [0.15 ... 0.28] [0.41 ... 0.51] [0.64 ... 0.69] [0.81 ... 0.88] [0.75 ... 0.95] 
(0, 1, 1, 0) [0.08 ... 0.18] [0.17 ... 0.31] [0.22 ... 0.47] [0.35 ... 0.61] [0.44 ... 0.61] 
(1, 0, 1, 0) [0.10 ... 0.25] [0.22 ... 0.31] [0.31 ... 0.55] [0.39 ... 0.42] [0.53 ... 0.75] 
Table 4. Results from the Article I, second experiment, CIEDE2000(1, 1, 1) 
algorithm. 
CMYK 20% 40% 60% 80% 100% 
No sensor [0.00 ... 0.05] 
(0, 0, 0, 1) [0.16 ... 0.27] [0.31 ... 0.60] [0.58 ... 0.87] [0.82 ... 0.95] [0.60 ... 1.00] 
(1, 0, 0, 0) [0.08 ... 0.16] [0.19 ... 0.36] [0.30 ... 0.37] [0.41 ... 0.50] [0.50 ... 0.63] 
(0, 1, 0, 0) [0.10 ... 0.24] [0.23 ... 0.34] [0.36 ... 0.49] [0.36 ... 0.63] [0.48 ... 0.81] 
(0, 0, 1, 0) [0.11 ... 0.15] [0.16 ... 0.21] [0.21 ... 0.39] [0.30 ... 0.33] [0.27 ... 0.32] 
(1, 1, 0, 0) [0.19 ... 0.29] [0.28 ... 0.38] [0.65 ... 0.72] [0.82 ... 0.90] [0.75 ... 0.97] 
(0, 1, 1, 0) [0.09 ... 0.18] [0.17 ... 0.31] [0.23 ... 0.47] [0.36 ... 0.61] [0.46 ... 0.63] 
(1, 0, 1, 0) [0.14 ... 0.29] [0.28 ... 0.38] [0.36 ... 0.66] [0.45 ... 0.53] [0.56 ... 0.76] 
Table 5. Results from the Article I, second experiment, CIEDE2000(2.76, 1.58, 1) 
algorithm. 
CMYK 20% 40% 60% 90% 100% 
No sensor [0.00 ... 0.05] 
(0, 0, 0, 1) [0.16 ... 0.27] [0.31 ... 0.60] [0.58 ... 0.87] [0.82 ... 0.95] [0.60 ... 1.00] 
(1, 0, 0, 0) [0.07 ... 0.16] [0.18 ... 0.36] [0.30 ... 0.37] [0.39 ... 0.49] [0.48 ... 0.62] 
(0, 1, 0, 0) [0.08 ... 0.24] [0.20 ... 0.32] [0.33 ... 0.48] [0.33 ... 0.62] [0.45 ... 0.80] 
(0, 0, 1, 0) [0.08 ... 0.13] [0.13 ... 0.19] [0.15 ... 0.29] [0.23 ... 0.25] [0.22 ... 0.26] 
(1, 1, 0, 0) [0.17 ... 0.29] [0.43 ... 0.52] [0.64 ... 0.70] [0.81 ... 0.89] [0.75 ... 0.95] 
(0, 1, 1, 0) [0.09 ... 0.18] [0.18 ... 0.31] [0.23 ... 0.48] [0.36 ... 0.61] [0.46 ... 0.63] 
(1, 0, 1, 0) [0.12 ... 0.27] [0.25 ... 0.34] [0.34 ... 0.60] [0.42 ... 0.47] [0.54 ... 0.75] 
Table 6. Results from the Article I, second experiment, CIEDE2000(2, 1, 1) 
algorithm. 
70 Acta Wasaensia 
that the algorithms yield values that vary widely, especially with higher intensities. 
This means that a threshold can be set for recognition if colours are different, but 
accurate recognition of colours is not feasible with the CIEDE2000 algorithm when 
printed sources are used in different environments. In such environments, camera 
accuracy, ambient light, print quality, and paper quality have a significant impact, 
so only a limited number of colour shades can be recognised accurately. 
5.2 Article II - Unsupervised learning 
Article II compares different unsupervised clustering methods. In these experi-
ments, the clustering process was considered successful if it could recognise three 
different clusters and the clusters were formed correctly. The clusters were correctly 
formed if the Delta-E value between the white and the colour cluster was equal to 
or smaller than 2.0, compared to the ground-truth. The ground-truth was obtained 
from the images before the datapoints were combined and sent to an unsupervised 
algorithm. The Delta-E value 2.0 is considered to be the smallest colour difference 
that an inexperienced human observer can notice (Mokrzycki & Tatol, 2011). 
In Article II, the success rate was calculated for each image in the dataset. 
success rate = 
 
correctly clustered images 
total images 
 
The first experiment explored various unsupervised clustering methods. Feasible 
parameters or their combinations were used for each method. The results are shown 
in the following Table 7, where the runtime presented is the average runtime per 
image. 
The results of the experiment show that multiple clustering methods are suitable 
if the difference in colour between white paper is sufficiently large (equal to or 
greater than 20%). K-Means, with K-Means++ initialisation, performed well over-
all, particularly benefiting from its hard clustering nature, although it struggled with 
low-density ink colours and outliers. C-Means, while generally effective, faced dif-
ficulties with close clusters and low-density colours, requiring high-fuzziness pa-
rameters for better results. DBSCAN frequently failed due to incorrectly classify-
ing datapoints as noise, especially in mid-intensity colours. GMM performed sim-
ilarly to K-Means but with fewer issues, and although it struggled with noisy data, 
it was still the best algorithm in larger differences. BIRCH performed better than 
expected for non-hierarchical data, but was sensitive to outliers and data ordering. 
Hierarchical clustering was effective, particularly with Ward linkage, but struggled 
Acta Wasaensia 71 
Table 7. Results of the clustering process algorithm with colour difference 
>= 20%. 
Method (parameters) Success rate Runtime (s.) 
K-means (algorithm = elkan) 98.1% 0.052 
K-means (algorithm = full) 98.1% 0.051 
C-means (m = 1.0) 96.1% 0.034 
C-means (m = 2.0) 96.6% 0.027 
C-means (m = 3.0) 97.1% 0.039 
C-means (m = 4.0) 97.6% 0.051 
C-means (m = 5.0) 97.6% 0.050 
C-means (m = 6.0) 97.6% 0.068 
DBSCAN (eps = 2.5, min samples = 25%) 19.5% 0.172 
DBSCAN (eps = 5.0, min samples = 25%) 70.1% 0.176 
DBSCAN (eps = 10.0, min samples = 25%) 78.1% 0.191 
DBSCAN (eps = 10.0, min samples = 16.7%) 81.6% 0.177 
DBSCAN (eps = 10.0, min samples = 33.3%) 25.2% 0.167 
DBSCAN (eps = 15.0, min samples = 25%) 70.1% 0.181 
GMM (covariance type = full) 99.0% 0.085 
GMM (covariance type = tied) 97.6% 0.071 
GMM (covariance type = diag) 99.0% 0.071 
BIRCH (threshold = 1.0, branching factor = 50) 78.5% 0.285 
BIRCH (threshold = 0.5, branching factor = 50) 81.9% 0.344 
BIRCH (threshold = 0.25, branching factor = 50) 85.3% 0.351 
BIRCH (threshold = 0.25, branching factor = 100) 84.0% 0.354 
BIRCH (threshold = 0.25, branching factor = 25) 83.5% 0.410 
BIRCH (threshold = 0.1, branching factor = 50) 85.0% 0.380 
Hierarchical (affinity=euclidean, linkage=ward) 97.7% 0.537 
Hierarchical (affinity=euclidean, link-
age=complete) 
82.9% 0.414 
Hierarchical (affinity=euclidean, link-
age=average) 
92.7% 0.468 
Hierarchical (affinity=euclidean, linkage=single) 84.2% 0.417 
Spectral (affinity=radial basis function) 98.1% 2.870 
Spectral (affinity=nearest neighbour) - -
Meanshift (quantile = 0.5) 5.2% 18.30 
Meanshift (quantile = 0.75) 0% 17.50 
Meanshift (quantile = 0.4) 5.0% 15.60 
OPTICS (eps = 2.5, min samples = 25%) 17.7% 5.120 
OPTICS (eps = 5.0, min samples = 25%) 70.0% 24.12 
OPTICS (eps = 10.0, min samples = 25%) 17.7% 5.12 
72 Acta Wasaensia 
with specific low-density colours. Spectral clustering performed reasonably well, 
but was affected by outliers. Meanshift was slow and ineffective in clustering the 
data. OPTICS, although similar to DBSCAN, achieved only a success rate of 70% 
and was slower. A closer inspection of each algorithm is presented in Article III. 
Some common observations are that as clusters become closer to each other, it is 
very challenging for unsupervised learning to identify to which cluster datapoint 
belongs to. Most algorithms also suffer from noise and outliers, noise is common 
in the presented use case and comes from many source and transitions of colours. 
Article II also presents another experiment in which the intensity difference with 
respect to paper-white was 0%, 5%, or 10% in the specified CMYK channels. For 
this experiment, the best algorithms were chosen K-means, C-means (m = 3.0), 
GMM (covariance = full), hierarchical clustering (affinity = Euclidean, linkage = 
Ward) and spectral clustering (affinity = radial basis function). The results of this 
experiment are shown in Table 8. 
Table 8. Results of the second experiment, colour difference <= 10%. 
Method (parameters) 10% intensity 5% intensity 
C-means (m = 3.0) 94.8% 83.3% 
K-means 96.3% 89.3% 
GMM (covariance = full) 92.5% 84.5% 
Hierarchical clustering (affinity=euclidean, 
linkage=ward) 
93.4% 84.5% 
Spectral clustering (affinity = radial basis 
function) 
97.8% 76.2% 
From the results, it can be seen that the success rate of all algorithms was lower 
than in the first experiment. The algorithms performed quite well when the inten-
sity of the colour was 10%; however, with a small intensity of 5%, only the K-means 
achieved an almost 90% success rate. The challenges come mainly from the ma-
genta and yellow colours. This is shown in Figure 29, where images a) and b) have 
a colour intensity of 10% and c) and d) have an colour intensity of 5%. 
Figure 29. Failed images in experiment two. 
The results of the Article II show that unsupervised clustering methods, K-means, 
C-means, GMM, hierarchical clustering, and spectral clustering, can be used to 
Acta Wasaensia 73 
recognise colour differences in printed CMYK colours. These methods are feasi-
ble, especially when the colour has an intensity difference of 20% or more. The best 
option for these use cases is GMM, which in the experiment presented achieves a 
99.0% success rate, and spectral clustering, and a K-means also reached a success 
rate greater than 98%. All of the best algorithms also performed well when the 
ink levels were 10%. But when the difference dropped to only 5%, none of the 
algorithms achieved a success rate higher than 90%. With very low colour intensity 
differences (5%), the best algorithm was K-means followed by GMM and Hierar-
chial clustering. Each of the top three algorithms have very different approaches to 
clustering. Where K-means and Hierarchial clustering are hard clustering, GMM 
is soft. GMM and Hierarchial clustering assume elliptical or arbitrary clusters and 
K-means spherical. 
The results demonstrate that the unsupervised clustering methods are very sensitive 
to outliers and noise when they are used for colour data clustering. The noise comes 
from the change of colour; for example, when black changes to white, there are grey 
colours in the data. Noise datapoints weaken the results of the clustering by drifting 
centroids away from their actual locations. Using techniques to filter out noise and 
outliers would be beneficial, however, the challenge lies in the recognition of noise 
because they lie close to actual datapoints. With the best approaches, it is noticeable 
that clustering of yellow and magenta does not work as well as with green and blue. 
This is interesting as red and yellow lie on the lower end of the colour spectrum as 
well as on the positive A and B axis in the LAB colour space. 
5.3 Article III - Convolutional neural networks 
Article III used standard CNN architectures to classify images based on their colour 
difference to paper-white. The same approach as described in Article II was used: 
first a dataset with larger colour intensity difference was used, and then another 
dataset with smaller differences. The results of the training process for the first 
dataset are shown in Table 9, which contains the accuracy and train time/epoch for 
each architecture. 
The results demonstrate that most CNNs outperform unsupervised algorithms. Deeper 
networks have better performance than shallow ones, except AlexNet, which can be 
considered shallow with only eight layers. One thing to note is that the AlexNet 
training speed is the fastest. The results demonstrate that the problem may not re-
quire complex or deep networks, as the data used is not very complex. Regarding 
the best algorithm, DenseNet, which employs a learning approach through residu-
als, all images were classified correctly. Other architectures made mistakes in some 
images. ResNet and VGG-16 sometimes make confusions between different inten-
sities of the same colour (VGG-16 in a total 24 images and ResNet in 32 images). 
74 Acta Wasaensia 
Table 9. Results of the first experiment (difference >= 20%) in Article III. 
Architecture avg. train time / epoch (s.) Accuracy 
DenseNet 690 s. 1.00 
ResNet 626 s. 0.98 
VGG-16 570 s. 0.97 
AlexNet 110 s. 0.93 
EfficientNetB1 359 s. 0.90 
ZFNet 340 s. 0.88 
GoogLeNet 1639 s. 0.87 
AlexNet on the other hand confused some colours with other colours (in 88 images). 
Article III also included another experiment with smaller intensity differences, equal 
to or below 10%. In this experiment, the four best CNN architectures were used. 
In Article III transfer learning was also used to speed up the training process and to 
transfer knowledge from DS1 to the DS2 training process. Transfer learning used 
models of the first experiment to obtain initial weights and biases for DS2 training. 
These results are shown in the following Table 10. 
Table 10. Results of the second experiment, colour difference <= 10%. 
CNN avg train time / epoch (s.) Accuracy 
VGG-16 690 s. 0.34 
ALEXNET 13 s. 0.47 
DENSENET 201 s. 0.77 
RESNET 725 s. 0.95 
With smaller colour differences, it becomes a challenge for supervised learning to 
classify images correctly. When the models are compared, AlexNet and ResNet 
learnt through the process. However, VGG-16 and DenseNet stopped learning at a 
certain epoch. 
According to the confusion matrix, and further investigation of the per-class perfor-
mance of the models, most of the models could classify the items correctly if there 
was no colour presented (class = 0) or when there is at least 10% intensity of some 
colour. However, when there is only 5% intensity, different classes are confused. 
Based on the results presented, the best accuracy was achieved with the ResNet 
architecture, which exhibited the best overall performance if both experiments are 
considered. The ResNet architecture can accurately classify other colours, with 
the exception that low-intensity yellow and magenta are sometimes confused. In 
addition, in some cases ResNet cannot differentiate 5% yellow from paper-white, 
Acta Wasaensia 75 
which is also quite challenging for the human eye. 
All of the selected CNNs were effective in recognising colour differences, espe-
cially those with large intensity differences (>= 20%). When the colour difference 
decreased to only 5% or 10% (see later Figure 34) in some CMYK channels, most 
architectures struggled to classify images correctly. 
The results demonstrate that different CNN architectures can be used for colour-
difference recognition. Deeper CNNs can typically identify more complex features; 
however, they are more difficult to train. Wider networks can capture more fine-
grained features, but they experience difficulties when handling high-level features 
(see Tan and Le (2019)). In this use case, it is interesting that the depth of the 
network and the number of parameters are directly correlated with the accuracy of 
the network when the colour difference is large. 
The best architectures (i.e. ResNet and DenseNet) use an approach that connects 
layers not only to the previous and subsequent layers. However, they either have 
a dense connection (DenseNet) in which the architecture connects each layer to 
other layers separately in a feed-forward manner (Zhu & Qiu, 2021), or as seen 
in ResNet, where the architecture allows data to flow from other layers directly to 
subsequent layers (W. Zhang, Li, & Ding, 2019). These connectivity types appear 
to significantly affect performance when the model attempts to classify items based 
on small differences. As with unsupervised learning, the most challenging colours 
for neural networks were yellow and magenta. 
5.4 Article IV - Optimised ResNet 
Article IV follows the same structure as Article III, and the first different versions of 
ResNet, ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-153 are eval-
uated together. Then, the best of the architecture is selected for modification to 
achieve more accurate results. All architectures were tested with and without gra-
dient centralisation. The results of the first experiment are shown in Table 11. 
The results demonstrate high general accuracy for all architectures. With GC used, 
most architectures achieve higher accuracy, but with very deep ResNet-101 and 
ResNet-153, GC weakens accuracy. This might be an indication that GC accelerates 
the vanishing gradient problem if the network is very deep. Based on the findings 
presented in Article IV, most architectures are prone to overfitting with the target 
dataset. One possible reason for overfitting is small differences in images or a small 
dataset. ResNet-34 with gradient centralisation demonstrated the best accuracy of 
96.9%. 
76 Acta Wasaensia 
Table 11. Results of first experiment (without K-Fold cross-validation) in Article 
IV, colour difference <= 10%. 
Architecture Centralised gradient used avg. train time / epoch (s.) Accuracy 
Resnet-18 Yes 323 s. 0.934 
Resnet-18 No 316 s. 0.952 
Resnet-34 Yes 236 s. 0.969 
Resnet-34 No 233 s. 0.966 
Resnet-50 Yes 785 s. 0.958 
Resnet-50 No 781 s. 0.901 
Resnet-101 Yes 1414 s. 0.941 
Resnet-101 No 1350 s. 0.948 
Resnet-153 Yes 2079 s. 0.682 
Resnet-153 No 1094 s. 0.935 
Article IV also presents the modified ResNet-34 architecture, where modifications 
are based on the findings of past research. The modifications were first tested with-
out K-Fold cross-validation, and later final architecture accuracy was tested with 
K-Fold cross-validation. 
Adding extra dropouts can be used to mitigate overfitting. The first dropout was 
added after the first convolutional layer, and this approach achieved an accuracy of 
98.9%. Then, dropout was added after each residual group, which led to a 94.34% 
accuracy. The results of these modification demonstrate that dropout in the correct 
location in the architecture can improve model performance in the target dataset. 
Another way to make the network more resistant to the degradation of the between-
class distance to the within-class distance ratio is to include batch normalisation. In 
the proposed ResNet-34 batch normalisation was added at the end of the residual 
block. In this approach, the accuracy of the model was reduced by 1.0% from the 
standard ResNet-34 implementation. Article IV also modified the residual block 
of ResNet by adding a max-pooling layer after each convolutional operation. This 
change led to a 95.1% accuracy. The maximum pooling was also tested after both 
convolutional operations were performed, achieving an accuracy of 99.7%. Then, 
as the final change of average pooling was changed to global max pooling, this 
change achieved an accuracy of 99.6% 
Some previously mentioned changes improved the accuracy of ResNet34. However, 
when all or some of them were used together, the accuracy did not improve. This led 
to the following result: that the best options to improve the accuracy of ResNet-34 
are: 
• changing average pooling to maximum pooling before the fully connected 
layer; Architecture A 
Acta Wasaensia 77 
• using maximum pooling at the end of the residual block; Architecture B 
Architectures A and B (Figure 30) showed continuous progress during the training 
phase, and even with an early stopping, training completed 30 epochs. Architecture 
A encountered some challenges, with low-level green (10% CY) and yellow (10% 
Y) images being confused with very low yellow images (5% Y). Architecture B on 
the other-hand confused different green images (5% CY and 10% CY) and low-
intensity yellow to paper white. 
Figure 30. Final architectures A and B. 
The final results of K-Fold cross-validation, using 5 folds and 30 epochs, are shown 
in the following Table 12. 
Table 12. Final architectures, K-Fold cross-validation accuracy. 
Fold Architecture A ResNet-34 Architecture B 
0 97.66 95.31 97.85 
1 99.51 98.93 99.22 
2 96.58 97.66 97.17 
3 96.29 98.73 98.54 
4 100.00 96.58 98.63 
Final 98.00 97.44 98.28 
78 Acta Wasaensia 
6 DISCUSSION 
This research has focused on exploring the possibilities of different methods, colour 
difference algorithms, and non-supervised and supervised learning in subtle colour 
difference recognition. The recognition of colours, or their difference, is a common 
challenge in computer vision research and is commonly used in agriculture, health-
care, civil engineering, and printing domains. Mostly, colour recognition is used to 
differentiate various colours, but as seen in Section 1.1, small differences in colour 
are important in the identification of plant disease, the quality control of print, and 
in medical imaging. 
The use case that inspired this research was the recent development of functional 
inks. Functional inks have been used for a long time to create colour-changing 
effects on products such as cans, mugs, stickers, and so on. As these inks have 
developed and their price has come down, new use cases have been found, including 
consumer-related applications like printed freshness and temperature indicators. In 
addition, industry is using functional inks for the detection of gasses or humidity, 
and common for all these use cases is that functional inks indicate their state through 
a colour change. 
The main research question that this research focused on was which of the presented 
methods is most accurate when subtle colour differences are observed in printed 
sources. The printed sources used have various characteristics that make the recog-
nition of colours a challenge, and include paper quality, the ink and printer used, and 
ambient light conditions that are present. The use case presented in this research 
has focused on the consumer perspective, where it is difficult to control the usage 
environment or the devices used. Although modern mobile devices have high-range 
cameras integrated into them, the imaging quality between devices varies greatly. 
The work done in the past has shown the possibilities of smartphone cameras in 
colour detection, for example, Fan et al. presented a digital image colorimetry on 
smartphone (Fan et al., 2021). As smartphones are not typically used in calibrated 
or controlled environments, solutions developed for them need the ability to adapt to 
various situations regarding camera accuracy, ambient light, print, and paper quality 
(Bagherinia & Manduchi, 2011; Kim, Song, & Kang, 2018). In addition, shadows 
and uneven light conditions play a role in colour recognition, although these are 
not crucial if the surface area of the colour is small. To overcome these challenges, 
many previous studies have used devices that make the environment more control-
lable (e.g. Rateni, Dario, and Cavallo (2017)). In this research, the focus was on 
using devices in everyday life situations without external help. 
The general findings of the research show that different methods are suitable for dif-
ferent purposes. To some extent, each of them is a powerful tool for colour recog-
nition. This means that if the limits of the method are known, it can be deployed in 
Acta Wasaensia 79 
solving suitable use cases. 
6.1 Mathematical methods in colour comparison 
The recognition or matching of colours through mathematical methods works well 
in digital and controlled environments. For example, when the colour interpretation 
of two displays are compared, or in the colour calibration of displays and cam-
eras. Mathematical methods are also an option in use cases where the capturing 
environment can be controlled. But in real-life scenarios, mathematical methods 
have a more limited usability due to a non-controlled environment and the variety 
between captured images. 
The colour difference algorithm is a powerful tool as it has evolved over the years 
and become more and more accurate for various purposes. Furthermore, its chal-
lenges, such as discontinuity and applicability, are well known (Luo et al., 2001). 
CIEDE2000 can be easily integrated into software solutions and does not require 
special libraries or such. 
CIEDE2000 has become a standard in dental colour-matching use cases, where the 
colours of the dental ceramics are observed in very controlled environments. (Yer-
liyurt & Sarıkaya, 2022). CIEDE2000 is also a good choice if the colour difference 
is measured against how humans perceive colours. The smallest difference that 
humans can recognise is typically Delta-E 2.0 or greater, but the acceptable differ-
ence depends on the use case (Farrag, Bakry, & Aly, 2022; ˇ Struncov´ a et al., 2020;
Yılmaz, Tutus, & S¨ onmez, 2022). In Article I Delta-E 2.0 was used to recognise 
the smallest possible difference for humans. 
The most important part of using the CIEDE2000 algorithm is choosing the para-
metric values, KL, KC, and KH. These parameters adjust the weighting of the light-
ness (KL), chroma (KC), and hue (KH) components of the colour difference, re-
spectively, based on specific viewing conditions or application needs. A common 
approach is to have all of these parameters have a value of 1.0, although other val-
ues, such as KL = 2, KC = 1 , KH = 1, and as KL = 2.76, KC = 1.58, KH = 1 have 
been used in the past (e.g. del Mar Perez et al. (2011); He et al. (2022); Isohanni 
(2022); Mangine et al. (2005); Pereira et al. (2019)). Article I used previously men-
tioned parametric values in its experiments. The results show that in the case of 
the recognition of colour difference, using more weight on lightness and chroma 
provides better results. This seems reasonable as the actual colour is not important, 
but rather the change in colour. When these parametric values are used, the change 
from white to any colour other than yellow can be recognised more accurately. Us-
ing larger KL and KC values has also been considered as a solution by del Mar Perez 
et al. (2011) and Pecho, Ghinea, Alessandretti, P´ erez, and Della Bona (2016). In 
80 Acta Wasaensia 
another study, Pe´rez et al. (2022) also showed that human vision is more sensitive 
to changes in lightness and chroma than hue. 
When varying conditions are considered, the smallest colour difference that can be 
reliably recognised with the CIEDE2000 algorithm is 40% as the change in inten-
sity of a CMYK channel(s) (Figure 31). With this change between colours, the 
CIEDE2000 difference is constantly over the threshold. If conditions are more con-
trolled in the sense of printing, ambient light, and devices used, then smaller differ-
ences (20%) can be recognised, or if the use case does not require 100% accuracy 
at all times. 
Figure 31. 40% colour intensity change. 
If the change is observed with the CIEDE2000 algorithm, the solution should be 
transferable to varying conditions, and devices, if the difference between colours 
is high enough. This is because the absolute difference between colours is mea-
sured. However, the result depends on how well the colour information can be 
captured from the source, if the colour information is noise-free and how well the 
pre-processing works. 
The CIEDE2000 algorithm can quantify how ”far apart” two colours are in percep-
tual terms. However, colour recognition requires identifying or naming a colour 
(e.g., ”red,” ”blue”), if CIEDE2000 is used for such purposes, reference colour(s) 
would be needed. 
6.2 Clustering colours with unsupervised learning 
Unsupervised clustering is a good tool when large datasets need to be processed. 
With unsupervised clustering, it is possible to get insight into the data when labels 
or relationships between datapoints are unknown. In computer vision/image pro-
cessing, unsupervised clustering has use cases like data compression, image classi-
Acta Wasaensia 81 
fication, and segmentation. As with mathematical methods, unsupervised clustering 
is quite easy to implement as part of software solutions, and this makes it feasible 
for many use cases. 
Figure 32. Example of the challenge of clustering colours. 
When unsupervised clustering is used for colour difference recognition, the chal-
lenges can be seen in Figure 32, where the datapoints are scattered around the plot. 
Some dense regions (red circle = black, green circle = white, and yellow circle = 
CMYK(0.0, 0.6, 0.0, 0.0)) are highlighted in the figure. The black and white colours 
lie on different ends of the L-axis, and the yellow colour is fairly in the middle of 
the L-axis, with a negative A value and a positive B value. As the differences be-
tween these clusters become smaller, the algorithm cannot be sure which cluster the 
datapoint belongs to. From the figure, it is also possible to observe the existence of 
outliers: in the illustrated case outliers are between black and white colour, but also 
between white-yellow and black-yellow. 
The research done in Article II shows that multiple different clustering approaches 
outperform mathematical methods in subtle colour-difference recognition. The al-
gorithms K-means, C-means, GMM, Hierarchical clustering, and Spectral cluster-
ing all have the ability to accurately recognise colours when the difference in the 
CMYK intensity of the colour is 20% or greater. 
In Article II, unsupervised clustering was used to identify if the colour differs from 
paper-white and its actual colour. And, for example, GMM was able to correctly 
identify the difference with 99.0 accuracy, when the intensity of the colour was 
equal to or greater than 20%. When the difference between colours dropped to 10%, 
the popular K-means algorithm and Spectral Clustering, with affinity = rbf, showed 
82 Acta Wasaensia 
the best performance. The smallest difference, only 5%, was too challenging for all 
of the algorithms used. The Delta-E difference between the 10% CMYK intensity 
ranges from 8.8 to 17.0 depending on the colour, which is still a very clearly visible 
difference for human perception. The process where colours are first clustered with 
unsupervised learning methods and then the difference between cluster centres is 
calculated with the CIEDE2000 algorithm seems to work quite well. The K-means 
has been used by other researchers in the past to classify colour or segment images 
(e.g. Abdulateef, Ahmed, and Salman (2020); Saifullah (2020); Trivedi, Shukla, 
and Pandey (2022); T. Wu, Gu, Shao, Zhou, and Li (2021)). Conservatively, where 
the accuracy of the methods (K-means, C-means, GMM, Hierarchical and Spectral 
clustering) is close to 100%, it can be considered that unsupervised clustering can 
be used if the difference in colour intensity is 20% or more (Figure 33). 
Figure 33. 20% colour intensity change. 
As the best of the methods, K-means is very dependent on the initialisation of the 
cluster centres. The standard K-means algorithm randomly selects initial centroids, 
which can sometimes result in poor clustering results or slow convergence. The K-
means++ initialisation method addresses this by carefully choosing initial centroids 
by spreading them out as much as possible. K-means++ has outperformed standard 
K-means in colour-related applications in research (e.g. Biswas, Umbaugh, Marino, 
and Sackman (2021)). One option would be the manual initialisation of cluster 
centres, as in the use case where black and white colour values are known. This 
could lead to better clustering results, as shown by Basar et al. (2020). As with 
the mathematical methods, the more advanced pre-processing might also help if it 
can eliminate noise and outliers. Using clustering algorithms that are resistant to 
outliers could also be experimented with, such as the recent innovation of Z. Wang 
(2020) regarding slope difference distribution (SDD) or Multi-View Clustering with 
Outlier Removal (MVCOR) by Chen, Wang, Hu, and Zheng (2020). 
With unsupervised learning, the possibilities of transferring results into different 
contexts, such as different devices, printers, etc., should be feasible. As the unsu-
pervised clustering only groups colours and the final measurement is done with the 
Acta Wasaensia 83 
CIEDE2000 algorithm, the results of the recognition is mostly dependent on how 
accurate clustering is in the used context. If clustering has reference or known val-
ues for black and white colour, then a calculation of difference to the third cluster 
will provide an absolute colour difference value. As with mathematical methods, 
the noise and paper used might again weaken the transfer of the results. 
6.3 Neural network based colour classification 
Unsupervised learning methods have already demonstrated the ability to recognise 
subtle colour differences with reasonable accuracy (Isohanni, 2024b). However, 
due to the inherent limitations in capturing non-linear relationships and contextual 
cues, more accurate solutions have been sought through supervised learning meth-
ods (Isohanni, 2025). In this work, the supervised approach involved classifying 
images into predefined categories, where each image represents a colour deviation 
from the paper-white average and is annotated accordingly. 
Some supervised models were able to classify colours and recognise subtle differ-
ences more effectively than their unsupervised counterparts. As shown in Figure 
34, the subtle colour changes become barely perceptible when the CMYK inten-
sity varies by only 5%. As colour saturation decreases, the texture and structure 
of the paper surface become more pronounced, introducing additional noise into 
the image. It has been shown, for example by De and Pedersen, that the loss of 
colour information — even partial — makes the classification task more complex 
for CNNs (De & Pedersen, 2021). 
Figure 34. Examples of small colour differences. 
The increase in image noise highlights the importance of appropriate preprocessing. 
Improper preprocessing techniques can inadvertently degrade colour information or 
amplify noise artifacts (Maharana, Mondal, & Nemade, 2022). The noise artifacts 
become more significant when colour differences become smaller. 
Among the tested models, only ResNet achieved classification accuracy exceeding 
90% when recognising very small colour differences. The ResNet architecture has 
also been explored by De and Petersen, and by optimising its hyperparameters they 
84 Acta Wasaensia 
were able to make ResNet more robust against loss of colour information (De & 
Pedersen, 2021). The ResNet architectures have recently shown their robustness in 
recognising small differences, like example in Chen and Luo (2023); Gao, Yi, Liu, 
and Tan (2025); S. Wang, Wang, Yang, Li, and Fan (2022) 
The core innovation of the ResNet architecture lies in its use of residual connections, 
which mitigate the vanishing gradient problem, a common issue in deeper convolu-
tional neural networks. These skip connections allow gradient flow through identity 
mappings, which seems particularly advantageous in tasks where small colour dif-
ferences must be preserved and propagated through the network. In the Article III, 
ResNet-34 achieved the best baseline performance among standard CNN architec-
tures for subtle colour recognition. 
The further tuning of the ResNet-34 architecture in Article IV proved that a custom 
architecture results in better accuracy. The customised version, where Gradient 
Centralization (GC) and max-pooling were used, resulted in the best accuracy even 
with very small colour differences (5%). This is a pretty good result, as the Delta-E 
difference between these colours can be as low as 4.4. In this research, ResNet-34 
was used, but as the architectures evolve, other solutions might also be feasible. The 
Gradient Centralization has been previously used successfully in medical imagining 
like L. Zhang, Xia, Yang, Zhang, and Wang (2024) and Khatri and Kwon (2024). 
Max-pooling on the other hand can be used to identify small differences as seen in 
Zheng et al. (2021) and Ashtiani et al. (2023). 
The Articles III and IV introduced a difference-image-based approach, where the in-
put emphasizes the chromatic deviation from a reference white point. This approach 
not only improves class separability but also enhances the method’s adaptability to 
other use cases with uniform backgrounds and subtle chromatic shifts. This solution 
remains sensitive to variations in paper structure and imaging conditions, which are 
features that CNNs may learn unless explicitly regularized. 
The application of convolutional neural networks in complex colour classification 
tasks has recently demonstrated promising results. For instance, Wei et al. investi-
gated the performance of CNN architectures VGG-16 and ResNet-34 in the classifi-
cation of tea leaves, where colour was one of the distinguishing features despite the 
subtle differences in hue between samples (Wei et al., 2022). Both models achieved 
classification accuracies exceeding 90%, with VGG-16 outperforming ResNet-34 
in overall performance. However, it is important to note that their classification 
approach relied not only on colour but also on additional morphological and textu-
ral features. Their findings also highlighted that incorporating all image channels 
contributed to improved model accuracy, underlining the value of comprehensive 
image data in CNN-based classification. 
Similarly, Katkus, Maciulevicˇius, and Lipnickas (2023) explored the use of a CNN 
model for classifying amber gemstones based on subtle colour variations. Their cus-
Acta Wasaensia 85 
tom CNN approach achieved high accuracy, demonstrating that CNNs are capable 
of distinguishing fine colour differences when appropriately trained and configured. 
The work of Katkus et al. aligns with the findings presented in this study, reinforc-
ing the conclusion that CNNs are a powerful tool for colour classification. How-
ever, achieving high performance in such tasks typically requires tailored network 
architectures, careful preprocessing, and data augmentation strategies to mitigate 
the effects of noise and enhance generalisation. 
One interesting option could be to use Multilayer Perceptrons (MLPs) instead of 
Convolutional Neural Networks. MLPs, by design, require the input image to be 
flattened into a one-dimensional vector, which removes all spatial relationships be-
tween pixels (Guo et al., 2022). This can be a limitation in image-based tasks where 
spatial patterns — such as edges, textures, or shapes — carry meaningful informa-
tion, especially when the images are not consistently aligned in terms of position, 
rotation, or scale within the training set, or between the training and test sets. If 
the images contain more than just colour values, this loss of spatial context may 
degrade the performance of MLPs. In the presented use case, however, MLPs can 
still serve as a useful baseline for comparison with CNNs, and can help evaluate the 
significance of spatial information in the context of colour classification. 
In a lightweight experiment, six different MLPs were trained on both datasets using 
ReLU activation, the Adam optimiser, and 50 training iterations. As high-resolution 
images lead to high-dimensional input vectors when flattened for MLPs, which sig-
nificantly increases the number of model parameters. This makes the network more 
prone to overfitting (Liu, Starzyk, & Zhu, 2008). So in this experiment DS1 and 
DS2 were scaled to 64x64. 
Table 13. Results of the different MLPs. 
MLP (layers) Accuracy DS1 Accuracy DS2 
(512, 256, 128, 64) 0.87 0.88 
(512, 256, 128, 64, 32) 0.86 0.87 
(256, 128, 64) 0.76 0.87 
(256, 128, 64, 32) 0.90 0.88 
(128, 64) 0.72 0.85 
(128, 64, 32) 0.80 0.86 
The results (Table 13) of this comparison show that MLP networks almost reach 
the same level as the best CNNs (DenseNet, ResNet and VGG-16). MLPs com-
putational cost is lower, so training can be done faster and also they are easier to 
implement in programming environment due their simplistic architecture. When 
larger colour differences are classified, MLPs can offer a good starting point. 
86 Acta Wasaensia 
6.4 Comparison of the methods 
The approaches presented in Articles I-IV use different methods for colour recog-
nition. Each of these methods has its use case and is suitable for colour difference 
recognition. 
The challenge of the use case presented in this dissertation comes from the small 
data area, a subtle difference in colours, and the noise that is generated from the 
printing process, used paper, and imaging pipeline. Also, since the use case was 
aimed at consumer use, the environment of use cannot be controlled. 
Based on the findings of the research, two main paths for integrating the results of 
the research can be identified a) unsupervised or b) supervised path. If more simple 
and easier to implement solutions are sought, then unsupervised clustering can be 
used to cluster colours into a specified number of clusters, and then the difference 
between colours can be calculated with the CIEDE2000 algorithm. With this ap-
proach, small differences (CMYK difference <= 20%) can be reliably recognised. 
The results of using unsupervised learning will be better if references of white and 
black colour are available for use during clustering, and this solution can also be 
run directly on mobile devices without the need for data collection. 
With supervised learning, it is possible to recognise very subtle colour differences 
(CMYK difference <= 10%), but the solution will become more use case spe-
cific. The same model might not work if different papers and printers are used, and 
supervised learning will also require a lot of data collection, especially if more gen-
eralised solutions are sought. This could mean that a process to continuously train 
the model would need to be developed. 
When printed sources are used, and the colour difference between paper-white and 
colour is observed with machine learning, unsupervised methods are useful if the 
difference between one or more CMYK channels is equal or greater than 20%. Su-
pervised learning methods, CNNs and MLPs, can go lower and accurately recognise 
a 10% difference, or even 5% if the results do not need to have high accuracy all the 
time. There are differences between individual colours, and while the most chal-
lenging colour to be identified is yellow for all of the methods, other colours work 
quite equally in the presented use case. 
In the tests conducted for the dissertation, both unsupervised and supervised meth-
ods were able to handle the recognition of colours that lie on the negative side of the 
A or B-axis on the LAB colour space, better than those on the positive side. This 
might mean that the used imaging pipelines give more weight to blue and green 
shades than to red and yellow shades. 
Acta Wasaensia 87 
6.5 Limitations and future research 
Although consumer camera devices have developed over the years, the development 
has focused mainly on common photography, including taking selfies, vacation pho-
tos, etc. Some devices have special lenses for close-range photography, but they are 
not common. The presented research uses close-range photography to capture the 
colour data, and this approach is prone to noise that is produced in different phases 
of the imagining pipeline and which comes from different sources, like paper struc-
ture that appears easily. All of this has implications on the generalisation of the 
solution. Deviations related to camera devices can be overcome with image pro-
cessing solutions that maintain colour difference information, like smart blurring 
or similar. But using different paper or printers can lead to situations where any 
generalisation of the solution is challenging. This dissertation has focused on us-
ing laser-jet printed sources with standard office paper. Different printers such as 
Flexo, gravure and inkjet result in different print quality. In addition, the paper that 
is used has an impact on the colour appearance. These limitations restrict the use of 
the proposed approach to specific datasets, where the image comes from the same 
printer with the same paper. However, these challenges can be overcome with larger 
datasets when supervised learning is used. 
Both training and testing were conducted using images captured on the same hard-
ware platform. This ensures consistency in image acquisition, it also limits the as-
sessment of the model’s generalisation across different devices. The used imaging 
scenes are relatively small and the colour areas are consistent, so variations between 
camera devices may have a limited impact on the results. Still, future work could 
explicitly evaluate the robustness of the approach by testing it on data captured with 
different devices exhibiting varying imaging characteristics. 
More research is needed, especially in the preprocessing phase of the images. More 
advanced methods could adapt the images so that the differences between differ-
ent environments, papers, and printers would have a less significant impact on the 
colour recognition, and thus make the solution more generic for various use cases. 
Also, this research only focused on small colour areas, so shadows and uneven light 
might have influence on how proposed approach works with larger areas. 
Since the proposed solution can be deployed as a backend service, with results 
transmitted to mobile terminals or user-facing applications, inference latency is not 
critical at this stage. Algorithms like CIEDE2000 can be run in real-time in mo-
bile terminals, as well as some unsupervised and supervised methods. But future 
research may explore how to optimise the presented models for reduced inference 
time, particularly if deployment in low-power or real-time environments. 
The development of machine learning is so rapid at the moment that various archi-
tectures and approaches might be more usable to solve the problem in the future. 
88 Acta Wasaensia 
7 CONCLUSIONS 
In this research, colour classification and recognition were studied by using various 
methods. The objective was to determine how colour, particularly subtle colour dif-
ferences, could be classified by using colour difference algorithms, unsupervised 
and supervised methods. This objective was closely related to functional inks, 
which can serve as indicators if their colour or colour difference can be accurately 
identified. 
The dissertation began with colour difference algorithms and demonstrated that the 
colour difference algorithm could identify colour differences in printed sources. But 
this was only if the colour difference is equal or greater than 20% in some or many 
CMYK channels, considering 0% as no printed colour and 100% as full intensity of 
colour. Although colour difference algorithms cannot be used for precise real-life 
use cases, they can indicate if the colour difference exceeds a certain threshold. 
Unsupervised methods can classify colours when the colour difference is equal or 
less than 20% on some CMYK channel but greater than 10%. One of the best un-
supervised algorithms is K-means with K-means++ initialisation, which is a hard 
clustering method that can achieve good accuracy even with a colour intensity dif-
ference as low as 10%, if the accuracy of 100% is not considered. Other unsuper-
vised methods such as C-mean, GMM, and hierarchical clustering can classify the 
colour when the colour difference is over 10%. 
In supervised learning, CNN architectures were shown to be more accurate than 
unsupervised algorithms. Architectures like AlexNet, DenseNet, and ResNet were 
able to correctly classify the colour, with ResNet standing out as the most accurate. 
ResNet addresses the problem of vanishing gradients by introducing ”skip connec-
tions” or ”residual connections,” allowing the network to bypass one or more layers 
by adding the input of a layer directly to the output of a subsequent layer, form-
ing a residual block. These residual connections enable the model to retain critical 
features even as the network deepens. 
The research modified the best CNN candidate, ResNet, and its ResNet-34 architec-
ture to achieve more accurate classification results. When ResNet-34 was used with 
Gradient Centralisation and the last average pooling layer was changed to the max-
pooling layer, the proposed architecture reached 98.28% accuracy with the very 
subtle colour differences dataset. This result was validated using K-Fold cross-
validation, making it the best solution among all the experiments. 
This dissertation shows that the classification of colour and the recognition of subtle 
colour differences is possible with colour difference algorithms, and unsupervised 
and supervised learning. Choosing optimal methods is use-case dependent. If the 
colour differences are small, CNNs are the best option. Unsupervised learning can 
Acta Wasaensia 89 
be quite easily taken into use without a training process and can recognise differ-
ences when they are at least 20% in CMYK intensity. If the difference is larger, 
colour difference algorithms can be used even under varying conditions. 
90 Acta Wasaensia 
BIBLIOGRAPHY 
Abdalla, A., Cen, H., El-manawy, A., & He, Y. (2019). Infield oilseed rape images 
segmentation via improved unsupervised learning models combined with supreme 
color features. Computers and Electronics in Agriculture, 162, 1057–1068. 
Abdel-Hamid, O., Mohamed, A.-r., Jiang, H., Deng, L., Penn, G., & Yu, D. (2014). 
Convolutional neural networks for speech recognition. IEEE/ACM Transactions on 
Audio, Speech, and Language Processing, 22(10), 1533-1545. 
Abdulateef, S. K., Ahmed, S. R. A., & Salman, M. D. (2020). A novel food image 
segmentation based on homogeneity test of K-means clustering. In IOP Conference 
Series: Materials Science and Engineering (Vol. 928, p. 032059). 
Adiwijaya, N. O., Romadhon, H. I., Putra, J. A., & Kuswanto, D. P. (2022). The 
quality of coffee bean classification system based on color by using k-nearest neigh-
bor method. In Journal of Physics: Conference Series (Vol. 2157, p. 012034). 
Ali, G. N., Mikkilineni, A. K., Delp, E. J., Allebach, J. P., Chiang, P.-J., & Chiu, 
G. T. (2004). Application of principal components analysis and gaussian mixture 
models to printer identification. In NIP & Digital Fabrication Conference (Vol. 20, 
pp. 301–305). 
Al-Shakarji, N. M., Kassim, Y. M., & Palaniappan, K. (2017). Unsupervised 
learning method for plant and leaf segmentation. In 2017 IEEE applied imagery 
pattern recognition workshop (AIPR) (pp. 1–4). 
Amakdouf, H., Zouhri, A., El Mallahi, M., Tahiri, A., Chenouni, D., & Qjidaa, 
H. (2021). Artificial intelligent classification of biomedical color image using 
quaternion discrete radial tchebichef moments. Multimedia Tools and Applications, 
80, 3173–3192. 
Anandhakrishnan, T., & Jaisakthi, S. (2022). Deep convolutional neural networks 
for image based tomato leaf disease detection. Sustainable Chemistry and Phar-
macy, 30, 100793. 
Anderson, M., Motta, R., Chandrasekar, S., & Stokes, M. (1996). Proposal for a 
standard default color space for the internet—srgb. In Color and imaging confer-
ence (Vol. 4, pp. 238–245). 
Angelico, R., Liccardo, D., Paoletti, M., Pietrobattista, A., Basso, M. S., Mosca, A., 
. . . others (2021). A novel mobile phone application for infant stool color recogni-
Acta Wasaensia 91 
tion: An easy and effective tool to identify acholic stools in newborns. Journal of 
Medical Screening, 28(3), 230–237. 
Anilkumar, KK and Manoj, VJ and Sagi, TM. (2018). Colour based image segmen-
tation for automated detection of leukaemia: a comparison between CIELAB and 
CMYK colour spaces. In 2018 International Conference on Circuits and Systems 
in Digital Enterprise Technology (ICCSDET) (pp. 1–6). 
Apriyanti, D. H., Spreeuwers, L. J., Lucas, P. J., & Veldhuis, R. N. (2021). Au-
tomated color detection in orchids using color labels and deep learning. PloS one, 
16(10), e0259036. 
Arora, J., Khatter, K., & Tushir, M. (2019). Fuzzy c-means clustering strategies: 
A review of distance measures. Software Engineering: Proceedings of CSI 2015, 
153–162. 
Arsenovic, M., Karanovic, M., Sladojevic, S., Anderla, A., & Stefanovic, D. (2019). 
Solving current limitations of deep learning based approaches for plant disease de-
tection. Symmetry, 11(7), 939. 
Arunachalam, D., & Kumar, N. (2018). Benefit-based consumer segmentation 
and performance evaluation of clustering approaches: An evidence of data-driven 
decision-making. Expert Systems with Applications, 111, 11–34. 
Ashtiani, F., On, M. B., Sanchez-Jacome, D., Perez-Lopez, D., Yoo, S. B., & 
Blanco-Redondo, A. (2023). Photonic max-pooling for deep neural networks using 
a programmable photonic platform. In Optical fiber communication conference (pp. 
M1J–6). 
Atha, D. J., & Jahanshahi, M. R. (2018). Evaluation of deep learning approaches 
based on convolutional neural networks for corrosion detection. Structural Health 
Monitoring, 17(5), 1110-1128. 
Bagherinia, H., & Manduchi, R. (2011). A theory of color barcodes. In 2011 
IEEE International Conference on Computer Vision Workshops (ICCV Workshops) 
(p. 806-813). 
Balaji, V., Suganthi, S., Rajadevi, R., Kumar, V. K., Balaji, B. S., & Pandiyan, S. 
(2020). Skin disease detection and segmentation using dynamic graph cut algorithm 
and classification through naive bayes classifier. Measurement, 163, 107922. 
Barbe, D. F. (1975). Imaging devices using the charge-coupled concept. Proceed-
ings of the IEEE, 63(1), 38–67. 
92 Acta Wasaensia 
Barton, S., Alakkari, S., O’Dwyer, K., Ward, T., & Hennelly, B. (2021). Convolu-
tion network with custom loss function for the denoising of low snr raman spectra. 
Sensors, 21(14), 4623. 
Basar, S., Ali, M., Ochoa-Ruiz, G., Zareei, M., Waheed, A., & Adnan, A. (2020). 
Unsupervised color image segmentation: A case of rgb histogram based k-means 
clustering initialization. Plos one, 15(10), e0240015. 
Bayer, B. E. (U.S. Patent 3971065A, Jul. 1976). Color imaging array. 
Beaugnon, A., & Chifflier, P. (2018). Machine learning for computer security detec-
tion systems: practical feedback and solutions. Proceedings of the 2018 Intelligence 
Artificielle et Cybers´ ecurite´/Artificial Intelligence and Cybersecurity (C&ESAR), 
Rennes, France, 19–21. 
Bejani, M. M., & Ghatee, M. (2021). A systematic review on overfitting control 
in shallow and deep neural networks. Artificial Intelligence Review, 54(8), 6391– 
6438. 
Belasco, R., Edwards, T., Munoz, A., Rayo, V., & Buono, M. J. (2020). The 
effect of hydration on urine color objectively evaluated in CIE L* a* b* color space. 
Frontiers in Nutrition, 7, 576974. 
Berns, R. S. (2019). Billmeyer and saltzman’s principles of color technology. John 
Wiley & Sons. 
Bigas, M., Cabruja, E., Forest, J., & Salvi, J. (2006). Review of cmos image sensors. 
Microelectronics journal, 37(5), 433–451. 
Bilgin, M., & Backhaus, J. (2017). Intelligent codes by controlled responsiveness to 
external stimuli. In Printing future days 2017 7th international scientific conference 
on print and media technology (pp. 85–90). 
Bilius, L. B., & Pentiuc, S. G. (2020). Unsupervised clustering for hyperspectral 
images. Symmetry, 12(2), 277. 
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford university 
press. 
Bishop, C. M., & Nasrabadi, N. M. (2006). Pattern recognition and machine 
learning (Vol. 4) (No. 4). Springer. 
Biswas, H., Umbaugh, S. E., Marino, D., & Sackman, J. (2021). Comparison of 
K-means and K-means++ for image compression with thermographic images. In 
Acta Wasaensia 93 
Thermosense: Thermal Infrared Applications XLIII (Vol. 11743, pp. 209–214). 
Bloss, R. (2009). Making better “eyes” for cameras, mobile (cell) phones and cars. 
Assembly Automation, 29(1), 14–18. 
Botalb, A., Moinuddin, M., Al-Saggaf, U., & Ali, S. S. (2018). Contrasting convo-
lutional neural network (cnn) with multi-layer perceptron (mlp) for big data anal-
ysis. In 2018 international conference on intelligent and advanced system (icias) 
(pp. 1–5). 
Boulent, J., Foucher, S., Th´ eau, J., & St-Charles, P.-L. (2019). Convolutional 
neural networks for the automatic identification of plant diseases. Frontiers in plant 
science, 10, 941. 
Brophy, E., Hennelly, B., De Vos, M., Boylan, G., & Ward, T. (2022). Improved 
electrode motion artefact denoising in ECG using convolutional neural networks 
and a custom loss function. IEEE Access, 10, 54891–54898. 
B¨ uy¨ ukarıkan, B., & ¨ Ulker, E. (2022). Using convolutional neural network models 
illumination estimation according to light colors. Optik, 271, 170058. 
Cabello, J., Bailey, A., Kitchen, I., Prydderch, M., Clark, A., Turchetta, R., & Wells, 
K. (2007). Digital autoradiography using room temperature ccd and cmos imaging 
technology. Physics in medicine & biology, 52(16), 4993. 
Cali´ nski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Com-
munications in Statistics-theory and Methods, 3(1), 1–27. 
Caruana, R. (1997). Multitask learning. Machine learning, 28, 41–75. 
Cauchy, A., et al. (1847). Me´thode g´ en´ erale pour la re´solution des systemes 
d’´ equations simultan´ ees. Comp. Rend. Sci. Paris, 25(1847), 536–538. 
Chen, C., & Luo, D. (2023). Enhanced resnet network for food image security 
recognition. In Third international conference on optics and image processing 
(icoip 2023) (Vol. 12747, pp. 490–493). 
Chen, C., Wang, Y., Hu, W., & Zheng, Z. (2020). Robust multi-view k-means 
clustering with outlier removal. Knowledge-Based Systems, 210, 106518. 
Cho, J. D., Jeong, J., Kim, J. H., & Lee, H. (2020). Sound coding color to improve 
artwork appreciation by people with visual impairments. Electronics, 9(11), 1981. 
Craven, P., & Wahba, G. (1978). Smoothing noisy data with spline functions: 
94 Acta Wasaensia 
estimating the correct degree of smoothing by the method of generalized cross-
validation. Numerische mathematik, 31(4), 377–403. 
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE 
transactions on pattern analysis and machine intelligence(2), 224–227. 
De, K., & Pedersen, M. (2021). Impact of colour on robustness of deep neural 
networks. In Proceedings of the IEEE/CVF international conference on computer 
vision (pp. 21–30). 
de Brito Silva, G. V., & Flores, F. C. (2021). Rot corn grain classication by color 
and texture analysis. IEEE Latin America Transactions, 20(2), 208–214. 
Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for machine 
learning. Cambridge University Press. 
del Mar Perez, M., Ghinea, R., Herrera, L. J., Ionescu, A. M., Pomares, H., Pulgar, 
R., & Paravina, R. D. (2011). Dental ceramics: a ciede2000 acceptability thresholds 
for lightness, chroma and hue differences. Journal of Dentistry, 39, e37–e44. 
Di Gennaro, S. F., Toscano, P., Cinat, P., Berton, A., & Matese, A. (2019). A 
low-cost and unsupervised image recognition methodology for yield estimation in 
a vineyard. Frontiers in plant science, 10, 559. 
Doolittle, M. H., Doolittle, K. W., Winkelman, Z., & Weinberg, D. S. (1997, Jan-
uary). Color images in telepathology: how many colors do we need? Human 
Pathology, 28(1), 36–41. 
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online 
learning and stochastic optimization. Journal of machine learning research, 12(7). 
Dunn, J. C. (1973). A fuzzy relative of the isodata process and its use in detecting 
compact well-separated clusters. Taylor & Francis. 
Emmert-Streib, F., & Dehmer, M. (2019). Evaluation of regression models: Model 
assessment, model selection and generalization error. Machine learning and knowl-
edge extraction, 1(1), 521–551. 
Engilberge, M., Collins, E., & S¨ usstrunk, S. (2017). Color representation in deep 
neural networks. In 2017 IEEE International Conference on Image Processing 
(ICIP) (pp. 2786–2790). 
Ezugwu, A. E., Ikotun, A. M., Oyelade, O. O., Abualigah, L., Agushaka, J. O., 
Eke, C. I., & Akinyelu, A. A. (2022). A comprehensive survey of clustering algo-
Acta Wasaensia 95 
rithms: State-of-the-art machine learning applications, taxonomy, challenges, and 
future research prospects. Engineering Applications of Artificial Intelligence, 110, 
104743. 
Fan, Y., Li, J., Guo, Y., Xie, L., & Zhang, G. (2021). Digital image colorimetry on 
smartphone for chemical analysis: A review. Measurement, 171, 108829. 
Farrag, K. M., Bakry, S. I., & Aly, Y. M. (2022). Effect of yellow anodization of 
titanium on the shade of lithium disilicate ceramic with different thicknesses. The 
Journal of Prosthetic Dentistry, 128(4), 793–e1. 
Fiesler, E., & Beale, R. (2020). Handbook of neural computation. CRC Press. 
Ford, A., & Roberts, A. (1998). Colour space conversions (Tech. Rep.). Westmin-
ster University, London. 
Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical 
clusterings. Journal of the American statistical association, 78(383), 553–569. 
Fu, C., & Yang, J. (2021). Granular classification for imbalanced datasets: a 
minkowski distance-based method. Algorithms, 14(2), 54. 
Fuentes-Pen˜ailillo, F., Ortega-Farias, S., Rivera, M., Bardeen, M., & Moreno, M. 
(2018). Using clustering algorithms to segment UAV-based RGB images. In 2018 
IEEE international conference on automation/XXIII congress of the Chilean asso-
ciation of automatic control (ICA-ACCA) (pp. 1–5). 
Gao, X., Yi, J., Liu, L., & Tan, L. (2025). A generic image steganography recog-
nition scheme with big data matching and an improved ResNet50 deep learning 
network. Electronics, 14(8), 1610. 
Genc¸tav, A., Aksoy, S., & ¨ Onder, S. (2012). Unsupervised segmentation and clas-
sification of cervical cell images. Pattern recognition, 45(12), 4151–4168. 
Gere, A. (2023). Recommendations for validating hierarchical clustering in con-
sumer sensory projects. Current Research in Food Science, 6, 100522. 
Ghinea, R., Herrera, L., Ionescu, A., Pomares, H., Pulgar, R., Paravina, R., et al. 
(2011). Dental ceramics: a ciede2000 acceptability thresholds for lightness, chroma 
and hue differences. Journal of Dentistry, 39, e37–44. 
Ghinea, R., Pe´rez, M. M., Herrera, L. J., Rivas, M. J., Yebra, A., & Paravina, R. D. 
(2010). Color difference thresholds in dental ceramics. Journal of dentistry, 38, 
e57–e64. 
96 Acta Wasaensia 
Gligoric, N., Krco, S., Hakola, L., Vehmas, K., De, S., Moessner, K., . . . Van Kra-
nenburg, R. (2019). Smarttags: Iot product passport for circular economy based on 
printed sensors and unique item-level identifiers. Sensors, 19(3), 586. 
Golan, E. H., Krissoff, B., Kuchler, F., Calvin, L., Nelson, K. E., & Price, G. K. 
(2004). Traceability in the us food supply: economic theory and industry studies. 
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press. 
Govender, P., & Sivakumar, V. (2020). Application of k-means and hierarchical 
clustering techniques for analysis of air pollution: A review (1980–2019). Atmo-
spheric pollution research, 11(1), 40–56. 
Graphic technology and photography — colour characterisation of digital still cam-
eras (dscs) (Vol. 2; Standard). (2012, November). Geneva, CH: International Orga-
nization for Standardization. 
Graphic technology — colour and transparency of printing ink sets for four-colour 
printing (Vol. 3; Standard). (2017, August). Geneva, CH: International Organiza-
tion for Standardization. 
Graphic technology — process control for the production of half-tone colour sepa-
rations, proof and production prints (Vol. 3; Standard). (2013, December). Geneva, 
CH: International Organization for Standardization. 
Green, P., & MacDonald, L. (2011). Colour engineering: achieving device inde-
pendent colour. John Wiley & Sons. 
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., . . . others (2018). 
Recent advances in convolutional neural networks. Pattern recognition, 77, 354– 
377. 
Gu, Q., Zhu, L., & Cai, Z. (2009). Evaluation measures of the classification per-
formance of imbalanced data sets. In Computational Intelligence and Intelligent 
Systems: 4th International Symposium, ISICA 2009, Huangshi, China, October 23-
25, 2009. Proceedings 4 (pp. 461–471). 
Gunturk, B. K., Glotzbach, J., Altunbasak, Y., Schafer, R. W., & Mersereau, R. M. 
(2005). Demosaicking: color filter array interpolation. IEEE Signal processing 
magazine, 22(1), 44–54. 
Guo, J., Tang, Y., Han, K., Chen, X., Wu, H., Xu, C., . . . Wang, Y. (2022). Hire-
mlp: Vision mlp via hierarchical rearrangement. In Proceedings of the ieee/cvf 
conference on computer vision and pattern recognition (pp. 826–836). 
Acta Wasaensia 97 
Hamming, R. W. (1950). Error detecting and error correcting codes. The Bell 
system technical journal, 29(2), 147–160. 
Han, J., Kamber, M., & Pei, J. (2012a). 10 - cluster analysis: Basic con-
cepts and methods. In J. Han, M. Kamber, & J. Pei (Eds.), Data mining 
(third edition) (Third Edition ed., p. 443-495). Boston: Morgan Kaufmann. 
https://doi.org/https://doi.org/10.1016/B978-0-12-381479-1.00010-1 
Han, J., Kamber, M., & Pei, J. (2012b). 2 - getting to know your data. In J. Han, 
M. Kamber, & J. Pei (Eds.), Data Mining (Third Edition) (Third Edition ed., p. 39-
82). Boston: Morgan Kaufmann. https://doi.org/https://doi.org/10.1016/B978-0-
12-381479-1.00002-2 
Hartigan, J. A., & Wong, M. A. (1979). Algorithm as 136: A k-means clustering 
algorithm. Journal of the royal statistical society. series c (applied statistics), 28(1), 
100–108. 
Harvey, J. (2007, 11). Mechanical engineers’ handbook: Materials and mechanical 
design, volume 1, third edition. In (p. 1423 - 1436). John Wiley & Sons. 
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements 
of statistical learning: data mining, inference, and prediction (Vol. 2). Springer. 
Haykin, S. (2009). Neural networks and learning machines, 3/e. Pearson Education 
India. 
He, R., Xiao, K., Pointer, M., Melgosa, M., & Bressler, Y. (2022). Optimizing 
parametric factors in cielab and ciede2000 color-difference formulas for 3d-printed 
spherical objects. Materials, 15(12), 4055. 
Hecht-Nielsen, R. (1992). Theory of the backpropagation neural network. In Neural 
networks for perception (pp. 65–93). Elsevier. 
Hensel, M., Scheiermann, M., Fahrer, J., & Durner, D. (2023). New insights into 
wine color analysis: A comparison of analytical methods to sensory perception for 
red and white varietal wines. Journal of Agricultural and Food Chemistry, 72(4), 
2008–2017. 
Hinton, G. E., & Roweis, S. (2002). Stochastic neighbor embedding. Advances in 
neural information processing systems, 15. 
Hirsch, R. (2022). Light and lens: Thinking about photography in the digital age. 
Routledge. 
98 Acta Wasaensia 
Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. 
Artificial intelligence review, 22, 85–126. 
H¨ oge, M., W¨ ohling, T., & Nowak, W. (2018). A primer for model selection: The 
decisive role of model complexity. Water Resources Research, 54(3), 1688–1715. 
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal 
components. Journal of educational psychology, 24(6), 417. 
Hristev, R. (1998). The ANN book. 
Huber, P. J. (1992). Robust estimation of a location parameter. In Breakthroughs 
in statistics: Methodology and distribution (pp. 492–518). Springer. 
Isohanni, J. (2022). Use of functional ink in a smart tag for fast-moving consumer 
goods industry. Journal of Packaging Technology and Research, 6(3), 187–198. 
Isohanni, J. (2023, March). Qr-code dataset, with colour embed insed. Zenodo. 
https://doi.org/10.5281/zenodo.7749912 
Isohanni, J. (2024a, April). Qr-codes with colour embed inside. Zenodo. 
https://doi.org/10.5281/zenodo.11079897 
Isohanni, J. (2024b). Recognising small colour changes with unsupervised learning, 
comparison of methods. Advances in Computational Intelligence, 4(2), 6. 
Isohanni, J. (2025). Customised resnet architecture for subtle color classification. 
International Journal of Computers and Applications, 47(4), 341–355. 
Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern recognition 
letters, 31(8), 651–666. 
Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, 
Inc. 
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM 
computing surveys (CSUR), 31(3), 264–323. 
James, G., Witten, D., Hastie, T., Tibshirani, R., & Taylor, J. (2023). Statistical 
learning. Springer. 
Japkowicz, N., & Shah, M. (2011). Evaluating learning algorithms: a classification 
perspective. Cambridge University Press. 
Acta Wasaensia 99 
Jeon, Y., Yoo, J., Lee, J., & Yoon, S. (2017). Nc-link: A new linkage method for 
efficient hierarchical clustering of large-scale data. IEEE Access, 5, 5594-5608. 
Jiang, H., Tian, Q., Farrell, J., & Wandell, B. A. (2017). Learning the image 
processing pipeline. IEEE Transactions on Image Processing, 26(10), 5032–5042. 
Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 
241–254. 
Katkus, D., Maciulevicˇius, P., & Lipnickas, A. (2023). Amber gemstones colour 
classification by cnn. In 2023 ieee 12th international conference on intelligent 
data acquisition and advanced computing systems: Technology and applications 
(idaacs) (Vol. 1, pp. 531–536). 
Keivani, M., Mazloum, J., Sedaghatfar, E., & Tavakoli, M. B. (2020). Automated 
analysis of leaf shape, texture, and color features for plant classification. TRAITE-
MENT du Signal, 37(1), 17–28. 
Khan, A. R., Khan, S., Harouni, M., Abbasi, R., Iqbal, S., & Mehmood, Z. (2021). 
Brain tumor segmentation using k-means clustering and deep learning with syn-
thetic data augmentation for classification. Microscopy Research and Technique, 
84(7), 1389–1399. 
Khatri, U., & Kwon, G.-R. (2024). Diagnosis of alzheimer’s disease via optimized 
lightweight convolution-attention and structural mri. Computers in Biology and 
Medicine, 171, 108116. 
Kim, M., Song, K., & Kang, M. (2018, 04). No-reference contrast measurement 
for color images based on visual stimulus. IEEE Access, PP, 1-1. 
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv 
preprint arXiv:1412.6980. 
Kreer, J. (1957). A question of terminology. IRE Transactions on Information 
Theory, 3(3), 208-208. 
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The annals 
of mathematical statistics, 22(1), 79–86. 
Kumar, A., Singh, C., & Sachan, M. K. (2024). A novel cross correlation-based 
color texture descriptor for the classification of breast cancer histopathology im-
ages. Biomedical Signal Processing and Control, 93, 106157. 
Lai, P., & Westland, S. (2020). Machine learning for colour palette extraction from 
100 Acta Wasaensia 
fashion runway images. International Journal of Fashion Design, Technology and 
Education, 13(3), 334-340. 
Lance, G. N., & Williams, W. T. (1967). A general theory of classificatory sorting 
strategies: 1. hierarchical systems. The computer journal, 9(4), 373–380. 
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., & 
Jackel, L. (1989). Handwritten digit recognition with a back-propagation network. 
Advances in neural information processing systems, 2. 
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning 
applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. 
LeCun, Y., Touresky, D., Hinton, G., & Sejnowski, T. (1988). A theoretical frame-
work for back-propagation. In Proceedings of the 1988 connectionist models sum-
mer school (Vol. 1, pp. 21–28). 
Lee, C.-H., Lee, E.-J., Ahn, S.-C., & Ha, Y.-H. (2001). Color space conversion via 
gamut-based color samples of printer. Journal of Imaging Science and Technology, 
45(5), 427–435. 
Lee, M.-K., Golzarian, M. R., & Kim, I. (2021). A new color index for vegetation 
segmentation and classification. Precision Agriculture, 22, 179–204. 
Limare, N., Lisani, J.-L., Morel, J.-M., Petro, A. B., & Sbert, C. (2011). Simplest 
color balance. Image Processing On Line, 1, 297–315. 
Litwiller, D. (2001). Ccd vs. cmos. Photonics spectra, 35(1), 154–158. 
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J., & Wu, S. (2013). Understanding 
and enhancement of internal clustering validation measures. IEEE transactions on 
cybernetics, 43(3), 982–994. 
Liu, Y., Starzyk, J. A., & Zhu, Z. (2008). Optimized approximation algorithm in 
neural networks without overfitting. IEEE transactions on neural networks, 19(6), 
983–995. 
L´ opez, A., Guzm´ an, G. A., & Di Sarli, A. R. (2016). Color 
stability in mortars and concretes. Part 1: Study on architec-
tural mortars. Construction and Building Materials, 120, 617–622. 
https://doi.org/https://doi.org/10.1016/j.conbuildmat.2016.05.133 
Lukas, J., Fridrich, J., & Goljan, M. (2006). Digital camera identification from 
sensor pattern noise. IEEE Transactions on Information Forensics and Security, 
Acta Wasaensia 101 
1(2), 205–214. 
Luo, M. R., Cui, G., & Rigg, B. (2001). The development of the CIE 2000 colour-
difference formula: CIEDE2000. Color Research & Application, 26(5), 340–350. 
Magnan, P. (2003). Detection of visible photons in ccd and cmos: A comparative 
view. Nuclear Instruments and Methods in Physics Research Section A: Accelera-
tors, Spectrometers, Detectors and Associated Equipment, 504(1-3), 199–212. 
Mahalanobis, P. (1936). On the generalized distance in statistics. Proceedings of 
the National Institute of Sciences of India, 2(1), 49–55. 
Maharana, K., Mondal, S., & Nemade, B. (2022). A review: Data pre-processing 
and data augmentation techniques. Global Transitions Proceedings, 3(1), 91–99. 
Maiti, A., Chatterjee, B., & Santosh, K. (2021). Skin cancer classification through 
quantized color features and generative adversarial network. International Journal 
of Ambient Computing and Intelligence (IJACI), 12(3), 75–97. 
Mangin, P., & Silvy, J. (1997). Fundamental studies of linting: Understanding 
ink-press-paper interactions non-linearity. In Taga (pp. 884–905). 
Mangine, H., Jakes, K., & Noel, C. (2005). A preliminary comparison of cie color 
differences to textile color acceptability using average observers. Color Research 
& Application, 30(4), 288–294. 
Martı´nez-Domingo, M. ´ A., L´ opez-Baldomero, A. B., Tejada-Casado, M., Melgosa, 
M., & Collado-Montero, F. J. (2024). Colorimetric evaluation of a reintegration 
via spectral imaging—case study: Nasrid tiling panel from the alhambra of granada 
(spain). Sensors, 24(12), 3872. 
Mimmack, G. M., Mason, S. J., & Galpin, J. S. (2001). Choice of distance matrices 
in cluster analysis: Defining regions. Journal of climate, 14(12), 2790–2797. 
Minkowski, H. (1910). Geometrie der zahlen (Vol. 1). BG Teubner. 
Mokrzycki, W. S., & Tatol, M. (2011, April). Colour difference delta-e - a survey. 
MG&V , 20(4), 383–411. 
Moreira, G., Magalha˜es, S. A., Pinho, T., dos Santos, F. N., & Cunha, M. (2022). 
Benchmark of deep learning and a proposed hsv colour space models for the detec-
tion and classification of greenhouse tomato. Agronomy, 12(2), 356. 
Morovic, J., & Luo, M. R. (2001). The fundamentals of gamut mapping: A survey. 
102 Acta Wasaensia 
Journal of Imaging Science and Technology, 45(3), 283–290. 
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann 
machines. In Proceedings of the 27th international conference on machine learning 
(icml-10) (pp. 807–814). 
Nakamura, J. (2017). Image sensors and signal processing for digital still cameras. 
CRC press. 
Nalhiati, G., Borges, G. G., Speranc¸a, M. A., & Pereira, F. M. V. (2023). Color 
classification for red alcohol vinegar to control the quality of the end-product. Food 
Analytical Methods, 16(7), 1283–1290. 
Naveed, K., Ehsan, S., McDonald-Maier, K. D., & Ur Rehman, N. (2019). A 
multiscale denoising framework using detection theory with application to images 
from cmos/ccd sensors. Sensors, 19(1), 206. 
Nguyen, C.-N., Vo, V.-T., & Ha, N. C. (2022). Developing a computer vision 
system for real-time color measurement–a case study with color characterization of 
roasted rice. Journal of Food Engineering, 316, 110821. 
Ni, J., Yan, Z., & Jiang, J. (2022). Tonguecaps: An improved capsule network 
model for multi-classification of tongue color. Diagnostics, 12(3), 653. 
Niu, S., Liu, Y., Wang, J., & Song, H. (2020). A decade survey of transfer learning 
(2010–2020). IEEE Transactions on Artificial Intelligence, 1(2), 151–166. 
Nixon, M., Outlaw, F., & Leung, T. S. (2020). Accurate device-independent colori-
metric measurements using smartphones. PLoS One, 15(3), e0230561. 
Nokia n9 product page. (n.d.). https://www.hmd.com/en int/nokia-9-pureview 
?sku=11AOPLW1A08. (Accessed: 2024-05-05) 
Nugroho, H. A., Goratama, R. D., & Frannita, E. L. (2021). Face recognition in 
four types of colour space: a performance analysis. In IOP Conference Series: 
Materials Science and Engineering (Vol. 1088, p. 012010). 
Othman, N., Zain, M. Z. M., Ishak, I. S., Bakar, A. R. A., Ab Wahid, M., & Mo-
hamad, M. (2020). A colour recognition device for the visually disabled people. 
Indonesian Journal of Electrical Engineering and Computer Science, 17(3), 1322– 
1329. 
Pak, A., Reichel, S., & Burke, J. (2022). Machine-learning-inspired workflow for 
camera calibration. Sensors, 22(18), 6804. 
Acta Wasaensia 103 
Palus, H. (1998). Representations of colour images in different colour spaces. In 
The colour image processing handbook (pp. 67–90). Springer. 
Pearson, K. (1901). Liii. on lines and planes of closest fit to systems of points in 
space. The London, Edinburgh, and Dublin philosophical magazine and journal of 
science, 2(11), 559–572. 
Pecho, O. E., Ghinea, R., Alessandretti, R., P´ erez, M., & Della Bona, 
A. (2016). Visual and instrumental shade matching using CIELAB and 
CIEDE2000 color difference formulas. Dental Materials, 32(1), 82–92. 
https://doi.org/https://doi.org/10.1016/j.dental.2015.10.015 
Pecho, O. E., Ghinea, R., Alessandretti, R., Pe´rez, M. M., & Della Bona, A. (2016). 
Visual and instrumental shade matching using CIELAB and CIEDE2000 color dif-
ference formulas. Dental materials, 32(1), 82–92. 
Pennebaker, W. B., & Mitchell, J. L. (1992). Jpeg: Still image data compression 
standard. Springer Science & Business Media. 
Pereira, A., Carvalho, P., Coelho, G., & Coˆrte-Real, L. (2019). Efficient 
CIEDE2000-based color similarity decision for computer vision. IEEE Transac-
tions on Circuits and Systems for Video Technology, 30(7), 2141–2154. 
P´ erez, M. M., Carrillo-Perez, F., Tejada-Casado, M., Ruiz-L´ opez, J., Benavides-
Reyes, C., & Herrera, L. J. (2022). CIEDE2000 lightness, chroma and hue human 
gingiva thresholds. Journal of Dentistry, 124, 104213. 
Plataniotis, K. N. (2001). Color image processing and applications. Measurement 
Science and Technology, 12(2), 222–222. 
Polyak, B. T. (1964). Some methods of speeding up the convergence of iteration 
methods. Ussr computational mathematics and mathematical physics, 4(5), 1–17. 
Pramoditha, R. (n.d.). Overview of a neural network’s learning 
process. https://medium.com/data-science-365/overview-of-a-neural-networks 
-learning-process-61690a502fa. (Accessed: 2024-07-01) 
Prechelt, L. (2002). Early stopping-but when? In Neural networks: Tricks of the 
trade (pp. 55–69). Springer. 
Przybyło, J., & Jabło´ nski, M. (2019). Using deep convolutional neural network for 
oak acorn viability recognition based on color images of their sections. Computers 
and Electronics in Agriculture, 156, 490-499. 
104 Acta Wasaensia 
Rafiq, M., Bugmann, G., & Easterbrook, D. (2001). Neural network de-
sign for engineering applications. Computers & Structures, 79(17), 1541-1552. 
https://doi.org/https://doi.org/10.1016/S0045-7949(01)00039-6 
Ramanath, R., Snyder, W. E., Yoo, Y., & Drew, M. S. (2005). Color image process-
ing pipeline. IEEE Signal processing magazine, 22(1), 34–43. 
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. 
Journal of the American Statistical association, 66(336), 846–850. 
Rao, K. R., & Yip, P. C. (2018). The transform and data compression handbook. 
CRC press. 
Rateni, G., Dario, P., & Cavallo, F. (2017). Smartphone-based food diagnostic 
technologies: A review. Sensors, 17(6), 1453. 
Rend´ on, E., Abundez, I. M., Gutierrez, C., Zagal, S. D., Arizmendi, A., Quiroz, 
E. M., & Arzate, H. E. (2011). A comparison of internal and external cluster val-
idation indexes. In Proceedings of the 2011 American Conference, San Francisco, 
CA, USA (Vol. 29, pp. 1–10). 
Rennie, J. D., & Srebro, N. (2005). Loss functions for preference levels: Regres-
sion with discrete ordered labels. In Proceedings of the IJCAI multidisciplinary 
workshop on advances in preference handling (Vol. 1). 
Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and 
validation of cluster analysis. Journal of computational and applied mathematics, 
20, 53–65. 
Rubinstein, R. Y., & Kroese, D. P. (2004). The cross-entropy method: a uni-
fied approach to combinatorial optimization, monte-carlo simulation, and machine 
learning (Vol. 133). Springer. 
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations 
by back-propagating errors. Nature, 323(6088), 533–536. 
Rundo, L., Beer, L., Ursprung, S., Martin-Gonzalez, P., Markowetz, F., Brenton, 
J. D., . . . Woitek, R. (2020). Tissue-specific and interpretable sub-segmentation of 
whole tumour burden on CT images by unsupervised fuzzy clustering. Computers 
in biology and medicine, 120, 103751. 
Saifullah, S. (2020). Segmentation for embryonated egg images detection using the 
k-means algorithm in image processing. In 2020 Fifth International Conference on 
Informatics and Computing (ICIC) (pp. 1–7). 
Acta Wasaensia 105 
Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. 
IBM Journal of research and development, 3(3), 210–229. 
Samuel, A. L. (1967). Some studies in machine learning using the game of checkers. 
ii—recent progress. IBM Journal of research and development, 11(6), 601–617. 
Scha¨fer, E., Heiskanen, J., Heikinheimo, V., & Pellikka, P. (2016). Mapping tree 
species diversity of a tropical montane forest by unsupervised clustering of airborne 
imaging spectroscopy data. Ecological indicators, 64, 49–58. 
Sch¨ oberl, M., Senel, C., F¨ oßel, S., Bloss, H., & Kaup, A. (2009). Non-linear dark 
current fixed pattern noise compensation for variable frame rate moving picture 
cameras. In 2009 17th European Signal Processing Conference (pp. 268–272). 
Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: 
From theory to algorithms. Cambridge university press. 
Shannon, C. E. (1948). A mathematical theory of communication. The Bell system 
technical journal, 27(3), 379–423. 
Sharma, S. (1995). Applied multivariate techniques. John Wiley & Sons, Inc. 
Shrivastava, V. K., & Pradhan, M. K. (2021). Rice plant disease classification using 
color features: a machine learning paradigm. Journal of Plant Pathology, 103(1), 
17–26. 
Sietsma, & Dow. (1988). Neural net pruning-why and how. In IEEE 1988 interna-
tional conference on neural networks (pp. 325–333). 
Sinaga, K. P., & Yang, M.-S. (2020). Unsupervised k-means clustering algorithm. 
IEEE access, 8, 80716–80727. 
Skandarajah, A., Reber, C. D., Switz, N. A., & Fletcher, D. A. (2014). Quantitative 
imaging with a mobile phone microscope. PloS one, 9(5), e96906. 
Sokal, R., Michener, C., & of Kansas, U. (1958). A statistical method for evaluating 
systematic relationships. University of Kansas. 
Souchleris, K., Sidiropoulos, G. K., & Papakostas, G. A. (2023). Reinforcement 
learning in game industry—review, prospects and challenges. Applied Sciences, 
13(4), 2443. 
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. 
(2014). Dropout: a simple way to prevent neural networks from overfitting. The 
106 Acta Wasaensia 
journal of machine learning research, 15(1), 1929–1958. 
Steinbach, M., Ert¨ oz, L., & Kumar, V. (2004). The challenges of clustering high 
dimensional data. In New directions in statistical physics: econophysics, bioinfor-
matics, and pattern recognition (pp. 273–309). Springer. 
Sˇtruncova´, M., Toma, S. H., Araki, K., Bresciani, E., Rodrigues, F. P., Medeiros, 
I. S., & Dutra-Correa, M. (2020). Silver nanoparticles added to a commercial 
adhesive primer: Colour change and resin colour stability with ageing. International 
Journal of Adhesion and Adhesives, 102, 102694. 
Su, R., Guo, Y., Wu, C., Jin, Q., & Zeng, T. (2024). Kernel correlation–dissimilarity 
for Multiple Kernel k-Means clustering. Pattern Recognition, 150, 110307. 
Su, X., Yue, X., Kong, M., Xie, Z., Yan, J., Ma, W., . . . Liu, M. (2023). Leaf 
color classification and expression analysis of photosynthesis-related genes in in-
bred lines of chinese cabbage displaying minor variations in dark-green leaves. 
Plants, 12(11), 2124. 
Su, Z., Yang, J., Li, P., Jing, J., & Zhang, H. (2022). A precise method of color 
space conversion in the digital printing process based on pso-dbn. Textile Research 
Journal, 92(9-10), 1673–1681. 
Sun, X., Panda, R., Feris, R., & Saenko, K. (2020). Adashare: Learning what 
to share for efficient deep multi-task learning. Advances in Neural Information 
Processing Systems, 33, 8728–8740. 
Szeliski, R. (2022). Computer vision: algorithms and applications. Springer 
Nature. 
Takase, T., Oyama, S., & Kurihara, M. (2018). Effective neural network training 
with adaptive learning rate based on training loss. Neural Networks, 101, 68–78. 
Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convo-
lutional neural networks. In International conference on machine learning (pp. 
6105–6114). 
Tang, K., Astola, J., & Neuvo, Y. (1994). Multichannel edge enhancement in color 
image processing. IEEE Transactions on circuits and systems for video technology, 
4(5), 468–479. 
Tejada-Casado, M., P´ erez, M. M., Della Bona, A., L¨ ubbe, H., Ghinea, R., & Her-
rera, L. J. (2024). Chroma-dependence of CIEDE2000 acceptability thresholds for 
dentistry. Journal of Esthetic and Restorative Dentistry, 36(3), 469–476. 
Acta Wasaensia 107 
Terensan, S., Salgadoe, A. S. A., Kottearachchi, N. S., & Weerasena, O. J. (2024). 
Proximally sensed rgb images and colour indices for distinguishing rice blast and 
brown spot diseases by k-means clustering: Towards a mobile application solution. 
Smart Agricultural Technology, 100532. 
Thorndike, R. L. (1953). Who belongs in the family? Psychometrika, 18(4), 
267–276. 
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of 
the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267–288. 
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters 
in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B 
(Statistical Methodology), 63(2), 411–423. 
Tieleman, T., & Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a 
running average of its recent magnitude. COURSERA: Neural networks for machine 
learning, 4(2), 26–31. 
Tooms, M. S. (2015). Colour reproduction in electronic imaging systems: photog-
raphy, television, cinematography. John Wiley & Sons. 
Trivedi, V. K., Shukla, P. K., & Pandey, A. (2022). Automatic segmentation of plant 
leaves disease using min-max hue histogram and k-mean clustering. Multimedia 
Tools and Applications, 81(14), 20201–20228. 
Ultsch, A., & L¨ otsch, J. (2022). Euclidean distance-optimized data transformation 
for cluster analysis in biomedical data (edotrans). BMC bioinformatics, 23(1), 233. 
Valkenborg, D., Rousseau, A.-J., Geubbelmans, M., & Burzykowski, T. (2023). 
Unsupervised learning. American Journal of Orthodontics and Dentofacial Ortho-
pedics, 163(6), 877–882. 
Wang, D., Wang, X., Chen, Y., Wu, Y., & Zhang, X. (2023). Strawberry ripeness 
classification method in facility environment based on red color ratio of fruit rind. 
Computers and Electronics in Agriculture, 214, 108313. 
Wang, H., Yu, W., You, J., Ma, R., Wang, W., & Li, B. (2021). A unified framework 
for anomaly detection of satellite images based on well-designed features and an 
artificial neural network. Remote Sensing, 13(8), 1506. 
Wang, S., Wang, K., Yang, T., Li, Y., & Fan, D. (2022). Improved 3D-ResNet sign 
language recognition algorithm with enhanced hand features. Scientific Reports, 
12(1), 17812. 
108 Acta Wasaensia 
Wang, X., & Zhang, D. (2010). An optimized tongue image color correction 
scheme. IEEE Transactions on information technology in biomedicine, 14(6), 
1355–1364. 
Wang, X., Zhang, J., Jiang, Y., Du, J., Miao, D., & Xu, C. (2024). Color difference 
of yarn-dyed fabrics woven from warp and weft yarns in different color depths. 
Pigment & Resin Technology, 53(1), 28–35. 
Wang, Z. (2020). Robust segmentation of the colour image by fusing the SDD clus-
tering results from different colour spaces. IET Image Processing, 14(13), 3273– 
3281. 
Ward Jr, J. H. (1963). Hierarchical grouping to optimize an objective function. 
Journal of the American statistical association, 58(301), 236–244. 
Wei, K., Chen, B., Li, Z., Chen, D., Liu, G., Lin, H., & Zhang, B. (2022). Classifica-
tion of tea leaves based on fluorescence imaging and convolutional neural networks. 
Sensors, 22(20), 7764. 
Wilkes, T. C., McGonigle, A. J., Pering, T. D., Taggart, A. J., White, B. S., Bryant, 
R. G., & Willmott, J. R. (2016). Ultraviolet imaging with low cost smartphone 
sensors: development and application of a raspberry pi-based uv camera. Sensors, 
16(10), 1649. 
Wong, T.-T., & Yeh, P.-Y. (2019). Reliable accuracy estimates from k-fold cross 
validation. IEEE Transactions on Knowledge and Data Engineering, 32(8), 1586– 
1594. 
Wu, D., & Sun, D.-W. (2013). Colour measurements by computer vision for food 
quality control–a review. Trends in food science & technology, 29(1), 5–20. 
Wu, J., Chen, J., Xiong, H., & Xie, M. (2009). External validation measures for 
k-means clustering: A data distribution perspective. Expert Systems with Applica-
tions, 36(3), 6050–6061. 
Wu, J., Cui, Y., Sun, X., Cao, G., Li, B., Ikeda, D. M., . . . Li, R. (2017). Unsu-
pervised clustering of quantitative image phenotypes reveals breast cancer subtypes 
with distinct prognoses and molecular pathways. Clinical Cancer Research, 23(13), 
3334–3342. 
Wu, J., Zhang, B., Zhou, J., Xiong, Y., Gu, B., & Yang, X. (2019). Automatic 
recognition of ripening tomatoes by combining multi-feature fusion with a bi-layer 
classification strategy for harvesting robots. Sensors, 19(3). 
Acta Wasaensia 109 
Wu, T., Gu, X., Shao, J., Zhou, R., & Li, Z. (2021). Colour image segmentation 
based on a convex k-means approach. IET Image Processing, 15(8), 1596–1606. 
Xu, B., Zhang, B., Kang, Y., Wang, Y., & Li, Q. (2012). Applicability of 
CIELAB/CIEDE2000 formula in visual color assessments of metal ceramic restora-
tions. Journal of Dentistry, 40, e3–e9. 
Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. 
Annals of data science, 2, 165–193. 
Yang, J., Shen, F., Wang, T., Luo, M., Li, N., & Que, S. (2021). Effect of smart 
phone cameras on color-based prediction of soil organic matter content. Geoderma, 
402, 115365. 
Yerliyurt, K., & Sarıkaya, I. (2022). Color stability of hybrid ceramics exposed to 
beverages in different combinations. BMC Oral Health, 22(1), 180. 
Yılmaz, U., Tutus, A., & S¨ onmez, S. (2022). Effects of using recycled paper in 
inkjet printing system on colour difference. Pigment & Resin Technology, 51(3), 
336–343. 
Yong, H., Huang, J., Hua, X., & Zhang, L. (2020). Gradient centralization: A new 
optimization technique for deep neural networks. In Computer Vision–ECCV 2020: 
16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part 
I 16 (pp. 635–652). 
Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are fea-
tures in deep neural networks? Advances in neural information processing systems, 
27. 
Yulita, I. N., Amri, N. A., & Hidayat, A. (2023). Mobile application for tomato 
plant leaf disease detection using a dense convolutional network architecture. Com-
putation, 11(2), 20. 
Zeiler, M. D. (2012). Adadelta: an adaptive learning rate method. arXiv preprint 
arXiv:1212.5701. 
Zhang, L., Xia, R., Yang, B., Zhang, J., & Wang, J. (2024). MSFNet-2SE: A 
multi-scale fusion convolutional network for Alzheimer’s disease classification on 
magnetic resonance images. International Journal of Imaging Systems and Tech-
nology, 34(4), e23112. https://doi.org/https://doi.org/10.1002/ima.23112 
Zhang, Q., Zhuo, L., Li, J., Zhang, J., Zhang, H., & Li, X. (2018). Vehicle 
color recognition using multiple-layer feature representations of lightweight con-
110 Acta Wasaensia 
volutional neural network. Signal Processing, 147, 146–153. 
Zhang, W., Li, X., & Ding, Q. (2019). Deep residual learning-based fault diagnosis 
method for rotating machinery. ISA transactions, 95, 295–305. 
Zhang, X., Li, X., Feng, Y., & Liu, Z. (2015). The use of ROC and AUC in the 
validation of objective image fusion evaluation metrics. Signal processing, 115, 
38–48. 
Zhbanova, V. L. (2020). Evaluation and selection of colour spaces for digital 
systems. Light & Engineering, 28(6). 
Zheng, Y., Iwana, B. K., Malik, M. I., Ahmed, S., Ohyama, W., & Uchida, S. 
(2021). Learning the micro deformations by max-pooling for offline signature ver-
ification. Pattern Recognition, 118, 108008. 
Zhong, X., & Ban, H. (2022). Pre-trained network-based transfer learning: A small-
sample machine learning approach to nuclear power plant classification problem. 
Annals of Nuclear Energy, 175, 109201. 
Zhou, J., & Glotzbach, J. (2007). Image pipeline tuning for digital cameras. In 
2007 IEEE International Symposium on Consumer Electronics (pp. 1–4). 
Zhu, D., & Qiu, D. (2021). Residual dense network for medical magnetic resonance 
images super-resolution. Computer Methods and Programs in Biomedicine, 209, 
106330. 
Zou, F., Shen, L., Jie, Z., Zhang, W., & Liu, W. (2019). A sufficient condition for 
convergences of adam and rmsprop. In Proceedings of the IEEE/CVF Conference 
on computer vision and pattern recognition (pp. 11127–11135). 
Vol.:(0123456789)  
Journal of Packaging Technology and Research (2022) 6:187–198 
https://doi.org/10.1007/s41783-022-00137-4 
RESEARCH ARTICLE 
Use of Functional Ink in a Smart Tag for Fast‑Moving Consumer Goods 
Industry 
Jari Isohanni1 
Received: 10 October 2021 / Accepted: 21 June 2022 / Published online: 25 July 2022 
© The Author(s) 2022 
Abstract 
In the fast-moving consumer goods (FMCG) industry, current labelling solutions have challenges to meet the track & trace 
requirements. Currently, FMCG items use mainly paper-based self-adhesive labels with traditional barcodes. These labels are 
low priced and technically easy to produce and deploy. The shift towards advanced solutions, like radio frequency identifica-
tion (RFID) or near field communication (NFC) tags, still does not offer a good enough cost/benefit ratio. These advanced 
solutions have a high unit price or require costly changes in production lines. Still, the industry recognizes the possibilities of 
smart tags. Recent research has shown that functional inks can operate as cheap sensors. However, more research is needed to 
take functional inks into the operational FMCG environment. This paper presents one technical solution for an FMCG smart 
tag. The proposed smart tag builds on traditional QR-Code and Datamatrix markers, printed with standard inks. However, 
it also has functional ink embedded inside the marker as a sensor. This research experiments how embedding impacts the 
overall performance of the smart tag decoding. And if the CIEDE2000 color difference algorithm can calculate the state 
of the sensor. Three different parameter combinations, CIEDE2000(1, 1, 1), CIEDE2000(2, 1, 1), CIEDE2000(2.76, 1.58, 
1), and their accuracy are compared. Experiments show that the proposed approach does not negatively affect the decoding 
performance. And that a color comparison can detect sensor states, especially when the functional ink has high enough 
color intensity. Between different parameters, CIEDE2000(2.76, 1.58, 1) performed best, especially in the low-intensity 
test. However, some future research needs to address absolute color value detection and the accuracy of color recognition; 
especially when if the color has low intensity. 
Keywords Functional ink · Smart tag · Fast-moving consumer goods · Color difference · Machine vision 
Introduction 
From the day mobile devices have had cameras, they have 
been used to decode markers. The most popular markers 
today are QR-Code and Datamatrix. These markers can be 
found in many items starting from fast-moving consumer 
goods (FMCG) and going all the way to industrial and medi-
cal usage. According to Tiwari (2016), the first one-dimen-
sional (1D) marker was invented in the 1960s. Since markers 
have evolved from 1D barcodes to multi-color two-dimen-
sional (2D) markers (Fig. 1) [56]. Marker development has 
followed other technologies that relate, like cameras and 
image vision. Currently, there are multiple marker types, 
some more general, some more use-case specific. 
This research extends standard QR-Code and Datamatrix 
markers to smart tags. Smart tags primary function is to 
provide information to its user regarding the status of item 
[38]. Printed non-electronic smart tags function by com-
municating through the physical senses (human vision) or 
machine vision. 
Functional inks are used to extend standard QR-Code and 
Datamatrix markers. These inks form the part of the marker 
which acts as an active sensor by reacting to environmental 
variables like humidity, temperature, or light. Functional 
inks are suitable to the normal label print process of mark-
ers. The chemical structure of the functional ink determines 
how and on which environmental variables ink reacts. 
As summary, this research looks into solving following 
research questions: 
• Jari Isohanni 
jari.isohanni@abo.fi 
1 Faculty of Science and Engineering, Åbo Akademi 
University, Tuomiokirkontori 3, 20500 Turku, Finland 
Acta Wasaensia 111 
188 Journal of Packaging Technology and Research (2022) 6:187–198
1 3 
• How to embed functional ink inside the QR-Code or 
Datamatrix? 
• How does the embedding impact the decoding perfor-
mance of QR-Code or Datamatrix? 
• Can CIEDE2000 color difference algorithms accurately 
recognize state of the functional ink? 
The rest of the paper is organized as follows. Sec-
tion  “Related Work” goes through related work. Sec-
tion “Materials and Methods defines what smart tags are, 
how label printing works, introduces use-cases that relate to 
smart tags, and looks into the proposed technical approach. 
Section “Experiments” discusses details of experiments, and 
in Section “Results and Discussion”, results of the experi-
ments are discussed. Finally, Section “Conclusions” con-
cludes the work, followed by future research topics in Sec-
tion Future Work. 
Related Work 
Relevant past research work has focused on exploring pos-
sibilities of intelligent packaging in the FMCG industry. As 
part of intelligent packaging smart tags have been seen as a 
possibility to track items and their status. For example, this 
research has been done in the context of food items by Yam 
[60], Mohebi and Marquez [34], Realini and Marcos. [43], 
Kalpana et al. [21], and Fuertes et al. [8]. 
Related research has developed new advanced markers. 
These markers have one difference from traditional markers. 
Where traditional markers use black and white values to 
present binary values, advanced markers use a limited spec-
trum of colors. Parikh and Jancke proposed an approach to 
recognize multiple colors in 2D color barcodes [36]. Bulan 
and Sharma proposed using dot orientation, with colors, to 
encode data into high-capacity barcode [4]. Grillo et al. pro-
posed a new barcode that uses colors to include more data 
[13]. Bulan and Sharma proposed a barcode that uses the 
cyan, magenta, and yellow colorant separations to enable 
high-capacity barcode [5]. Subpratatsavee and Kuacharoen 
proposed a new barcode that reduces the physical space 
needed for the barcode. In their research, Subpratatsavee 
and Kuacharoen present the design and the implementation 
of a high-capacity two-dimensional barcode [50]. Taveerad 
and Vongpradhip proposed color QR-Code to hold more data 
through a novel encoding concept [54]. Common to all these 
approaches is that in theory, black & white marker data can 
only be translated into 1s and 0s when decoded. However, 
colors, or shades, can be translated to as many values as 
possible to identify [41]. Most of these past studies relate 
to using colors to encode more data into barcodes. Or how 
to change data that barcode contains through colors. John 
and Raahemifar provided an overview of color barcodes. 
They also proposed a binarization and grouping algorithm to 
encode data to form a color barcode [20]. Wasule and Met-
kar took into account the intensity variation which occurs 
while decoding colored barcodes and proposed an approach 
that will increase the capacity of barcodes beyond threefold; 
in their work, they used quantization of grey levels [59]. 
Ramalho et al. used the concept of super-modules to encode 
more data into QR-Code but also to make QR-Code more 
secure [42]. 
Functional inks and mainly their properties have been 
studied during recent years. Gao et al., in their research, 
focused on time-temperature indicators (TTIs) data mod-
elling [9]. Li and Chen used two different functional inks 
in their proposal as a new Dynamic and Sensitive Barcode 
(D&S), D&S can react to environmental state change [27]. 
Chen et al. used a vice-versa approach and made the paper 
reactive to environmental changes, and they also used 
smart devices to decode color information [6]. Kulčar et al. 
focused their research on the properties of functional inks 
regarding state changes [24]. 
One of the most relevant research around functional inks 
and smart tags is a study by Gligoric et al. In this research, 
authors defined possible approaches for smart tags [12]. 
Quite similar work was done by Hakola and Vehmas [14]. 
However, these studies mainly focus on uses-cases and prop-
erties of functional inks, and only briefly define technical 
frameworks for ink-based smart tags 
This research uses CIEDE2000 (KL, KC, KH) algorithm 
for color difference calculations. CIEDE2000 has been 
used in many studies where color difference is matched 
against human perception or in general color comparison. 
This work has been conducted for example by Taoa et al. 
[52], where authors used the algorithm to recognize leaf 
color differences. By Nguyen et al. [35] in the context of 
rice color recognition and by López et al. [30] for mortar 
color differences. Usage of KL = 1, KC = 1, and KH = 1 
parametric values is the most common approach when using 
Fig. 1  Evolution of barcodes, adapted from [56] 
112 Acta Wasaensia 
189 Journal of Packaging Technology and Research (2022) 6:187–198 
1 3 
CIEDE2000 [11, 31, 37]. Past research has also achieved 
good results with KL = 2, KC = 1, and KH = 1 parametric 
values [10, 32]. The third set of parametric values used in 
this research is KL = 2.76, KC = 1.58, and KH = 1, which 
has been used in the past, especially in digital and printed 
images [29]. 
The past research approaches smart tags from different 
point-of-view; they rarely consider the actual FMCG label 
printing process. And previous color difference research has 
focused on color difference calculations; especially, in use-
cases where color comes from a physical object, or colors 
have a different hue. 
Contributions of this research work are: 
(a) Technical approach to extend standard QR-Code and 
Datamatrix markers to smart tags, without significant 
impact on marker decoding performance. 
(b) Verify the suitability of embedding functional inks into 
QR-Code and Datamatrix markers. 
(c) CIEDE2000 algorithms’ parameter comparison with 
printed colors. 
Materials and Methods 
Labels 
Smart tags have been defined in the past research in various 
ways. They might refer to radio-enabled electronic devices, 
e.g., [62], radio frequency identification (RFID), or near 
field communication (NFC) tags, e.g., [1]. Smart tags have 
also been considered as printed electronics, e.g., [45]. 
In this research, smart tags are standard two-dimensional 
markers with added intelligence from functional inks. Smart 
tags are printed on self-adhesive labels, and finally glued 
to FMCG products. This paper does not take into account 
labelling/printing that is done directly on items, like laser 
marking. According to Kirwan (2012), the invention of self-
adhesive labels was in the mid-1930s. Labels were back then 
used to apply price and decoration on store items. Currently, 
the FMCG industry uses labels to add value to a product 
item during its life-cycle. The latest development of labels 
has made them smart, smart-active, or smart-intelligent 
(Fig. 2). Smart labels have various usages like tracking 
products, monitoring their temperature, and indicating food 
freshness [23]. 
Smart tags have the functionality to react to environ-
mental changes and identify individual items, rather than 
presenting only static content. This categorizes smart tags 
as customized labels [14]. Smart tags defined in such a 
way are part of intelligent packaging, an emerging technol-
ogy in the FMCG. Intelligent packaging uses smart tags as 
the communication function between the package and the 
user. Smart tags and their information help in the decision-
making related to the item, to achieve the added value, like 
enhanced food quality, user experience, and safety [60]. 
Label Printing Process 
Labels with smart tags are either (a) pre-printed in print 
houses or (b) printing occurs just before they are applied 
(in-house) [49]. 
Depending on the chosen label printing facility, pre-
print or in-house, multiple technologies can be used to 
print labels. 
• Rotary and semi-rotary letterpress. 
• Flexographic printing. 
• UV-flexographic printing. 
• Screen printing. 
• Offset printing. 
• Rotogravure printing. 
• Thermal printing. 
• Laser and inkjet printing [23]. 
All listed technologies are suitable for the pre-printing of 
labels. For labels printed in the production line, thermal, 
laser, and inkjet printing are plausible. These non-conven-
tional technologies can also print customized information 
that varies per label. It is also possible to mix conven-
tional and non-conventional ways of printing; this is called 
hybrid printing [23]. 
Electronic/non-electronic labels can occupy as much 
as 50% of the package costs. However, in most low-priced 
Fig. 2  Categories of labels [23] 
Acta Wasaensia 113 
190 Journal of Packaging Technology and Research (2022) 6:187–198 
1 3 
products, like food, the price of the package should not 
exceed 10% of the total costs of the product [58]. 
Functional Inks 
Functional inks can report exposure to environmen-
tal influences by switching between two states of opti-
cal properties. The state of the ink and its absolute value 
depends on the current properties of the used ink. Func-
tional inks can ultimately appear either in an active (1) or 
non-active (0) binary state. Switch between states is rela-
tive to physical influences. For example, water/humidity 
(hydrochromism), temperature (thermochromism), and the 
intensity of light (photochromism) can change the state of 
the functional ink [15]. 
Depending on the environmental influence, and proper-
ties of the ink, change between two binary states occurs 
in a specified time. During this period, functional ink can 
also have values between 0 and 1. Optically visible change 
between states can either be a change between colors or 
color-changing its intensity. Also, depending on the chem-
ical compound of the ink, change can be reversible or irre-
versible [3]. 
According to Zabala et al., the change in functional ink 
can represent accumulated exposure (total), or if the ink 
has exceeded its activation point [61]. Therefore, func-
tional inks can track continuous exposure during the whole 
life-cycle, or exposure to an environment lower/higher 
than the threshold set. 
Functional inks in this research are compatible with the 
high-speed printing process (conventional, non-conven-
tional, and hybrid). And to meet labelling cost require-
ments, the price of the functional ink is close to the price 
range of standard color inks. The advantage of the smart 
tag with functional ink is that there are no electronics in 
it. This makes the manufacturing technically less complex 
than RFID/NFC-based smart tags. The advantage is also 
in recycling, as some regulations define RFID/NFC-based 
tags as electronics. Using functional ink raises the pos-
sibility to design innovative products for markets not yet 
addressed by the electronic tags [12]. Functional inks are 
suitable for most printing methods presented in the previ-
ous chapter, including flexographic, gravure, screen, and 
inkjet [28]. 
When smart tags are developed for the needs of the food 
industry, they should not impose thread to items inside pack-
ages or when in direct contact with food items. There are 
three main ways of food safety can be compromised by the 
ink used in labels/packages [46]: 
• Migration, components of ink pass through the substrate 
• Invisible set-off—transfer of components of ink from the 
printed side to the food-facing side, for example when 
packages are stacked. 
• Gas-phase transfer of ink components via the air in the 
packaging to food. 
Some functional inks are suitable for short- or long-term 
direct food contact (DFC). These inks meet regulations set 
by authorities like EuPIA (European Printing Ink Associa-
tion) and U.S. Food and Drug Administration (FDA). How-
ever, some functional inks can only be used in non-food 
contact. Past research has also developed options for this 
like impermeable barriers/papers, which can be used when 
printing smart tags [61]. As this paper only focuses on pro-
posing a general approach, research of these options is left 
for future research. 
Use‑Cases for Smart Tags in FMCG Industry 
Fast-Moving-Consumer-Goods are perishable packaged 
products purchased and consumed by all members of soci-
ety. FMCG cover items like food and beverages; often, these 
goods are also considered as Consumer Packaged Goods 
(CPG). One of the main purposes of the product packag-
ing is to communicate important information about item(s) 
in the package to the user. Information user gets through 
labels, which might relate to the ingredients/material of the 
product. However, the label can also contain other informa-
tion, like nutritional facts. Packages, and their labels, look to 
meet customer experience expectations. However, some of 
their functionality is also responsible for fostering a circular 
economy. Part of this is cradle-to-grave tracking and actions 
that relate. Typical life-cycle (cradle-to-grave) of an product 
is illustrated in Fig. 3. 
As shown in Fig. 3, intelligent packaging links to all 
phases of the product life-cycle. In this context, smart tags 
Fig. 3  Product life-cycle and its relation to intelligent packaging [60] 
114 Acta Wasaensia 
191 Journal of Packaging Technology and Research (2022) 6:187–198 
1 3 
should deliver various information to the user. This infor-
mation contains either data about the current status of the 
product or information that relates to the product’s histori-
cal status. Depending on the life-cycle phase, user might be 
interested in a lifetime, temperature history, freshness status, 
package condition, authenticity, or something else [23]. 
Smart tag-related research has recognized suitable use-
cases which directly or indirectly contribute to meeting 
the needs of consumers, producers, and manufacturers [7]. 
These uses-cases are classified as follows: 
• Manufacturing (Raw material, Product manufacturing, 
Packaging). 
• Distribution & Consume. 
• Recycle [38]. 
Listed use-cases are related to item identification and provid-
ing information about the item to the current stakeholder or 
user. When the packaged item has a smart tag on its label, 
the item has intelligent packaging. The intelligent package is 
“a packaging that contains an external or internal indicator 
to provide information about aspects of the history of the 
package and/or the quality” [44]. Some use-cases are more 
suitable for intelligent packaging than others. 
As printed smart tags belong to the same category as 
RFID tags’ most use-cases can be derived from RFID/NFC 
use-cases [12]. Printed smart tags lack a radio interface, so 
the decoder must have a line of sight and relatively close dis-
tance. However, printed smart tags support a visual interface 
that low-cost RFID/NFC do not currently have. Generally, 
the costs of the smart tags are part of the packaging costs. 
Technical Approach for the Smart Tag 
Smart tags developed for the FMCG, must be able to: (a) 
avoid any false negatives (samples that seem safe but are 
dangerous) and (b) they should have as few false positives 
(samples that seem unsafe but are healthy) as possible. 
If a functional marker detects false negatives, it provides 
information that the item is safe, although the reality is vice 
versa. False positives are not that crucial, but providing false 
positives weakens the trust towards the system. False posi-
tives provide information that the item is unsafe even if they 
are not. The price of the marker is also a meaningful factor 
and how the user can decode the marker [39]. 
Intelligent packaging use-cases are dependent on two fea-
tures. First is a unique identifier, which provides a way to 
trace items’ history and link life-cycle events of the item into 
databases or other services [38]. The second feature is a use-
case-specific sensor printed with functional ink. Therefore, 
markers must be printed with two different inks, one that is 
static (unique identifier) and one that can react to environ-
mental changes (sensor) [51]. 
The following three methods can be when extending 
markers with functional ink [51]. These are: 
– (a) Functional ink placed outside of the marker, a solution 
used by many of the existing smart tags. This increases 
the required physical print area and might make decoding 
of the functional ink state complex. 
– (b) Functional ink changes markers’ data, and this solu-
tion is more advanced and requires specific and matching 
functional inks and compatible marker contents. 
– (c) Functional ink placed inside of the marker, a solution 
presented in this paper. 
In the proposed approach, functional ink is placed inside 
the marker without disturbing the reserved cells, like a tim-
ing pattern. This does not exceed markers’ error correction 
capability. However, due to ink misplacement or dye growth, 
the approach uses functional ink intensity satisfying require-
ments for symbols set by GS1 standard (bar code symbol 
print quality test specification—two-dimensional symbols) 
[16]. In these requirements, the contrast between the sym-
bols black and white cells should not be lower than 40%. 
Placing functional ink inside marker also affects modulation 
criteria, which compares local contrast to global contrast. As 
functional ink area disturbs the global threshold, it might 
affect the probability of incorrect cell color identification. 
In such a case, base markers’ capability to recover from 
error/destroyed data is used while decoding. When using 
QR-Code or Datamatrix as the base marker, around 30% of 
destroyed data can be recovered [17, 18]. 
The proposed approach has the following advantages: (a) 
it does not occupy more physical space than adding stand-
ard QR-Code, (b) QR-Code can work as call-to-action, and 
(c) it is technically suitable for the current label printing 
processes. Technically approach might be suitable for use-
cases with more than one functional ink (sensor), but more 
research is needed. 
Figure 4 shows proposed approach in QR-Code. In this 
approach, the sensor area would be defined by rectangle 
Fig. 4  Approach to embed func-
tional ink into QR-Code 
Acta Wasaensia 115 
192 Journal of Packaging Technology and Research (2022) 6:187–198 
1 3 
where n is the count of cells marker has in vertical/horizon-
tal direction and indexes of the cells start from top-left cor-
ner where index is (0,0) and bottom-right corner has index 
(n − 1, n − 1). When sensor data are occupying such a rec-
tangle, it does not occupy more space than 24 cells (1̃3%) 
of 189 data cells in the smallest possible Qr-Code (21×21) 
[17]. Some research needs to be done in the future to explore 
if this amount of sensor cells is enough for accurate color 
recognition. Theoretically, the sensor could occupy more 
area within QR-Code. However, this might affect the decode 
performance of the QR-Code. Especially performance could 
be lower in challenging environments if more errors occur 
in other parts of the QR-Code [26]. Using only one row for 
the sensor might affect sensor data decoding, especially in 
small QR-Codes. This is because the QR-Code generator 
algorithm optimizes the ratio between white and black cells, 
but might sometimes generate rows where black/white cells 
have the majority [17]. 
Functional ink inside the marker is straightforward to 
locate. When decoding a standard QR-Code, an algorithm 
finds coordinates of each cell. As the algorithm knows the 
location of each cell, it can also define the coordinates of the 
sensor area. In the proposed approach, standard QR-Code 
scanner applications can decode contents of the QR-Code, 
but cannot decode sensor area information. 
Decoding of the sensor value happens with a special 
algorithm. This algorithm calculates the difference between 
white and black cells’ color values within the sensor area. 
Then, the difference is compared to the maximum difference, 
x = 8 
y = n − 3 
w = n − 9 
h = 2, 
calculated in the same way but from the reference area. The 
color difference is calculated by CIEDE2000(KL, KC, KH) 
formula [31]. The performance of different parameters is 
compared later in the experiments section. Figure  5 shows 
the process chart of the algorithm. The sensor is considered 
active if its value is over 10% of the reference maximum 
difference. 
Figure 6 shows the reference area (REF) and sensor area 
(S1). These areas work as sources for the calculation of the 
white-black or sensor-black difference. The location of the 
reference area is two rows above the sensor area because of 
possible functional ink spreading or misplacement. Placing 
the reference area further from the sensor are might alter it 
to ambient lighting changes like shadows, different to ones 
that the sensor has. 
Even though in Fig. 4, it shows sensor in the QR-Code 
marker, the approach should apply to other markers with 
the error correction capability, like Datamatrix. As such, 
the addition of the sensor affects the general performance of 
the marker, discussed in the next chapter. Depending on how 
much data are recoverable from damaged markers (Datama-
trix or QR-Code), height of the sensor area can vary in dif-
ferent sized markers. In this research, sensor area is always 
two rows in height. 
Experiments 
This section focuses on the practical experiments with the 
proposed approach. The focus of the first experiment is to 
validate how sensor embedding impacts the decoding per-
formance of the marker. The focus of the second experiment 
is to test the calculation of the sensor value in simulated use-
cases. Finally, the third test experiments with three real-life 
markers using actual functional ink. 
Past research has shown that the size of the marker should 
be larger than 10×10mm and have at least 4 pixels (px) per 
cell [40, 53]. Based on this information, for the first two 
experiments, three different markers in size 20 mm × 20 
mm were printed with an office laser-jet (Canon ImageRun-
ner C5535i). Printing was done in 300dpi to standard office 
A4-paper (Canon Black Label Plus 80 g/m2). Fig. 5  Sensor data decoding process of the proposed approach 
Fig. 6  Reference area (REF) and sensor area (S1) extraction 
116 Acta Wasaensia 
193 Journal of Packaging Technology and Research (2022) 6:187–198 
1 3 
Markers had different data to represent different item-
level tracing options: 
(a) EAN/UPC-13 data [47], 13 digits, illustrated as 25×25 
sized QR-Code 
(b) EPC data, 26 digits [55], illustrated as 29×29 sized QR-
Code 
(c) UUID data [57], 36 digits, illustrated as 37×37 sized 
QR-Code. 
Using different sizes is to experiment with how the proposed 
approach performs when there is a different amount of pixels 
per cell. 
The third experiment uses real-life markers generated 
with the proposed approach. These markers were printed 
with the following functional inks: 
(1) LCR Hallcrest cold activated thermochromic ink, clear 
to green CMYK (22,7,17,0) at 7 ◦ C, 38 cm3/m2 , 1 
printed layer [25]. Printed using flexographic printing. 
2) Datamatrix with SICPA heat activated thermochromic 
ink, clear to magenta CMYK(0,100,0,0) at 26 ◦C, 38 
cm3/m2, 1 printed layer [48]. Printed using screen print-
ing and flexographic printing. 
3) Datamatrix with Sunlase TDS sulfuric acid reactive ink, 
clear to black CMYK(0,0,0,100), 38 cm3/m2 , 1 printed 
layer Printed using screen printing. 
The third experiment has also the difference that it uses 
Datamatrix codes where functional ink is embedded in the 
same way as in QR-Code used previously. 
QR‑Code Integrity Experiment 
The first test experimented with two versions of each marker 
(a) version where sensor area was transparent (sensor off), 
in Fig. 7 top row. And (b) version where sensor was totally 
black (sensor on), in Fig. 7 bottom row. 
Latter markers simulate a situation where the sensor 
destroys all white cells within the specified data region, and 
error correction recovers these data. This experiment looks 
to validate that adding a sensor does not prevent the decod-
ing of the marker. 
In this experiment, iPhone 7 and iPhone 11 Pro were 
used, with a standard camera application, to decode QR-
Code markers from different distances. Controlled environ-
ment, 25 cm × 20 cm × 40 cm sized white box, was used 
for the test. The test environment had a moving sledge for 
distance control and adjustable LED lights for ambient light-
ing control. Decoding distance was measured by moving the 
sledge to the furthest distance where decoding of the marker 
happens continuously. In this experiment, decoding distance 
was tested in two ambient light levels, 400lx (office lighting) 
(Table 1) & 15lx (dark environment) (Table 2). 
Sensor Value Calculation Experiment 
The second experiment focuses on testing if the algo-
rithm defined in the previous section can identify sen-
sor states when different sensor colors are used. Sec-
ond experiment uses same marker sizes and contents 
as the first experiment. However, this time with dif-
ferent sensor colors and their intensities. Four pure 
colors magenta ( C = 0.0, M = 1.0, Y = 0.0, K = 0.0 ), 
c y  a n  (  C = 1.0, M = 0.0, Y = 0.0, K = 0.0 ) ,  y  e l  -
low (  C = 0.0, M = 0.0, Y = 1.0, K = 0.0 )  and black 
( C = 0.0, M = 0.0, Y = 0.0, K = 1.0 ). And three mixed 
colors red ( C = 1.0, M = 1.0, Y = 0.0, K = 0.0 ), green 
( C = 1.0, M = 0.0, Y = 1.0, K = 0.0 )  a n d  p u r  p l e  
(C = 1.0, M = 1.0, Y = 0.0, K = 0.0) were used. 
For each color, three marker sizes were printed with dif-
ferent intensity (100%, 80%, 60%, 40%, and 20%) levels. 
Markers were captured, with iPhone 7, Nokia TA-1032, and 
iPhone 11 Pro to images from two distances 10 cm and 20 
cm, in the same ambient light levels as in experiment one. 
Aside from capturing markers in a controlled environment, 
images were captured in random contexts, including daily 
environments like fridges, office tables, and store shelves. 
Fig. 7  Markers for the first experiment 
Table 1  Experiment results of the 400lx experiment 
Size No sensor (cm) Sensor (cm) Differ-
ence 
(%) 
25 × 25 26.3 24.7 6 
29 × 29 24.7 23.7 4 
37 × 37 24.0 24.0 0 
Table 2  Experiment results of the 15lx experiment 
Size No sensor (cm) Sensor(cm) Difference (%) 
25 × 25 25.5 23.8 7% 
29 × 29 24.0 23.0 4% 
37 × 37 23.5 22.6 4% 
Acta Wasaensia 117 
194 Journal of Packaging Technology and Research (2022) 6:187–198 
1 3 
This totalled 1260 images for the second experiment, and 
examples of photos used in this experiment are shown in 
Fig. 8. 
In the second experiment, QR-Codes were decoded with 
a custom Python application [19]. This application used 
algorithm shown in Fig. 5 to recognize sensor state. In the 
experiment, three CIEDE formulas with different para-
metric values for KL, KC, and KH were used to calculate 
color difference. These formulas were CIEDE2000(1,1,1), 
CIEDE2000(2.76,1.58,1), and CIEDE2000(2,1,1). 
Tables 3, 4, and 5 show the results of the second experi-
ment. Results are separated into three tables depending on 
parametric values. The first column shows sensor color, and 
the following columns show the lowest and highest value of 
the sensor. These results can be used to estimate the usability 
and accuracy of the algorithm and its parameters. 
The third and final test experimented with smart tags 
printed with real functional inks. These smart tags were 
based on Datamatrix and photographed in a real-life envi-
ronment with iPhone 7, Nokia TA-1032, and iPhone 11 Pro. 
Fig. 8  Examples of photos used for the second experiment 
Table 3  Results from the 
second experiment, CIEDE2000 
(1,1,1) algorithm 
CMYK color 20% 40% 60% 80% 100% 
No sensor [0,00 ... 0,05] 
(0,0,0,1) [0,16 ... 0,27] [0,31 ... 0,60] [0,58 ... 0,87] [0,82 ... 0,95] [0,60 ... 1,00] 
(1,0,0,0) [0,07 ... 0,17] [0,17 ... 0,36] [0,28 ... 0,36] [0,37 ... 0,49] [0,47 ... 0,61] 
(0,1,0,0) [0,06 ... 0,24] [0,16 ... 0,31] [0,29 ... 0,46] [0,29 ... 0,61] [0,43 ... 0,80] 
(0,0,1,0) [0,04 ... 0,12] [0,08 ... 0,17] [0,08 ... 0,16] [0,12 ... 0,14] [0,14 ... 0,23] 
(1,1,0,0) [0,15 ... 0,28] [0,41 ... 0,51] [0,64 ... 0,69] [0,81 ... 0,88] [0,75 ... 0,95] 
(0,1,1,0) [0,08 ... 0,18] [0,17 ... 0,31] [0,22 ... 0,47] [0,35 ... 0,61] [0,44 ... 0,61] 
(1,0,1,0) [0,10 ... 0,25] [0,22 ... 0,31] [0,31 ... 0,55] [0,39 ... 0,42] [0,53 ... 0,75] 
Table 4  Results from the 
second experiment, CIEDE2000 
(2.76,1.58,1) algorithm 
CMYK color 20% 40% 60% 80% 100% 
No sensor [0,00 ... 0,05] 
(0,0,0,1) [0,16 ... 0,27] [0,31 ... 0,60] [0,58 ... 0,87] [0,82 ... 0,95] [0,60 ... 1,00] 
(1,0,0,0) [0,08 ... 0,16] [0,19 ... 0,36] [0,30 ... 0,37] [0,41 ... 0,50] [0,50 ... 0,63] 
(0,1,0,0) [0,10 ... 0,24] [0,23 ... 0,34] [0,36 ... 0,49] [0,36 ... 0,63] [0,48 ... 0,81] 
(0,0,1,0) [0,11 ... 0,15] [0,16 ... 0,21] [0,21 ... 0,39] [0,30 ... 0,33] [0,27 ... 0,32] 
(1,1,0,0) [0,19 ... 0,29] [0,28 ... 0,38] [0,65 ... 0,72] [0,82 ... 0,90] [0,75 ... 0,97] 
(0,1,1,0) [0,09 ... 0,18] [0,17 ... 0,31] [0,23 ... 0,47] [0,36 ... 0,61] [0,46 ... 0,63] 
(1,0,1,0) [0,14 ... 0,29] [0,28 ... 0,38] [0,36 ... 0,66] [0,45 ... 0,53] [0,56 ... 0,76] 
Table 5  Results from the 
second experiment, CIEDE2000 
(2,1,1) algorithm 
CMYK color 20% 40% 60% 90% 100% 
No sensor [0,00 ... 0,05] 
(0,0,0,1) [0,16 ... 0,27] [0,31 ... 0,60] [0,58 ... 0,87] [0,82 ... 0,95] [0,60 ... 1,00] 
(1,0,0,0) [0,07 ... 0,16] [0,18 ... 0,36] [0,30 ... 0,37] [0,39 ... 0,49] [0,48 ... 0,62] 
(0,1,0,0) [0,08 ... 0,24] [0,20 ... 0,32] [0,33 ... 0,48] [0,33 ... 0,62] [0,45 ... 0,80] 
(0,0,1,0) [0,08 ... 0,13] [0,13 ... 0,19] [0,15 ... 0,29] [0,23 ... 0,25] [0,22 ... 0,26] 
(1,1,0,0) [0,17 ... 0,29] [0,43 ... 0,52] [0,64 ... 0,70] [0,81 ... 0,89] [0,75 ... 0,95] 
(0,1,1,0) [0,09 ... 0,18] [0,18 ... 0,31] [0,23 ... 0,48] [0,36 ... 0,61] [0,46 ... 0,63] 
(1,0,1,0) [0,12 ... 0,27] [0,25 ... 0,34] [0,34 ... 0,60] [0,42 ... 0,47] [0,54 ... 0,75] 
118 Acta Wasaensia 
195 Journal of Packaging Technology and Research (2022) 6:187–198 
1 3 
First before exposing them to a reactive environment. After 
exposure, markers were photographed again. Fifty images 
were captured for each marker and sensor state, totalling 300 
images. Examples of photos are seen in Fig. 9. 
The same Python application as in experiment 2 was used 
to decode markers. Table 6 shows the results of the third 
experiment. Columns show different CIEDE200 parameters, 
and values range from decoding. On the rows are different 
marker states 
Results and Discussion 
Based on the results from the first experiment, adding a sen-
sor inside the marker reduces the decoding distance only 
slightly. Distance is decreased more in a dark environment, 
but the absolute change in the decoding distance is small. In 
everyday use, this does not have a significant impact. Mark-
ers with more data are affected less as the sensor occupies a 
relatively smaller area from the data area. 
In the second experiment, simulated sensors with dif-
ferent intensities [0%, 20%, 40%, 60%, 80%, and 100%] 
were decoded with the proposed approach. The second 
experiment included sensor value calculations with the 
CIEDE2000 algorithm. Three different parametric (KL, 
KC, and KH) values were tested, (1,1,1),(2.76,1.58,1) and 
(2,1,1). Results show that when sensor intensity is 40% or 
more, all parametric combinations can recognize the state 
of the sensor. When intensity is lower (20%), calculations 
are not working accurately, and incorrect states are some-
times recognized. Especially, algorithm has challenges 
with the yellow ink. With higher intensities, the algorithm 
gives values over 0.1. This can be considered as a thresh-
old value. When the sensor has a value over 0.1, color 
exists, and the sensor is ON. Values vary highly, especially 
with higher intensities. Therefore, the algorithm cannot be 
used for accurate color value identification. 
The third experiment showed that the proposed 
approach works well with actual functional inks. How-
ever, the algorithm provides sometimes lower values for 
the sensor as in experiment two. This might be from the 
label background color differences or calibration of the ink 
intensity. In the third experiment, all CIEDE2000 parame-
ter combinations provided values that ranged quite widely. 
In practice, this means that the approach is suitable for 
sensor state recognition. However, it fails when used to 
recognize the absolute value of the sensor. 
Past research has shown that it is possible to detect colors 
with smartphone cameras [6]. However, due to restrictions 
like camera accuracy, ambient light, print, and the paper 
quality, limited amount of color shades can be recognized 
[2, 22]. In the past, multiple approaches for color identi-
fication in barcodes have been proposed, for example, by 
John and Raahemifar [20], Bagherinia and Manduchi [2]. 
In these approaches, different colors represent different data. 
The same restrictions apply when functional ink operates as 
a sensor, and a limited amount of shades can be recognized. 
The most reliable way to use functional ink is to use high 
enough color intensity. It is challenging to recognize the 
actual shade of the functional ink. However, recognition of 
functional inks states, ON or OFF, is quite straightforward. 
The proposed approach cannot recognize shades accurately, 
as there is variation in results. One possible approach to 
overcome this problem might be calibration. In traditional 
calibration, color charts are used for camera calibration [2]. 
However, in the case of smart tags, calibration has a small 
amount of data in its use. These data include only labels 
background color (paper color), black cells of the marker, 
and possible color reference points. Together with calibra-
tion, reference points can be used. With reference points, it 
is possible to compare the state of functional ink into speci-
fied reference values [6]. However, if reference points are 
used, they must be printed with color as close as possible to 
functional ink’s value in a specific state. 
Actual color shade recognition with functional ink is a 
challenging topic as the shade is depending on multiple fac-
tors. Ink intensity is the most important factor. However, the 
direction of state change that occurred last has its impact. In 
other words, color is slightly different if it has been reached 
during a negative or positive change. Also, change between 
states has different speeds in different environments and 
colors. In some cases, when changing from color to trans-
parent, some color can remain [24]. 
Conclusions 
To support the data-driven applications in the FMCG sector, 
more data need to be collected. Without this data collec-
tion, it is not possible to implement optimal decision-making 
Fig. 9  Examples of photos used for the third experiment 
Table 6  Results from the third experiment 
Marker (1,1,1) (2.76,1.58,1) (2,1,1) 
Sensor off [0,01 ... 0,03] [0,01 ... 0,03] [0,01 ... 0,03] 
100% black [0,65 ... 0,72] [0,65 ... 0,72] [0,65 ... 0,72] 
100% magenta [0,36 ... 0,39] [0,52 ... 0,54] [0,45 ... 0,46] 
40% green [0,12 ... 0,31] [0,15 ... 0,31] [0,13 ... 0,31] 
Acta Wasaensia 119 
196 Journal of Packaging Technology and Research (2022) 6:187–198 
1 3 
processes. Currently, the obstacle is not on cloud services or 
data processing. But on how we collect data from low-cost 
items and track & trace them during their life-cycle. One 
approach is to integrate low-cost smart tags with functional 
ink into items. Functional ink would be able to react to envi-
ronmental changes and work as a sensor. 
Implementation of such markers can be done in various 
ways, and this paper presented a low-cost way to embed 
functional ink inside a standard marker. With this approach, 
the marker has two features: a unique identifier and sensor. 
Unique identifiers are used to track the item, and the sensor 
can react to environmental changes. Different color and ink 
combinations can react to different environmental variables 
like temperature, time, and humidity. Information from the 
sensor can be decoded by observing the color changes. One 
approach to recognize the color and state of the sensor is to 
use a simple color comparison. CIEDE2000 algorithm with 
(2.76,1.58,1) parametric values fits for this purpose, and it 
works reliably when the sensor’s color intensity is 40% or 
higher. On lower intensities, proposed approach sometimes 
provides incorrect results. 
Yet, more work is needed in the field of color recognition. 
If color can be recognized accurately, it would allow more 
use-cases around functional inks and make functional inks 
usable for use in challenging environments. 
Future Work 
More research is needed on the computer vision side, espe-
cially on how to decode sensor color and its value better 
accuracy. Also wrapping and challenging conditions might 
impact proposed approach. Some options, which might 
achieve better results, could be advanced mathematical 
calculations, usage of reference colors, or using calibration 
methods. Usage of machine learning (ML) or artificial (AI) 
intelligence could also be one option. However, this might 
make the solution more complex, unless ML/AI solution 
can be run in a decoding device, rather than in the cloud. 
When ML/AI is considered, it could be possible to build 
a rich database containing information about data to color 
conversion or artificial intelligence models can be trained 
and used [33]. 
Funding Open access funding provided by Abo Akademi University 
(ABO). This work was supported by Finnish Cultural Foundation’s 
Central Ostrobothnia Regional Fund (Grant Number 25211242). 
Declarations 
Competing interests The authors have not disclosed any competing 
interests. 
Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long 
as you give appropriate credit to the original author(s) and the source, 
provide a link to the Creative Commons licence, and indicate if changes 
were made. The images or other third party material in this article are 
included in the article's Creative Commons licence, unless indicated 
otherwise in a credit line to the material. If material is not included in 
the article's Creative Commons licence and your intended use is not 
permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. To view a 
copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. 
References 
1. Abad E, Palacio F, Nuin M, De Zarate AG, Juarros A, Gómez 
JM, Marco S (2009) Rfid smart tag for traceability and cold chain 
monitoring of foods: demonstration in an intercontinental fresh 
fish logistic chain. J Food Eng 93(4):394–399 
2. Bagherinia H, Manduchi R (2011) A theory of color barcodes. In: 
2011 IEEE International Conference on Computer Vision Work-
shops (ICCV Workshops), pp 806–813. https://doi.org/10.1109/ 
ICCVW.2011.6130335 
3. Bilgin M, Backhaus J (2017) Intelligent codes by controlled 
responsiveness to external stimuli. In: Printing future days 2017 
7th international scientific conference on print and media technol-
ogy, pp 85–90 
4. Bulan O, Monga V, Sharma G (2009) High capacity color bar-
codes using dot orientation and color separability. In: Delp EJ III, 
Dittmann J, Memon ND, Wong PW (eds) Media forensics and 
security, vol 7254. International Society for Optics and Photonics, 
SPIE, Bellingham, pp 397–403 
5. Bulan O, Sharma G (2011) High capacity color barcodes: per 
channel data encoding via orientation modulation in elliptical 
dot arrays. IEEE Trans Image Process 20(5):1337–1350 
6. Chen Y, Fu G, Zilberman Y, Ruan W, Ameri SK, Zhang YS, 
Miller E, Sonkusale SR (2017) Low cost smart phone diagnos-
tics for food using paper-based colorimetric sensor arrays. Food 
Control 82:227–232 
7. Farizal, Abhirama P (2019) Feasibility analysis of implementing 
logistic information system with internet of things technology 
on a fmcg company. In: 2019 IEEE 6th international conference 
on engineering technologies and applied sciences (ICETAS), pp 
1–6. https://doi.org/10.1109/ICETAS48360.2019.9117563 
8. Fuertes G, Soto I, Carrasco R, Vargas M, Sabattin J, Lagos C 
(2016) Intelligent packaging systems: sensors and nanosensors 
to monitor food quality and safety. J Sens 2016:4046061 
9. Gao T, Tian Y, Zhu Z, Sun D (2020) Modelling, responses and 
applications of time-temperature indicators (TTIs) in monitor-
ing fresh food quality. Trends Food Sci Technol 99:311–322 
10. Ghinea R, Herrera L, Ionescu A, Pomares H, Pulgar R, Para-
vina R et al (2011) Dental ceramics: a ciede2000 acceptability 
thresholds for lightness, chroma and hue differences. J Dent 
39:e37-44 
11. Ghinea R, Pérez MM, Herrera LJ, Rivas MJ, Yebra A, Paravina 
RD (2010) Color difference thresholds in dental ceramics. J Dent 
38:e57–e64 
12. Gligoric N, Krco S, Hakola L, Vehmas K, De S, Moessner K, 
Jansson K, Polenz I, van Kranenburg R (2019) Smarttags: Iot 
product passport for circular economy based on printed sensors 
and unique item-level identifiers. Sensors 19(3):586 
13. Grillo A, Lentini A, Querini M, Italiano GF (2010) High capac-
ity colored two dimensional codes. In: Proceedings of the 
120 Acta Wasaensia 
197 Journal of Packaging Technology and Research (2022) 6:187–198 
1 3 
international multiconference on computer science and informa-
tion technology, pp 709–716. https://doi.org/10.1109/IMCSIT. 
2010.5679869 
14. Hakola L, Vehmas K (2018) Functional ink formulation for indi-
vidualized smart tags. In: NIP & digital fabrication conference, 
number 1. society for imaging science and technology, pp 211– 
214. https://doi.org/10.2352/ISSN.2169-4451.2018.34.211 
15. Harvey J (2007) Mechanical engineers’ handbook: materials 
and mechanical design, volume 1, vol 1. Wiley, Hoboken, pp 
1423–1436 
16. International Organization for Standardization (2011) Automatic 
identification and data capture techniques—bar code symbol print 
quality test specification - two-dimensional symbols. Standard, 
International Organization for Standardization, Geneva 
17. International Organization for Standardization (2015) Automatic 
identification and data capture techniques—qr-code bar code 
symbology specification. Standard, International Organization 
for Standardization, Geneva 
18. International Organization for Standardization (2018) Information 
technology—international symbology specification—data matrix. 
Technical report 
19. Isohanni J (2021) Qr-code sensor decoder. https://github.com/jarii 
sohanni/QR-code-sensor-decoder. Accessed 5 Aug 2021 
20. John RA, Raahemifar K (2015) Designing a 2d color barcode. 
In 2015 IEEE 28th Canadian Conference on electrical and com-
puter engineering (CCECE), pp 297–301. https://doi.org/10.1109/ 
CCECE.2015.7129203 
21. Kalpana S, Priyadarshini SR, Leena MM, Moses JA, Anandhara-
makrishnan C (2019) Intelligent packaging: trends and applica-
tions in food systems. Trends Food Sci Technol 93:145–157 
22. Kim M, Song K, Kang M (2018) No-reference contrast measure-
ment for color images based on visual stimulus. IEEE Access 6:1 
23. Kirwan M (2012) Handbook of paper and paperboard packaging 
technology, 2nd edn. Wiley, Hoboken 
24. Kulčar R, Friškovec M, Knešaurek N, Sušin B, Klanjšek Gunde 
M (2009) Colour changes of uv-curable thermochromic inks. In: 
Proceedings of the 36th international research conference of iari-
gai advanced in printing and media technology, pp 429–434 
25. Hallcrest LCR (2021) Thermochromic free flowing powder techni-
cal data. Whitepaper, LCR Hallcrest 
26. Li L, Qiu J, Lu J, Chang C (2016) An aesthetic qr-code solution 
based on error correction mechanism. In Journal of Systems and 
Software. 116:85–94. https://doi.org/10.1016/j.jss.2015.07.009  
27. Li Z, Chen W (2014) D&S Barcode: A Dynamic and Sensitive 
Barcode for Intelligent Environment Monitoring. In: 2014 Interna-
tional Conference on Intelligent Environments, pp 47–51. https:// 
doi.org/10.1109/IE.2014.14 
28. Lindqvist U, Eiroma K, Hakola L, Jussila S, Kaljunen T, Moilanen 
P, Rusko E, Siivonen T, Välkkynen P (2008) Technical innova-
tions and business from printed functionality. Number 2436 in 
VTT Tiedotteita—Meddelanden—Research Notes. VTT Techni-
cal Research Centre of Finland, Finland. Project code: 4312 
29. Liu H, Huang M, Liu Y, Wu B, Xu Y (2012) Color difference 
evaluation and calculation for digital and printed images. In: NIP
& Digital Fabrication Conference 2012(1):140–143 
30. López A, Guzmán GA, Di Sarli AR (2016) Color stability in mor-
tars and concretes. Part 1: study on architectural mortars. Constr 
Build Mater 120:617–622 
31. Luo MR, Cui G, Rigg B (2001) The development of the cie 
2000 colour-difference formula: Ciede 2000. Color Res Appl 
26(5):340–350 
32. Mangine H, Jakes K, Noel C (2005) A preliminary comparison of 
cie color differences to textile color acceptability using average 
observers. Color Res Appl 30(4):288–294 
33. Mercan ÖB, Kılıç V, Şen M (2021) Machine learning-based color-
imetric determination of glucose in artificial saliva with different 
reagents using a smartphone coupled ˜PAD. Sens Actuators B 
Chem 329:129037 
34. Mohebi E, Marquez L (2015) Intelligent packaging in meat 
industry: an overview of existing solutions. J Food Sci Technol 
52(7):3947–3964 
35. Nguyen C, Vo V, Cong Ha N (2022) Developing a computer vision 
system for real-time color measurement—A case study with color 
characterization of roasted rice. J Food Eng 316:110821 
36. Parikh D, Jancke G (2008) Localization and segmentation of a 2d 
high capacity color barcode. In: 2008 IEEE workshop on applica-
tions of computer vision, pp 1–6. https://doi.org/10.1109/WACV. 
2008.4544033 
37. Pecho OE, Ghinea R, Alessandretti R, Pérez M, Bona AD (2016) 
Visual and instrumental shade matching using CIELAB and 
CIEDE2000 color difference formulas. Dent Mater 32(1):82–92 
38. Plimmer J (2013) Augmenting and securing the consumer brand 
experience through smart and intelligent packaging for food, bev-
erages and other fast-moving consumer goods, In: Trends in pack-
aging of food, beverages and other fast-moving consumer goods 
(FMCG), pp 35–57. https://doi.org/10.1533/9780857098979.35 
39. Poyatos-Racionero E, Ros-Lis JV, Vivancos J, Martínez-Máñez R 
(2018) Recent advances on intelligent packaging as tools to reduce 
food waste. J Clean Prod 172:3398–3409 
40. Qian J, Du X, Zhang B, Fan B, Yang X (2017) Optimization of 
QR-Code readability in movement state using response surface 
methodology for implementing continuous chain traceability. 
Comput Electron Agric 139:56–64 
41. Querini M, Italiano G (2014) Reliability and data density in high 
capacity color barcodes. Comput Sci Inf Syst 11:1595–1615 
42. Ramalho J, Correia S, Fu L, Dias L, Adão P, Mateus P, Ferreira R, 
André PS (2020) Super modules-based active qr-codes for smart 
trackability and iot: a responsive-banknotes case study. npj Flex 
Electron 4(1):1–9 
43. Realini CE, Marcos B (2014) Active and intelligent packaging 
systems for a modern society. Meat Sci 98(3):404–419 
44. Robertson G (2016) Food packaging: principles and practice, 3rd 
edn. CRC Press, Boca Roton 
45. Salmerón JF, Rivadeneyra A, Agudo-Acemel M, Capitán-Vallvey 
LF, Banqueri J, Carvajal MA, Palma AJ (2014) Printed single-
chip uhf passive radio frequency identification tags with sensing 
capability. Sens Actuators A Phys 220:281–289 
46. Schmid P, Welle F (2020) Chemical migration from beverage 
packaging materials—a review. Beverages 6(2):37–55 
47. Schmidt L, Mitton N, Simplot-Ryl D (2009) Towards unified tag 
data translation for the internet of things. In: 2009 1st Interna-
tional Conference on wireless communication, vehicular technol-
ogy, information theory and aerospace electronic systems tech-
nology, pp 332–335. https://doi.org/10.1109/WIRELESSVITAE. 
2009.5172469 
48. SICPA Securink Corp (2016) Thermochromic inks. Whitepaper, 
SICPA Securink Corp 
49. Stanislav B, Igor M, Kristijan G (2015) Packaging printing today. In: 
Faculty of Graphic Arts, University of Zagreb, Croatia Packaging 
Printing Today, acta graphical, vol 26(4). pp 27–33, SSN: 1848-3828 
50. Subpratatsavee P, Kuacharoen P (2012) An implementation of a 
high capacity 2D barcode. In: Papasratorn B, Charoenkitkarn N, 
Lavangnananda K, Chutimaskul W, Vanijja V (eds) Advances in 
information technology. Springer Berlin Heidelberg, Berlin, pp 
159–169 
51. TagItSmart (2017) Initial enablers for smarttags. Technical report 
52. Tao M, Ma X, Huang X, Liu C, Deng R, Liang K, Qi L (2020) 
Smartphone-based detection of leaf color levels in rice plants. 
Comput Electron Agric 173:105431 
53. Tarjan L, Šenk I, Tegeltija S, Stankovski S, Ostojic S (2014) A 
readability analysis for QR-Code application in a traceability sys-
tem. Comput Electron Agric 109:1–11 
Acta Wasaensia 121 
198 Journal of Packaging Technology and Research (2022) 6:187–198 
1 3 
54. Taveerad N, Vongpradhip S (2015) Development of color qr-code 
for increasing capacity. In: 2015 11th International Conference 
on signal-image technology internet-based systems (SITIS), pp 
645–648. https://doi.org/10.1109/SITIS.2015.42 
55. Thiesse F, Michahelles F (2006) An overview of epc technology. 
Sens Rev 26(2):101–105 
56. Tiwari S (2016) An Introduction to QR-Code Technology. 2016 
International Conference on Information Technology (ICIT) 
322(10):39–44 
57. Triebel D, Reichert W, Bosert S, Feulner M, Okach DO, Slimani 
A, Rambold G (2018) A generic workflow for effective sampling 
of environmental vouchers with UUID assignment and image pro-
cessing. Database J Biol Databases Curation 2018 
58. Đurđević S, Novaković D, Zeljković Ž, Avramović D (2016) 
Using augmented reality technology for controlling state of smart 
packaging products. Preliminary Report, pp 427–437 
59. Wasule S, Metkar S (2017) Improvement in two-dimensional bar-
code. Sādhanā 42(7):1025–1035 
60. Yam KL (2012) 8—Intelligent packaging to enhance food safety 
and quality. In: Yam KL, Lee DS (eds) Emerging food packag-
ing technologies. Woodhead Publishing Series in Food Science, 
Technology and Nutrition. Woodhead Publishing, Sawston, pp 
137–152 
61. Zabala S, Castán J, Martínez C (2015) Development of a time-
temperature indicator (TTI) label by rotary printing technologies. 
Food Control 50:57–64 
62. Zhu D, Beeby SP, Tudor MJ, Harris NR (2011) A credit card 
sized self powered smart sensor node. Sens Actuators A Phys 
169(2):317–325 
Publisher's Note Springer Nature remains neutral with regard to 
jurisdictional claims in published maps and institutional affiliations. 
122 Acta Wasaensia 
Advances in Computational Intelligence (2024) 4:6 
https://doi.org/10.1007/s43674-024-00073-7 
ORIG INAL  ART ICLE  
Recognising small colour changes with unsupervised learning, 
comparison of methods 
Jari Isohanni1 
Received: 11 April 2023 / Revised: 16 March 2024 / Accepted: 19 March 2024 / Published online: 16 April 2024 
© The Author(s) 2024 
Abstract 
Colour differentiation is crucial in machine learning and computer vision. It is often used when identifying items and objects 
based on distinct colours. While common colours like blue, red, green, and yellow are easily distinguishable, some applications 
require recognising subtle colour variations. Such demands arise in sectors like agriculture, printing, healthcare, and packaging. 
This research employs prevalent unsupervised learning techniques to detect printed colours on paper, focusing on CMYK 
ink (saturation) levels necessary for recognition against a white background. The aim is to assess whether unsupervised 
clustering can identify colours within QR-Codes. One use-case for this research is usage of functional inks, ones that change 
colour based on environmental factors. Within QR-Codes they serve as low-cost IoT sensors. Results of this research indicate 
that K-means, C-means, Gaussian Mixture Model (GMM), Hierarchical clustering, and Spectral clustering perform well in 
recognising colour differences when CMYK saturation is 20% or higher in at least one channel. K-means stands out when 
saturation drops below 10%, although its accuracy diminishes significantly, especially for yellow or magenta channels. A 
saturation of at least 10% in one CMYK channel is needed for reliable colour detection using unsupervised learning. To 
handle ink densities below 5%, further research or alternative unsupervised methods may be necessary. 
Keyword Machine vision, Colour difference, Printed colours, Unsupervised learning 
1 Introduction 
The human visual system (HVS) can separate colours, even 
under challenging ambient light conditions. A healthy human 
eye can recognise approximately 100 colour shades in each 
of its three different types of cone cells. In total HSV can 
roughly recognise around one million different colours Hurl-
bert and Ling (2012). 
Computer vision (CV) research develops solutions that 
are as accurate as or more accurate than HSV. Colour imag-
ing research, as part of computer vision research, focuses 
on colour inspection, sorting, detection, and matching. This 
research addresses colour inspection/matching, particularly 
in colour differentiation. 
All colours we see are combinations of hue, saturation, and 
brightness values. In digital imaging, light reflected from an 
object is captured as a digital representation using a digital 
 Jari Isohanni 
x2603813@student.uwasa.fi ; jari.isohanni@gmail.com 
1 Digital Economy, University of Vaasa, Wolffintie 34, 65200 
Vaasa, Finland 
camera. The entire process is complex and involves electron-
ics, signal processing, and algorithms. This process is unique 
to each device and depends on the imaging conditions. There-
fore, the resulting digital images always vary slightly, more in 
non-controlled environments and less in controlled environ-
ments. The same colour can appear as many different digital 
representations, or the same digital presentation can appear 
as two separate colours. 
Computer vision research has identified many solutions 
for colour differentiation. These approaches are based on 
mathematical algorithms or artificial intelligence (AI). Math-
ematical algorithms work well in conditions where colours 
are clearly different, their location is known, and the data is 
high quality (Isohanni 2022). Artificial intelligence (super-
vised and unsupervised) is primarily used to recognise 
objects of different colours and patterns. 
However, the current solutions face challenges when deal-
ing with small colour differences in unknown locations. 
Previous studies have used unsupervised learning to seg-
ment/cluster colours. Although the reported performance of 
algorithms has improved, past research has found out that 
unsupervised colour segmentation doesn’t work in all use-
123 
Acta Wasaensia 123 
6 Page 2 of 13 Advances in Computational Intelligence (2024) 4 :6 
cases (Xu et al. 2018). Improving the quality of the clustering 
process or finding the best clustering method can have an 
impact for example on healthcare (Vishnuvarthanan et al. 
2016), smart city (Mao and Li 2019) and agriculture (Abdalla 
et al. 2019) applications. Challenges unsupervised methods 
face are a) some algorithms require the setting of clusters 
before running the algorithm, and b) some algorithms are 
sensitive to the initial cluster centre guess and might stick in 
the local optima during the process (Abdalla et al. 2019). 
In this study, the recognition of small colour differences 
with unsupervised learning was investigated by using printed 
inks. One direct use-case of this research is the colour 
recognition of functional inks. The development of novel 
printing methods and inks has enabled the labelling and 
packaging industry to create innovative labelling methods. 
Some of these innovations use functional inks. Functional 
inks change their colour depending on environmental values 
and it is important to detect this change reliably (Isohanni 
2022). This research focuses on recognising colour differ-
ences using unsupervised clustering methods and compares 
different methods, their accuracy, and running time. Contri-
butions of the research are: 
• Comparison of unsupervised learning methods in colour 
difference recognition 
• Approach to detect small colour changes in printed 
colours with unsupervised learning 
This research is structured as follows: Sect. 2. contains rel-
evant previous research done in the past. Section 3. defines 
the methods and materials used in this study. The results of 
this study are presented in Sect. 4. Finally, Sect. 5. discusses 
conclusions and future research needs. 
2 Related work 
Colour recognition has its role in object detection, object 
recognition, image segmentation and many other applica-
tions. Most research done around colour recognition focuses 
on high-level use cases, for example, colour recognition is 
used in animal/plant recognition (Koubaroulis et al. 2002; 
Jhawar 2016), in dental applications (Bretzner et al. 2002; 
Bar-Haim et al. 2009; Riri et al. 2016; Kang and Ji 2010), in 
face / skin recognition (Yang et al. 2010), in robotics (Rabie 
2017; Bazeille et al. 2012) and in intelligent traffic (Gao et al. 
2006; Gong et al. 2010; De la Escalera et al. 2003; Zhu and 
Liu 2006). However there many other use-cases where colour 
recognition is useful. 
Colour recognition can be done by mathematical algo-
rithms, which is the most dominant approach, but during the 
last decade, artificial intelligence has been applied success-
fully in many use-cases. In the artificial intelligence context, 
both supervised and unsupervised learning have been proved 
as possible approaches. Unsupervised learning is usually 
used when the clustering of colours is done, or when domi-
nant colours are looked at from the source image (Du et al. 
2004; Kuo et al. 2005; Bo et al.  2013; Basar et al. 2020). 
Supervised learning has been found more suitable in higher 
level use-cases like object colour recognition (Zhang et al. 
2019; Aarathi and Abraham 2017; Feng et al. 2019). 
As seen from Table 1, most relevant past research around 
unsupervised learning has focused on agriculture and health-
care use-case. 
Banic et al. used unsupervised colour clustering for 
image colour calibration, they researched a custom cluster-
ing approach and finally achieved results where the median 
angular error was almost always below 2° (Banic and Lon-
caric 2018). 
Gerke and Xiao studied the usage of two classification 
strategies a supervised method (Random Trees) and unsu-
pervised approach. They also used graph-cuts for energy 
optimization. Their results achieved 97.74% accuracy in the 
context of recognition of urban objects. however, methods 
had challenges with shadows (Gerke and Xiao 2014). 
Dresp-Langley and Wandeto used in their research quanti-
zation error from Self-Organising Map (SOM). With using of 
the quantization error their purpose was to recognise increase 
of amounts of red or green pixels (Dresp and Wandeto 2020). 
Results of their research were good but only colour amounts 
where used, not intensity. 
Yavuz and Köse achieved good results in colour clus-
tering in the blood vessel extraction use-case. Even with 
small colour differences. Authors used combination of K-
means and Fuzzy C-means. They also used postprocessing to 
remove falsely segmented isolated regions (Yavuz and Köse 
2017). 
Abdallaa et al. used subsequent combination of various 
unsupervised learning methods (GMM, SOM, FCM and K-
Means). They proposed methods to overcome illumination 
and weather condition challenges in segmentation of infield 
oilseed rape images. They achieved segmentation accuracy 
of 96% even in challenging conditions (Abdalla et al. 2019). 
Basar et al. developed a novel approach to overcome chal-
lenges in the initialisation of the clustering algorithm. Their 
research focuses on the challenge of defining number of clus-
ters and the initial central points of clusters. Their results 
improved segmentation quality and reduced the classifica-
tion error (Basar et al. 2020). 
Wang et al. achieved good results, even with small colour 
contrast changes, when they used first-order colour moments, 
second-order colour moments, and colour histogram peaks. 
Their objective was to extract feature vectors from the image. 
And to realise data dimension reduction. Use-case was to 
classify solid wood panels with K-means Wang et al. (2021). 
123 
124 Acta Wasaensia 
Advances in Computational Intelligence (2024) 4 :6 Page 3 of 13 6 
Table 1 Related past research 
Study title Authors Colour spaces Method 
Unsupervised learning for colour constancy (Banic 
and Loncaric 2018) 
Banic,Koscevic, 
Loncaric 
RGB colour Tiger 
Fusion of airborne laser scanning point clouds and 
images for supervised and unsupervised scene classi-
fication (Gerke and Xiao 2014) 
Gerke, Xiao RGB Markov random field 
formulation 
Unsupervised classification of cell imaging data using 
the quantization error in a self-organising map (Dresp 
and Wandeto 2020) 
Dresp–Langley, 
Wandeto 
RGB Self-organising map 
Blood vessel extraction in colour retinal fundus 
images with enhancement filtering and unsupervised 
classification (Yavuz and Köse 2017) 
Yavuz, Köse RGB K-means, Fuzzy C-means 
Infield oilseed rape images segmentation via 
improved unsupervised learning models combined 
with supreme colour features (Abdalla et al. 2019) 
Abdalla, Cen, 
El-manawy, He 
Multiple Gaussian mixture model 
(GMM), self-organising map 
(SOM), fuzzy c-mean(FCM), 
and k-means algorithms 
Unsupervised colour image segmentation: a case of 
RGB histogram based K-means clustering initialisa-
tion (Basar et al. 2020) 
Basar, Ali, 
Ochoa-Ruiz, 
Zareei, 
Waheed, Adnan 
RGB K-Means 
Colour classification and texture recognition system 
of solid wood panels (Wang et al. 2021) 
Wang, Zhuang, 
Liu, Ding, Tang 
RGB, HSV, LAB K-means 
Related work shows that unsupervised learning can be 
used to classify colours. However there aren’t many studies 
which have focused on recognition of small colour dif-
ferences, especially in the printed colours. Some studies 
have also clearly pointed out that unsupervised learning 
approaches have challenges when foreground and back-
ground objects have only slight colour difference. 
3 Materials and methods 
The dataset analysed in this research is available in the Zen-
odo repository (Isohanni 2023). This study used 25 different 
modified QR-Codes as the original dataset. QR-Codes had 
three colour zones in them: black, white, colour. (Fig. 1). 
The black zone (1.) was printed with pure black (100K / 
CMYK(1.0,1.0,1.0,1.0)), the white zone (2.) had no colour 
(paper white, 0K / CMYK(0.0,0.0,0.0,0.0)) and the third 
(3.) zone was printed with some colour. All zones had 
equal size. In the example Fig. 1. colour area with (20 M / 
CMYK(0.0,0.2,0.0,0.0)) is presented, this means that colour 
has 20% saturation in magenta channel. Dataset was cre-
ated by printing colour areas with different ink saturation 
20%, 40%, 60%, 80% and 100% and with different colours. 
Example of a QR-codes with colour saturation 20–100% in 
magenta (M) channel are illustrated in the following Fig. 2. 
Other colours or colour combinations used were, C (cyan), 
Y (yellow), K (black) and combination CY (green). Fur-
ther experiments were also made with ink saturation’s 10% 
Fig. 1 QR-code sample 
and 5% with unsupervised learning methods that performed 
best in the first experiments. QR-Codes were printed in 
size 20 mm × 20 mm. Printed used in this research was an 
standard office laser-jet (Canon ImageRunner C5535i). And 
paper that was used was a standard office A4-paper (Canon 
Black Label Plus 80 g/m2). Printing was done in 300dpi. 
All QR-Codes were captured into a image dataset which 
contained 25–30 images per QR-Code. Different environ-
ments were used to capture QR-codes to images, some of the 
images were captured in normal office ambient light level 
around 500 lux and colour temperature of around 5000k. 
123 
Acta Wasaensia 125 
6 Page 4 of 13 Advances in Computational Intelligence (2024) 4 :6 
Fig. 2 Different intensity samples 
Fig. 3 Samples from different environments 
This higher ambient light environment results more separable 
colours, but results to more small details (noise) in images. 
Some of the images were captured in home environment with 
around 250 lux and 3000 K colour temperature. In darker 
environment digital cameras might not be able to capture 
colour information properly (Zamir et al. 2021). Example 
of differences in these two environments are shown in the 
following Fig. 3. 
Image a) is captured in higher ambient light. This results 
in more details and clear transforms between colours. Image 
b) is from lower and warmer ambient light, in these images 
it can also be seen that camera focus makes image blurry 
and colours not as clearly separable, however, image has less 
noise. 
All images were taken with iPhone 11 Pro by using stan-
dard camera application from around 30 cm distance and 
stored as JPG’s in RGB format with 8 bits / channel. Captured 
images were resized to 1200×1600 resolution in Photoshop 
before processing them, no other processing was done. Even-
tually after resize QR-code occupied around 400×400 px 
area from the image, and each of the zone was roughly 50×50 
px. 
3.1 Process as whole 
The process that was used in this research is shown in the fol-
lowing Fig. 4. Process starts from the RGB JPEG-image, and 
results into CIELAB values of three (white, black, colour) 
Fig. 4 Flowchart of the process used 
cluster centres (Fig. 5) and Delta-E between white and colour 
cluster. Delta-E is a measurement which ranges between 0 
and 100, it quantifies the difference between two colours, 
and can be used to determine if two colours are different 
(Luo et al. 2001). Formula to calculate Delta-E is 
E00 
= 
  
L  
kL · SL 
2 
+ 
 
C  
kC · SC 
2 
+ 
 
H  
kH · SH 
2 
+ R · C
 
kC · SC · 
H  
kH · SH , 
where 
E00 : Delta-E 2000 colour difference 
L  : Difference in lightness 
C  : Difference in chroma 
H  : Difference in hue 
kL , kC , kH : Weighting factors 
SL , SC , SH : Adjustment factors based on standard 
deviations 
R : Rotation function 
123 
126 Acta Wasaensia 
Advances in Computational Intelligence (2024) 4 :6 Page 5 of 13 6 
Fig. 5 Extraction of analysis area 
The CIELAB colour system represents quantitative rela-
tionship of colours on three axes, lightness (L), and chro-
maticity coordinates (a,b). 
After reading the input image, auto-levelling of colours 
was performed. Auto-levelling uses following equation: 
I  = I − Imin 
Imax − Imin × 255, 
where Imax is the most bright white value in the image and 
Imin is the darkest black value in the image. Auto-level does 
a histogram equalisation to achieve more uniform distribu-
tion of values in range [0, 255] in all (R,G,B) channels (Kao 
et al. 2006). For the value 0 the mean value of black area 
(Fig. 1 area 1.) of the image was used and for the value 255 
mean value of the white area (Fig. 1 area 2.) was used. Other 
values of image are then stretched, and as seen in Fig. 5 this 
makes difference between colours in the image clearer. His-
togram equalisation improves the contrast of an image, but 
might lead into over enhancement of the image. Some other 
image enhancement methods could be also used, but this 
is out of scope of this research and discussed in the con-
clusions. Result of auto-levelling is shown in Fig. 5. were  
leftmost image is original image, then one on the right is 
image after auto-levelling. Three squares that are seen at the 
right-bottom of the image are extracted colour areas after 
auto-levelling. Data from these areas is grouped into one 
dataframe. In the dataframe CIELAB colour format is used, 
so each row of the frame contained one pixels L, A and B 
values. 
Dataframe was then processed by the unsupervised learn-
ing methods. One visual example of the dataframe is pre-
sented in the Fig. 6, dataframe LAB colour values are in this 
figure plotted as density chart. Density of points is shown in 
different colours, dark blue being less dense and red having 
highest dense. Circles in the figure are added for illustrative 
purposes, they show different colour areas (red = black, green 
= white, yellow = CMYK(0.0, 0.6, 0.0, 0.0)). This is also the 
result unsupervised learning is expected to achieve. Clusters 
red and green should stay in quite same location in different 
images, but yellow cluster moves to different locations in 3D 
plane. Also, from this figure it can be seen that there is lot 
of noise present. Noise comes from the digital image and 
imagining environment. 
Clustering of colours was done with different approaches: 
• Centroid-based algorithms organise the data in non-
hierarchical clusters. These algorithms use distance mea-
suring, like Euclidean, between points to determine if 
points belong to the same cluster. Usually, centroid-based 
algorithms run iterations and update cluster centres in 
each iteration. Centroid-based algorithms are efficient 
and fast. However, they are sensitive to initial cluster cen-
tres and outliers Gonzalez (1985); Hartigan and Wong 
(1979). 
• Connectivity-driven clustering, often referred to as hier-
archical clustering, operates under the assumption that 
points tend to have stronger connections with nearby 
points compared to those that are farther apart. Algo-
rithms based on connectivity use the distances between 
points to create clusters. The goal is to minimise the 
maximum distance required to link these points together 
Reddy (2021). 
• Density-based clustering definesclusters as dense regions 
of space. Between dense regions, there are regions where 
data density is lower. Low-density region can also be 
empty. Density-based clustering algorithms are good at 
finding arbitrarily shaped clusters, but they have diffi-
culty when it comes to varying densities and if data has 
high dimensions (Kriegel et al. 2011). 
• Distribution-based clustering assumes that the data has 
a specified number of distributions. Each of these distri-
butions has it’s own mean and variance (or covariance). 
Distribution can be based on Gaussian distribution for 
example. Points probability to belong to the distribu-
tion decreases when it’s distance from distribution centre 
increases Xu et al. (1998). 
In this research K-Mean, Fuzzy C-Mean, DBSCAN, 
MeanShift, Hierarchical clustering, Spectral clustering, Gaus-
sian Mixture Model (GMM), BIRCH and OPTICS from 
scikit-learn (Pedregosa et al. 2011) were used as unsuper-
vised learning methods. All of these methods were used 
to cluster dataframe’s 3D points (L, A, B) into clusters, if 
method had option to cluster datapoints into specified amount 
of clusters it was set to three. Methods used are explained in 
the following sub-chapters in more detail. The objective of 
each method was to find the cluster centre or average value 
of points which had the same cluster. 
After clustering clusters were labelled, the cluster closest 
to CIELAB(1.0, 0.0, 0.0) was labelled as “white”, cluster 
closest to CIELAB(0.0, 0.0, 0.0) got label “black”. Finally 
123 
Acta Wasaensia 127 
6 Page 6 of 13 Advances in Computational Intelligence (2024) 4 :6 
Fig. 6 Illustration of LAB 
colour data points intensities 
there were left one cluster that was labelled “colour”. If only 
two cluster or more than three major clusters were found, 
result of the clustering process was considered as failed. 
Finally CIEDE2000 Delta-E between “white” and “colour” 
was calculated. This value was then compared to ground-
truth Delta-E calculated from mean values of white and 
colour areas. Result of the comparison was stored in.CSV 
file with other image information for later analysis discussed 
in results section. 
3.2 K-means 
The K-Means clustering algorithm belongs to the partition-
based clustering algorithms category. K-means uses a iter-
ative process to partition n observations into k clusters. 
K-means minimises the sum of the squared Euclidean dis-
tance of each point to its cluster centroid. The K-Means works 
by first choosing k points as initial cluster centres, this can 
be done in multiple ways. Then algorithm calculates the dis-
tance between each cluster centre and each point. Individual 
points are assigned to the closest cluster centre. After this, the 
mean of all the data points in each cluster is calculated and 
used as the new centroid for that cluster. Then again assign-
ing all points to the nearest centroid. This process is repeated 
until the centroids do not change or until a predetermined 
number of iterations has been reached. K-Means is so-called 
hard clustering where a point can belong to one cluster only 
Lloyd (1982); MacQueen et al. (1967). 
This research uses a standard implementation of K-means 
where the algorithm is given a fixed number of clusters before 
running the clustering algorithm. “lloyd/full” Lloyd (1982) 
and “elkan” Elkan (2003) algorithms are experimented. The 
difference between these two algorithms is that “full” is an 
expectation-maximisation (EM) algorithm and “elkan” uses 
the triangular inequality. 
K-means looks into minimising total intra-cluster vari-
ance, for this squared error function is used: 
J = 
n
i=1 
k 
j=1 
wi j  · xi − c j2 , 
where J is the distortion measure. n is the number of data 
points. k is the number of clusters. xi represents a data point. 
ci represents a cluster centroid. wi j is a binary indicator 
variable indicating whether data point xi is assigned to cluster 
j . 
3.3 Fuzzy C-mean 
The Fuzzy C-Means (FCM) is a so-called soft clustering 
method. In FCM points can belong to two or more clusters. 
Each point belongs to every cluster to a certain degree. Points 
that are located near the centroid of the cluster have a high 
degree of belonging to this cluster, and point that is located 
far from the centre has a low degree Bezdek et al. (1984). 
Fuzzy C-means starts with an initial guess for the clus-
ter centres for a predefined number of clusters. Then FCM 
assigns every data point a membership grade for each clus-
ter. In the same way that K-means C-means works iteratively 
and moves the cluster centres to the right locations. FCM’s 
iteration is based on minimising an objective function that 
represents the distance from any given data point to a clus-
ter centre weighted by that data point’s membership grade 
Bezdek et al. (1984). 
123 
128 Acta Wasaensia 
Advances in Computational Intelligence (2024) 4 :6 Page 7 of 13 6 
With C-mean, different fuzziness parameters are 2.0–5.0 
experimented. The FCM algorithms fuzziness parameter is a 
key parameter. Larger fuzziness parameters blur the clusters, 
and all points will finally belong to all clusters Zhou et al. 
(2014). 
In C-means following membership degree function is 
used. 
J = 
n
i=1 
k 
j=1 
u m i j  · xi − c j2 , 
where J is the objective function. n is the number of data 
points. k is the number of clusters. xi represents a data point. 
c j represents a cluster centroid. ui j is the fuzzy membership 
value of data point xi in cluster j. And m is the fuzziness 
parameter (a positive constant). 
3.4 DBSCAN 
Density-Based Spatial Clustering of Applications with Noise 
(DBSCAN), an algorithm designed for clustering data points 
where noise exists. When provided with a collection of 
points, DBSCAN organises them into clusters by assessing 
the density of their arrangements. Essentially, points that are 
closely packed together form dense regions and are grouped 
accordingly. However, if a point is situated significantly apart 
from its neighbouring points, DBSCAN identifies it as noise 
or an outlier Ester et al. (1996). 
DBSCAN works based on two parameters: 
• (eps), two points are considered neighbours if the dis-
tance between them is smaller than epsilon. 
• minPts the minimum number of points required to form 
a dense region Ester et al. (1996). 
DBSCAN is very sensitive when it comes to these parame-
ters. minPts is easier to decide, as it can be determined from 
the total pixel amount and cluster count. However (eps) 
is more complex, and some past research has looked into 
ways to determine optimal  value (Giri and Biswas 2020). 
DBSCAN doesn’t have similar single objective function as 
K-means or C-mean, but can be described with the following 
algorithm. 
1. Identify core points based on the density criterion. 
2. Connect core points to form clusters using density-
reachability. 
3. Assign border points to clusters if they are density-
reachable from a core point. 
4. Identify noise points that are neither core points nor 
density-reachable from core points. 
In this research (eps) = 3.0, 5.0, 1.0 is used and 
minPts = (T /n)* 0.8, where T = total pixels and n is 
cluster count. As result each cluster must have at least 80% of 
pixels, if total pixel count is divided into n clusters. After run-
ning the DBSCAN algorithm density group’s average colour 
value is considered to represent the whole density group. 
3.5 MeanShift 
MeanShift is an unsupervised learning algorithm, MeanShift 
works in iterations. On each iteration algorithm shifts points 
to the direction where region have the highest density of 
data points. On each iteration MeanShift algorithm updates 
candidates for centroids to be the mean of the points within a 
given region. This region is also called bandwidth, which is 
the only parameter given to the MeanShift algorithm. After 
the update MeanShift filters out near-duplicates so that finally 
a final set of centroids are left Wu and Yang (2007). 
MeanShift’s iterative optimization algorithm, which results 
into a vector that represents the direction in which the density 
increases the most at the location of the data point, can be 
expressed as follows: 
x = 
 n 
i=1 K (x − xi ) · xi n 
i=1 K (x − xi ) 
− x, 
where K (x) is a kernel function, often a Gaussian kernel, and 
x is the MeanShift vector for a data point x . xi are the other 
data points in the dataset. In this researched uses bandwidth 
which is the median of all pairwise distances. Calculation of 
bandwidth is slow, as it takes time at least quadratic to point 
count. 
3.6 Hierarchical clustering 
Hierarchical clustering algorithms build a nested cluster by 
merging or splitting them successively. The final hierarchy 
of a dataset is represented as a tree. The root of the hierar-
chy tree gathers all the samples together, and finally, leaves 
are clusters with only one sample. Hierarchical clustering 
depends on the so-called linkage function which defines the 
distance between any two subsets. Linkage functions devel-
oped in the past are single linkage, average linkage, complete 
linkage, Ward linkage, etc. Nielsen (2016). 
This research uses AgglomerativeClustering, which is 
a version of hierarchical clustering that uses a bottom-up 
approach (Zhao and Qi 2010). The clustering algorithm starts 
from the situation where each point is in its cluster (leaf). 
The algorithm starts to merge clusters using Ward’s linkage 
criteria. Ward linkage analyses the variance of clusters and 
minimises the sum of squared differences within all clus-
ters (Miyamoto et al. 2015). The variance, also known as the 
sum of squares, is calculated based on the squared Euclidean 
123 
Acta Wasaensia 129 
6 Page 8 of 13 Advances in Computational Intelligence (2024) 4 :6 
distance between data points and the centroid of the cluster. 
D(X , Y ) = NX NY 
NX + NY · CX − CY 
2 
where D(X , Y ) is the distance between clusters X and Y . 
NX and NY are the numbers of elements in clusters X and Y 
respectively. CX −CY 2 is the Euclidean distance between 
the centroids of clusters X and Y . 
3.7 Spectral clustering 
Spectral clustering is very useful when the shape of the cluster 
is non-convex. This is because spectral clustering focuses on 
connectivity rather than the compactness of the cluster. This 
can be the case for example when the cluster has a shape of 
an arch, or if clusters are nested circles. Spectral clustering 
performs measurements for given data points by calculating 
their pairwise similarities with chosen similarity function. 
The similarity function is symmetric and non-negative. This 
research uses Euclidean distance. This results in a similarity 
matrix which is used in an unnormalized or a normalised 
spectral clustering Ng et al. (2001). 
Euclidean distance as a similarity function is expressed 
Similarity (xi , x j ) = 1 
1 + xi − x j 
where xi and x j are data points. 
3.8 OPTICS 
Optics Clustering (Ordering Points To Identify Cluster Struc-
ture) was developed to address DBSCAN’s weakness when 
data has varying density. OPTICS does this by linearly order-
ing dataset points, and points which are spatially closest 
become neighbours in a density-based representation called 
the reachability plot. In this plot, every point has a reacha-
bility distance. This reachability distance defines how easily 
a point can be reached from other points. Clusters are then 
formed based on reachability distances Ankerst et al. (1999). 
Reachability distance is calculated with the following 
function 
rdist(p, q) = max(dist(p, q), core-distance(q)) 
where p and q are data points. dist(p, q) is the Euclidean 
distance between points p and q. core-distance is the radius 
within which a certain density threshold  or MinPts is 
satisfied. OPTICS in this research uses the same parameters 
as DBSCAN. 
Table 2 Results of K-means algorithm 
Method (parameters) Success rate (%) Runtime (s.) 
K-means (algorithm = elkan) 98.1 0.052 
K-means (algorithm = full) 98.1 0.051 
3.9 GMM 
The GMM (Gaussian mixture model) is a finite mixture prob-
ability distribution model. GMM assumes that all data points 
are generated from a mixture of a finite number of Gaus-
sian distributions. The parameters of these distributions are 
however unknown. Each Gaussian distribution has mean and 
covariance which defines its parameters, the whole GMM is 
built of mean vectors (μ) and covariance matrices (σ ). GMM 
uses an iterative expectation-maximisation method to esti-
mate these parameters for distributions Rasmussen (2000). 
A Gaussian Mixture Model is represented by the following 
probability density function: 
p(x) = 
K 
k=1 
πk · N (x |μk, k). 
3.10 Birch 
BIRCH (Balanced Iterative Reducing and Clustering using 
Hierarchies) is a hierarchical clustering algorithm. BIRCH 
incrementally builds a tree-like data structure (Clustering 
Feature tree). Tree summarises the information about the 
dataset and is built top-down. Algorithm recursively splits 
the data into subclusters. Algorithm does not use traditional 
distance-based split criteria like other clustering algorithms. 
The split conditions are based on factors like the number of 
points in a subcluster or the sum of squared feature values. 
The actual split logic is more intricate due to the CF tree struc-
ture and the desire to maintain balance in the tree. BIRCH 
algorithm has two main parameters: the maximum number 
of subclusters that can be generated from a single cluster and 
the threshold distance. These parameters determine the size 
and depth of the Clustering Feature tree Zhang et al. (1997). 
4 Results 
The results of this research were obtained by running the 
process mentioned in the previous chapter. Clustering was 
performed with a 2,3 GHz Quad-Core Intel Core i5 proces-
sor. Clustering process was considered successful, if it was 
able to recognise three different clusters, and clusters were 
formed correctly. Correct formulation of clusters was defined 
as follows, if Delta-E value between white and colour was 
123 
130 Acta Wasaensia 
Advances in Computational Intelligence (2024) 4 :6 Page 9 of 13 6 
Table 3 Results of C-means algorithm 
Method (parameters) Success rate (%) Runtime (s.) 
C-means (m = 1.0) 96.1 0.034 
C-means (m = 2.0) 96.6 0.027 
C-means (m = 3.0) 97.1 0.039 
C-means (m = 4.0) 97.6 0.051 
C-means (m = 5.0) 97.6 0.050 
C-means (m = 6.0) 97.6 0.068 
Table 4 Results of DBSCAN algorithm 
Method (parameters) Success 
rate 
(%) 
Runtime (s.) 
DBSCAN (eps = 2.5, min_samples = 25%) 19.5 0.172 
DBSCAN (eps = 5.0, min_samples = 25%) 70.1 0.176 
DBSCAN (eps = 10.0, min_samples = 25%) 78.1 0.191 
DBSCAN (eps = 10.0, min_samples = 16.7%) 81.6 0.177 
DBSCAN (eps = 10.0, min_samples = 33.3%) 25.2 0.167 
DBSCAN (eps = 15.0, min_samples = 25%) 70.1 0.181 
equal or smaller than 2.0 when compared to ground-truth 
white - colour Delta-E (calculated in the process step 4 Fig. 
4). Value 2.0 for Delta-E was selected as past research has 
identified it as the smallest colour difference an inexperi-
enced human observer can notice (Han et al. 2022). This can 
be expressed as the following equation: 
E00 = 
 
success, if E00 <= 2.0 
failed, otherwise 
Success rate for each method was calculated using the 
standard formula: 
success rate = 
 
correctly clustered images 
total images 
 
. 
Results of running all unsupervised clustering methods 
for the whole 620 images dataset are presented in the follow-
ing tables. In these tables success rate describes how large 
share of the images were successfully clustered into three 
cluster were white and colour had Delta-E equal or smaller 
than 2.0, when compared to ground-truth. In the method col-
umn, method and its possible parameters are described, some 
methods were tested with multiple parameters so that best 
parameters and their combination was found. Runtime is the 
average runtime per image. 
K-Means (Table 2) achieved very high success rate. Chal-
lenges it had were when density of ink colour was low (20%), 
specially in M & C CMYK-components. As being a hard 
clustering method has it advantages when it comes to recog-
Table 5 Results of GMM algorithm 
Method (parameters) Success rate (%) Runtime (s.) 
GMM (covariance_type = full) 99.0 0.085 
GMM (covariance_type = tied) 97.6 0.071 
GMM (covariance_type = diag) 99.0 0.071 
nition of colour cluster, this is because each data point must 
belong to only one cluster. Problems arise from datapoints 
which are outliers and pull centres away from their ground-
truth. Two different algorithms were experimented, “full” 
and “elkan”, there seems to be no difference between these 
two algorithms. 
C-Means (Table 3) seems to fail when colour tried to be 
identified had only K CMYK-component, or if density of 
ink is low (20%). Otherwise, it works well. Close cluster 
seem to make it harder for C-mean algorithm to differen-
tiate cluster, this comes from C-mean nature of being soft 
clustering method. In close cluster datapoints have some 
grade of belonging to other than its main cluster. Outliers 
also impact negatively when C-mean is used. For the best 
results fuzziness parameters must be 4.0 or larger. Execution 
time although increases if fuzziness parameters is large. 
DBSCAN (Table 4) usually recognises three clusters, but 
sometimes only two, which makes the whole clustering pro-
cess to fail. In these cases DBSCAN mistakenly identifies 
large amount of datapoints as noise, even though they belong 
to a cluster. As seen from the table, DBSCAN was experi-
mented with different esp values. DBSCAN fails throughout 
the dataset, but especially when intensity of colour is between 
40 and 80%. 
GMM (Table 5) has very good performance, failures hap-
pen in same images as with the K-means, but not in so many 
test cases. Challenge with GMM is that it does not explicitly 
model outliers, and noisy data points can influence the cluster 
parameters. Different covariance matrixes, which define the 
gaussian distributions, were tested. Best options were diago-
nal and full. Full covariance gives components a possibility 
to adopt any position and shape individually. When diagonal 
is used the In the diagonal covariance, the contour axes align 
with the coordinate axes. However, for other orientations, the 
eccentricities of components may differ. 
BIRCH (Table 6) algorithm manages to solve clustering 
better than expect, as dataset is not generally considered hier-
archical. As with other clustering algorithms BIRCH is also 
vulnerable when it comes to outliers. Also BIRCH is depend 
on data ordering of datapoints, this was not considered in this 
research. With the BIRCH methods two different parame-
ters and their values were tested. Threshold value defines the 
radius of the which is used to merge samples to subclusters, 
and branching factor defines the maximum number of sub-
123 
Acta Wasaensia 131 
6 Page 10 of 13 Advances in Computational Intelligence (2024) 4 :6 
Table 6 Results of BIRCH 
algorithm 
Method (parameters) Success rate (%) Runtime (s.) 
BIRCH (threshold = 1.0, branching_factor = 50) 78.5 0.285 
BIRCH (threshold = 0.5, branching_factor = 50) 81.9 0.344 
BIRCH (threshold = 0.25, branching_factor = 50) 85.3 0.351 
BIRCH (threshold = 0.25, branching_factor = 100) 84.0 0.354 
BIRCH (threshold = 0.25, branching_factor = 25) 83.5 0.41 
BIRCH (threshold = 0.1, branching_factor = 50) 85.0 0.38 
Table 7 Results of hierarchical 
clustering algorithm 
Method (parameters) Success rate (%) Runtime (s.) 
Hierarchical (affinity = euclidean, linkage = ward) 97.7 0.537 
Hierarchical (affinity = euclidean, linkage = complete) 82.9 0.414 
Hierarchical (affinity = euclidean, linkage = average) 92.7 0.468 
Hierarchical (affinity = euclidean, linkage = single) 84.2 0.417 
Table 8 Results of spectral 
clustering algorithm 
Method (parameters) Success rate (%) Runtime (s.) 
Spectral (affinity = rbf) 98.1 2.87 
Spectral (affinity = nearest_neighbour) – – 
Table 9 Results of Meanshift algorithm 
Method (parameters) Success rate (%) Runtime (s.) 
Meashift (quantile = 0.5) 5.2 18.3 
Meashift (quantile = 0.75) 0 17.5 
MeanShift (quantile = 0.4) 5.0 15.6 
clusters in each node. Even with the best parameters BIRCH 
fails throughout the dataset, especially when colour is green, 
or its intensity is 20% 
Hierarchical clustering (Table 7) algorithms are sensitive 
to noise and outliers, as they can impact the linkage and 
structure of clusters, still hierarchical clustering works well. 
Distances between datapoints in LAB colour space can easily 
be calculated using euclidean distance. Usage of euclidean 
distance is used to allocate points into clusters, depend on 
their distance. In this research the hierarchical clustering 
affinity matrix was constructed using euclidean distance, and 
four different options for linkage were used. Ward linkage, 
which minimises the variance of the clusters when merged, 
performed best. With the hierarchical clustering most prob-
lematic cases are 20% M and 20% C images, and it also fail 
sometimes with the green colour. 
Nature of the problem does not especially require spectral 
cluster (Table 8), sill spectral clustering achieves quite high 
success rate. Spectral clustering does not explicitly handle 
noisy data points or outliers. Outliers can affect the affinity 
matrix and potentially lead to the formation of unwanted 
clusters. When affinity matrix is computed using nearest 
neighbours algorithm fails to run. However, when affinity 
matrix is constructed using a radial basis function (RBF) 
kernel algorithm. 
Meanshift (Table 9) algorithm is very slow on given prob-
lem. Larger problem was that Meanshift was not able to 
cluster data points correctly. Problems of the Meanshift might 
come from incorrect bandwidth value or noisy data, but even 
if Meanshift would work correctly it would be very slow in 
the problem given. Different quantile sizes were also tested, 
but Meanshift still was not able to recognise clusters. 
As OPTICS (Table 10) is based on same base core idea 
as DBSCAN, it was expected to work in quite similar way. 
However, OPTICS was able to achieve only 70% success 
rate with best parameters, being also slower than DBSCAN. 
OPTICS hierarchical approach still seems to work better than 
DBSCAN. 
Also, one other algorithm was experimented, Affinity 
Propagation, but it failed to perform in the given dataset, all 
runs ended to results that no clusters were detected because 
affinity propagation did not converge. 
The following Fig. 7. shows some of the images where 
clustering failed in all best algorithms. In the image a) colour 
intensity is CMYK(0,0,0,0.6) gray, b) & c) (1.0,1.0,0,0) green 
and d) (0,0,0.2,0) magenta. In the image colour is dark in 
clustering results two main clusters instead of three. In b) 
& c) ambient lighting is challenging and its too hard for 
clustering algorithms to recognise green colour. Finally in 
123 
132 Acta Wasaensia 
Advances in Computational Intelligence (2024) 4 :6 Page 11 of 13 6 
Table 10 Results of OPTICS 
algorithm 
Method (parameters) Success rate (%) Runtime (s.) 
OPTICS (eps = 2.5, min_samples = 25%) 17.7 5.12 
OPTICS (eps = 5.0, min_samples = 25%) 70.0 24.12 
OPTICS (eps = 10.0, min_samples = 25%) 17.7 5.12 
Fig. 7 Failed images in 
experiment one 
Table 11 Results of the second 
experiment 
Method (parameters) 10% intensity 5% intensity 
C-means (m = 3.0) 94.8% 83.3% 
K-means 96.3% 89.3% 
GMM (covariance = full) 92.5% 84.5% 
hierarchical clustering (affinity=euclidean, linkage=ward) 93.4% 84.5% 
Spectral clustering (affinity = rbf) 97.8% 76.2% 
the d) image there is lot of noise and print quality is low 
which results to incorrect results. 
Finally second experiment was done with algorithms 
that had best performance, K-means, C-means (m = 3.0),  
GMM (covariance = full), hierarchical clustering (affin-
ity=euclidean, linkage=ward), spectral clustering (affinity = 
rbf) with a smaller dataset. This dataset contained colours 
with ink densities 0%, 5% and 10%, these images were 
captured in the same environments as images used in the 
first experiment. In total this dataset contained 230 images. 
Results of this experiment are shown in the Table 11. 
In the second experiment success rate of all algorithms 
dropped, but still all of them performed well with over 90% 
success rate when colour intensity was 10%. But when inten-
sity dropped to 5% only K-means achieved almost 90% 
success rate. Mostly challenges were with magenta and yel-
low colours, some of the images that failed are show in Fig. 8. 
In this figure images a) and b) have intensity of 10%, c) and 
d) have intensity of 5% 
5 Conclusion and discussion 
Results show that unsupervised clustering methods, K-
means, C-means, GMM, Hierarchical clustering, Spectral 
clustering can be used to recognise colour differences in 
printed CMYK colours. Especially when difference in ink 
density on at least one CMYK channel is 20% or more. In 
these cases GMM achieves 99,0 % success rate followed by 
Spectral clustering and K-means. These results show that if 
high success rate is wanted using ink densities in 20% inter-
vals is good way to go. Best parameter options for methods 
are K-means with K-means++ initialisation, C-means with 
fuzziness parameter 3.0, GMM with covariance type full, 
Fig. 8 Failed images in 
experiment two 
123 
Acta Wasaensia 133 
6 Page 12 of 13 Advances in Computational Intelligence (2024) 4 :6 
hierarchical clustering with affinity=euclidean and Ward’s 
linkage and Spectral clustering with RBF affinity. When ink 
levels drop to 10% all algorithms still have success rate over 
90%. But when ink levels drop down to only 5% none of the 
algorithms achieves over 90% success rate. 
The best algorithm based on the results of the both 
experiments is K-means which achieves better results than 
quite similar C-mean, especially in the second low ink den-
sity experiment. This is because fuzziness of the C-mean 
algorithm clusters some datapoints incorrectly when clus-
ter centres are close to each other. Hard clustering which 
K-means uses seems to work well in lower ink densi-
ties. Used K-means algorithm uses K-means++ initialisation 
method. K-means++ assigns the first centroid randomly, 
then selects the rest of the centroids based on the maxi-
mum squared distance. K-means++ initialisation seems to 
work well. Incorrect clustering happens also with hierarchi-
cal and spectral clustering, some datapoints are linked into 
wrong cluster as they are close outliers or noise datapoints of 
other clusters. In this use-case datapoints form dense regions, 
which have spherical or elliptical shape, this plays K-means 
and GMM’s advantage, but K-means seems to manage out-
liers better than GMM. While spectral clustering also works 
well it has challenges when ink density is low, and tries to 
form two clusters instead of three. 
To get better results from unsupervised clustering some 
methods like filtering out datapoints which are outliers could 
be used. This would need to be done in a way which 
doesn’t destroy datapoints which belong to some clusters, as 
there is limited amount of datapoints in images. Also using 
some pre-processing techniques, like noise filtering, colour 
enchantment and so might improve results. Problem might 
be that in this use-case there is very limited amount of data 
available for colour enchantment. In this problem it might 
be also possible to assign one or two initial locations of cen-
troids (black & white) manually, this might lead into better 
results but is left for future research. 
Acknowledgements This work was supported by Finnish Cultural 
Foundation’s Central Ostrobothnia Regional Fund (Grant Number 
25211242). 
Funding Open Access funding provided by University of Vaasa. 
Data availability The dataset used in this manuscript is available as 
Zenodo repository: 10.5281/zenodo.7749912. 
Declarations 
Conflict of interest There are no conflicts of interest. 
Open Access This article is licensed under a Creative Commons 
Attribution 4.0 International License, which permits use, sharing, adap-
tation, distribution and reproduction in any medium or format, as 
long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons licence, and indi-
cate if changes were made. The images or other third party material 
in this article are included in the article’s Creative Commons licence, 
unless indicated otherwise in a credit line to the material. If material 
is not included in the article’s Creative Commons licence and your 
intended use is not permitted by statutory regulation or exceeds the 
permitted use, you will need to obtain permission directly from the copy-
right holder. To view a copy of this licence, visit http://creativecomm 
ons.org/licenses/by/4.0/. 
References 
Aarathi KS, Abraham A (2017) Vehicle color recognition using deep 
learning for hazy images. In: 2017 International Conference on 
Inventive Communication and Computational Technologies (ICI-
CCT), pp 335–339 
Abdalla A, Cen H, El-manawy A et al (2019) Infield oilseed rape 
images segmentation via improved unsupervised learning models 
combined with supreme color features. Comput Electron Agric 
162:1057–1068. https://doi.org/10.1016/j.compag.2019.05.051 
Ankerst M, Breunig MM, Kriegel HP et al (1999) Optics: ordering 
points to identify the clustering structure. ACM Sigmod Record 
28(2):49–60 
Banic N, Loncaric S (2018) Unsupervised learning for color constancy. 
pp 181–188 
Bar-Haim Y, Saidel T, Yovel G (2009) The role of skin colour in face 
recognition. Perception 38(1):145–148 
Basar S, Ali M, Ochoa-Ruiz G et al (2020) Unsupervised color image 
segmentation: a case of rgb histogram based k-means clustering 
initialization. PLoS ONE 15(10):e0240015 
Bazeille S, Quidu I, Jaulin L (2012) Color-based underwater object 
recognition using water light attenuation. Intell Serv Robot 
5(2):109–118 
Bezdek JC, Ehrlich R, Full W (1984) Fcm: The fuzzy c-means clustering 
algorithm. Comput Geosci 10(2–3):191–203 
Bo L, Ren X, Fox D (2013) Unsupervised feature learning for rgb-d 
based object recognition. In: Experimental robotics, Springer, pp 
387–402 
Bretzner L, Laptev I, Lindeberg T (2002) Hand gesture recognition 
using multi-scale colour features, hierarchical models and particle 
filtering. In: Proceedings of fifth IEEE international conference on 
automatic face gesture recognition, IEEE, pp 423–428 
De la Escalera A, Armingol JM, Mata M (2003) Traffic sign recognition 
and analysis for intelligent vehicles. Image Vis Comput 21(3):247– 
258 
Dresp B, Wandeto JM (2020) Unsupervised classification of cell imag-
ing data using the quantization error in a self-organizing map. 
In: on Science AC, ASCE E (eds) 22nd International Conference 
on Artificial Intelligence ICAI 2020, American Council on Sci-
ence and Education, Las Vegas, United States, CSCI 2020 Book 
of Abstracts, https://hal.archives-ouvertes.fr/hal-02913378 
Du EY, Chang CI, Thouin PD (2004) Unsupervised approach to color 
video thresholding. Opt Eng 43(2):282–289 
Elkan C (2003) Using the triangle inequality to accelerate k-means. 
In: Proceedings of the 20th international conference on Machine 
Learning (ICML-03), pp 147–153 
Ester M, Kriegel HP, Sander J, et al (1996) A density-based algorithm 
for discovering clusters in large spatial databases with noise. In: 
kdd, pp 226–231 
Feng L, Jiang D, Zhang A, et al (2019) Color recognition for rubik’s cube 
robot. In: 2019 IEEE International Conference on Smart Internet 
of Things (SmartIoT), IEEE, pp 269–274 
Gao XW, Podladchikova L, Shaposhnikov D et al (2006) Recognition 
of traffic signs based on their colour and shape features extracted 
123 
134 Acta Wasaensia 
Advances in Computational Intelligence (2024) 4 :6 Page 13 of 13 6 
using human vision models. J Vis Commun Image Represent 
17(4):675–685 
Gerke M, Xiao E (2014) Fusion of airborne laserscanning point clouds 
and images for supervised and unsupervised scene classification. 
ISPRS J Photogr Remote Sens 87:78–92. https://doi.org/10.1016/ 
j.isprsjprs.2013.10.011 
Giri K, Biswas TK (2020) Determining optimal epsilon (eps) on dbscan 
using empty circles. In: International Conference on Artificial 
Intelligence and Sustainable Engineering: Select Proceedings of 
AISE 2020, Vol 1, Springer Nature, p 265 
Gong J, Jiang Y, Xiong G, et al (2010) The recognition and tracking of 
traffic lights based on color segmentation and camshift for intel-
ligent vehicles. In: 2010 IEEE Intelligent Vehicles Symposium, 
IEEE, pp 431–435 
Gonzalez TF (1985) Clustering to minimize the maximum intercluster 
distance. Theor Comput Sci 38:293–306 
Han A, Kim J, Ahn J (2022) Color trend analysis using machine learning 
with fashion collection images. Clothing Textiles Res J 40(4):308– 
324 
Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering 
algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108 
Hurlbert A, Ling Y (2012) Understanding colour perception and pref-
erence. In: Colour design. Elsevier, p 129–157 
Isohanni J (2022) Use of functional ink in a smart tag for fast-moving 
consumer goods industry. J Pack Technol Res 6(3):187–198 
Isohanni J (2023) Qr-code dataset, with colour embed inside 
Jhawar J (2016) Orange sorting by applying pattern recognition on 
colour image. Proc Comput Sci 78:691–697 
Kang J, Ji Z (2010) Dental plaque quantification using mean-shift-based 
image segmentation. In: 2010 International Symposium on Com-
puter, Communication, Control and Automation (3CA), IEEE, pp 
470–473 
Kao WC, Wang SH, Che WH, et al (2006) Designing image processing 
pipeline for color imaging systems. In: 2006 IEEE International 
Symposium on Circuits and Systems (ISCAS), IEEE 
Koubaroulis D, Matas J, Kittler J, et al (2002) Evaluating colour-based 
object recognition algorithms using the soil-47 database. In: Asian 
Conference on Computer Vision 
Kriegel HP, Kröger P, Sander J et al (2011) Density-based clustering. 
Wiley Interdiscip Rev Data Min Knowl Discov 1(3):231–240 
Kuo CFJ, Shih CY, Kao CY et al (2005) Color and pattern analysis of 
printed fabric by an unsupervised clustering method. Textile Res 
J 75(1):9–12 
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf The-
ory 28(2):129–137. https://doi.org/10.1109/TIT.1982.1056489 
Luo MR, Cui G, Rigg B (2001) The development of the cie 2000 colour-
difference formula: Ciede 2000. Color Res Appl 26(5):340–350. 
https://doi.org/10.1002/col.1049 
MacQueen J, et al (1967) Some methods for classification and analysis 
of multivariate observations. In: Proceedings of the fifth Berkeley 
symposium on mathematical statistics and probability, Oakland, 
pp 281–297 
Mao B, Li B (2019) Building façade semantic segmentation based on 
k-means classification and graph analysis. Arab J Geosci 12(7):1–9 
Miyamoto S, Abe R, Endo Y, et al (2015) Ward method of hierarchi-
cal clustering for non-Euclidean similarity measures. In: 2015 7th 
International Conference of Soft Computing and Pattern Recogni-
tion (SoCPaR), IEEE, pp 60–63 
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and 
an algorithm. Adv Neural Inf Process Syst 14 
Nielsen F (2016) Hierarchical clustering. Springer International Pub-
lishing, Cham, pp 195–211 
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: 
machine learning in Python. J Mach Learn Res 12:2825–2830 
Rabie T (2017) Training-less color object recognition for autonomous 
robotics. Inf Sci 418:218–241 
Rasmussen C (2000) The infinite gaussian mixture model. Adv Neural 
Inf Process Syst 
Reddy EK (2021) Clustering techniques in data mining: A comparative 
analysis. Research issues on datamining, pp 95–101 
Riri H, Elmoutaouakkil A, Beni-Hssane A et al (2016) Classification 
and recognition of dental images using a decisional tree. In: 2016 
13th International Conference on Computer Graphics. Imaging 
and Visualization (CGiV), IEEE, pp 390–393 
Vishnuvarthanan G, Rajasekaran MP, Subbaraj P et al (2016) An unsu-
pervised learning method with a clustering approach for tumor 
identification and tissue segmentation in magnetic resonance brain 
images. Appl Soft Comput 38:190–212 
Wang Z, Zhuang Z, Liu Y et al (2021) Color classification and tex-
ture recognition system of solid wood panels. Forests 12(9):1154. 
https://doi.org/10.3390/f12091154 
Wu KL, Yang MS (2007) Mean shift-based clustering. Pattern Recogn 
40(11):3035–3052 
Xu G, Li X, Lei B et al (2018) Unsupervised color image segmentation 
with color-alone feature using region growing pulse coupled neural 
network. Neurocomputing 306:1–16 
Xu X, Ester M, Kriegel HP, et al (1998) A distribution-based clustering 
algorithm for mining in large spatial databases. In: Proceedings 
14th International Conference on Data Engineering, IEEE, pp 324– 
331 
Yang G, Li H, Zhang L, et al (2010) Research on a skin color detec-
tion algorithm based on self-adaptive skin color model. In: 2010 
International Conference on Communications and Intelligence 
Information Security, IEEE, pp 266–270 
Yavuz Z, Köse C (2017) Blood vessel extraction in color retinal fundus 
images with enhancement filtering and unsupervised classifica-
tion. J Healthc Eng 2017:4897258. https://doi.org/10.1155/2017/ 
4897258 
Zamir SW, Arora A, Khan S, et al (2021) Learning digital camera 
pipeline for extreme low-light imaging 
Zhang S, Huang W, Zhang C (2019) Three-channel convolutional neural 
networks for vegetable leaf disease recognition. Cogn Syst Res 
53:31–41 
Zhang T, Ramakrishnan R, Livny M (1997) Birch: a new data clustering 
algorithm and its applications. Data Min Knowl Discov 1:141–182 
Zhao H, Qi Z (2010) Hierarchical agglomerative clustering with 
ordering constraints. In: 2010 Third International Conference on 
Knowledge Discovery and Data Mining, IEEE, pp 195–199 
Zhou K, Fu C, Yang S (2014) Fuzziness parameter selection in fuzzy 
c-means: the perspective of cluster validation. Sci China Inf Sci 
57:1–8 
Zhu S, Liu L (2006) Traffic sign recognition based on color standard-
ization. In: 2006 IEEE International Conference on Information 
Acquisition, IEEE, pp 951–955 
123 
Acta Wasaensia 135 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to 
classify subtle colour differences 
Jari Isohanni 
University of Vaasa, Digital Economy, Wolffintie 34, Vaasa, 
65200, Finland. 
Contributing authors: x2603813@student.uwasa.fi; 
Abstract 
Convolutional neural networks (CNNs) have proven to have high accu-
racy in image classification tasks. In recent years, their image classifi-
cation performance has significantly improved. This research compares 
several of the most popular CNNs for color classification tasks. The 
problem presented is similar but different than the image classification 
task. Image classification involves a large amount of high-level feature 
information, while color classification involves only low-level information. 
This research applies standard versions of each architecture, trains the 
model and evaluates its accuracy. The purpose of the CNN is to iden-
tify which color has been printed on the paper. Mobile phone-captured 
images are first preprocessed by applying autoleveling, and then the dif-
ference between the color and paper white images is calculated. These 
different images are stored as a dataset. The colors used in this research 
are CMYK colors, and their intensity differs in the first dataset in 20 % 
of the steps. The second dataset uses smaller differences of 10 % and 5 
%. Most architectures achieved high accuracy when the colour difference 
was 10 % or greater, and DenseNet was able to correctly classify 100 % 
of the images. When the colour difference decreased to less than 10 %, 
the accuracy of most models decreased significantly. ResNet, as the best 
architecture, achieved 95 % accuracy. ResNet only had problems with 
very low-intensity magenta and yellow colours. The achieved accuracy is 
very high, as differences in images are very subtle. The residual connec-
tions of ResNet, which help the model learn in an incremental way, and 
skip connections, which facilitate the reuse of features, allow ResNet to 
outperform other architectures. However, ResNet, like many other archi-
tectures, is prone to overfitting. This research provides a baseline for 
using CNN colour classification, but more research is needed, especially 
regarding the performance of CNNs, for example, via ablation studies. 
1 
136 Acta Wasaensia 
Springer Nature 2021 LATEX template 
2 Using convolutional neural networks to classify subtle colour differences 
Different fine-tuning and optimization methods, as well as more advanced 
preprocessing methods, should also be considered in future research. 
Keywords: machine vision, colour difference, printed colours, convolutional 
neural networks 
1 Introduction 
Human eyes are sensitive to subtle color differences; for instance, we can dis-
tinguish whether a wall is painted in berry or navy hues. It is commonly 
acknowledged that women perceive colors differently than men. This phe-
nomenon has been widely studied; for example, Bimler et al. Bimler et al (2004) 
reported that colour differences can carry significant information. IIn indus-
tries such as food and cosmetics, slight color variations can indicate quality, 
freshness, or differences in colors, which might raise serious safety concerns. 
Color recognition is also a common topic in computer vision research and 
development. Typically, color recognition is a part of a larger solution. In these 
solutions, computer vision is employed to classify, identify or recognize items. 
In addition, colour is a feature that can be used in the process of separa-
tion. The applications of computer vision and color recognition span various 
fields, including agriculture (Jhawar (2016),Lamb and Chuah (2018)), object 
recognition (Albani et al (2017),Nalinipriya et al (2018)), environmental pro-
tection Zhao et al (2020) medical imagining Zhu et al (2017) and predictive 
maintenance Ahmed and Nandi (2021). 
Color recognition is also an integral part of the printing industry Luo and 
Zhang (2003). With the development of functional inks, colour recognition 
has played a role in printing nonelectronic sensors Hakola et al (2021). These 
sensors can be applied to fast-moving consumer goods or other product labels, 
where sensors react to environmental changes through color alteration Isohanni 
(2022). The detection of this color change via computer vision enables these 
sensors to function as cost-efficient IoT sensors. 
This research focuses on experimenting with some of the most popular con-
volutional neural network (CNN) models in color classification. The results 
provide knowledge to researchers and developers interested in employing color 
classification in their projects. Such a comparative analysis has not been con-
ducted previously, as color classification is typically studied as part of a larger 
recognition task. Notably, there has been very little research on using CNNs 
to recognize colors in printed sources. However, CNNs have proven to be pow-
erful tools for various different recognition tasks, leading to improved product 
quality and efficiency in various applications. 
In this research, popular convolutional neural networks (CNNs) are used 
to recognize color differences in printed sources. The research question this 
research aims to answer is "How can CNNs be used to recognize small color 
differences in printed sources, especially when a low-cost smartphone is used 
Acta Wasaensia 137 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 3 
as a camera?" The key contributions of this work can be summarized as an 
approach to recognizing small color changes in printed sources. This helps when 
creating approaches that can be applied across various industries for quality 
control, safety, and precision. 
This research is structured as follows: Chapter 2 starts with relevant past 
research. Chapter 3 contains an overview of the system. Chapter 4 provides an 
overview of the convolutional neural networks used and their comparison based 
on past research. Section 5 describes the experiments used, and in Section 
6, the results of the experiments are shown. Finally, Section 7 discusses the 
conclusions of the research. 
2 Past research 
Color recognition, especially in regard to the recognition of slight color 
differences, is not a commonly studied topic. Past research like (Lai and 
Westland (2020),Senthilkumar (2010),Wu et al (2019),Vidal-Calleja et al 
(2014),Arsenovic et al (2019), Anandhakrishnan and Jaisakthi (2022)) relate 
loosely to approach presented in this research. However, they examined 
using colors in artificial neural networks in different contexts or as part of 
wider object recognition. The development of convolutional neural networks 
(CNNs) and their ability to be used in closely related research (Apriyanti 
et al (2021),Büyükarıkan and Ülker (2022), Engilberge et al (2017),Atha 
and Jahanshahi (2018a), Boulent et al (2019), Tiwari (2018), Muhammad 
et al (2018),Zhang et al (2018), Kagaya et al (2014),Przybyło and Jabłoński 
(2019)). This research focuses on using CCNs to identify slight color differences. 
This research takes input from past research when designing an architecture 
proposed for slight color difference recognition. 
Closely related past research has developed CNN architectures, which per-
form well in problems related to this research. Past research shows that CNNs 
are well suited for color recognition tasks. This is due to the CNN’s ability 
to automatically learn hierarchical features from images. CNNs can capture 
intricate patterns and variations in color, making them effective at discerning 
subtle differences. The capability of a CNN depends on its architecture. One 
of the main architectural design-related questions is how wide and/or deep 
architecture should be. A wide neural network is characterized by having a sub-
stantial number of neurons within each of its layers. A network is considered 
"wider" when there is a greater number of neurons in a given layer. Wider net-
works are employed for tasks demanding extensive feature engineering, such as 
image classification and natural language processing. A deep neural network 
is characterized by having numerous layers. The greater the number of layers 
in a network is, the deeper the network becomes. Deep networks have appli-
cations in tasks that necessitate a high degree of abstraction, such as speech 
recognition and predictive analytics. 
Some of the closely related research has focused on the inspection of items 
or structures via computer vision and the use of CNNs to identify defects. 
138 Acta Wasaensia 
Springer Nature 2021 LATEX template 
4 Using convolutional neural networks to classify subtle colour differences 
These use cases have come from agriculture and civil engineering. Detection 
of defects is crucial, for example, in the prediction of structural failures or the 
correct time for pest control. 
Soukup and Huber-Mörk proposed a CNN architecture that had two con-
volutional and pooling layers and a final fully connected layer. They were able 
to recognize defects in steel rails, but future research should use deeper CNNs. 
The findings of Soukup’s and Huber-Mörk’s research are similar, as defects 
in rails are small, and differences in nondefect and defect images are small. 
However, their research used a special setup for image capture. Soukup and 
Huber-Mörk (2014) 
Fuentes et al. researched the use of various CNNs to identify defects in 
tomato plants. They used the Faster Region-based Convolutional Neural Net-
work (Faster R-CNN), Region-based Fully Convolutional Network (R-FCN), 
and Single Shot Multibox Detector (SSD) together with deep feature extrac-
tors such as VGG net and Residual Network (ResNet). Authors state that in 
their case plain networks perform better than deeper networks. Their research 
has quite closely the same setup as in this research. Changes in the tomatoes 
were slight, and they did not use any special image capture setup. Fuentes 
et al (2017) 
Zhang et al. proposed the automatic detection of road cracks and did not 
use any special image capture setups, such as normal smartphones. Their prob-
lems are quite close to one in this research, as cracks have low contrast with 
the surrounding pavement. Their approach used four convolutional layers, five 
max-pooling layers, and two fully connected layers. With this approach, they 
achieved almost 90% precision. Zhang et al (2019a) 
Mohanty et al. used the common CNN architectures AlexNet and 
GoogLeNet to identify plant diseases. Their research dataset was collected 
by using normal smartphones, and GoogLeNet consistently outperformed 
AlexNet. Additionally, their results show that using color images instead of 
grayscale images yields better results. Mohanty et al (2016) 
Chen and Jahanshahi proposed a deep learning framework based on a con-
volutional neural network (CNN) and a naive Bayes data fusion scheme, called 
NB-CNN, to analyse individual video frames for crack detection. Their use 
also involves very small and difficult-to-detect changes in metallic surfaces. 
Their approach had four convolutional and pooling layers with two fully con-
nected layers. The result of their approach was that 99% of the defects were 
recognized successfully. Chen and Jahanshahi (2017) 
Kumar et al. used CNNs to identify defects in sewer pipes. The defects 
they looked for did not significantly differ from those of pipes under decent 
conditions. Their approach had two convolutional layers and two connected 
layers. Their approach achieved 86% accuracy. Kumar et al (2018) 
Past research has shown that various CNN architectures can be used to 
identify slight differences in images. Some of them are very deep. Some of 
them are standard architectures, and some of them are specially developed for 
certain use cases. As shown in Table 1, past research has used both existing 
Acta Wasaensia 139 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 5 
CNNs and custom CNNs. However, there has been very little research on the 
use of CNNs to recognize colors in printed sources. 
Table 1 Related past research 
Article CNN architecture 
Evaluation of deep learning approaches based on con-
volutional neural networks for corrosion detection Atha 
and Jahanshahi (2018b) 
VGG, GoogLeNet, ResNet, 
AlexNet, ZFNet 
Solving Current Limitations of Deep Learning Based 
Approaches for Plant Disease Detection Arsenovic et al 
(2019) 
AlexNet, VGG, Incep-
tion version 3, DenseNet, 
ResNet152 
Using convolutional neural network models illumination 
estimation according to light colors Büyükarıkan and 
Ülker (2022) 
VGG 
Automated color detection in orchids using color labels 
and deep learning Apriyanti et al (2021) 
VGG16, Inception, 
Resnet50, Xception, Nas-
net 
Deep Learning-Based Crack Damage Detection Using 
Convolutional Neural Networks Cha et al (2017) 
Custom 
Convolutional Neural Networks for the Automatic Iden-
tification of Plant Diseases Boulent et al (2019) 
ResNet-101,DenseNet-
121,VGG-16,Inception-V3, 
SqueezeNet 
Color representation in deep neural networks 
Engilberge et al (2017) 
VGG-19, AlexNet 
Rock images classification by using deep convolution 
neural network Cheng and Guo (2017) 
Custom 
Deep Convolutional Neural Networks for image based 
tomato leaf disease detection Anandhakrishnan and 
Jaisakthi (2022) 
Custom 
Three-channel convolutional neural networks for veg-
etable leaf disease recognition Zhang et al (2019a) 
Custom 
Early fire detection using convolutional neural networks 
during surveillance for effective disaster management 
Muhammad et al (2018) 
Custom AlexNet 
Using Deep Convolutional Neural Network for oak 
acorn viability recognition based on color images of 
their sections Przybyło and Jabłoński (2019) 
CNN-F-16, CNN-F-13, 
CNN-F-9 
Vehicle color recognition using Multiple-Layer Feature 
Representations of lightweight convolutional neural net-
work Zhang et al (2018) 
Custom 
Color for object recognition: Hue and chroma sensitiv-
ity in the deep features of convolutional neural networks 
Flachot and Gegenfurtner (2021) 
AlexNet, VGG-16 and 
VGG-19 
Color for object recognition: Hue and chroma sensitiv-
ity in the deep features of convolutional neural networks 
Flachot and Gegenfurtner (2021) 
AlexNet, VGG-16 and 
VGG-19 
140 Acta Wasaensia 
Springer Nature 2021 LATEX template 
6 Using convolutional neural networks to classify subtle colour differences 
3 System overview 
The process as a whole is described in this section and its subsections. Figure 
1 shows the process of creating the final dataset used in the research. Before 
splitting the dataset into training and validation datasets, image autoleveling, 
extraction of color areas, and calculation of difference images are performed. 
The difference image is used in this research because the objective of the 
research is to determine whether CNNs can correctly identify colour differences 
from paperwhites. The difference image is also feasible for use because the two 
areas that are compared can be made equal in size. Another option would be 
to merge two color images together. Then, the merged image could be used 
as a multichannel input to the convolutional neural network. However, this 
could lead to decision-making where individual pixel differences might play an 
important role. The use of multichannel images is left for future research. 
Fig. 1 Dataset creation process 
3.1 Colour dataset 
This research uses two datasets. The first dataset is an open dataset that 
contains QR codes with different color areas (Figure Isohanni (2023).). The 
second dataset is a custom dataset created for this research with the same 
approach as the first dataset. Different color areas in QR codes are a) paper-
white, b) 100K black, and c) color. From these are paper-white and black are 
used for auto-leveling and paper-white and colour to form the final dataset. 
The second dataset is a custom dataset created for this research with the 
same approach as the first dataset. The different coloured areas in the QR 
codes are a) paper-white, b) 100K black, and c) coloured. Paper-white and 
black are used for autoleveling, and paper-white and color are used to form the 
final dataset. In the first dataset, colors vary in all CMYK colors, combinations, 
Acta Wasaensia 141 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 7 
and saturations. The ink saturation in these images varies between 20% and 
100% in steps of 20%. The second dataset, which was captured only for this 
research, extends the first dataset by colors with lower ink saturations of 0%, 
5%, and 10%. Examples of the colors in the first dataset are shown in Figure 2. 
Fig. 2 Colour examples, dataset 1. 
Images in both datasets were captured with two standard mobile devices 
(different versions of the iPhone) in normal living/office ambient light envi-
ronments. In practice, this means that in the dataset, images with different 
colour temperatures, ambient light and images have been captured from dif-
ferent distances and with different camera settings. The purpose of the dataset 
is to reflect various situations that might arise every day. The different colours 
are grouped into folders, where the folder names define the colour combination 
and saturation of the coloured areas. 
As can be seen from the dataset one (Figure 2), all colours are quite clear. 
However, it can be seen that the samples were printed with a laserjet, as stripes 
typical of laser printing appear when camera autofocus is used in a specific 
setting. 
In the Figure 3, colour differences are no longer clear. Camera settings, 
such as autobalancing and adapting to capture the environment, easily destroy 
some of the small colour difference information while improving the overall 
image. Yellow colour information (5Y and 10Y) is sometimes almost destroyed. 
This can be expected to make classification of images very challenging. 
142 Acta Wasaensia 
Springer Nature 2021 LATEX template 
8 Using convolutional neural networks to classify subtle colour differences 
Fig. 3 Colour examples, dataset 2. 
3.2 Preprocessing 
Preprocessing of the images in the dataset is performed by identifying the 
darkest and whitest areas, marked with red and green rectangles in the image 
(Figure 4. red rectangle), and the color area (Figure 4. yellow rectangle). 
These areas work as input for autocontrast. Auto-contrast is a mapping func-
tion that stretches the pixel intensities in the image to cover the entire range. 
In the case of RGB images, this range is [0 ... 255] on each channel. The dark-
est areas will be mapped to the minimum intensity value (e.g. 0). The whitest 
areas will be mapped to the maximum intensity value (e.g. 255). Taylor (2003) 
Output image b) in Figure 4, is expected to have improved contrast and color 
balance. Auto contrast adjustment is calculated with the following equation: 
I ′ = I−Imin Imax−Imin × 255, 
where Imax is the brightness value in the image and Imin is the darkest 
value in the image. 
Preprocessing of the whole image is concluded with Gaussian blurring of the 
image to eliminate noise. Gaussian blurring is performed by applying a Gaus-
sian filter to the image; this is a convolution operation involving a Gaussian 
kernel. In this research, the kernel used is 5x5 pixels in size. This makes edges 
and fine details less pronounced. These edges especially appear in laser-printed 
images. Chauhan (2018) 
After blurring two areas are extracted from the image, the white area 
(Figure 4. red rectangle) and the color area (Figure 4. green rectangle). These 
color areas are resized to 256 x 256 px so that their size is close to the size used 
Acta Wasaensia 143 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 9 
Fig. 4 Different colour areas of the QR-Code and results of the auto-levelling. 
by CNNs as input. Then, as the last step, the difference image between white 
and colour is calculated. The difference image is calculated pixel by pixel. The 
formula for calculating the difference image is: 
D(x, y) = |I1(x, y) − I2(x, y)| 
where D(x, y) represents the difference value for the pixel at position (x, y) 
in the difference image. I1(x, y) is the color value of the pixel at position (x, 
y) in the first RGB image. I2(x, y) is the color value of the pixel at position 
(x, y) in the second RGB image. The absolute value is used to ensure that 
the difference is a positive value, as the actual difference could be negative if 
I1(x, y) is smaller than I2(x, y). Taking the absolute value ensures that the 
final image reflects the magnitude of the difference, not its direction. 
This difference image is stored as an RGB image in a folder, where the 
folder name represents the label of the image. 
3.3 Transfer learning 
As this research has two different datasets, which are both quite similar but 
the classes represent different colors, the CNN is first trained with the larger 
dataset. Then, the knowledge is transferred to a smaller model and trained 
with a smaller dataset. 
Transfer learning (Figure 5) is a pivotal technique that leverages preexist-
ing knowledge to enhance the performance of deep learning models. Transfer 
learning involves reusing neural network architectures and their learned rep-
resentations from one task and adapting them for a new, related task. This 
process significantly reduces the need for extensive labelled data and extensive 
training resources. Pan and Yang (2009) 
Computer vision tasks, such as image classification and object detection, 
often demand large datasets and significant computational power for training 
deep neural networks from scratch. Transfer learning mitigates these chal-
lenges by allowing training to start from pretrained models. These models 
learn features from larger, usually closely related datasets. Some past studies 
that have successfully used transfer learning in computer vision-related tasks 
144 Acta Wasaensia 
Springer Nature 2021 LATEX template 
10 Using convolutional neural networks to classify subtle colour differences 
include those by Ravishankar et al (2016), Wu et al (2018), Varde et al (2023) 
and Kentsch et al (2020). 
The keys to successful transfer learning lies in choosing the appropriate 
pretrained model and understanding how to adapt it effectively. Challenges 
regarding the use of transfer learning are related to negative transfers Brodzicki 
et al (2020). Negative transfer can occur, for example, if the source and target 
domains are too dissimilar, if the source data are biased or unrepresentative 
of the target data, or if the representation learned in the source task may not 
be suitable for the target task. Li et al (2022) 
Fig. 5 Example of transfer learning, adopted from Pan and Yang (2009) 
4 Used CNN architectures 
Convolutional neural networks (CNNs) are a class of deep learning models 
designed for processing structured grid data. All CNNs are built from the 
same elements, convolutional layers, activation functions, pooling operations, 
flattening. 
The core building block of a CNN is the convolutional layer. Convolutional 
operations involve sliding a set of filters (kernels) over the input data, extract-
ing local features, and creating feature maps. These layers capture hierarchical 
features through the network. Alzubaidi et al (2021) 
Different activation functions have been developed as part of CNN research. 
Currently, nonlinear activations, such as rectified linear units (ReLUs), are the 
most efficient. They are applied after convolutional operations to introduce 
nonlinearity and enable the model to learn complex patterns. Alzubaidi et al 
(2021) 
Pooling layers (e.g., max pooling) downsample the spatial dimensions 
of the feature maps, reducing computational complexity and providing a 
form of translation invariance. Pooling layers help retain the most important 
information while discarding less relevant details. Alzubaidi et al (2021) 
Acta Wasaensia 145 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 11 
After the convolutional and pooling layers, the spatial information is flat-
tened into a vector, and fully connected layers are employed. These layers 
connect every neuron to every neuron in the subsequent layer, enabling the 
model to learn high-level abstractions. Alzubaidi et al (2021) 
The different CNNs that have been developed in the following subsections 
provide an overview of the most important ones in relation to this research. 
4.1 AlexNet and ZFNet 
AlexNet (Figure 6) is a large, deep convolutional neural network. It has 60 
million parameters and 500,000 neurons. The architecture of AlexNet consists 
of five convolutional layers. Some of the convolutional layers are followed by 
maxpooling layers, and finally, the architecture ends with a 1000-way softmax 
layer. AlexNet uses a regularization method to reduce overfitting. ZFNet is an 
improved version of AlexNet. ZFNet adjusts the filter sizes using smaller filters 
(7x7 for the first layer and 3x3 for the subsequent layers). This modification 
aims to capture finer details in the images. The larger stride of ZFNet in the 
first layer helps reduce the spatial dimensions more quickly. Krizhevsky et al 
(2017); Zeiler and Fergus (2014) 
Fig. 6 AlexNet architecture Han et al (2017). 
4.2 VGG 
VGG (Visual Geometry Group), presented in Figure 7 has 16-19 layers. VGG 
is a convolutional neural network (CNN) architecture designed for image clas-
sification tasks. It was developed by researchers at the University of Oxford’s 
Visual Geometry Group. VGGs come in different versions with varying depths. 
For example, VGG16 has 16 layers, and VGG19 has 19 layers. The number 
represents how deep the network is. More layers can capture more complex 
features but require more computations. Simonyan and Zisserman (2014) One 
of the key features of VGG is its simplicity and depth. Unlike some other CNN 
architectures, VGG uses a straightforward structure. 
146 Acta Wasaensia 
Springer Nature 2021 LATEX template 
12 Using convolutional neural networks to classify subtle colour differences 
Fig. 7 VGG architecture Paul et al (2020). 
4.3 ResNet 
ResNet (Residual Network (Figure 8)) iis a 152-layer architecture that has 
been developed to maximize network depth while limiting complexity. ResNet 
introduces "residual blocks," which allow information to bypass one or more 
layers, creating shortcut connections. These connections help gradient flow 
more effectively during training, enabling the training of extremely deep net-
works. He et al (2016). Being a very deep network, ResNet outperforms many 
networks, but it might be complex to train and use considerable amounts of 
memory. 
Fig. 8 ResNet architecture He et al (2016) 
4.4 GoogLeNet 
GoogLeNet (Figure 9) is a deep convolutional neural network (CNN) archi-
tecture with 22 layers. It is known for its efficiency and performance in image 
recognition tasks. The key innovation in GoogLeNet is the use of "Incep-
tion modules," which are blocks of layers that perform multiple types of 
convolutions simultaneously, allowing the network to capture diverse features 
at different scales. This architecture helps reduce computational complex-
ity while maintaining high accuracy. Szegedy et al (2015) The strength of 
the GoogLeNet is that it has increased depth and width without sacrificing 
computational efficiency. 
4.5 DenseNet 
DenseNet (Figure 10), or Densely Connected Convolutional Network, is a deep 
learning architecture that maximizes information flow between layers. Unlike 
Acta Wasaensia 147 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 13 
Fig. 9 Simplified GoogLeNet Kishore and Singh (2015) 
traditional networks, DenseNet connects each layer to every other layer in a 
dense manner. This dense connectivity enables efficient feature reuse and gra-
dient propagation, resulting in highly efficient and accurate models with fewer 
parameters. used in a comparison experiment to evaluate different training 
strategies. Zhu and Newsam (2017) 
Fig. 10 DenseNet architecture Kim and Jang (2023) 
4.6 EfficientNet 
The EfficientNet (Figure 11) models are designed to achieve state-of-the-art 
performance in terms of accuracy while being computationally efficient, making 
them suitable for a wide range of applications, including devices with limited 
resources. EfficientNetB1, which is used in this research, specifically represents 
the baseline model in the EfficientNet series, with a balance between model size 
and performance. The "B1" indicates its relative scale within the EfficientNet 
family; larger variants, denoted by B2, B3, etc.,etc., have more parameters 
and are computationally more demanding but may offer improved accuracy. 
EfficientNet architectures are characterized by a compound scaling approach, 
where the model’s depth, width, and resolution are scaled together to find an 
optimal balance. This scaling strategy allows EfficientNet models to achieve 
148 Acta Wasaensia 
Springer Nature 2021 LATEX template 
14 Using convolutional neural networks to classify subtle colour differences 
better performance than traditional architectures with similar computational 
costs. Tan and Le (2019) 
Fig. 11 EfficientNet architecture Ahmed and Sabab (2022) 
4.7 Comparison of the algorithms 
Summary of used architectures, their layers, and amount of trainable param-
eters is show in the Table 2. 
Table 2 Summary of used CCN architectures 
Architecture Year Depth Layers Trainable Param. 
AlexNet 2012 Deep 8 ∼60 million 
VGG-16 2014 Very Deep 16-19 ∼138 million 
GoogLeNet 2014 Deep 22 ∼6.7 million 
ResNet 2015 Very Deep 50-152 ∼25.6 million 
ZFNet 2013 Deep 8 ∼62 million 
DenseNet 2017 Very Deep 121-169 ∼28.9 million 
EfficientNetB1 2019 Very Deep ∼66 ∼7 million 
Based on past research, several popular CNN architectures have been com-
pared to each other. These results offer some indications and guidelines for 
this research. 
Naseer et al. demonstrated that AlexNet outperforms LeNet, AlexNet, 
VGG16, ResNet-50, and Inception-V1 in the detection of lung cancer (LUNA16 
dataset). The lung cancer dataset contains CT scan images with different 
shapes and colors that show whether the patient has lung cancer. Addition-
ally, Naseer et al. showed that in their dataset, AlexNet works best when the 
SGD optimizer is used. Naseer et al (2022) 
Neris et al. compared the ResNet50, AlexNet, VGG19, MobileNet, and 
DenseNet architectures on three different datasets, namely, the MASATI 
Acta Wasaensia 149 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 15
dataset and the airplances dataset. The MobileNet and AlexNet architectures 
achieved the best performances on the MASATI dataset. On the airplane 
dataset, ResNet was the best architecture. Neris et al (2021) 
Saikiaa et al. tested the performance of VGG16, VGG19, ResNet-50, and 
GoogLeNet-V3 on FNAC cell sample images. Their approach used only one 
channel, red, and thresholding methods to eliminate noise in images. In their 
research, fine-tuned GoogLeNet performed best. Saikia et al (2019) 
In their research, Ahad et al. compared different CNNs for rice disease 
classification. Their use case is quite close to one that relates to this research. 
Rice leaves are colored mostly green, but diseases are shown as different colors 
in leaves, for example, brown or yellow areas. Ahad et al. used DenseNet121, 
Inceptionv3, MobileNetV2, ResNet101, ResNet152V, ResNet101, and Xception 
CNNs. Most of the models had over 95% accuracy. Ahad et al (2023) 
The methods used by Maeda-Gutiérrez et al. were quite similar to those 
used by Ahad et al., as they focused on tomato leaves. They used AlexNet, 
GoogleNet, InceptionV3, ResNet18, and ResNet50. All the CNNs used had an 
accuracy of more than 98%, and GoogleNet was the best. Maeda-Gutiérrez 
et al (2020) 
The performances of 21 different CNN architectures for detecting COVID-
19 were evaluated byt Breve. In his research, he used a dataset that contained 
chest X-ray images. In some of the classification cases, the difference between 
positive and negative classifications was small. He showed that DenseNet169 
outperformed other CNNs. Breve (2022) 
Bressem et al. also used chest radiographs as their dataset. They used five 
different architectures, ResNet, DenseNet, VGG, SqueezeNet, Inception v4, 
and AlexNet, to classify images. Interestingly, shallow networks can compete 
with deeper and more complex CNNs. Specifically, they showed that eight-
layer AlexNet can achieve the same results as 121-layer DenseNet. Bressem 
et al (2020) 
Based on the results of past research, it can be expected that CNN architec-
tures will perform quite equally in this use case. The deep architecture might 
not have better performance than the shallow architecture, but training the 
deep architecture requires more time and memory. Past research has shown 
that AlexNet, GoogLeNet, and DenseNet perform well in classification tasks 
that are closely related to the datasets used in this research. 
5 Experiments 
The experiments were run first with all the architectures mentioned in the 
previous chapters. For training, 30 epochs were used, and the total training 
time was measured when the training process was run with an Apple M2 
processor, which had 8 cores and a total RAM of 16 Gt. 
Only standard versions of each archicture were used, and optimizations 
regarding hyperparameters, such as momentum, learning rate, and batch size, 
were not used. The purpose of the experiments was to determine which of 
150 Acta Wasaensia 
Springer Nature 2021 LATEX template 
16 Using convolutional neural networks to classify subtle colour differences 
the architectures provides the best baseline for further development. Only the 
needed top layers were changed to match the number of labels. 
Data augmentation was used to extend the training set, and the parameters 
for the augmentation were as follows: 
- Rotation of each image by 90 degrees 
- Width shift = 0.2, the image is shifted by 0.2 fraction in horizontal directions 
- Height shift = 0.2, the image is shifted by 0.2 fraction in vertical direction 
- Shear range = 0.2, shears is a transformation where the image is skewed, 
the left edge of the image is moved up, and the right edge down. 
- Zoom range = 0.2, the image is zoomed out so the image is only 20% as 
large as the input 
- Horizontal flip, the image is flipped by its horizontal axis 
- Vertical flip, the image is flipped by its vertical axis 
Each CNN was trained in the first experiment with 80% of the images 
in the dataset, and 20% of the images were used for the validation. In total, 
5051 images were used for training, and 1249 were used for validation. In the 
second experiment, the split between the training and validation sets was kept 
the same, totaling 1488 images belonging to the training set and 372 to the 
validation set. 
All CNN models were compiled with the cross-entropy loss function and 
Adam optimizer. Research on other compiling options was excluded from the 
scope of this research. Cross-entropy is commonly used as a loss function in 
classification tasks. Its goal is to minimize the cross-entropy between the pre-
dicted probabilities and the true labels. In practice, this means that during 
the experiments, cross-entropy yields different values depending on how many 
of the images are correctly classified into the correct class Shore and Johnson 
(1981). In total, dataset one had 25 classes, and dataset two had 11 classes. 
Experiments were run for both datasets: a) larger color differences and b) 
datasets with smaller differences. For the latter dataset, transfer learning was 
used. For transfer learning, the first training process was used as a pretrained 
model; this model contains the weights and biases from the first dataset. The 
model was customized by setting the trainable property of the model to false, 
which prevents the weights in nontrainable layers from being updated. Then, 
the top layers and fully connected layers were changed to matched classes in 
a later dataset. 
Performance of the each model was evaluated with the following metrics. 
- Accuracy, ratio of correctly predicted instances to the total instances: 
Accuracy = 
NumberofCorrectP redictions 
T otalNumberofP redictions 
Accuracy was calculated for the whole architecture and used as the main 
metric when architures were compared. Additionally, a confusion matrix was 
created for a detailed breakdown of correct and incorrect predictions for each 
class in a classification problem. From the confusion matrix, further metrics 
were derived: 
Acta Wasaensia 151 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 17 
- Precision (Positive Predictive Value): The ratio of correctly predicted 
positive observations to the total predicted positives. 
P recision = 
TP 
TP + FP 
- Recall (Sensitivity or True Positive Rate): The ratio of correctly predicted 
positive observations to all actual positives. 
Recall = 
TP 
TP + FN 
- Definition, the weighted average of precision and recall, providing a 
balance between the two metrics. 
F1 = 
2 ∗ P recision ∗ Recall 
P recision + Recall 
= 
2 ∗ TP 
2 ∗ TP + FP + FN 
Where TP is True Positives, FP is False Positives, FN is False Negatives 
and TN is True Negatives. 
Recall, definition and F1-score were mostly used to analyze performance in 
more detail and to find out where each architecture makes mistakes or work 
well. 
6 Results 
This section shows the results of experiments performed for both datasets. 
6.1 First dataset 
Experiment 1 was conducted on a dataset with large colour differences. For this 
experiment, all architectures were used. The following table shows the total 
accuracy of each architecture. The accuracy is the final validation accuracy of 
the models achieved after 30 epochs of training. 
Table 3 Results of first experiment 
Architecture avg. train time / epoch (s.) Accuracy 
DenseNet 690s. 1.00 
ResNet 626s. 0.98 
VGG-16 570s. 0.97 
AlexNet 110s. 0.93 
EfficientNetB1 359s. 0.90 
ZFNet 340s. 0.88 
GoogLeNet 1639s. 0.87 
152 Acta Wasaensia 
Springer Nature 2021 LATEX template 
18 Using convolutional neural networks to classify subtle colour differences 
All the architectures achieve high accuracy. This means that all of them are 
suitable for color difference recognition in the first dataset, where color differ-
ences are larger. The results show that very deep networks perform better than 
deep networks. Most of these networks have a large number of layers, except 
for AlexNet, which has only eight layers. This is also the case for the number 
of epochs, for which AlexNet is the fastest to train. Based on the DenseNet 
experiment, RseNet, VGG-16 and AlexNet seem to be able to classify colours 
well, as they reach accuracies greater than 95%. All of these architectures have 
their own approach to creating the CNN architecture, but as seen from the 
following images, they can reach 100% or close accuracy during the training 
process. The best four architectures respond quite well to the training process, 
as shown in Figure 12, where their training and validation process evolution 
is shown. 
Fig. 12 Training and validation results of best architectures 
From the confusion matrixes, several conclusions can be drawn. The best 
architecture, DenseNet, classifies all the images correctly. Figure 13 shows the 
confusion caused by other architectures that achieved high accuracy. ResNet 
confuses shades of magenta and cyan in some cases. VGG-16 sometimes has 
Acta Wasaensia 153 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 19 
problems when classifying yellow shades. AlexNet mostly recognizes all classes 
well; some dark shades of gray are mixed, and magenta is sometimes recognized 
as black. Some examples of confusion are shown in the following Figure 13. 
Fig. 13 Some confusion made by best models 
The first experiment yielded promising results in which selected CNNs 
could be used in the given use case to recognize colour differences. These differ-
ences are quite significant in dataset one; further experiments were performed 
with dataset two, where the differences are smaller. In the first dataset, there 
seems to be no indication of overfitting of the model. 
6.2 Second dataset 
For the second experiment, the four best CNN architectures were used; these 
CNNs obtained their weights and biases from the first dataset through transfer 
learning. In the second experiment, the same training and validation process 
was used. The Table 4 summarizes the time used per epoch and the final 
accuracy of the models. 
Table 4 Results of the second experiment 
CNN avg train time / epoch (s.) Accuracy 
VGG-16 690s. 0.34 
ALEXNET 13s. 0.47 
DENSENET 201s. 0.77 
RESNET 725s. 0.95 
The result of the evolution of the model training process is shown in the 
following Figure 14 
AlexNet benefits from the use of more epochs; its training accuracy 
improves over time, and the training loss decreases, although it seems to settle 
in the final epoch. Overall, the performance of AlexNet decreases significantly 
on the second dataset. The training and validation accuracies follow each other, 
so the model learns throughout the process, which also supports the use of 
154 Acta Wasaensia 
Springer Nature 2021 LATEX template 
20 Using convolutional neural networks to classify subtle colour differences 
Fig. 14 Training and validation results of second experiment 
more epochs. The training accuracy of VGG-16 improves throughout the train-
ing process but not substantially (0.2... 0.3), while its training loss decreases 
when the model does not learn well enough. Even with more epochs, VGG-16 
will not reach as high accuracy as better models. Additionally, validation loss 
seems not to follow training loss. DenseNet is the most accurate of all archi-
tectures in the second experiment. The model learns from the data throughout 
the process, and its training accuracy improves. However, although the vali-
dation accuracy increases, it behaves unpredictably. Additionally, the model 
validation loss does not decrease after the first epochs. As the best model, 
DenseNet might be the best model for use in the presented use case. ResNet 
achieves high accuracy in the first epochs, which shows that transfer learning 
works well. The accuracy of the model does not improve much after ten epochs, 
and the training loss seems to stabilize before ten epochs. This indicates that 
ResNet is prone to overfitting, the solution for this could be to implement early 
stopping. 
According to the confusion matrix and further investigation of the per class 
performance of the models (Tables 5 and 6), most of the models can classify 
items correctly if there is no color presented (class = 0) or when there is at 
least 10% intensity of some color. However, when there is only 5% intensity 
different classes are confused. 
Acta Wasaensia 155 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 21 
Table 5 AlexNet and VGG-16 results from second experiment 
AlexNet VGG-16 
Class precision recall f1-score precision recall f1-score 
0 0.00 0.00 0.00 0.48 0.90 0.63 
10C 1.00 1.00 1.00 0.19 0.75 0.30 
10CY 1.00 0.81 0.89 0.04 0.02 0.02 
10K 1.00 1.00 1.00 0.71 0.05 0.10 
10M 1.00 0.50 0.67 0.00 0.00 0.00 
10Y 0.00 0.00 0.00 0.30 0.42 0.35 
5C 0.20 1.00 0.33 0.00 0.00 0.00 
5CY 0.00 0.00 0.00 1.00 0.60 0.75 
5K 0.60 0.67 0.63 1.00 1.00 1.00 
5M 0.00 0.00 0.00 1.00 0.04 0.08 
5Y 0.00 0.00 0.00 0.00 0.00 0.00 
Table 6 DenseNet and ResNet results from second experiment 
DenseNet ResNet 
Class precision recall f1-score precision recall f1-score 
0 0.33 0.50 0.40 1.00 0.89 0.94 
10C 1.00 0.44 0.62 1.00 1.00 1.00 
10CY 1.00 1.00 1.00 1.00 1.00 1.00 
10K 1.00 1.00 1.00 1.00 1.00 1.00 
10M 1.00 1.00 1.00 0.71 1.00 0.83 
10Y 1.00 1.00 1.00 1.00 1.00 1.00 
5C 1.00 0.67 0.80 1.00 1.00 1.00 
5CY 0.69 1.00 0.82 1.00 1.00 1.00 
5K 1.00 0.89 0.94 1.00 1.00 1.00 
5M 0.38 0.33 0.35 0.97 0.46 0.62 
5Y 0.50 0.67 0.57 0.86 1.00 0.92 
Based on the results presented, the best accuracy can be achieved with 
the ResNet architecture, which had the best performance in both experiments. 
ResNet can accurately classify all colors other than very low-intensity yellow 
and magenta. The mistakes that ResNet makes are that 5% of the images are 
classified as magenta, while 10% are classified as magenta, and vice versa. Addi-
tionally, 5% yellow, which is quite challenging for the human eye to recognize, 
is classified as paper white. 
7 Discussion 
All the selected CNNs proved that they can be used for the recognition of color 
differences, especially those with large intensity differences (>10%). When 
the color difference decreases to only 5% or 10% (Figure 15) in some CMYK 
channels, most architectures struggle to classify images correctly. 
The results show that different CNN architectures can be used for color 
difference recognition. Even worse, GoogLeNet achieved an accuracy of 88% in 
the first experiment. Deeper CNNs can usually identify more complex features, 
156 Acta Wasaensia 
Springer Nature 2021 LATEX template 
22 Using convolutional neural networks to classify subtle colour differences 
Fig. 15 Examples of small colour differences 
but they are more difficult to train. Wider networks can capture more fine-
grained features, but they have difficulties with high-level features. Tan and 
Le (2019) 
In this use case, it is interesting that the depth of the network and the 
number of parameters are directly correlated with the accuracy of the net-
work when the colour difference is large. All the very deep networks manage 
experiment one well, except for EfficientNetB1. 
When the color differences decrease, the background noise from the paper 
and printer also becomes more significant. Another issue is that when images go 
through autoleveling, which should improve the colors of the image and make 
the model more generalized for different ambient light conditions, different 
colors close to paper-white might lose some important color information. In 
this experiment, only one of the models achieved an accuracy greater than 
90% when the color difference was less than 5%. The best of the architectures 
was ResNet, with 95% accuracy. Most of the architectures were able to classify 
images correctly if the difference was 10%, but when the difference decreased 
to 5%, all architectures had problems. This research did not use K-fold cross-
validation for validating the performance of the model, and the use of K-fold 
cross-validation should help to increase the reliability of the results obtained 
by Wong and Yeh (2019). 
The best two architectures, ResNet and DenseNet, use an approach that 
connects layers not only to the previous and following layers. However, they 
have either a dense connection (DenseNet) where the architecture connects 
each layer to other layers separately in a feed-forward manner Zhu and Qiu 
(2021).For example, in ResNet, the architecture allows data to flow from other 
layers directly to the subsequent layers Zhang et al (2019b). These connectivity 
types seem to make a large difference in performance when the model is trying 
to classify items based on small differences. 
Similar research performed in the past showed improved results when CNNs 
were used. In agriculture, Kumar et al. used the KNN algorithm to identify 
three colours (yellow, blue and green) in their research. They stated that KNN 
was the best method for machine learning colour classification Kumar et al 
(2022). In addition to artificial neural network (ANN) classifiers and support 
vector machine (SVM) classifiers, KNN was also used in research by Kurtulmus 
et al., who reported problems with fruit colors that were close to each other 
and could not reach an accuracy greater than 90%. Kurtulmus et al (2014) 
Aznan et al. also used ANNs to classify rice seeds; however, with 40 hidden 
Acta Wasaensia 157 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 23 
neuron models, they reached an accuracy of only approximately 70% Aznan 
et al (2017). The results presented in Kumar et al., Kurtulmus et al. and Aznan 
et al. not only show that various approaches can reach high accuracy in colour 
classification but also that none of them use convolutional neural networks. 
Based on the results presented in this research, convolutional neural networks 
might be used to achieve better results in agriculture and food production 
when items are classified based on their colors. In the context of clothes sorting, 
Furferi and Governi presented a method using a self-organizing feature map 
(SOFM) and a feed-forward backpropagation artificial neural network (FFBP 
ANN)-based approach to classify clothes into 85 classes. Their research was 
quite old but already showed only 5% mean error; they had problems mainly 
with beige and brown colours, which the ResNet approach presented in this 
research might overcome. Furferi and Governi (2008) 
8 Conclusion 
This research focuses on solving the following question: can convolutional neu-
ral networks be used to recognize small color differences in printed sources? 
Source data for the project contained various images, which stored different 
information between paper white and source color. The colors used varied 
among the different CMYK colors and intensities. The smallest intensity dif-
ference used was 5%. The nature of the problem is prone to overfitting, and 
the images that were used contain only the difference between color and paper-
white, not actual high-level features; noise and small changes in images make 
the used models easily nongeneric. This makes it challenging for CNN models 
to learn from the data. When CNNs are compared in this research, mak-
ing CNNs deeper or wider does not guarantee automatically better results. 
The results of the research show that the ResNet architecture can be used to 
identify small color differences. The difference between ResNet and other archi-
tectures is that ResNet implements residual connections. These connections 
enable the network to learn residual functions rather than directly learning the 
desired underlying mapping. In practice, this means that each layer can focus 
on learning the incremental changes needed to transform the input. The skip 
connections in ResNets facilitate the reuse of features from earlier layers in 
the network. This is advantageous for recognizing small differences because it 
allows the network to preserve and leverage low-level features that may be rele-
vant for distinguishing between similar classes or categories in the data. As the 
best architecture, ResNet achieved 95% accuracy in the small difference exper-
iment, but when the colour difference was larger (over 10%), ResNet’s accuracy 
was 100%. Further research should be conducted with different ResNet mod-
ifications and versions, as well as with different training parameters (such as 
different optimizers). However, more research on improving the source data is 
needed. This can be seen especially in low-intensity images, where the color 
information is destroyed by noise, preprocessing, or camera settings. This 
research would also make the solution more generic. Future research could 
158 Acta Wasaensia 
Springer Nature 2021 LATEX template 
24 Using convolutional neural networks to classify subtle colour differences 
address this problem through ablation study and look into more detailed 
explanation of why different architectures work Sheikholeslami (2019). 
Acknowledgements This work was supported by Finnish Cultural Foun-
dation’s Central Ostrobothnia Regional Fund (Grant Number 25211242). 
Data availability One of the datasets used in this manuscript is available 
as Zenodo repository: 10.5281/zenodo.7749912. 
Conflict of interest There are no conflicts of interest. 
References 
Ahad MT, Li Y, Song B, et al (2023) Comparison of cnn-based deep learn-
ing architectures for rice diseases classification. Artificial Intelligence in 
Agriculture 9:22–35 
Ahmed HO, Nandi AK (2021) Connected components-based colour image 
representations of vibrations for a two-stage fault diagnosis of roller bear-
ings using convolutional neural networks. Chinese Journal of Mechanical 
Engineering 34:1–21 
Ahmed T, Sabab NHN (2022) Classification and understanding of cloud 
structures via satellite images with efficientunet. SN Computer Science 
3:1–11 
Albani D, Youssef A, Suriani V, et al (2017) A deep learning approach for 
object recognition with nao soccer robots. In: Behnke S, Sheh R, Sarıel S, 
et al (eds) RoboCup 2016: Robot World Cup XX. Springer International 
Publishing, Cham, pp 392–403 
Alzubaidi L, Zhang J, Humaidi AJ, et al (2021) Review of deep learning: Con-
cepts, cnn architectures, challenges, applications, future directions. Journal 
of big Data 8:1–74 
Anandhakrishnan T, Jaisakthi S (2022) Deep convolutional neural networks 
for image based tomato leaf disease detection. Sustainable Chemistry and 
Pharmacy 30:100,793 
Apriyanti DH, Spreeuwers LJ, Lucas PJ, et al (2021) Automated color 
detection in orchids using color labels and deep learning. PloS one 
16(10):e0259,036 
Arsenovic M, Karanovic M, Sladojevic S, et al (2019) Solving current lim-
itations of deep learning based approaches for plant disease detection. 
Symmetry 11(7):939 
Atha DJ, Jahanshahi MR (2018a) Evaluation of deep learning 
approaches based on convolutional neural networks for corro-
sion detection. Structural Health Monitoring 17(5):1110–1128. 
Acta Wasaensia 159 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 25 
https://doi.org/10.1177/1475921717737051 
Atha DJ, Jahanshahi MR (2018b) Evaluation of deep learning approaches 
based on convolutional neural networks for corrosion detection. Structural 
Health Monitoring 17(5):1110–1128 
Aznan A, Ruslan R, Rukunudin I, et al (2017) Rice seed varieties identifica-
tion based on extracted colour features using image processing and artificial 
neural network (ann). Int J Adv Sci Eng Inf Technol 7(6):2220–2225 
Bimler DL, Kirkland J, Jameson KA (2004) Quantifying variations in per-
sonal color spaces: Are there sex differences in color vision? Color Research 
& Application: Endorsed by Inter-Society Color Council, The Colour Group 
(Great Britain), Canadian Society for Color, Color Science Association of 
Japan, Dutch Society for the Study of Color, The Swedish Colour Cen-
tre Foundation, Colour Society of Australia, Centre Français de la Couleur 
29(2):128–134 
Boulent J, Foucher S, Théau J, et al (2019) Convolutional neural networks 
for the automatic identification of plant diseases. Frontiers in plant science 
10:941 
Bressem KK, Adams LC, Erxleben C, et al (2020) Comparing different 
deep learning architectures for classification of chest radiographs. Scientific 
reports 10(1):13,590 
Breve FA (2022) Covid-19 detection on chest x-ray images: A compari-
son of cnn architectures and ensembles. Expert systems with applications 
204:117,549 
Brodzicki A, Piekarski M, Kucharski D, et al (2020) Transfer learning methods 
as a new approach in computer vision tasks with small datasets. Foundations 
of Computing and Decision Sciences 45(3):179–193 
Büyükarıkan B, Ülker E (2022) Using convolutional neural network models 
illumination estimation according to light colors. Optik 271:170,058. https: 
//doi.org/https://doi.org/10.1016/j.ijleo.2022.170058 
Cha YJ, Choi W, Büyüköztürk O (2017) Deep learning-based crack damage 
detection using convolutional neural networks. Computer-Aided Civil and 
Infrastructure Engineering 32(5):361–378 
Chauhan MS (2018) Optimizing gaussian blur filter using cuda parallel frame-
work. Information Technology Department, College of Applied Sciences Ibri, 
Sulatanate of Oman 
160 Acta Wasaensia 
Springer Nature 2021 LATEX template 
26 Using convolutional neural networks to classify subtle colour differences 
Chen FC, Jahanshahi MR (2017) Nb-cnn: Deep learning-based crack detec-
tion using convolutional neural network and naïve bayes data fusion. IEEE 
Transactions on Industrial Electronics 65(5):4392–4400 
Cheng G, Guo W (2017) Rock images classification by using deep convolution 
neural network. In: Journal of Physics: Conference Series, IOP Publishing, 
p 012089 
Engilberge M, Collins E, Süsstrunk S (2017) Color representation in deep neu-
ral networks. In: 2017 IEEE International Conference on Image Processing 
(ICIP), IEEE, pp 2786–2790 
Flachot A, Gegenfurtner KR (2021) Color for object recognition: Hue and 
chroma sensitivity in the deep features of convolutional neural networks. 
Vision Research 182:89–100 
Fuentes A, Yoon S, Kim SC, et al (2017) A robust deep-learning-based detector 
for real-time tomato plant diseases and pests recognition. Sensors 17(9). 
https://doi.org/10.3390/s17092022 
Furferi R, Governi L (2008) The recycling of wool clothes: an artificial neural 
network colour classification tool. The International Journal of Advanced 
Manufacturing Technology 37:722–731 
Hakola L, Vehmas K, Smolander M (2021) Functional inks and indicators 
for smart tag based intelligent packaging applications. Journal of Applied 
Packaging Research 13(2):3 
Han X, Zhong Y, Cao L, et al (2017) Pre-trained alexnet architecture with 
pyramid pooling and supervision for high spatial resolution remote sensing 
image scene classification. Remote Sensing 9(8). https://doi.org/10.3390/ 
rs9080848 
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recogni-
tion. In: Proceedings of the IEEE conference on computer vision and pattern 
recognition, pp 770–778 
Isohanni J (2022) Use of functional ink in a smart tag for fast-moving consumer 
goods industry. Journal of Packaging Technology and Research 6(3):187–198 
Isohanni J (2023) Qr-code dataset, with colour embed inside. https://doi.org/ 
10.5281/zenodo.7749912 
Jhawar J (2016) Orange sorting by applying pattern recognition on colour 
image. Procedia Computer Science 78:691–697. https://doi.org/https://doi. 
org/10.1016/j.procs.2016.02.118, 1st International Conference on Informa-
tion Security & Privacy 2015 
Acta Wasaensia 161 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 27 
Kagaya H, Aizawa K, Ogawa M (2014) Food detection and recognition using 
convolutional neural network. In: Proceedings of the 22nd ACM Interna-
tional Conference on Multimedia. Association for Computing Machinery, 
New York, NY, USA, MM ’14, p 1085–1088, https://doi.org/10.1145/ 
2647868.2654970 
Kentsch S, Lopez Caceres ML, Serrano D, et al (2020) Computer vision and 
deep learning techniques for the analysis of drone-acquired forest images, 
a transfer learning study. Remote Sensing 12(8). https://doi.org/10.3390/ 
rs12081287 
Kim GI, Jang B (2023) Petroleum price prediction with cnn-lstm and cnn-gru 
using skip-connection. Mathematics 11(3):547 
Kishore A, Singh S (2015) Natural language image descriptor. In: 2015 IEEE 
Recent Advances in Intelligent Computational Systems (RAICS), IEEE, pp 
110–115 
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep 
convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/ 
10.1145/3065386 
Kumar NS, Maheswari SU, Pramila P, et al (2022) Colour based object classifi-
cation using knn algorithm for industrial applications. In: 2022 International 
Conference on Automation, Computing and Renewable Systems (ICACRS), 
IEEE, pp 1110–1115 
Kumar SS, Abraham DM, Jahanshahi MR, et al (2018) Automated defect 
classification in sewer closed circuit television inspections using deep convo-
lutional neural networks. Automation in Construction 91:273–283 
Kurtulmus F, Lee WS, Vardar A (2014) Immature peach detection in colour 
images acquired in natural illumination conditions using statistical classifiers 
and neural network. Precision agriculture 15:57–79 
Lai P, Westland S (2020) Machine learning for colour palette extraction from 
fashion runway images. International Journal of Fashion Design, Technol-
ogy and Education 13(3):334–340. https://doi.org/10.1080/17543266.2020. 
1799080 
Lamb N, Chuah MC (2018) A strawberry detection system using convolutional 
neural networks. In: 2018 IEEE International Conference on Big Data (Big 
Data), pp 2515–2520, https://doi.org/10.1109/BigData.2018.8622466 
Li J, Sun T, Lin Q, et al (2022) Reducing negative transfer learning via 
clustering for dynamic multiobjective optimization. IEEE Transactions on 
Evolutionary Computation 26(5):1102–1116 
162 Acta Wasaensia 
Springer Nature 2021 LATEX template 
28 Using convolutional neural networks to classify subtle colour differences 
Luo J, Zhang D (2003) Automatic colour printing inspection by image pro-
cessing. Journal of Materials Processing Technology - J MATER PROCESS 
TECHNOL 139:373–378. https://doi.org/10.1016/S0924-0136(03)00534-X 
Maeda-Gutiérrez V, Galván-Tejada CE, Zanella-Calzada LA, et al (2020) 
Comparison of convolutional neural network architectures for classification 
of tomato plant diseases. Applied Sciences 10(4):1245 
Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-
based plant disease detection. Frontiers in plant science 7:1419 
Muhammad K, Ahmad J, Baik SW (2018) Early fire detection using convo-
lutional neural networks during surveillance for effective disaster manage-
ment. Neurocomputing 288:30–42. https://doi.org/https://doi.org/10.1016/ 
j.neucom.2017.04.083, learning System in Real-time Machine Vision 
Nalinipriya G, Baluswarny B, Patan R, et al (2018) To detect and recog-
nize object from videos for computer vision by parallel approach using deep 
learning. In: 2018 International Conference on Advances in Computing and 
Communication Engineering (ICACCE), pp 336–341, https://doi.org/10. 
1109/ICACCE.2018.8441718 
Naseer I, Akram S, Masood T, et al (2022) Performance analysis of state-of-
the-art cnn architectures for luna16. Sensors 22(12):4426 
Neris R, Guerra R, López S, et al (2021) Performance evaluation of state-
of-the-art cnn architectures for the on-board processing of remotely sensed 
images. In: 2021 XXXVI Conference on Design of Circuits and Integrated 
Systems (DCIS), IEEE, pp 1–6 
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Transactions on 
knowledge and data engineering 22(10):1345–1359 
Paul R, Hassan MSu, Moros EG, et al (2020) Deep feature stability analy-
sis using ct images of a physical phantom across scanner manufacturers, 
cartridges, pixel sizes, and slice thickness. Tomography 6(2):250–260 
Przybyło J, Jabłoński M (2019) Using deep convolutional neural network 
for oak acorn viability recognition based on color images of their sections. 
Computers and electronics in agriculture 156:490–499 
Przybyło J, Jabłoński M (2019) Using deep convolutional neural network for 
oak acorn viability recognition based on color images of their sections. Com-
puters and Electronics in Agriculture 156:490–499. https://doi.org/https: 
//doi.org/10.1016/j.compag.2018.12.001 
Acta Wasaensia 163 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 29 
Ravishankar H, Sudhakar P, Venkataramani R, et al (2016) Understanding the 
mechanisms of deep transfer learning for medical images. In: Deep Learning 
and Data Labeling for Medical Applications: First International Workshop, 
LABELS 2016, and Second International Workshop, DLMIA 2016, Held 
in Conjunction with MICCAI 2016, Athens, Greece, October 21, 2016, 
Proceedings 1, Springer, pp 188–196 
Saikia AR, Bora K, Mahanta LB, et al (2019) Comparative assessment of cnn 
architectures for classification of breast fnac images. Tissue and Cell 57:8–14 
Senthilkumar M (2010) 5 - use of artificial neural networks (anns) in colour 
measurement. In: Gulrajani M (ed) Colour Measurement. Woodhead Pub-
lishing Series in Textiles, Woodhead Publishing, p 125–146, https://doi.org/ 
https://doi.org/10.1533/9780857090195.1.125 
Sheikholeslami S (2019) Ablation programming for machine learning 
Shore J, Johnson R (1981) Properties of cross-entropy minimization. IEEE 
Transactions on Information Theory 27(4):472–482 
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-
scale image recognition. arXiv preprint arXiv:14091556 
Soukup D, Huber-Mörk R (2014) Convolutional neural networks for steel 
surface defect detection from photometric stereo images. In: International 
Symposium on Visual Computing, Springer, pp 668–677 
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: 2015 
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 
pp 1–9, https://doi.org/10.1109/CVPR.2015.7298594 
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional 
neural networks. In: International conference on machine learning, PMLR, 
pp 6105–6114 
Taylor GA (2003) Improving image contrast. American Journal of Roentgenol-
ogy 180(2):329–331 
Tiwari S (2018) An analysis in tissue classification for colorectal cancer his-
tology using convolution neural network and colour models. International 
Journal of Information System Modeling and Design (IJISMD) 9(4):1–19 
Varde AS, Karthikeyan D, Wang W (2023) Facilitating covid recognition from 
x-rays with computer vision models and transfer learning. Multimedia Tools 
and Applications pp 1–32 
164 Acta Wasaensia 
Springer Nature 2021 LATEX template 
30 Using convolutional neural networks to classify subtle colour differences 
Vidal-Calleja T, Miró JV, Martín F, et al (2014) Automatic detection and 
verification of pipeline construction features with multi-modal data. In: 2014 
IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 
3116–3122, https://doi.org/10.1109/IROS.2014.6942993 
Wong TT, Yeh PY (2019) Reliable accuracy estimates from k-fold cross vali-
dation. IEEE Transactions on Knowledge and Data Engineering 32(8):1586– 
1594 
Wu J, Zhang B, Zhou J, et al (2019) Automatic recognition of ripening toma-
toes by combining multi-feature fusion with a bi-layer classification strategy 
for harvesting robots. Sensors 19(3). https://doi.org/10.3390/s19030612 
Wu Y, Qin X, Pan Y, et al (2018) Convolution neural network based trans-
fer learning for classification of flowers. In: 2018 IEEE 3rd international 
conference on signal and image processing (ICSIP), IEEE, pp 562–566 
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional 
networks. In: Computer Vision–ECCV 2014: 13th European Conference, 
Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, Springer, 
pp 818–833 
Zhang Q, Zhuo L, Li J, et al (2018) Vehicle color recognition using multiple-
layer feature representations of lightweight convolutional neural network. 
Signal Processing 147:146–153 
Zhang S, Huang W, Zhang C (2019a) Three-channel convolutional neural net-
works for vegetable leaf disease recognition. Cognitive Systems Research 
53:31–41 
Zhang W, Li X, Ding Q (2019b) Deep residual learning-based fault diagnosis 
method for rotating machinery. ISA transactions 95:295–305 
Zhao Y, Shen Q, Wang Q, et al (2020) Recognition of water colour anomaly 
by using hue angle and sentinel 2 image. Remote Sensing 12(4). https://doi. 
org/10.3390/rs12040716 
Zhu C, Zou B, Zhao R, et al (2017) Retinal vessel segmentation in colour 
fundus images using extreme learning machine. Computerized Medical 
Imaging and Graphics 55:68–77. https://doi.org/https://doi.org/10.1016/ 
j.compmedimag.2016.05.004, special Issue on Ophthalmic Medical Image 
Analysis 
Zhu D, Qiu D (2021) Residual dense network for medical magnetic resonance 
images super-resolution. Computer Methods and Programs in Biomedicine 
209:106,330 
Acta Wasaensia 165 
Springer Nature 2021 LATEX template 
Using convolutional neural networks to classify subtle colour differences 31 
Zhu Y, Newsam S (2017) Densenet for dense flow. In: 2017 IEEE international 
conference on image processing (ICIP), IEEE, pp 790–794 
166 Acta Wasaensia 
INTERNATIONAL JOURNAL OF COMPUTERS AND APPLICATIONS 
2025, VOL. 47, NO. 4, 341–355 
https://doi.org/10.1080/1206212X.2025.2465727 
Customised ResNet architecture for subtle color classification 
Jari Isohanni 
Digital Economy, University of Vaasa, Vaasa, Finland 
ABSTRACT 
This study addresses the challenge of recognizing subtle color differences, a problem critical to applications 
in fields such as healthcare, food production, and civil engineering. Specially research focusses on printed 
colors. The research evaluates multiple ResNet architectures, including ResNet-18, ResNet-34, and ResNet-50, 
to identify the most effective model for this task. Modifications to the ResNet-34 architecture are proposed, 
such as replacing average pooling with global max pooling and introducing max pooling layers within residual 
blocks, to enhance feature extraction and classification accuracy. The models were validated using a K-fold 
cross-validation, which confirms the effectiveness of the proposed approaches. The findings demonstrate the 
potential of these modifications to achieve high classification accuracy, showcasing their adaptability to real 
world scenarios. However, limitations such as the use of a specific dataset and the type of printer highlight the 
need for further research to generalize the approach across diverse datasets and conditions. 
ARTICLE HISTORY 
Received 24 May 2024 
Accepted 5 February 2025 
KEYWORDS 
Machine vision; color 
difference; printed colors; 
convolutional neural 
networks (CNN); ResNet; max 
pooling 
1. Introduction 
ResNet (Residual neural network) by He et al. is a deep learn-
ing architecture for image classification and computer vision tasks. 
Unlike traditional convolutional neural networks (CNNs), ResNet 
uses skip connections. These connections allow the neural net-
work to learn efficiently through residual mappings. Before ResNet 
was published, deep networks had mechanisms which attempted to 
directly approximate the underlying mapping. ResNet architecture 
presented learning by residuals, the difference between the input and 
output of a given layer [1–3]. 
ResNet architecture emerged as an advance in deep learning, 
addressing critical challenges associated with training very deep neu-
ral networks. By introducing skip connections, ResNet mitigates the 
vanishing gradient problem, enabling effective training of networks 
with hundreds or even thousands of layers. This not only improves 
the flow of gradients during backpropagation but also enhances the 
reuse of features across layers, leading to better representation learn-
ing [4, 5]. ResNet’s ability to capture fine-grained details and subtle 
variations in data makes it a good choice for numerous computer 
vision applications, including image classification, object detection, 
and feature extraction. During the year 2024 over 41 000 articles 
mentioning ResNet were published, according to Google Scholar 
search. 
ResNet architecture adds shortcuts (skip connections) every two 
layers. This enables the layers of residual networks to learn  from  
residual mappings, this learning helps in representing identity map-
pings, and finally prevents networks from degradation as depth 
increases.  The novelty of  adding  shortcuts every  two layers is cru-
cial, as previous research has shown that using skip connections on 
every or every third layer does not achieve the same performance [6]. 
The innovation of ResNet has sparked research around its 
approach, and there have been interest in modifying, extending, and 
adapting the ResNet architecture. Targ et al. presented an approach 
that adds convolutional layers and data paths to each layer [7]. Wide 
Residual Networks is a version of ResNet invented by Zagoruyko and 
CONTACT Jari Isohanni x2603813@student.uwasa.fi, jari.isohanni@gmail.com Digital Economy, University of Vaasa, Wolffintie 34, Vaasa 65200, Finland 
Komodakis in their research where they showed the importance of 
residual blocks. Wide residual networks can outperform deep net-
works in some use cases, and their computational cost is lower [8]. 
Li and He explained how identity shortcut connections solve gradi-
ent fading problems and proposed adjustable shortcut connections. 
The authors stated that identity mappings are not reasonable to be 
adopted for all layer parameters [9]. HS-ResNet, an approach pro-
posed by Yuam et al., implements a plug-and-play block that can be 
added to existing networks. HS-ResNet implements hierarchical split 
and concatenate connections within one single residual block [10]. 
Targ et al. came up in their research with ResNet in ResNet (RiR), 
RiR has a deep dual-stream architecture that generalizes ResNet and 
optimizes ResNet performance [10]. Stable ResNet by Hayou et al. 
addressed the problem of an exploding gradient which can occur 
if ResNet becomes very deep. Their approach looks to stabilize the 
gradient [11]. Another version, CO-ResNet was proposed by Bharati 
et al. Their model was optimized to detect COVID-19 in X-ray 
images. The authors mostly focused on hyperparameter tuning [12]. 
The popularity and good performance of the ResNet architec-
ture have resulted in not only modified approaches, but also different 
depth variations of the ResNet. These variations are named based on 
how deep they are; for example, ResNet-18 has 18 neural network lay-
ers. Different versions of ResNet are, for example, ResNet-18, ResNet-
34, ResNet-50, ResNet-101, ResNet-110, ResNet-152, ResNet-164, 
and ResNet-1202. These variations and their performance are evalu-
ated in the first experiments of this research. 
ResNet and its different versions have proven that the ResNet 
capabilities are useful in image classification tasks in many different 
use cases [13–19]. In one recent research, ResNet was used to clas-
sify different printed colors, in this use case ResNet achieved a high 
accuracy of 95% even when the CMYK color intensity was only 5% 
in some of the color channels. 
The recognition of small color differences is not a commonly 
studied topic, but it is closely related to the development of func-
tional inks. These inks are used in the packaging industry and can 
© 2025 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. 
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, 
and reproduction in any medium, provided the original work is properly cited. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the 
author(s) or with their consent. 
Acta Wasaensia 167 
342 J. ISOHANNI 
be used as sensors for environmental values such as temperature or 
humidity. This is possible as functional inks change color  according  
to  their exposure to the variables [20]. 
In this research, ResNet variations are explored to see which 
of them can best classify subtle color differences. The best ResNet 
variation is selected for  further fine-tuning to find out if it can  be  
modified for optimized accuracy. Fine-tuning includes changing the 
architecture based on findings of the past research. 
This article makes contributions to the optimization of ResNet 
architectures for addressing the challenge of detecting subtle color 
differences in printed sources, a task with significant implications 
in various applications. The study examines multiple ResNet mod-
els, including ResNet-18, ResNet-34, and ResNet-50, to identify a 
baseline architecture suitable for fine-grained color classification. 
Based on this foundation, modifications are proposed to the ResNet-
34 architecture, focusing on adjustments to pooling strategies and 
feature extraction within residual blocks. The approach presented 
highlights the adaptability of ResNet architectures to subtle color 
classification tasks and underscores the potential of targeted archi-
tectural modifications to improve their performance. The findings 
can be used in various applications in sectors such as civil engi-
neering, food production, and healthcare care, where precise color 
differentiation is needed. 
This research is structured as follows. Section 2. contains an  
overview of the past research. Section 3. describes the methods and 
materials used, including the dataset, ResNet architecture, and other 
methods related to this research Section 4. describes the experi-
ments used and their results. Section 5. discusses  the results and  
provides information for future research. Section 6. goes trough the  
conclusions of the research. 
2. Related past research 
ResNet  has been used to solve various problems in the  past; some  
of the previous research uses different ResNet depth variations and 
applies them to specific use cases. Some past research adapts and  
modifies ResNet in various  ways. This section  summarizes  some  of  
the most relevant work done around ResNet in the  past.  
Sarwinda et al. used ResNet in the detection of colorectal cancer, 
experimenting with ResNet-50 and ResNet-18. All of their images 
were pre-processed with the contrast enhancement CLAHE method 
before making the dataset. In this research ResNet-18 reached an 
accuracy of 85% and ResNet-50 88% [14]. 
Subrataio et al. tuned the ResNet-101 hyperparameters to create a 
CO-ResNet model. This model was used to classify COVID-19 and 
pneumonia X-ray images. Their detection rate was 98.74% in cases of 
COVID-19, 92.08% and 91.32% in cases of normal and pneumonia. 
They also tested ResNet-152 which did not reach 90% accuracy [12]. 
Almoosawi and Khudeyer based their approach on ResNet-34 
when they researched the identification of diabetic retinopathy. In 
their image dataset, the differences are quite small as in the research 
presented in this research. The F1-score of their solution was 93.2%. 
The F1-score is a combination of precision and recall, providing a 
balance between the two [21]. 
Yu  et al. research recognition  of  early-stage breast cancer using the  
ResNet-50 architecture. In this research, SCDA (Scaling and Contrast 
Limited Adaptive Histogram Equalisation Data Augmentation) was 
used as a data augmentation tool. The model and approach used in 
this research reached 95.74% [16]. 
Gao et al. used ResNet-34 architecture and transfer learning 
as they had only a small amount of samples. ResNet-34 classified 
some classes with accuracy 100%, and even in the most challenging 
class, the leaf knot, the accuracy was 97%. The  Leaf  knot classifica-
tion is closely related to the small difference classification problem 
presented in this research [17]. 
Hu et al. combined two ResNet-50 and one ResNet-34 architec-
tures into a multidimensional feature compensation residual neural 
network. Each dimension was responsible for certain classifications 
related to crop diseases.  As the last layer, authors had a compensa-
tion layer which used a compensation algorithm to determine final 
recognition result. Their approach achieved 85 % accuracy, which 
was better than other approaches with their dataset [22]. 
Al-Haija and Adebanjo worked on their research with the breast 
cancer dataset. With the ResNet-50 architecture and transfer learn-
ing, they achieved very high 99% accuracy [23]. 
ResNet-50 has shown its accuracy in disease identification in the 
past. Potato plant leaf disease identification was done by Shaheed 
et al. In this research ResNet reached 99.12% accuracy in only under 
30 epochs. Zhang et al. performed another study that used ResNet-
50 in disease identification. The researchers also used the coordinate 
attention module (CA) and the weight-adaptive multiscale feature 
fusion (WAMFF) and were able to achieve accuracy of 98.32 %. 
In  addition, Li and Rai researched the  identification of diseases in  
apples; however, they proved that shallow ResNet-18 outperformed 
ResNet-34 [24–26]. 
The summary  of the previous research is shown in the following  
table. 
Research title Authors Findings 
Sarwinda et al. Deep learning in image 
classification using residual 
network (ResNet) variants for 
detection of colorectal 
cancer 
Comparison of the 
ResNet-18 and ResNet-50 
Subrataio et al. CO-ResNet: Optimized ResNet 
model for COVID-19 
diagnosis from X-ray images 
Hyperparameter tuning to 
increase performance of 
the ResNet-101 
Almoosawi and 
Khudeyer 
ResNet-34/DR: a residual 
convolutional neural 
network for the diagnosis of 
diabetic retinopathy 
Using of preprocessing to 
increase performance of 
the ResNet 
Yu et al. ResNet-SCDA-50 for breast 
abnormality classification 
Using contrast 
enhancement to improve 
ResNet performance in 
abnormality classification 
Gao et al. A Transfer Residual Neural 
Network Based on ResNet-34 
for Detection of Wood Knot 
Defects 
ResNet was used to identify 
differences in wood, some 
of the changes were quite 
subtle 
Hu et al. MDFC–ResNet: An Agricultural 
IoT System to Accurately 
Recognize Crop Diseases 
Using of ResNet to identify 
small differences in 
images 
Al-Haija and 
Adebanjo 
Breast Cancer Diagnosis in 
Histopathological Images 
Using ResNet-50 
Convolutional Neural 
Network 
Identifying features from 
images 
Zhang et al. Classification and Identification 
of Apple Leaf Diseases and 
Insect Pests Based on 
Improved ResNet-50 Model 
Improving ResNet-50 with 
coordinate attention (CA) 
module and 
weight-adaptive 
multi-scale feature fusion 
Shaheed et al EfficientRMT-Net – An Efficient 
ResNet-50 and Vision 
Transformers Approach for 
Classifying Potato Plant Leaf 
Diseases 
Integration of Vision 
Transformer (ViT) and 
ResNet-50 architectures 
Li and Rai Apple leaf disease 
identification and 
classification using resnet 
models 
Comparison of ResNet-18 
and ResNet-34 in lead 
disease identification 
168 Acta Wasaensia 
INTERNATIONAL JOURNAL OF COMPUTERS AND APPLICATIONS 343 
Previously ResNet has been used in various use cases, and the 
mentioned research are closely related to the use case presented in 
this research. The similarity comes from the problem of recognizing 
subtle differences in images. Previously, ResNet and related archi-
tectures have been shown to perform well when small differences in 
images are observed or located.Most research in the past has used the  
standard version of the ResNet, the most popular being 50-layer and 
34-layer versions. These previous researches have mainly focused on 
improving the performance of ResNet via pre-processing or other 
methods that do not directly modify the architecture of ResNet. But 
there are also examples of modifications that make the architecture 
work better in specific use cases. In addition, there is some research 
showing that increasing the depth of the ResNet architecture does not 
correlate with accuracy. As ResNet achieves high accuracy in multiple 
of the presented cases, choosing the best optimizer or hyperparam-
eter  tuning has not  been a very popular  topic in relation to ResNet.  
Another lesson that can be learnt from the previous research is that 
using enough correct data augmentation helps when the generic and 
more accurate model is the objective. 
The main gap in previous research is that ResNet (or any other 
CNN architecture) has not been used to classify subtle color differ-
ences. This research explores how ResNet can be used in this use-case 
and if better architecture can be developed with small changes. This 
article proposes two modified ResNet-34 architectures tailored to 
the subtle color difference classification in printed sources. The first 
modification replaces the average pooling layer with global maxi-
mum pooling before the fully connected layer, enhancing feature 
extraction by focusing on the most significant features across the 
input space. The second modification adds a maximum pooling layer 
after each convolutional operation within residual blocks, improv-
ing the network’s ability to capture subtle differences by emphasiz-
ing local maxima. These adjustments are supported by additional 
techniques such as gradient centralization and the use of K-fold 
cross-validation for robust performance evaluation. These proposed 
approaches overcome identified gaps by improving model accu-
racy, reducing overfitting, and tailoring ResNet-34 for subtle color 
classification. 
3. Methods and materials 
3.1. The dataset 
The dataset [27] used in this research contains images that have  
been acquired using normal smartphones in various real-life envi-
ronments. The images contain a special QR-Code as a carrier of 
color information (Figure 1). Before any operation, the source images 
are run through an auto-leveling process. This process is a tech-
nique commonly used in image processing to automatically adjust 
the contrast of the image. Auto-leveling aims to spread out the bright-
ness  levels  across  the entire dynamic  range available. The purpose of  
image auto-levelling is to enhance presentation of colors by automati-
cally adjusting the brightness, contrast, and color balance of an image 
to achieve a more natural and evenly distributed tonal range. The 
auto-levelling process starts with the identification of the minimum 
and maximum intensity values within the image. After determining 
the minimum and maximum intensity values, each pixel’s intensity 
in  the image is  mapped  to  a new value based  on a linear transforma-
tion. Pixels with intensities below the minimum value are assigned 
0 (black), while pixels with intensities above the maximum value are 
assigned 255 (white). The intensities between are linearly scaled to 
cover the entire dynamic range. The mapping process is done for 
all RGB-channels [28]. An example of auto-levelling can be seen in 
Figure 1. where  image 1 is  before  and image  2 after auto-levelling.  
After the auto-levelling the image goes through a Gaussian blurring. 
Blurring reduces noise in an image by averaging the intensity val-
ues of neighboring pixels, thereby smoothing out abrupt variations 
caused by random noise or, as in the use case presented, printer 
patterns. Noise typically appears as sharp, high-frequency intensity 
fluctuations that stand out from the underlying signal or pattern in 
the image. Blurring  applies a low-pass  filter,  which suppresses these  
high-frequency components while retaining the overall structure of 
the image. 
The actual color  information used is placed at two  specific loca-
tions on the QR-Code. These locations and their cornerpoints are 
calculated as part of the QR-Code decoding process. The locations 
and their cornerpoints are used to correct area skew and orien-
tation, resulting in square color areas. These color areas are then 
extracted into separate images for pre-processing and forming of the 
final dataset. The extraction process extracts 80% from the color area 
to ensure that only the needed area is left. In this way, the possible 
distortion and skew that is left will not impact  the process.  
Each of the source images originally contains two different areas 
(Figure 1). The first area is a rectangle with paper-white (green rect-
angle), e.g. no color printed, and the second one has some color 
printed on it (yellow rectangle). Printed color is the  one CNN  model  
is expected to classify. In the process, first an average RGB value for 
the paper-white area is calculated, this is done with the following 
formula: 
C¯(c) = 1 
M 
W  
x=1 
H  
y=1 
I(r, g, b) (1) 
where I is the source image and R, G, and  B represent individual inte-
ger values in each RGB channel at pixel location  (x, y). W and H are 
the width and height of the image and M is the total number of pixels 
in the image. 
To form the final image, used as the dataset image, the average 
color of thewhite area is subtracted from the color area. This creates a  
difference image  which contains information  about how  much color  
area differs from no color area. The difference image is calculated 
with the following formula: 
Idifference(r, g, b) = |Ioriginal(r, g, b) − C(r, g, b)| (2) 
where Idifference is the final image and Ioriginal is the color area image 
(Figure 1(b)). Constant C is the average RGB color value in the white 
area Figure 1(a). 
In Figure 2 process of forming the difference image is shown with 
a few examples. In Figure  2 row (a) contains sample images of average 
paper-white color, row  (b) contains  the actual color area, and row (c)  
is the final dataset image. The columns in the figure represent differ-
ent colors printed on paper 10% yellow, 10% cyan, 10% magenta, 5% 
yellow, 5% cyan, 5% magenta. In the final difference images (Figure 2 
row (c)), it can be seen that the differences between the samples 
are very small. This makes it difficult for the convolutional neural 
network (CNN) to classify images correctly. 
The whole process of making the dataset is illustrated in Figure 3. 
The dataset is augmented with the following options, data aug-
mentation is one option to enlarge the dataset and can provide better 
accuracy [29]. Images are rotated in angles, 90, 180, and 270, then 
images are flipped vertically and horizontally, and the same rotations 
are done for flipped version of images. Also, the 0.2 zoom option is 
used to make more images using data augmentation. 
In total, the dataset contains 7855 RGB images. Each image is 
stored  with  24-bit depth, which  means 8 bits per each channel. The  
images are split into train (80%) and validation (20%) sets so that 
the train set contains 6284 images belonging to 11 classes and the 
validation set contains 1571 images belonging to 11 classes (Table 1). 
Acta Wasaensia 169 
344 J. ISOHANNI 
Figure 1. The process of color correction and area extraction. 
Figure 2. Samples of images used (best viewed online). 
Table 1. Images per each class. 
Color Images Colour Images 
10% K 769 5% K 542 
10% M 770 5% M 638 
10% Y 770 5% Y 650 
10% CY 818 5% CY 506 
10% C 830 5% C 626 
0% K 926 
3.2. Convolutional neural networks 
A Convolutional Neural Network (CNN) is a specialized type of 
artificial neural network specifically designed for processing and 
analyzing grid-structured data, such as images. Its architecture is 
optimized to reduce the number of parameters and computational 
overhead of the neural network, making it particularly well suited 
for the efficient handling of high-dimensional datasets [30]. 
The convolutional layer is the fundamental building block of a 
CNN (Figure 4). It applies convolutional filters (kernels) to localized 
regions of the input data, generating feature maps that capture essen-
tial patterns such as edges, color information, textures, and shapes. 
Each filter is specialized to detect a specific type of feature, and dur-
ing training, the network optimizes these filters to best suit the task 
at hand [30]. 
The pooling layers in CNNs are used to down-sample feature 
maps, reducing their spatial dimensions and the number of param-
eters in the network. This reduction decreases the computational 
complexity and helps prevent overfitting, thereby enhancing the net-
work’s ability to generalize to new data. Common pooling methods 
include max pooling, which selects the maximum value within a 
specified window, and average pooling, which computes the average 
value, both contributing to feature extraction while simplifying the 
model [31, 32]. 
170 Acta Wasaensia 
INTERNATIONAL JOURNAL OF COMPUTERS AND APPLICATIONS 345 
Figure 3. The process of color correction and area extraction. 
A key feature of CNNs is the use of non-linear activation func-
tions, such as the Rectified Linear Unit (ReLU). ReLU introduces 
non-linearity into the model, enabling it to capture complex patterns 
and relationships within grid-structured data. Furthermore, ReLU 
helps accelerate training convergence by alleviating the vanishing 
gradient problem, which is commonly encountered with activation 
functions such as sigmoid or tanh [30, 33]. 
In the final stages of a CNN, fully connected (dense) layers are 
used to consolidate the high-level features extracted by the convolu-
tional layers. These layers are responsible for performing tasks such 
as classification or regression, using the learnt representations to 
generate the final output [30]. 
3.3. ResNet architecture 
ResNet, short for Residual Network, was introduced by Kaiming 
He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun from Microsoft 
Research in their paper titled ‘Deep Residual Learning for Image 
Recognition’, which was presented at the IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR) in 2016 [1]. 
The ResNet architecture was  the first convolutional  neural  
network architecture to use residual learning. Residual learning 
addresses the vanishing gradient problem, which usually occurs 
when neural networks become deep. The vanishing gradient issues 
are closely tied to backpropagation (backward propagation of errors).  
In neural networks, backpropagation is a crucial element of the train-
ing process. In the backpropagation, the network learns to adjust its 
weights and biases to minimize the difference between its predicted 
output and the actual target output [34]. 
In backpropagation, the input data is moved through the network 
to produce predictions (forward pass). In this process, the difference 
between the predictions and the actual output (loss) is computed. 
During the backward pass, gradients of the loss concerning network 
parameters are calculated using the chain rule. These gradients are 
then used to guide the adjustment of weights and biases, the purpose 
is to minimize the loss through iterative updates. As the process is 
repeated for multiple epochs, it allows the neural network to learn 
and improve its predictive capabilities [35]. 
The vanishing gradient problem occurs in deep neural networks 
during backpropagation, as gradients are exponentially diminished 
in the process of backward passing through layers. In particular, the 
vanishing gradient problems occur in networks with many layers. 
Another context where vanishing gradient easily occurs are activa-
tion functions with saturating gradients, such as sigmoid  or hyper-
bolic tangent functions. In the vanishing gradient problem gradients 
are approaching zero when moving closer to early layers, so these 
layers receive only minimal updates. As this happens, the learning 
process decreases, which eventually leads to slow convergence and 
poor performance in the training of deep networks [36, 37]. 
The vanishing gradient problem can easily occur in low-level 
features, where the difference on pixel level is small. DenseNet archi-
tecture or its modifications have been successfully used in the past to 
overcome problems with the low-level features [38–40]. 
The innovation presented in ResNet was skip connections 
(Figure 5), also known as shortcut connections, which skip one or 
more layers by adding the input of a layer to its output. This allows the 
network to learn residual functions instead of directly learning the 
underlying mapping functions. Instead of purely learning a mapping 
from input to output, each layer of a residual network is tasked with 
learning the residual function, the difference between the desired 
output  and the  input to that layer.With this approach ResNet enables  
the training of much deeper networks (hundreds of layers) while 
maintaining or improving performance [1]. 
Mathematically, this can be represented as follows:  
Output = Input + Residual (3) 
When residuals are used, the network can focus on learning small 
incremental adjustments to the input rather than learning to gener-
ate the entire output from scratch. Residual connections enable the 
gradient to flow more easily during training, mitigating the problem 
of vanishing gradient. In practice (Figure 5), a residual block typi-
cally consists of two  or  more  convolutional layers followed by a skip  
connection that adds the input to the output of the convolutional 
layers. 
Recently ResNet has been used successfully, for example, in 
solving different problems. For example, Hossain et al. pro-
posed a weighted ensemble deep transfer learning framework with 
ResNet152 to identify Alzheimer disease from MRI images. Hassan 
et al. used ResNet-50 to classify images in the Medical MNIST dataset 
[41]. Senapati et al. also used ResNet-50 to classify food images, 
Wu et al. for chicken gender identification and Lin & Wu for dia-
betic retinopathy detection. Madhukar et al. incorporated ResNet 
in cancer image segmentation [42–46]. ResNet has also been used  
in many other recent researches. Usually, it achieves high classifi-
cation accuracy in cases where small differences play an important 
role. 
The development of the ResNet architecture has also  led to differ-
ent variations, the most popular way being to change the amount of 
layers. For example, ResNet-18, ResNet-34, ResNet-50, ResNet-101, 
and ResNet-152 have been used in previous research. One of the most 
popular architecture of these is ResNet-50. 
But also some variations have been researched, Wide ResNet 
increases the  width of ResNet by increasing the  number  of chan-
nels in each convolutional layer. This can improve performance, but 
requires more computational resources [47]. 
ResNeXt introduced a new block structure that increased model 
capacity by aggregating multiple paths of information flow within 
each block [48].ResNetv2 introduced improvements to the original 
ResNet  architecture, such as using preactivation residual units  and  
employing a bottleneck structure in all layers [2]. 
Acta Wasaensia 171 
346 J. ISOHANNI 
Figure 4. Simplified illustration of convolutional neural network. 
Figure 5. Skip connection [1]. 
3.4. Cross-validation 
In this research, the dataset used is quite small, with only 7855 
images. For this reason, K-Fold cross-validation is used. In the K-
Fold Cross-Validation, the training dataset is divided into k subsets 
of approximately equal size. These subsets are often called ‘folds’, so 
cross-validation is K-fold. With the used dataset size, the size of each 
subset is around 1560 images, when five folds are used. The advan-
tage of using K-fold Cross-Validation is that model will be evaluated 
multiple times during the process. The final model when the K-fold 
cross-value process  is used is generally less biased than if only one  
training / validation dataset is used [49, 50]. 
The process of the K-fold Cross-Validation (Figure 6) is follow-
ing, before starting the process images are placed into corresponding 
folders, where folder represent image labels. 
(1) Dataset is shuffled randomly 
(2) Number of folds (k) is chosen and  dataset is split into  k groups 
(3) For the each group:Group is taken either as hold out or test data 
set Rest of the groups are used as training data setModel is fitted  
with training data set and evaluated on the test set Evaluation 
score is kept and model discarded 
(4) After each group is processed, model is summarized by using 
the sample of model evaluation scores [51] 
During the process, it is important that the images remain in the same 
group,  just  because thewhole group  changes.  During  the process, this  
means that each image is used to train the model k−1 times  
The advantage of K-Fold cross-validation is that by training 
the model k times using different combinations of training and 
validation sets, it is possible to obtain more reliable performance 
metrics than with a single train-test split. Also, when optimizing the 
Figure 6. Illustration of K-fold cross-validation. 
hyperparameters, K-Fold cross-validation helps to choose the best 
hyperparameter values, without making model overfitted [52]. 
However, choosing the value k is important, typically k values like 
2,5,10 are used [49]. The larger k uses more computational resources 
and could lead to overfitting, so the value of k should be kept as low as 
possible. In their research Yadav and Shukla have proposed that for 
small dataset k value 5–6 would provide the best results, and Wong 
et al. showed that cross-validation should be repeated twice for such 
dataset [49, 51]. 
3.5. Gradient centralization 
Gradient Centralisation (GC), a method proposed by Yong et al., can 
be used instead of batch normalization (BN) and weight standard-
ization (WS).  As  BN  and WS make use of  activations or weights,  
GC uses gradients directly and centralizes gradient vectors to have 
zero mean. Typically, using gradient centralization can lead to better 
generalization performance [53]. 
Batch normalization, being the most used of these, for internal 
normalization has the disadvantage of being an independent layer 
which processes data even after training. BN is usually also applied 
to a relatively large batch [54]. GC works directly on top of the gra-
dient and centralizes the gradient vector so that it have zero means. 
The calculation of GC is done by getting the mean of each column 
of the gradient matrix/tensor, and then the center of each column is 
transferred to the zero means [55]. 
In the research conducted by  Agarwal et al.  GCwas able to achieve  
higher accuracy of Denset models, which is a close relative of ResNet, 
than without GC. In this research, it was also shown that Adam’s 
optimizer was the best option to train a neural network [56]. 
172 Acta Wasaensia 
INTERNATIONAL JOURNAL OF COMPUTERS AND APPLICATIONS 347 
Figure 7. Illustration of dropout [60]. 
3.6. Dropout 
Dropout can be used in convolutional neural networks to prevent the 
model fromoverfitting, also it has been shown that the use of dropout  
in the early phase of training can be used to avoid underfitting [57]. 
In the dropout, some hidden unit nodes are set stochastically to zero, 
which means that the nodes are removed and all forward and back-
ward connections are also removed (Figure 7). This makes more 
gradient information flow through non-linear activation functions 
[58]. 
Dropout works show that during the training process, the nodes 
are dropped with a dropout probability of p [59, 60]. The probability 
p can vary, depending on the location of the dropout in the architec-
ture. The probability is different if the dropout layer is placed close to 
the input or closer to the output. Especially dropout has been proven 
to work in connection with fully connected layers [61]. During the 
training process, probability can be seen as one hyper-parameter 
which can be adjusted to find the optimal model. The dropout rate 
is usually smaller (p < 0.2) in input layer and larger (0.5 < p < 0.8) 
when used internally or close to a fully connected layer [62]. Some 
research has also proposed a more controlled way to use dropout 
[63]. 
Dropout layers can be easily added to any architecture; this has 
some consequences. When dropout is added, it removes some units, 
which reduces the capacity of the network; this can be compen-
sated by adjusting the number of units used by multiplying them 
by 1/(dropoutrate). As dropout introduces noise to the gradient, it 
has been shown that increasing the learning rate and momentum 
is needed; however, this depends on the optimizer used. If such a 
modification is made, max norm regularization might be needed 
[62]. 
When a dropout  layer is added to an architecture it also makes  it  
possible to fine-tune hyperparameters that relate. Depending on the 
used optimizer learning rate, weight decay and momentum parame-
ters can be tuned, and this is usually even required to get the model to 
work in a optimized way. Hyperparameters are important especially 
if the standard stochastic descent gradient (SGD) optimizer is used 
instead of adaptive optimizers [62]. 
Regarding the development of different dropout methods, past 
research has considered many of those, for example [64–68]. In this 
research, standard version, where nodes area randomly dropped, is 
used. 
3.7. Max and average pooling 
The pooling layers are used in convolutional neural networks for 
downsampling. Currently, the most common pooling layers are max 
and average pooling. In addition, other pooling methods have been 
invented. The purpose of downsampling is to reduce the spatial 
resolution of feature maps to extract semantic information [69]. 
Pooling layers have two main objectives: (1) they reduce the 
number of parameters, which makes training of the networks less 
expensive, and (2) they help in avoiding possible overfitting of the 
network [70]. 
Pooling can either be done based on local regions, like 3 × 3, 5 × 
5 or 7  × 7 areas. Another  option is  to  use global pooling, where  each  
feature maps across all spatial locations, resulting in a global repre-
sentation of the features that summarizes the entire input volume 
[70]. 
When the values are calculated in the pooling layer (Figure 8), 
the maximum or average pooling is used. Maximum pooling uses the 
greatest value within a given region (k × k) and  pools it into the  cor-
responding region  in  the downsampled  featuremap. Average pooling  
calculates the average value within a given region (k × k) and  uses  
this value in the downsampled feature map [70]. 
Using maximum or average pooling depends on the problem. 
In most cases, convolutional neural networks (CNNs) are looking 
to recognize significant values in images. This would make maxi-
mum pooling a preferable option, but average pooling is better at 
preserving localization [71]. 
4. Experiments and results 
The experiments were run in a Python environment, with a laptop 
with anAppleM2 processor, which had 8 cores and a total RAMof 16  
Gt. The environment used Python version 3.9.6 with Tensorflow and 
Keras 2.15.0. The architectures used were built manually on the basis 
of the ResNet architecture documentation and examples available. 
The training process of the models was done with Adam opti-
mizer,  with  default settings as the  purpose of this research was not to  
focus optimizing individual models via hyper-parameters. For each 
of the training processes, 30 epochs were used. 
4.1. Evaluation metrics 
The purpose  of  the experiments  was to find  out which approach  
achieves the best accuracy. Accuracy is an effective performance 
metric when applied to datasets with balanced class distributions. 
A balanced dataset ensures that each class is equally represented, 
minimizing the risk that performance metrics are disproportionately 
influenced by the  prevalence of  a particular class. In these  scenarios,  
accuracy offers a clear and straightforward measure of the propor-
tion of correctly classified instances in the total number of instances, 
capturing the overall predictive capacity of the model. Accuracy 
serves as an appropriate metric when the application context assigns 
equal importance to all classes in the dataset. In classification prob-
lems where the costs or implications of misclassifications are uni-
form across all classes, accuracy provides a meaningful and holistic 
evaluation of the model’s performance. 
Accuracy can be calculated with the following equation: 
Accuracy = Number Of Correct Predictions 
Total Number Of Predictions 
(4) 
In the experiments, the accuracy was calculated for the entire dataset. 
In addition, a confusion matrix was created for a detailed break-
down of correct and incorrect predictions for each class in a classifi-
cation problem. A confusion matrix is an effective tool for evaluating 
the performance of classification models. Confusion matrix pro-
vides a detailed comparison of predicted versus actual outcomes, 
enabling the assessment of overall accuracy and the identification 
of specific error patterns. This analysis can be used to improve the 
Acta Wasaensia 173 
348 J. ISOHANNI 
Figure 8. (a) Max pooling and (b) average pooling. 
model architecture, optimize preprocessing methods, and improve 
data acquisition strategies [72, 73] 
A confusion matrix is typically presented as a structured square 
table with rows and columns representing different classes in the 
classification task. For a binary classification problem (labelled as 
Positive and Negative), the matrix is a 2 × 2 table:  
Predicted Positive Predicted Negative 
Actual Positive True Positive(TP) False Negative (FN) 
Actual Negative False Positive (FP) True Negative (TN) 
True Positive (TP) refers to instances correctly identified as positive, 
while True Negative (TN) corresponds to instances accurately  clas-
sified as negative. False Positive (FP) represents instances incorrectly 
predicted as positive, and False Negative (FN) denotes instances 
incorrectly predicted as negative. 
In multiclass classification problems, the confusion matrix 
extends the binary  version into a square  matrix, with rows and  
columns representing the  various classes. This structure  provides a  
detailed view of the performance of the model, showcasing accurate 
classifications and misclassifications in all categories. 
For a classification problem with n classes, the confusion matrix 
will be an n × n matrix. Each element in the matrix at position (i, j) 
represents the number of instances where the true class is i, and  
the predicted class is j. The diagonal elements represent the number 
of  correct predictions  for each class, and the off-diagonal  elements  
represent misclassifications. 
In Figure 9, 13 images of  label 1 were classified as label 1, and  
no images were classified as other labels. Then, in label 2, some  
images have been labeled as label 3. Furthermore, all images  of  
label 3 are correctly  labeled. Therefore, the model may require  some  
improvement. 
A confusion matrix offers a more nuanced understanding of a 
classification model’s performance compared to scalar metrics such 
as accuracy. The confusion matrix identifies the specific classes that 
the model tends to confuse and highlights strengths and weaknesses 
in its predictions. This detailed analysis provides critical informa-
tion for the refinement of model design, the optimization of training 
processes, and the addressing of specific areas for improvement. 
4.2. Standard ResNet architectures 
As the purpose of the experiment was to find the best model for 
the use case, the fine-tuning of the hyperparameters and choosing 
the best optimizer were left for another research. All models were 
compiled with categorical cross-entropy loss and Adam optimizer 
(Table 2). 
Figure 9. Example of confusion matrix for multi-classification. 
Table 2. Results of first cycle. 
Architecture Centralised gradient used avg. train time/epoch (s.) Accuracy 
ResNet-18 Y 323 s. 0.934 
ResNet-18 N 316 s. 0.952 
ResNet-34 Y 236 s. 0.969 
ResNet-34 N 233 s. 0.966 
ResNet-50 Y 785 s. 0.958 
ResNet-50 N 781 s. 0.901 
ResNet-101 Y 1414 s. 0.941 
ResNet-101 N 1350 s. 0.948 
ResNet-153 Y 2079 s. 0.682 
ResNet-153 N 1094 s. 0.935 
From the experiments (Table 3) carried out with the standard 
implementations of ResNet, it can be seen that all implementa-
tions perform well. Using gradient centralization improves results 
in ResNet-50 and ResNet-34. But when architecture is deep, central-
ization of gradients decreases accuracy. When using the ResNet-153 
architecture, the centralization of the gradients significantly reduces 
the accuracy than without it. This might be an indication that when 
the differences are small in the data, the vanishing gradient problem 
becomes a problem with gradient centralization. 
Most of the ResNet architectures were prone to overfitting in 
the given problem, especially architectures deeper than ResNet-34. 
Overfitting of themodel signals that training goes well, but themodel  
fails to generalize to the validation set. In practice, this means that 
the model learns the training data too well, capturing noise and 
specifics of the training set that do not generalize to new, unseen 
data [74]. 
174 Acta Wasaensia 
INTERNATIONAL JOURNAL OF COMPUTERS AND APPLICATIONS 349 
Table 3. K-fold accuracy. 
Fold Architecture (a) ResNet-34 Architecture (b) 
0 97.66% 95.31% 97.85% 
1 99,51% 98.93% 99.22% 
2 96.58% 97.66% 97.17% 
3 96.29% 98.73% 98.54% 
4 100.00% 96.58% 98.63% 
Final 98.00% 97.44% 98.28% 
The small dataset can also be an issue, as the dataset contains 
only 7855 images. For extending the dataset, more images could be 
collected or different augmentation options could be used [75]. 
Of all architectures tested, ResNet-34 with gradient centralization 
had the best accuracy, as the model reached 96.9% accuracy. And it 
was selected for  fine-tuning the architecture.  
In Figure 10, it can be seen that the training accuracy of ResNet-34 
improves throughout the epochs. In addition, training loss decreases 
during the training process. After 10 epochs, validation accuracy 
starts to behave unexpectedly and drops, while training accuracy 
seems to settle. At the end of the training process, the training loss 
seems to increase. Some of the reasons  behind  this  could be overfit-
ting, a small dataset, inappropriate learning, or simply that the model 
has already reached its optimal performance. To overcome these and 
build better models, ResNet-34 was modified. 
If the model reaches its peak performance, it might be wise to 
implement early stopping for the training process; this prevents over-
fitting and saves computation time when further training does not 
yield significant improvements [76]. 
4.3. Proposed architecture 
The base model of ResNet-34 was able to reach the accuracy of 
96.90%, this is already very high, but with some modifications it 
might be possible to reach even higher accuracy. Some approaches 
were first experimented individually and finally, all of them were 
combined. 
One of the approaches used in past research to improve CNN 
is adding a dropout  layer to fully  connected layers. With  the usage  
of dropout, it is possible to reduce overfitting and regulate neu-
ral networks [60]. Proper usage of dropout layers can increase the 
accuracy of neural networks [77]. The traditional approach to using 
dropout is to add dropout after the convolutional layer; however, 
there are different variations of using dropout [78]. In this research, 
two variations of the use of dropout in the ResNet-34 architecture 
were used. In the first dropout, it was added after the first convolu-
tional layer. This convolutional layer pools feature maps down. Then, 
the dropout was added after each residual group. The performance of 
these architectures was 98.90% and 94.34% accordingly. This shows 
that dropping out at the correct location in the architecture can make 
the model perform more accurately. 
Adding batch normalization can also make the network more 
resistant towards degradation of the between-class distance to the 
within-class distance ratio. With batch normalization, it is possible 
to perceive the between-class angle [79]. When batch normalization 
was added at the  end of the  residual  block accuracy of the  model  
dropped 1.0% from standard ResNet-34 implementation. 
Another option to improve the accuracy of the  CNN is to use dif-
ferent grouping layers. Pooling layers play a crucial role in reducing 
the spatial dimensions of the feature maps, controlling the overfit-
ting, providing translation invariance, and facilitating feature learn-
ing and abstraction in convolutional neural networks [80]. Local 
pooling, which is used to pool data from smaller regions, or global 
pooling, which works on the feature map, can be used to improve the 
accuracy and sensitivity of feature translation [70]. With the usage of 
different pooling methods, it might be possible to increase the accu-
racy  of  CNNarchitecture, as, for example, in the research byMomeny  
et al. [81]. 
In the residual block of ResNet after each convolutional opera-
tion, a maximum pooling layer was added, which led  to  a accuracy  
of 95.10%. When max-pooling was added only after both convolu-
tional operations were performed, residual block accuracy reached 
as high as 99.70%. As a last modification to the standard ResNet-
34 architecture just before the fully connected layer average pool-
ing was changed to global max pooling instead of average pooling 
99.60% 
When finally all the best options from the above experiments were 
combined, using dropout after input layers, the global maximum 
pooling at the end of the residual block and changing the average 
pooling to the global maximum pooling accuracy was dropped to 
40.52%. This shows that even if optimisations work independently 
they don’t work together. In addition, some combinations were also 
tested to see if they can reach a higher accuracy than individual mod-
ifications. Using global maximum pooling instead of average pooling 
with dropout performed with 99.20% accuracy. Using global maxi-
mum pooling instead of average pooling with a maximum pooling 
layer at the end of the residual block reached 99.50% accuracy. 
For the initial experimentation, without cross-validation, training 
and validation accuracy and loss, together with confusion matrixes, 
are shown in Figure 11. For both of the architectures it can be seen 
that models are learning quite well throughout the epochs, and if 30 
epochs are used, early stopping does not stop the training process. 
Architectures reach maximum accuracy at the end of the process 
(architecture a 99.70% and b 99.60%). In both cases, the training loss 
seems to stabilize at the end of the training process. 
Some mistakes that version a) makes are that low-level green (10% 
CY) and yellow (10% Y) images are confused with very low yellow-
color (5% Y) images. With this approach,  it was possible to reach  an  
accuracy of 99.7%. Version b) of the architecture almost reaches the 
same accuracy (99.6%). In this architecture, there was more confu-
sion between classes. Different green colors are confused (5% CY and 
10% CY), as well as different yellow classes. Also, very light yellow is 
sometimes confused with paper white. 
Based on the experiments, two best options to improve the 
ResNet-34 architecture were: 
• (a) changing the average pooling to the max pooling before fully 
connected layer. 
• (b) using the maximum pooling at the end of the residual block. 
These architectures are presented in Figure 12. 
4.4. K-fold cross-validation 
The accuracy of both proposed architectures was finally verified with 
K-Fold cross-validation using five folds. For each fold 30 epochs of 
training were run. The same process was performed for the standard 
ResNet-34 architecture in order to obtain the accuracy for compari-
son.  The following  table summarizes these results.  The final accuracy  
in K-Fold cross-validation is calculated by averaging the accuracy 
scores obtained from each fold. 
The results show that architecture (b), where no other changes 
were made than changing the average pooling to the maximum pool-
ing, is the most accurate of all versions tested. In Figure 13 the 
training and validation loss through epochs for each fold can be 
seen. The figures show that model reaches its peak accuracy typically 
between 20 and 30 epochs, after that model starts to show signs over 
overfitting. 
Acta Wasaensia 175 
350 J. ISOHANNI 
Figure 10. Results of the train and validation process of ResNet-34. 
Finally, Figure 14 show the combined confusion matrix of K-
Fold cross-validation. In this figure, values are normalized so that 
it can easily be observed how  well the model perform.  The con-
fusion matrix for K-Fold cross-validation shows similar results to 
those without K-Fold cross-validation. The most challenging class 
to identify is 5% Y, this is confused with white paper in 8% of cases, 
also sometimes 10% and 5% Y are confused. Other classes are rarely 
confused. 
5. Discussion 
This article presented two optimized versions of ResNet, with the 
modified version, where a maximum pooling layer was added after 
convolutional operations, the residual block accuracy was 98.00% in 
the cross-validation of the K-Fold. The second modification was to 
change the last average pooling layer to the maximum pooling layer, 
this version of the ResNet reached 98.28% accuracy being the most 
accurate version. The results show that ResNet can be used to rec-
ognize subtle color differences. The mentioned modifications make 
it even better for the use case, as both versions outperformed the 
standard ResNet-34 architecture. 
With the proposed approach, it is possible to implement a more 
accurate color-based classification system that can help, for example, 
in civil engineering [82–84], food production [85–87] and health-
care [88, 89]. In these and some other use cases, color and color 
differences play an important role. 
The findings of this research support previous research like that by 
Sukhetha et al. and Goudha et al., showing that the ResNet architec-
ture works well when color is a criterion in the classification task [90, 
91]. Singh et al. [92] have also shown in  their research that convo-
lutional networks are highly color dependent. This dependency can 
be seen in the presented research; CNNs can learn to classify color 
even with shallow networks. Modifications to the standard ResNet 
architecture are a good way to make ResNet suitable for various 
color recognition tasks; this is in line with previous research such as 
Mathew et al. [93],  Zhang et al.  [94] and  Reddy et al.  [95]. 
One limitation of this research is that it uses a relatively small 
dataset of images which are printed  with a specific color printer. This  
might have an impact on the results presented; however, the  pre-
sented approach can be adjusted to different image sources with more 
training and possibly by adjusting preprocessing methods for images. 
The proposed solution was run with a standard laptop environ-
ment; more research would be needed about how to make the system 
run on mobile devices, if color recognition is wanted to be done in  
real-time, for example, in the consumer context. The use of extended 
datasets and different preprocessing algorithms might also make the 
proposed approach more generalizable. Deploying the system in a 
server-client infrastructure involves a client-side application, such as 
a Web  or mobile app,  for user interaction and image uploads, paired  
with a server-side back-end hosting the trained ResNet-34 model 
for pre-processing and image classification. The trained model can 
be deployed for web with frameworks such as TensorFlow together 
with the help of Flask. In such an application, continuous training of 
176 Acta Wasaensia 
INTERNATIONAL JOURNAL OF COMPUTERS AND APPLICATIONS 351 
Figure 11. Results of the process. 
the model might also be a feasible feature and might lead to a more 
accurate solution. 
Future research could also focus on tailoring the model for spe-
cific industrial applications, such as automating quality assurance in 
manufacturing or improving diagnostic capabilities in telemedicine. 
This involves optimizing the system to meet operational constraints, 
such as processing speed, scalability, and integration with exist-
ing workflows. In preprocessing, exploring alternative preprocessing 
techniques, such as adaptive histogram  equalization  or domain-
specific adjustments, could  further enhance  the robustness and relia-
bility of the model under varying lighting and environmental condi-
tions, making it more adaptable to real-world scenarios. In addition, 
creating diverse datasets with variations in imaging conditions, color 
ranges, and patterns would help generalize the system to a wider 
variety of use cases and ensure its effectiveness in different domains. 
6. Conclusions 
The research highlights the effectiveness of different ResNet architec-
tures for subtle color classification and demonstrates how targeted 
modifications can enhance model performance. By systematically 
evaluating different ResNet variants, the research identified ResNet-
34  as  the most suitable baseline model, with gradient centralization  
further enhancing its accuracy. Two custom versions of ResNet-34 
were proposed, achieving a classification accuracy of up to 99.28% 
through the  use of globalmax pooling instead  of global  average pool-
ing. The results of the experiment were verified with a 5-fold K-Fold 
cross-validation. These results underscore the flexibility of ResNet 
architectures and the importance of fine-tuning their components for 
specific applications. The proposed architectures demonstrate poten-
tial for real-world applications in fields like civil engineering, food 
production, and healthcare, where precise color differentiation plays 
Acta Wasaensia 177 
352 J. ISOHANNI 
Figure 12. Final architectures used. 
Figure 13. Training process of each fold. 
178 Acta Wasaensia 
INTERNATIONAL JOURNAL OF COMPUTERS AND APPLICATIONS 353 
Figure 14. Combined confusion matrix of K-fold cross-validation. 
a crucial role. Some restrictions of the  experiment  were  small dataset,  
that was augmented and usage of one printer type to print QR-code 
that worked as color information carries. Future work around the 
topic could explore broader datasets, alternative preprocessing meth-
ods,  and real-time  implementation onmobile  platforms to extend the  
applicability of the proposed approaches. 
Disclosure statement 
No potential conflict of interest was reported by the author(s). 
Funding 
This work was supported by the Finnish Cultural Foundation’s Central 
Ostrobothnia Regional Fund (Suomen Kulttuurirahasto) [grant number 2521 
1242]. 
Data availability statement 
One of the datasets used in this manuscript is available as Zenodo 
repository: https://doi.org/10.5281/zenodo.11079897. 
Notes on contributor 
Jari Isohanni, MSc (Computer Science), is currently working with his doctoral 
studies at the University  of  Vaasa (Digital Economy). His dissertation compares  
different approaches in recognition of subtle color differences in printed sources. 
Jari has been working in the software industry since 2004, currently acting as 
Director (Education) at Centria University of Applied Sciences. 
ORCID 
Jari Isohanni http://orcid.org/0000-0002-7154-2515 
References 
[1] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. 
In: Proceedings of the IEEE Conference on Computer Vision and Pattern 
Recognition, Las Vegas, NV, USA; 2016. p. 770–778. 
[2] He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks. 
In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, 
The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer; 
2016. p. 630–645. 
[3] He K, Zhang X, Ren S, et al. Delving deep into rectifiers: surpassing human-
level performance on imagenet classification. In: Conference on Computer 
Vision and Pattern Recognition (CVPR), Boston, MA; 2015. p. 1026–1034. 
[4] Yang S, Liu S, Yang C, et al. Re-rank coarse classification with local 
region enhanced features for fine-grained image recognition. Preprint; 2021. 
arXiv:210209875. 
[5] Ailing Q, Ning T. Fine-grained vehicle recognition method based on 
improved ResNet. In: 2nd International Conference on Information Tech-
nology and Computer Application (ITCA) IEEE; 2020. p. 588–592. 
[6] Li S, Jiao J, Han Y, et al. Demystifying ResNet. Preprint; 2016. arXiv:16110 
1186. 
[7] Targ S, Almeida D, Lyman K. Resnet in ResNet: generalizing residual archi-
tectures. Preprint; 2016. arXiv:160308029. 
[8] Zagoruyko S, Komodakis N. Wide residual networks. Preprint; 2016. 
arXiv:160507146. 
[9] Li B, He Y. An improved ResNet based on the adjustable shortcut connec-
tions. IEEE Access. 2018;6:18967–18974. doi: 10.1109/ACCESS.2018.281 
4605 
[10] Yuan P, Lin S, Cui C, et al. HS-ResNet: hierarchical-split block on convolu-
tional neural network. Preprint; 2020. arXiv:201007621. 
Acta Wasaensia 179 
354 J. ISOHANNI 
[11] Hayou S, Clerico E, He B, et al. Stable ResNetA Virtual Conference 
International Conference on Artificial Intelligence and Statistics; 2021. p.  
1324–1332. 
[12] Bharati S, Podder P, Mondal M, et al. Co-ResNet: optimized ResNet model 
for COVID-19 diagnosis from X-ray images. Int J Hybrid Intell Syst. 
2021;17(1–2):71–85. 
[13] Wen L, Li X, Gao L. A transfer convolutional neural network for fault diag-
nosis based on ResNet-50. Neural Comput Appl. 2020;32:6111–6124. doi: 
10.1007/s00521-019-04097-w 
[14] Sarwinda D, Paradisa RH, Bustamam A, et al. Deep learning in 
image classification using residual network (ResNet) variants for detec-
tion of colorectal cancer. Procedia Comput Sci. 2021;179:423–431. doi: 
10.1016/j.procs.2021.01.025 
[15] Li B, Lima D. Facial expression recognition via ResNet-50. Int J Cogn 
Comput Eng. 2021;2:57–64. 
[16] Yu X, Kang C, Guttery DS, et al. ResNet-SCDA-50 for breast abnormal-
ity classification. IEEE/ACM Trans Comput Biol Bioinform. 2020;18(1):94– 
102. doi: 10.1109/TCBB.8857 
[17] Gao M, Qi D, Mu H, et al. A transfer residual neural network based on 
ResNet-34 for detection of wood knot defects. Forests. 2021;12(2):212. doi: 
10.3390/f12020212 
[18] Hammad M, Pławiak P, Wang K, et al. Resnet-attention model for human 
authentication using ECG signals. Expert Syst. 2021;38(6):e12547. doi: 
10.1111/exsy.v38.6 
[19] Han C, Shi L. ML–ResNet: a novel network to detect and locate myocar-
dial infarction using 12 leads ECG. Comput Methods Programs Biomed. 
2020;185:105138. doi: 10.1016/j.cmpb.2019.105138 
[20] Isohanni J. Use of functional ink in a smart tag for fast-moving con-
sumer goods industry. J Packag Technol Res. 2022;6(3):187–198. doi: 
10.1007/s41783-022-00137-4 
[21] Almoosawi NM, Khudeyer RS. ResNet-34/DR: a residual convolutional 
neural network for the diagnosis of diabetic retinopathy. Informatica. 
2021;45(7):115–124. 
[22] Hu WJ, Fan J, Du YX, et al. MDFC–ResNet: an agricultural IoT system 
to accurately recognize crop diseases. IEEE Access. 2020;8:115287–115298. 
doi: 10.1109/Access.6287639 
[23] Al-Haija QA, Adebanjo A. Breast cancer diagnosis in histopathological 
images using ResNet-50 convolutional neural network. In: IEEE Inter-
national IOT, Electronics and Mechatronics Conference (IEMTRONICS), 
Vancouver, BC, Canada. IEEE; 2020. p. 1–7.  
[24] Zhang X, Li H, Sun S, et al. Classification and identification of apple leaf dis-
eases and insect pests based on improved ResNet-50 model. Horticulturae. 
2023;9(9):1046. doi: 10.3390/horticulturae9091046 
[25] Shaheed K, Qureshi I, Abbas F, et al. EfficientRMT-Net – an efficient 
ResNet-50 and vision transformers approach for classifying potato plant leaf 
diseases. Sensors. 2023;23(23):9516. doi: 10.3390/s23239516 
[26] Li X, Rai L. Apple leaf disease identification and classification using ResNet 
models. In: 3rd International Conference on Electronic Information and 
Communication Technology (ICEICT); Shenzhen, China. IEEE; 2020. p.  
738–742. 
[27] Isohanni J. QR-codes with colour embed inside. Zenodo. 2024. doi:10.5281/zen-
odo.11079897. 
[28] Basuki A, Ramadijanti N. Improving auto level method for enhance-
ment of underwater images. In: Manado International Conference on 
Knowledge Creation and Intelligent Computing (KCIC); 2016; p. 120–125. 
doi:10.1109/KCIC.2016.7883635. 
[29] Zhang G, Lin L, Wang J. Lung nodule classification in CT images using 3D 
DenseNet. J Phy Conf Series 2021;1827:012155. 
[30] Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge, USA: MIT 
Press; 2016. 
[31] LeCun Y, Boser B, Denker J, et al. Handwritten digit recognition with a back-
propagation network. Adv Neural Inf Process Syst. 1989;2:396–404. 
[32] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to doc-
ument recognition. Proc IEEE. 1998;86(11):2278–2324. doi: 10.1109/5.72 
6791 
[33] Nair V, Hinton GE. Rectified linear units improve restricted Boltzmann 
machines. In: Proceedings of the 27th International Conference on Machine 
Learning (ICML-10); 2010; p. 807–814. 
[34] Goh GB, Hodas NO, Vishnu A. Deep learning for computational chemistry. 
J Comput Chem.  2017;38(16):1291–1307. doi: 10.1002/jcc.v38.16 
[35] Zweiri YH, Whidborne JF, Seneviratne LD. A three-term backpropagation 
algorithm. Neurocomputing. 2003;50:305–318. doi: 10.1016/S0925-2312(02) 
00569-6 
[36] Wang X, Qin Y, Wang Y, et al. ReLTanh: an activation function with 
vanishing gradient resistance for SAE-based DNNs and its application to 
rotating machinery fault diagnosis. Neurocomputing. 2019;363:88–98. doi: 
10.1016/j.neucom.2019.07.017 
[37] Rehmer A, Kroll A. On the vanishing and exploding gradient problem 
in gated recurrent units. IFAC-PapersOnLine. 2020;53(2):1243–1248. doi: 
10.1016/j.ifacol.2020.12.1342 
[38] Tong T, Li G, Liu X, et al. Image super-resolution using dense skip con-
nections Venice. In: Proceedings of the IEEE International Conference on 
Computer Vision; 2017. p. 4799–4807. 
[39] Ooi YK, Ibrahim H, Mahyuddin MN. Enhanced dense space attention 
network for super-resolution construction from single input image. IEEE 
Access. 2021;9:126837–126855. doi: 10.1109/ACCESS.2021.3111983 
[40] Haider A, Arsalan M, Choi J, et al. Robust segmentation of underwater fish 
based on multi-level feature accumulation. Front Mar Sci. 2022;9:1010565. 
doi: 10.3389/fmars.2022.1010565 
[41] Hassan E, Hossain MS, Saber A, et al. A quantum convolutional net-
work and ResNet (50)-based classification architecture for the MNIST 
medical dataset. Biomed Signal Process Control. 2024;87:105560. doi: 
10.1016/j.bspc.2023.105560 
[42] Md Rabiul Hasan ABMAH, Ullah SMA. Ensemble ResDenseNet: Alzhei 
mer’s disease staging from brain MRI using deep weighted ensemble 
transfer learning. Int J Comput Appl. 2024;46(7):539–554. doi: 10.1080/ 
1206212X.2024.2380648 . 
[43] Senapati B, Talburt JR, Naeem AB, et al. Transfer learning based mod-
els for food detection using ResNet-50. In: Romeoville IEEE International 
Conference on Electro Information Technology (EIT); 2023. p. 224–229. 
[44] Wu D, Ying Y, Zhou M, et al. Improved ResNet-50 deep learning algorithm 
for identifying chicken gender. Comput Electron Agric. 2023;205:107622. 
doi: 10.1016/j.compag.2023.107622 
[45] Lin CL, Wu KC. Development of revised ResNet-50 for diabetic retinopathy 
detection. BMC Bioinform. 2023;24(1):1–18. doi: 10.1186/s12859-022-05 
124-9 
[46] Madhukar BN, Bharathi SH, Polnaya AM. Multi-scale convolution based 
breast cancer image segmentation with attention mechanism in conjunction 
with war search optimization. Int J Comput Appl. 2023;45(5):353–366. doi: 
10.1080/1206212X.2023.2212945 . 
[47] Zagoruyko S, Komodakis N. Wide residual networks; 2017. Available from: 
https://arxiv.org/abs/1605.07146 [cs.CV]. 
[48] Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for 
deep neural networks. In: Proceedings of the IEEE Conference on Computer 
Vision and Pattern Recognition; 2017. p. 1492–1500. 
[49] Wong TT, Yeh PY. Reliable accuracy estimates from k-fold cross validation. 
IEEE Trans Knowl Data Eng. 2019;32(8):1586–1594. doi: 10.1109/TKDE.69 
[50] James G, Witten D, Hastie T, et al. An introduction to statistical learning. 
Vol. 112. New York, USA: Springer; 2013. 
[51] Yadav S, Shukla S. Analysis of k-fold cross-validation over hold-out valida-
tion on colossal datasets for quality classification. In: IEEE 6th International 
Conference on Advanced Computing (IACC); 2016. p. 78–83. 
[52] Lyu Z, Yu Y, Samali B, et al. Back-propagation neural network optimized 
by k-fold cross-validation for prediction of torsional strength of reinforced 
concrete beam. Materials. 2022;15(4):1477. doi: 10.3390/ma15041477 
[53] Yong H, Huang J, Hua X, et al. Gradient centralization: a new optimization 
technique for deep neural networks. In: Computer Vision–ECCV 2020: 16th 
European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part 
I 16. Springer; 2020. p. 635–652. 
[54] Fuhl W, Kasneci E. Weight and gradient centralization in deep neural 
networks. Preprint; 2020. arXiv:201000866. 
[55] Lu F, Niu R, Zhang Z, et al. A generative adversarial network-based fault 
detection approach for photovoltaic panel. Appl Sci. 2022;12(4):1789. doi: 
10.3390/app12041789 
[56] Agarwal V, Lohani M, Bist AS. Comparative analysis of deep learning models 
for various optimizer embedded with gradient centralization. Int J Intell Syst 
Appl Eng. 2024;12(15s):445–454. 
[57] Liu Z, Xu Z, Jin J, et al. Dropout reduces underfitting. In: International 
Conference on Machine Learning. PMLR; 2023. p. 22233–22248. 
[58] Hahn S, Choi H. Understanding dropout as an optimization trick. Neuro-
computing. 2020;398:64–70. doi: 10.1016/j.neucom.2020.02.067 
[59] Wu H, Gu X. Towards dropout training for convolutional neural networks. 
Neural Netw. 2015;71:1–10. doi: 10.1016/j.neunet.2015.07.007 
[60] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent 
neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–1958. 
[61] Wu H, Gu X. Max-pooling dropout for regularization of convolutional neu-
ral networks. In: Neural Information Processing: 22nd International Confer-
ence, ICONIP 2015, Istanbul, Turkey, November 9–12, 2015, Proceedings, 
Part I 22. Springer; 2015. p. 46–54. 
[62] Garbin C, Zhu X, Marques O. Dropout vs. batch normalization: an 
empirical study of their impact to deep learning. Multimed Tools Appl. 
2020;79(19):12777–12815. doi: 10.1007/s11042-019-08453-9 
[63] Ko B, Kim HG, Oh KJ, et al. Controlled dropout: a different approach 
to using dropout on deep neural network. In: 2017 IEEE International 
180 Acta Wasaensia 
INTERNATIONAL JOURNAL OF COMPUTERS AND APPLICATIONS 355 
Conference on Big Data and Smart Computing (BigComp). IEEE; 2017. p. 
358–362. 
[64] Skourt BA, El Hassani A, Majda A. Mixed-pooling-dropout for convolu-
tional neural network regularization. J King Saud Univ Comput Inf Sci. 
2022;34(8):4756–4762. 
[65] Khan SH, Hayat M, Porikli F. Regularization of deep neural networks with 
spectral dropout. Neural Netw. 2019;110:82–90. doi: 10.1016/j.neunet.2018. 
09.009 
[66] Shirke V, Walika R, Tambade L. Drop: a simple way to prevent neural 
network by overfitting. Int J Res Eng Sci Manag. 2018;1:2581–5782. 
[67] Hou W, Wang W, Liu RZ, et al. Cropout: a general mechanism for reducing 
overfitting on convolutional neural networks. In: 2019 International Joint 
Conference on Neural Networks (IJCNN). IEEE; 2019. p. 1–8. 
[68] Poernomo A, Kang DK. Biased dropout and crossmap dropout: learning 
towards effective dropout regularization in convolutional neural network. 
Neural Netw. 2018;104:60–67. doi: 10.1016/j.neunet.2018.03.016 
[69] Bieder F, Sandkühler R, Cattin PC. Comparison of methods generalizing 
max-and average-pooling. Preprint; 2021. arXiv:210301746. 
[70] Zafar A, Aamir M, Mohd Nawi N, et al. A comparison of pooling meth-
ods for convolutional neural networks. Appl Sci. 2022;12(17):8643. doi: 
10.3390/app12178643 
[71] Chollet F. Deep learning with python. Shelter Island: Simon and Schuster; 
2021. 
[72] Han J, Kamber M, Pei J. 2 – getting to know your data. In: Han J, 
Kamber M, Pei J, editors. The Morgan Kaufmann series in data manage-
ment systems. Boston: Morgan Kaufmann; 2012. p. 39–82. doi: 10.1016/ 
B978-0-12-381479-1.00002-2 
[73] Japkowicz N, Shah M. Evaluating learning algorithms: a classification per-
spective. Cambridge: Cambridge University Press; 2011. 
[74] Ying X. An overview of overfitting and its solutions. In: Journal of Physics: 
Conference Series. Vol. 1168. IOP Publishing; 2019. p. 022022. 
[75] Brigato L, Mougiakakou S. No data augmentation? Alternative regular-
izations for effective training on small datasets. In: Proceedings of the 
IEEE/CVF International Conference on Computer Vision; 2023. p. 139–148. 
[76] Prechelt L. Early stopping-but when? In: Neural networks: tricks of the trade. 
Heidelberg: Springer; 2002. p. 55–69. 
[77] Ba J, Frey B. Adaptive dropout for training deep neural networks. Adv Neural 
Inf Process Syst. 2013;26. 
[78] Cai S, Shu Y, Chen G, et al. Effective and efficient dropout for deep convolu-
tional neural networks. Preprint; 2019. arXiv:190403392. 
[79] Furusho Y, Ikeda K. Resnet and batch-normalization improve data separa-
bility. In: Asian Conference on Machine Learning. PMLR; 2019. p. 94–108. 
[80] Murray N, Perronnin F. Generalized max pooling. In: Proceedings of the 
IEEE Conference on Computer Vision and Pattern Recognition; 2014. p. 
2473–2480. 
[81] Momeny M, Jahanbakhshi A, Jafarnezhad K, et al. Accurate classification of 
cherry fruit using deep CNN based on hybrid pooling approach. Postharvest 
Biol Technol. 2020;166:111204. doi: 10.1016/j.postharvbio.2020.111204 
[82] Lehmann MK, Nguyen U, Allan M, et al. Colour classification of 1486 lakes 
across a wide range of optical water types. Remote Sens. 2018;10(8):1273. 
doi: 10.3390/rs10081273 
[83] Hu H. Research on colour recognition sorting method of waste plastic bottles 
based on computer perspective. In: 2022 4th International Conference on 
Artificial Intelligence and Advanced Manufacturing (AIAM). IEEE; 2022. p. 
914–918. 
[84] Petus C, Waterhouse J, Tracey D, et al. Using optical water-type classifica-
tion in data-poor water quality assessment: a case study in the torres strait. 
Remote Sens. 2022;14(9):2212. doi: 10.3390/rs14092212 
[85] Wei X, Bohrer B, Uttaro B, et al. Centre pork chop colour classification 
using image analysis on the ventral surface of the loin. Can J Anim Sci. 
2023:123–126. 
[86] Pegalajar M, Ruiz L, Criado-Ramón D. Munsell soil colour classification 
using smartphones through a neuro-based multiclass solution. AgriEngi-
neering. 2023;5(1):355–368. doi: 10.3390/agriengineering5010023 
[87] Reyes JF, Contreras E, Correa C, et al. Image analysis of real-time classi-
fication of cherry fruit from colour features. J Agric Eng. 2021;52(4). doi: 
10.4081/jae.2021.1160 
[88] Göksel Duru D, Alobaidi M. Classification of brain electrophysio-
logical changes in response to colour stimuli. Phys Eng Sci Med. 
2021;44(3):727–743. doi: 10.1007/s13246-021-01021-2 
[89] van Minderhout HM, Joosse MV, Grootendorst DC, et al. Eye colour 
and skin pigmentation as significant factors for refractive outcome and 
residual accommodation in hypermetropic children: a randomized clini-
cal trial using cyclopentolate 1% and tropicamide 1%. Acta Ophthalmol. 
2022;100(4):454–461. doi: 10.1111/aos.v100.4 
[90] Sukhetha P, Hemalatha N, Sukumar R. Classification of fruits and vegetables 
using ResNet model. agriRxiv; 2021. 20210317,450. 
[91] Gouda N, Amudha J. Skin cancer classification using ResNet. In: 2020 
IEEE 5th International Conference on Computing Communication and 
Automation (ICCCA); 2020. p. 536–541. doi: 10.1109/ICCCA49541 
[92] Singh A, Bay A, Mirabile A. Assessing the importance of colours for CNNs 
in object recognition. Preprint; 2020. arXiv:201206917. 
[93] Mathew MB, Surya Manjunathan G, Gokul B, et al. Banana ripeness iden-
tification and classification using hybrid models with ResNet-50, VGG-16 
and machine learning techniques. In: Machine Intelligence Techniques for 
Data Analysis and Signal Processing: Proceedings of the 4th International 
Conference MISP 2022. Vol. 1. Springer; 2023. p. 259–273. 
[94] Zhang C, Xia K, Feng H, et al. Tree species classification using deep learning 
and RGB optical images obtained by an unmanned aerial vehicle. J For Res. 
2021;32(5):1879–1888. doi: 10.1007/s11676-020-01245-0 
[95] Reddy SR, Varma GS, Davuluri RL. Resnet-based modified red deer opti-
mization with DLCNN classifier for plant disease identification and classi-
fication. Comput Electr Eng. 2023;105:108492. doi: 10.1016/j.compeleceng. 
2022.108492 
Acta Wasaensia 181