696 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 32, 2024

Dual-3DM3
−AD: Mixed Transformer Based

Semantic Segmentation and Triplet
Pre-Processing for Early Multi-Class

Alzheimer’s Diagnosis
Arfat Ahmad Khan , Rakesh Kumar Mahendran , Kumar Perumal , Member, IEEE,

and Muhammad Faheem

Abstract— Alzheimer’s Disease (AD) is a widespread,
chronic, irreversible, and degenerative condition, and its
early detection during the prodromal stage is of utmost
importance. Typically, AD studies rely on single data
modalities, such as MRI or PET, for making predictions.
Nevertheless, combining metabolic and structural data can
offer a comprehensive perspective on AD staging analysis.
To address this goal, this paper introduces an innovative
multi-modal fusion-based approach named as Dual-3DM3-
AD. This model is proposed for an accurate and early
Alzheimer’s diagnosis by considering both MRI and PET
image scans. Initially, we pre-process both images in terms
of noise reduction, skull stripping and 3D image conversion
using Quaternion Non-local Means Denoising Algorithm
(QNLM), Morphology function and Block Divider Model
(BDM), respectively, which enhances the image quality.
Furthermore, we have adapted Mixed-transformer with Fur-
thered U-Net for performing semantic segmentation and
minimizing complexity. Dual-3DM3-AD model is consisted
of multi-scale feature extraction module for extracting
appropriate features from both segmented images. The
extracted features are then aggregated using Densely
Connected Feature Aggregator Module (DCFAM) to utilize
both features. Finally, a multi-head attention mechanism
is adapted for feature dimensionality reduction, and then
the softmax layer is applied for multi-class Alzheimer’s
diagnosis. The proposed Dual-3DM3-AD model is compared
with several baseline approaches with the help of several
performance metrics. The final results unveil that the pro-
posed work achieves 98% of accuracy, 97.8% of sensitivity,
97.5% of specificity, 98.2% of f-measure, and better ROC
curves, which outperforms other existing models in multi-
class Alzheimer’s diagnosis.

Manuscript received 7 September 2023; revised 17 December 2023;
accepted 18 January 2024. Date of publication 23 January 2024;
date of current version 8 February 2024. (Corresponding author:
Muhammad Faheem.)

Arfat Ahmad Khan is with the Department of Computer Science, Col-
lege of Computing, Khon Kaen University, Khon Kaen 40002, Thailand
(e-mail: arfatkhan@kku.ac.th).

Rakesh Kumar Mahendran and Kumar Perumal are with the Depart-
ment of Computer Science and Engineering, Rajalakshmi Engineering
College, Chennai 602105, India (e-mail: rakeshkumarmahendran@
gmail.com; kumar@rajalakshmi.edu.in).

Muhammad Faheem is with the Department of Computing, School of
Technology and Innovations, University of Vaasa, 65200 Vaasa, Finland
(e-mail: muhammad.faheem@uwasa.fi).

Digital Object Identifier 10.1109/TNSRE.2024.3357723

Index Terms— Alzheimer’s diagnosis, multi-modalities,
MRI, PET, semantic segmentation, mixed transformer,
multi-scale feature extraction.

I. INTRODUCTION

ALZHEIMER’S disease, an inexorable and series neuro-
logical problem, causes brain shrinkage and ranks among

the most prevalent causes of mortality in the elderly popula-
tion [1], [2], [3]. It progressively erodes memory and cognitive
faculties, eventually rendering even the simplest tasks insur-
mountable, disrupting daily life [4]. The primary culprit behind
the disease is the accumulation of abnormal proteins in and
around brain cells [5]. Amyloid protein aggregates to form
plaques around the brain, while tau protein forms tangles
within. Diagnosing Alzheimer’s disease can be challenging,
especially in older individuals [6], [7]. Consequently, Magnetic
Resonance Imaging (MRI) helps medical professionals in the
detection of this illness. Image analysis stands out as a promi-
nent method for diagnosing Alzheimer’s disease, as modern
medical imaging equipment yields a plethora of data about
the under-examination patient. T1-weighted structural MRI
scans and 18F 2-Fluoro-2-deoxy-D-Glucose Positron Emission
Tomography (FDG-PET) offer spatial insights into atrophy and
hypometabolism, respectively [8], [9], [10], [11].

The pathophysiological processes behind Alzheimer’s dis-
ease inflict damage upon brain tissues and disrupt their normal
metabolic functions [12]. FDG-PET can pinpoint areas with
impaired functions by visualizing metabolic irregularities. The
regional hypoperfusion/hypometabolism, particularly in bipari-
etal and bitemporal distributions, strongly correlates with the
clinical detection of the disease [13], [14]. PET scans are
capable of identifying diseases even before the emergence
of discernible symptoms or warning signals by scrutinizing
biological functions through metabolic processes [15]. Sim-
ilarly, MRI scans can gauge variations in the volume of
recognizable brain regions, allowing the observation of the
gradual brain atrophy caused by AD-related neurodegener-
ation [16]. This atrophy is attributed to losses in dendrites
and neurons. The atrophy measurements from MRIs can be
employed to estimate cumulative neuronal damage, as there

© 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
For more information, see https://creativecommons.org/licenses/by/4.0/

https://orcid.org/0000-0003-0918-8874
https://orcid.org/0000-0002-5059-7269
https://orcid.org/0000-0003-4282-5476
https://orcid.org/0000-0003-4628-4486


KHAN et al.: DUAL-3DM3
−AD: MIXED TRANSFORMER 697

exist a robust correlation between atrophy and cognitive
decline [17], [18], [19].

Detecting Alzheimer’s disease with the help of MRI images
involves many key stages, such as pre-processing, extraction
of features, segmentation, and classification. In the initial stage
of pre-processing, MRI images undergo essential adjustments
to address their susceptibility to noise as well as non-brain
tissue existences (such as the skin, scalp, dura, muscles, fat,
eye, etc.) [20], [31], [32]. It is worth noticing that some
previous studies omit skull stripping and overlook noise reduc-
tion (including salt and pepper noise, Gaussian noise, and
Rician noise, etc.), ultimately compromising their classifica-
tion accuracies. To enhance the classification accuracy and
computational efficiency, segmentation follows pre-processing.
Segmentation is a crucial process that involves distinguishing
the cerebrospinal fluid, white matter, and gray matter, yield-
ing essential information for subsequent categorization [33].
Interestingly, some prior research neglects segmentation alto-
gether, while many rely on automated image analysis tools
like Statistical Parametric Mapping (SPM), FreeSurfer, and
FSL-FAST4 [34]. However, the use of such automated tools
can substantially increase the computation time, potentially
impacting the efficiency of the segmentation process [35].
It is important to highlight that the automated methods for
estimating volume yield inaccurate results without the proper
validation. Automated tools often rely on intensity compar-
isons with atlases to guide the segmentation process, which can
introduce potential errors and complexities in the analysis [36].

The prevailing approach in current research involves
employing deep learning-based methods with the aim of
classification and extraction of useful features. However, these
algorithms typically extract only individual features or small
datasets, which proves to be insufficient in terms of classifi-
cation in an accurate way. The existing studies draw upon a
repertoire of techniques, containing Machine Learning (ML),
neural networks, and Deep Learning (DL) [37]. ML meth-
ods including K-nearest neighbours, decision trees, SVM
and random forests are frequently utilized. However, their
training complexity tends to increase due to the generation
of many trees during the extraction of features, and these
methods do not perform well in terms of handling extensive
datasets. On the other hand, deep learning, which relies on
neural networks for classification and the extraction of fea-
tures, encompasses various models like convolutional neural
networks, multilayer perceptrons, and radial basis functions.
Deep learning surpasses the shortcomings of conventional ML
methods. However, this approach often involves numerous
hidden layers, substantial convergence weights, and extended
computation times, leading to the heightened complexity and
a potential reduction in classification accuracies [38], [39],
[40]. To address these challenges, researchers have turned to
Mixed transformer-based semantic segmentation to overcome
the hurdles faced by automated tools during the segmentation
process. Additionally, a multi-scale feature extraction with an
effective Dual-3DM3

−AD architecture has been employed to
mitigate the issues arising from high complexity and elevated
false positive rates encountered during the feature extraction.

Research Contribution: The diagnosis of Alzheimer’s dis-
ease faces several notable drawbacks, particularly in the

context of neuroimaging and image analysis. Alzheimer’s,
a relentless and debilitating neurological condition, is marked
by significant challenges in its diagnosis. MRI and PET scans
have become integral tools for identifying the disease, and
they are not without limitations. One significant drawback is
the high cost and resource-intensive nature of these imaging
techniques, making them less accessible for many patients
and healthcare facilities. Furthermore, these methods primarily
provide structural or metabolic insights into the brain, often
lacking the ability to diagnose the disease in its early stages
when structural changes may not yet to be apparent. Addition-
ally, the process of image analysis, involves pre-processing,
segmentation, and classification, is susceptible to errors and
variations. Although automated tools are convenient, they
can compromise accuracy and introduce complexities. The
prevailing use of neural networks, machine learning, and deep
learning methods exhibits good performances. However, they
often demand substantial computational resources, resulting in
the increased complexity and potentially reduced diagnostic
accuracy. These challenges highlight the need for ongoing
research and the development of more accessible and pre-
cise diagnostic methods for Alzheimer’s disease. Henceforth,
we focus on an accurate and earlier Alzheimer diagnosis using
multi-modalities. To achieve this, we have contributed several
novelties explained as follows:

• This paper introduces a novel approach that combines
multiple data modalities, specifically MRI and PET
scans, to enhance Alzheimer’s Disease (AD) diagnosis.
This fusion-based approach offers a holistic perspective
on AD staging analysis.

• The research incorporates advanced preprocessing tech-
niques, including noise reduction, skull stripping, and
3D image conversion, achieved through the QNLM,
Morphology function, and BDM. These processes sig-
nificantly enhance the quality of the image data, ensuring
more reliable analysis.

• To reduce complexity and improve the accuracy of the
analysis, the study employs a Mixed-transformer with
Furthered U-Net architecture for semantic segmentation.
This step aids in identifying and isolating relevant
regions within the images.

• Dual-3DM3-AD model includes a multi-scale feature
extraction module, which extracts pertinent features
from both segmented images. This module ensures that
the critical information from images is effectively cap-
tured. The extracted features are then aggregated using
the DCFAM. This aggregation process maximizes the
utilization of information from both MRI and PET scans,
further enhancing the accuracy of the diagnosis. The
multi-head attention mechanism helps to reduce the fea-
ture dimensionality. This step actually aids to streamline
the data, while retaining essential information.

II. LITERATURE SURVEY
The prevalence of big data analytics and the enhanced

computational power offered by GPU clusters have firmly
established Deep Learning (DL) as a prevalent and influen-
tial technique, extending its reach into numerous domains.
Presently, it has become common to leverage DL models


698 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 32, 2024

for various recognition applications in the realm of medical
image analysis. In recent times, researchers have increasingly
turned to MRI and PET modalities to embrace DL for the
development of Alzheimer’s Disease (AD) diagnosis models.
Remarkably, a high-resolution T1-weighted MRI scan pos-
sesses the capability to identify atrophies in distinct brain
regions by providing critical structural insights into the brain.
Wei et al. [21] explore the application of Bi-directional
Empirical Model Decomposition (BEMD) for the automated
detection of Alzheimer’s disease. BEMD, a signal processing
technique, is employed for feature extraction from medical
data. This approach leverages BEMD’s potential in revealing
hidden patterns in multi-modal data sources to enhance the
early diagnosis of Alzheimer’s disease. The novelty of this
work lies in its innovative application of BEMD for the auto-
mated Alzheimer’s disease detection, potentially improving
diagnostic accuracy. Zaina et al. [22] introduce a novel feature
extraction method called Exemplar Pyramid for Alzheimer’s
disease classification. The study focuses on extracting dis-
criminative features from neuroimaging data, particularly MRI
scans, to aid in the accurate detection of Alzheimer’s dis-
ease. The innovation lies in its novel approach of utilizing
exemplar pyramid feature extraction, which enhances the
accuracy and effectiveness of Alzheimer’s disease classifica-
tion. Basheera et al. [23] present a classification method for
Alzheimer’s disease based on Convolutional Neural Networks
(CNNs), and the enhanced Independent Component Analysis
(ICA) is applied to segmented gray matter in MRI images.
By combining deep learning and feature extraction from MRI
scans, this study aims to advance the accuracy and efficiency
of Alzheimer’s disease detection. The paper’s contribution
lies in introducing a novel Alzheimer’s disease classification
method that combines CNN and hybrid enhanced ICA seg-
mentation, improving the accuracy of diagnosis using MRI
data. Murugan et al. [24] propose a deep learning model for
the early diagnosis of Alzheimer’s disease and dementia using
MR images. This research leverages the power of deep neural
networks to automatically extract relevant features and classify
patients based on neuroimaging data. The aim of this work is
the development of a deep learning model for early and accu-
rate diagnosis of Alzheimer’s disease and dementia, potentially
advancing early intervention and treatment. Febietti et al. [25]
delve into early detection by utilizing cortical and hippocampal
Local Field Potentials (LFPs) and ensemble machine learn-
ing models. By incorporating electrophysiological data, this
study explores an alternative approach to Alzheimer’s disease
detection. The contribution of this work is the development of
an ensemble machine learning approach for early Alzheimer’s
disease detection using neural signals, potentially advancing
early diagnosis and intervention.

Dwivedi et al. [26] focuse on the development of a multi-
modal fusion-based deep learning network for the effective
diagnosis of Alzheimer’s disease. It addresses the importance
of integrating data from various sources, such as neuroimag-
ing, genomics, and clinical assessments, to enhance diagnostic
accuracy. Yu et al. [27] explore the application of high-order
pooling and Generative Adversarial Networks (GANs) for
assessing Alzheimer’s disease. The research introduces inno-
vative techniques for feature extraction and data representation

by tensorizing GANs. The approach aims to improve the
accuracy and efficiency of Alzheimer’s disease assessment
using advanced data manipulation. The effectiveness of this
paper is the innovative integration of high-order pooling and
GAN techniques to enhance the assessment of Alzheimer’s
disease, potentially improving diagnostic accuracy and early
detection. Song et al. [28] delve into the application of the
Random Forest algorithm for diagnostic classification and
biomarker identification in Alzheimer’s disease. It emphasizes
the importance of interpretable machine learning methods in
uncovering relevant biomarkers for diagnosis. Bron et al. [29]
investigate the generalizability of machine learning models
for Alzheimer’s disease diagnosis across different cohorts.
It addresses the challenge of model transferability by exam-
ining the performance of deep learning and conventional
machine learning models on diverse datasets. The effectiveness
of this research is demonstrated through its robust ability
to generalize and accurately diagnose Alzheimer’s disease
across multiple cohorts, showcasing its potential for broad
clinical application. Etmanani et al. [30] introduce a 3D
deep learning model for predicting the diagnosis of various
neurodegenerative disorders, including dementia with Lewy
bodies, Alzheimer’s disease, and mild cognitive impairment.
The use of brain 18F-FDG PET scans and deep learning
techniques underscores the potential of non-invasive imaging
in early diagnosis and differentiation of these conditions.
The effectiveness of this work is evidenced by its accurate
prediction of various neurodegenerative conditions through
the analysis of 3D PET scans, providing valuable diagnostic
support.

III. DUAL-3DM3-AD FRAMEWORK

In this study, we primarily concentrate on the detection of
Alzheimer’s disease with the help of mathematical modelling.
With the help of pre-processing, extracting features, and seg-
menting, the suggested approach increases the classification
accuracy. We use the Alzheimer’s Disease Neuroimaging Ini-
tiative (ADNI) database’s T1-weighted MRI and PET images.
The three phases of the proposed work are as follows:

A. Data Acquisition
In this research, we have utilized neuroimaging data

acquired from Alzheimer’s Disease Neuroimaging Ini-
tiative (ADNI) dataset (https://www.kaggle.com/datasets/
madhucharan/alzheimersdisease5classdatasetadni). The main
intention of ADNI team is the neuropsychological calculation
for evaluating the improvement of MCI to initial AD and
for AD supplemented via research of resultant of combined
several biomarkers, utilizing Cerebos Spinal Fluid (CSF) data,
MRI and PET. The cases are chosen from ADNI dataset cohort
to our experiment prerequisite, having the visit of both conse-
quent and screening. The cases age ranges from 55 to 89 years
old, containing both female and male. We chose 100 normal,
100 MCI and AD cases. For every case, the 18-FDG-PET
images and T1-weighted MRI are adapted in this research.
Here, the PET images are obtained by the constructor model
of SIEMENS along with 2.4mm slice thickness. For that, the
radiopharmaceutical 18F-FDG is utilized which consists of


KHAN et al.: DUAL-3DM3
−AD: MIXED TRANSFORMER 699

63 slices. Besides, MRI images are acquired by 1.5 T scanners.
The slice thickness is 1.2mm with 160 slices, where the size
of each slice is 192 × 192 of 3D images.

B. Data Pre-Processing
The pre-processing approach is optimally prejudiced by the

consequent processing algorithm with image format defined
as:

1) Noise Reduction: Initially, the noise present in both MRI
and PET scans is removed for enhancing image quality. To do,
we have utilized Quaternion Non-local Means Denoising
Algorithm (QNLM). As the QNLM denoising technique lever-
ages the inherent high-degree self-similarities within images
for noise suppression, the choice of a similarity metric among
image patches plays a pivotal role in the algorithm’s noise
reduction effectiveness. We have introduced a novel approach
by replacing the traditional Euclidean distance with the QNLM
technique as a metric for evaluating similarities between image
patches. Meanwhile, the image information constantly con-
tains certain repeatability, as self-resemblance forms during the
distribution of noise is arbitrary. Hence, the target of QNLM
is to make utilize of self-resemblance forms to overwhelm the
noise. Henceforth, the QNLM improves the denoising process
from the level of pixel to patch. The noisy MRI image is
modeled as Y = X + N, and then the denoised image X̂ by
QNLM is mathematically expressed as:

X̂(ρ) =

∑
qϵδρ ϖ (ρ, q)×Y (q)∑

qϵδρ ϖ (ρ, q)
(1)

where δρ is denoted as the search window along with center
ρ, and the weight ϖ (ρ, q) is defined as:

ϖ (ρ, q)= exp
(

−
d (ρ, q) /αn

2

h2

)
(2)

Here, d (ρ, q) indicates the Euclidean distance among two
image patches along with center ρ and q in δρ . Likewise, the
PET image scans are denoised for image betterment.

2) Skull Stripping: Following denoising, the skull stripping
is performed by utilizing morphology. The skull stripping is a
preprocessing step performed in Alzheimer’s disease diagnosis
using brain imaging techniques, such as MRI and PET scans.
It involves the removal of non-brain tissues, including the
skull, scalp, and other extraneous structures, from the acquired
images. This step is crucial because it helps isolate the brain
region of interest, reducing noise and interference caused by
surrounding tissues. By effectively stripping away non-brain
elements, the processes of subsequent image analysis and
feature extraction become more accurate, allowing for a clearer
focus on the brain’s structural and metabolic changes associ-
ated with Alzheimer’s disease. The skull stripping enhances
the overall quality of the images and aids in the reliable
and precise detection of Alzheimer’s-related abnormalities.
For this purpose, the proposed technique is mathematically
integrated with Erosion and Dilation operators. Furthermore,
the proposed technique utilized global thresholding continued
by morphological functions. The thresholding value is evalu-
ated as per intensity distribution knowledge of brain scans.
Initially, the image (I) is read, and RGB is converted as

grayscale profile (I1). Here, the grayscale scan is eroded (I2)

by structuring element of disk-handed (x) in size 4 that is
continued by Dilation (I3) of outcome image utilizing same
structuring element (x). By adapting thresholding scheme, the
acquired image is then binarized (I4). The acquired binary
image is transmuted to unit of 8 format (I5) and that is
subtracted (I6) from the grayscale profile comprising skull
portion alone. By subtracting the image of (I7) from grayscale,
the skull portion is removed and then, the region of brain is
acquired, which is written as:

E (f) = f ⊕ x =

{
γ |(x)z ∩ f∁ = ∅

}
(3)

D (f) = f ⊕ x = {γ |(x)z∩f = ∅} (4)

3) 3D Image Conversion: As 3D image facilities a better
navigation in terms of multiple perspectives, we transfigured
the images to 3D with the skull stripping. As 3D images allow
us to navigate from multiple perspectives in the quest for
skull stripping, the transformation of two-dimensional (2D)
MRI scans into three-dimensional (3D) images is undertaken.
This transformation is driven by the inherent limitation of 2D
images, which provides a flat and single-perspective view,
while 3D images enable navigation from multiple angles,
offering richer and more diverse viewpoints. To achieve these
enhanced 3D images, a Block Divider Model (BDM) is
employed, significantly reducing the time required to obtain
precise depth details by segmenting the 2D images into blocks.
The process begins with the creation of a depth map through
node and link formation. During the conversion from 2D
to 3D images, the depth gradient hypothesis assigns depth
values to individual blocks. This hypothesis encompasses
depth gradients, validating accuracy within the detected area,
culminating in the generation of depth maps. Furthermore, the
identification of shifts in the scene allows the examination of
linear scene perception, facilitated by the Hough Transform
Line Detection Algorithm (HTLDA). The mathematical for-
mulation of the depth gradient hypothesis is as follows:

Dep (D) = 128 + 255

{∑
pixel(a,b)

Wlr + Wtd
b −

height
2

heigt

}
/pixelnum(D) (5)

Where |wlr| + |wtd| = 1

Dep(xi)
=

1
P (ai)

∑
aj∈�(ai)

e
−0.5

[
|aj−ai|

γ 2
x

+
|ν(aj)−ν(ai)|

2

γ 2
ς

]
Dep

(
aj

)
(6)

P (ai) =

∑
aj∈�(ai)

e
−0.5

[
γ 2

a |aj−ai|+γ
2
ς |ν(aj)−ν(ai)|

2
]

(7)

A higher depth value indicates that the pixel is closer to the
observer. Here, the intensity values are scaled from 0 (black) to
255 (white), with intermediate shades of gray are representing
different signal strengths in the image. The following equation
illustrates that the center of gravity is represented by the depth
value within a block group, where the pixels belong to the
same group share the same depth value. The |wlr | and |wtd |
values are controlled to control the depth gradient horizontally
as well as vertically. Once the depth map is generated by
grouping regions into blocks, it may exhibit blocky artifacts.


700 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 32, 2024

To address this issue, the cross-bilateral filter is employed
to smoothly refine the depth map while preserving object
boundaries. Afterward, the depth map is further improved
through pixel value adjustments and hole filling using the
QNLM filter, resulting in the creation of 3D representations.
The preprocessing of the depth image primarily involves
applying a smoothing filter. However, this filter, combined
with the transition of sharp horizontal features, can create
significant holes. To mitigate this problem, the QNLM filter
is utilized to reduce the occurrence of large holes.

We then execute 3D image warping, and the 3D image
warping scheme repositions pixels according to their depth
values. The formulation of 3D image warping is as follows:

el = em +

(
dgx

2
f
Z

)
(8)

er = em −

(
dgx

2
f
Z

)
(9)

where, the horizontal positions are expressed as el, er and em
with respect to the left, right and interposed positions, respec-
tively. The value of depth in the current pixel is represented
by Z. The distance of eye and the focal length is represented
as dgx and f, respectively. Moreover, we use QNLM with the
aim of filtering holes to generate a 3D image.

C. Transformer Based Semantic Segmentation
Following the pre-processing, both pre-processed images are

utilized for segmentation. Here, transformer based semantic
segmentation is executed for acquiring pixel-level information
effectively. For that, Mixed-transformer is used for getting
features, including cortical thickness, colour, texture and
boundary details from images. The densely connected feature
aggregator model is then employed for collecting the features
from multi-modalities and segment the ROI, which is detailly
described below as follows:

1) Mixed Transformer: The core architecture of the net-
work is based on an encoder-decoder framework, with the
incorporation of skip connections during the decoding phase
to retain essential low-level features. Notably, in an effort
to optimize computational resources, we selectively apply
Multi-Head Transformer Modules (MTMs) exclusively to the
deeper layers with reduced spatial dimensions. For the upper
layers, we maintain the use of conventional convolutional
operations. This distinction is deliberate, as the initial lay-
ers contain higher-resolution features, and our focus is on
capturing local relationships within them. Furthermore, the
utilization of convolutional operations in the upper layers
enables us to introduce structural priors into the model,
a valuable feature particularly when working with relatively
small medical image datasets. It is worth noting that a 2-stride
convolutional/deconvolutional kernel is uniformly employed
across all Transformer modules to facilitate channel expansion,
compression, and down/up sampling. MT comprises of Local
Global Gaussian-Self Attention (LGG-SA) and Dense Allied
Feature Accumulation (DAFA). LGG-SA is constructed to
model long-range and short-range dependencies along with
diverse granularity. This technique is designed to substitute
the encoder of traditional transformer for minimizing time

Fig. 1. LGG architecture.

complexity as well as providing better performance. LGG-SA
modules are detailed below as follows:

a) Local-global self-attention: Initially, the SA tends to
extract the interconnectedness among the entire entities of
both MRI and PET image inputs individually. To identify
the target, SA adapts three matrices that are key (K), query
(Q) and value (V). These three matrices are defined as input
linear transforms X . Besides, we introduce LGSA, as shown
in fig.1, for enhancing the significance of correlations. Here,
the local SA evaluates self-sympathies inside every window.
Next, the tokens inside every window are accumulated as
global tokens. For the accumulation operations, we apply
max pooling, stride convolution, and other techniques of that
Lightweight Dynamic Convolution (LDC) execute effectively.
Following the overall features of down-sampled, we execute
Global SA with minimal expense. For X∈RH×W×C , if we fix
window size to P , then the entire process is mathematically
expressed as:

loc = L S A (X ) (10)

glo = GS A (L DC ( loc)) (11)
= Concat

(
loc,U psample

(
glo

))
(12)

where indicates the output, LSA is local self-attention, and
GSA is equivalent global functions.

b) Gaussian-weighted axial attention: Contrasting Local
Self-Attention (LSA) utilizing default SA, we designed Gaus-
sian Weighted Axial Attention (GWAA) which improves
every query perception of adjacent via determinable Gaussian
matrix, and meanwhile minimal time complexity as per axial
attention. Let Q∈R

H
P ×

W
P signifies the queries acquired from

accumulation step, for query qi, j in Q, we describe Di, j as
Euclidean distance among qi, j and it is equivalent to Ki, j
and Vi, j , where Ki, j and Vi, j are represented as matrices
computed from tokens on i th row and j th column after
accumulation. Assume the similarity among q and K existence

(q,K) and then weight of Gaussian being e
−

D2
i. j

2ϕ2 , the output
of final in position (i, j) can be depicted as:

i, j = e
−

D2
i. j

2ϕ2 so f tmax
( (

qi, j ,Ki, j
))

Vi, j (13)

Meanwhile, we need the variance ϕ to be determinable and
then aforementioned equation can be also denoted as:

i, j = so f tmax
(

−
1

2ϕ2 D2
i. j +

(
qi, j ,Ki, j

))
Vi, j (14)


KHAN et al.: DUAL-3DM3
−AD: MIXED TRANSFORMER 701

Fig. 2. Representation of multi-modalities segmented image.

Here, we generally utilize ω to denote the factor of coef-
ficient before D2

i. j , ωD2
i. j further play as bias of correlative

position, which can underline the position information of MT.
It enhances the model performance for obviously affording
correlative relations, and it is the usual embedding of utter
positional. At last, the EA is introduced for solving the issues
which cannot exploit correlations among diverse images.

2) Semantic Segmentation Using Furthered U-Net: After
extracting features with the Mixed Transformer, we employ
the Furthered U-Net (FU-Net) Algorithm to segment white
matter, grey matter, and cerebrospinal fluid. This segmenta-
tion effectively breaks down the infected areas, as depicted
in Figure 2. In contrast to the traditional U-Net approach,
our work incorporates Batch Normalization (BN) to enhance
training stability and mitigate gradient vanishing issues. This
optimization enhances the segmentation performance, further
aiding model convergence. The mathematical evaluation of the
rational formula proceeds as follows:

3 =
ψ

√
V ar [s] + ε

·x +

(
ξ −

ψ.ζ [x]
√

V ar [s] + ε

)
(15)

In the equation above, ‘x’ represents the input features,
‘3’ denotes the standardized feature with values close to
zero. The parameters ‘ψ’ and ‘ ’ are training parameters
that are updated during the process. Subsequently, the loss
function (cross-entropy) is used in the training phase. The
Adam optimizer is utilized for the optimization tasks, The
updating of parameters within the algorithm can be expressed
as follows:

s = s0 −
yw
√

v
(16)

m = b1 × m0 + (1 − b1) f ′ (q0) (17)

v = b2 × v0 + (1 − b2)
[

f ′ (q0)
]2 (18)

where b1, b2 denoted as loss rate, y is the learning rate, the
parameters v and v0 are the old and new parameters. m
represents the morphology differs. Moreover, the algorithm
can compute the learning rates range in repetition to assure
the parameter stability and efficiency of high computational.

D. Multi-Modality-Based Alzheimer’s Diagnosis
Once the segmentation is completed, the segmented image is

fed into proposed Dual-3DM3
−AD model. In that, the appro-

priate features are extracted in multi-scale, and dimensionality
is minimized by using the multi-head attention mechanism,
which is elaborated as follows:

1) Multi-Scale Feature Extraction: We utilize two parallel
ResNet-51 blocks as encoders for extracting the feature maps
from both MRI and PET segmented 3D images separately.
For the utilization of encoder input, we direct the MRI and
PET images in three channels by repeating their information
in single-channel. The encoder is convolution integration,
Rectified Linear Unit (ReLU), batch normalization and max
pooling (CRBM) followed through an alternate integration
of ResNet block (RB) and Evolution Down sampling Block
(EDB). We extract the feature FM RI such as textural, statistical,
structural, edge, blobs, color and contour are extracted using
the multi-scale feature extraction model. Additionally, the
PET images are extracted FP ET after every ResNet block.
From encoders, we extract FM RI and FP ET features at 1/4,
1/8, 1/16 and 1/32 scales in size of original image. After
that, the multi-scale features are acquired in elementwise
addition.

2) Densely Allied Feature Accumulation: In order to aggre-
gate the features from MRI and PET, we adapted DAFA
module for feature representation. Specifically, we introduce
Collective Spatial Attention (CSA) and Collective Channel
Attention (C2A) for improving the spatial-wise and channel-
wise representation of semantic features. Here, the main
intention of utilizing CSA and C2A is to perform multi-
scale features in diverse scales. To be more specific, both
CSA and C2A comprise of convolutional filters, query, value
and key functions which provide appropriate weights for
individual features to accumulate precisely. Additionally, the
features from multi- modalities are combined by utilizing
downsample association and upsample association of large-
filed for enhancing the multi-scale illustration. The DAFA
accumulates features of MRI and PET as FM RI and FP ET .

a) Upsample connections: The upsampling connections
∪

j
i ( ) aim to pass information from one layer to another, while

maintaining or even enhancing spatial resolution. In which,
both MRI and PET pass features information for enhancing
the spatial resolution by integrating upsampling operations.

b) Downsample connection: The downsample connection
tends to interlink with both MRI and PET features for fusion,
and it can be expressed as:

D
j
i ( ) = f

(
fµ ( )+ fτ (fθ ( ))

)
(19)

where denotes the input vector, f is the ReLU activation
function. The parameter fµ and fτ are 3 × 3 convolution layer
along with 2 stride and fθ is a 3 × 3 convolution layer along
with 1 stride. Here, every convolution layer includes batch
normalization technique. i and j are represented as channels
of input and output, respectively.

c) Collective spatial attention: As per the mechanism of
linear attention, we used the CSA to design the long-range
addictions of spatial dimension, and it can mathematically be


702 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 32, 2024

defined as:

C S A ( ) =

∑
n V ( )c,n +

(
Q( )

∥Q( )∥2

) (
K( )

∥Q( )∥2

)T
V ( )

N+

(
Q( )

∥Q( )∥2

) ∑
n

(
Q( )

∥Q( )∥2

)T

c,n
(20)

where, Q ( ) , K ( ) and V ( ) indicate the convolutional
functions to compute the query matrix Q∈RN×DY , key matrix
K∈RN×DY and value matrix V ∈RN×DY , N denotes the
number of pixels of input feature maps. n and c are the
dimension of flattened spatial and channel dimension.

d) Collective channel attention: Likewise, CCA is modelled
for extracting the long- range addictions between channel
dimension that can defined as:

CC A ( ) =

∑
c R( )c,n +

(
R( )c,n

(
K( )

∥Q( )∥2

)T
)

Q( )
∥Q( )∥2

N+

(
R( )

∥R( )∥2

)T ∑
c

(
R( )

∥R( )∥2

)T

c,n
(21)

where R ( ) denotes the reshape function for flattening the
spatial dimension. In summary, the primary difference lies in
what actually these attention mechanisms focus on: spatial
attention deals with the spatial positions within the data,
while channel attention deals with the feature channels or
dimensions. They can be used in combination to enhance
the representation and performance of the proposed model,
depending on the nature of the classification task.

e) Feature accumulation: At last, the features obtained
from both MRI and PET features AF1 and AF2 are fused,
which can be generated by the following mathematical
equations:

= FM RI + FP ET + U (22)

Here, is the feature accumulation factor, F is the feature
obtained from both MRI and PET indicated as FM RI and
FP ET . U is denoted as upsample function of bilinear inter-
polation and spatial enhancement along with 2 scale factors.

3) Multi-Head Attention Mechanism: Multi-head attention
mechanism executes several linear transformations at feature
matrix of input and determines the attention illustrations
of image across diverse linear transformation; therefore,
we acquire huge inclusive Alzheimer’s information. This
mechanism is fundamentally integration of several self-
attention scheme, key (K), query (Q) and value (V). The
primary intention of the scheme is a Scaled Dot product
Attention (SDA). The function of SDA is expressed as:

SD A (Q,K,V) = so f tmax
(
QK
√

)T

V (23)

The concept of multi-head attention is to utilize diverse
parameters WQ

i ,W
K
i ,W

V
i to execute linear transformations

on Q,K,V matrices, and the result of input linear transforma-
tions as SDA. The estimation result is evaluated via headi ,

which can be formulated as:

headi = SD A
(
QWQ

i ,KWK
i ,VWV

i

)
(24)

TABLE I
HARDWARE PARAMETERS

Next, we concatenate the evaluated results head1 to head
to create a matrix, and multiply it via parameter W to conclude
the final linear transformation:

Head = Multihead (Q,K,V) (25)
= Concat (head1, . . . , head )W (26)

4) Output Layer of Alzheimer’s Diagnosis: The average pool-
ing is executed on Head output matrix in multi-head attention
layer to acquire the features vectors Favg

M P . We pass the input
Favg

M P via fully connected layer to final softmax classifier to
obtain final Alzheimer diagnosis as:

= so f tmax
(
wm Favg

M P + bm
)

(27)

Here, wm is depicted as weight matrix and bm is bias.
We utilize back propagation technique to optimize our pro-
posed model, and the cross entropy is expressed as:

loss =

∑D

i=1

∑C

j=1

ˆ j
i I n j

i + λ∥θ∥2 (28)

where, D is denoted as training data size, C is the number
of data classes, is represented as predicted class, ˆ is the
actual class and λ∥θ∥2 is the default term cross-entropy.

IV. EXPERIMENTAL RESULTS

In this section, we demonstrate the effectiveness of the
proposed Dual-3DM3

−AD model in terms of Alzheimer
detection. This section is divided into three sub-sections
including simulation setup, comparison analysis and research
summary:

A. Simulation Setup
The entire model execution and evaluation are implemented

by utilizing MATLAB 2020A. Moreover, we distributed the
dataset as 90:10 ratio, and the 10-fold-cross validation is
adopted. To diagnosis Alzheimer’s using MRI and PET
scans, the Dual-3DM3

−AD model is utilized as a classifier.
We set 32 mini-batch size, 100 epochs to fair analysis in
0.00008 learning rate. Tab. I shows the hardware parameters.

B. Experiments
The proposed Dual-3DM3

−AD model performance is com-
pared with the existing approaches with respect to sensitivity,
accuracy, confusion matrix, specificity, and ROC curve.
We performed the classification by Cognitive Normal (CN)
vs AD, AD vs Mild Cognitive Impairment (MCI) and CN vs
MCI. Accuracy affords us the true resultants proportion, which
can be true negative or true positive. Sensitivity appearances
the entire performance of proposed model. Specificity shows
how effectively the model is recognizing CN condition. ROC
curves and confusion matrices are visually characteristics
perceptions regarding predictive analysis.


KHAN et al.: DUAL-3DM3
−AD: MIXED TRANSFORMER 703

Fig. 3. Overall architecture of proposed dual-3DM3
−AD model.

C. Comparative Analysis
We elucidated the comparison between the proposed model

and existing works, where we have contemplated with two
existing works such as - The primary intention of this paper is
to perform segmentation and Alzheimer diagnosis effectively.

1) Comparison With Diverse Modalities: For the comparative
analysis between MRI, PET fused information, the Dual-
3DM3

−AD model is utilized for each of those modalities.
Fig 4(a)-(c) represents the confusion matrices and ROC curves
of CN vs AD classification acquired from diverse modalities.
In fig 5, class-1 illustrates CN, and class-2 illustrates AD.
As defined, classification by the consideration of fused data
provides ROC curve about to top-left recommending the fused
data usefulness.

Table II shows the comparative analysis in terms of per-
formance metrics, and it outlines that the fusion-based
classification is more accurate than PET and MRI. Both
MRI and PET data separately obtain minimal performance,
which is justified through inefficiency of single modality to
meet metabolic and structural modifications instantaneously.
Whereas, the multi-modality fused data concentrates on these
brain information. In pre-processing, the noise removal and
skull stripping are performed, which removes the noise and
unwanted tissues; therefore, contemplating the amount of
computation cost. Moreover, the multi-head-based attention
mechanism minimizes the complexities. Henceforth, the Dual-
3DM3

−AD model testing utilizes 2 minutes on machine with
one GPU, which articulating the algorithm’s space complexity
and optimum time.

2) Comparison With Diverse State-of-Art Approaches: The
proposed Dual-3DM3

−AD model is compared with several
state-of-the-art approaches to demonstrate the proposed model
efficacy for AD classification. EPEE [22], Novel-CNN [23],
DEMNET [24], EMLM [25], RELS-TSVM [26] and THS-
GAN are the approaches utilized for the comparison purpose.

Fig. 4. Confusion matrix for proposed model (a) MRI, (b) PET and
(c) Fused Data.

The comparison of Dual-3DM3
−AD model performance met-

rics with state-of-the-art approaches is unveiled in Table III.
The baseline approaches are defined as follows:

[i] EPEE: A deep learning based approach using EPEE is
proposed for Alzheimer diagnosis using MRI images, which
performs better.

[ii] Novel-CNN: Early diagnosis of Alzheimer’s clas-
sification is proposed by designing neural network-based
novel-CNN using T2 weighted MRI scans.

[iii] DEMNET: DL model is proposed for diagnosing
Dementia and Alzheimer’s classification for handling unbal-
ancing dataset.


704 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 32, 2024

TABLE II
PERFORMANCE ANALYSIS OF PROPOSED MODEL FOR ALZHEIMER DIAGNOSIS WITH DIVERSE MODALITIES

Fig. 5. ROC curve for proposed model (a) MRI, (b) PET and (c) Fused
data.

[iv] EMLM: An early detection for Alzheimer’s based on
hippocampal and cortical local filed is proposed by adapting
EMLM model.

[v] RELS-TSVM: A DK based Alzheimer’s detection is
implemented by utilizing multi-modality data for obtaining
accurate result.

[vi] THS-GAN: An MRI based classification model
THS-GAN is proposed for the identification of multi-class
Alzheimer’s disease.

The proposed Dual-3DM3
−AD model exhibits superior per-

formance with 98% of accuracy, 97.8% of sensitivity, 97.5%
of specificity and 98.2% of f-measure for CN vs AD diagnosis.
Figs 6-9 represent the performance metrics analysis of the pro-
posed vs existing works (accuracy, sensitivity, specificity, and
F-measure). The proposed Dual-3DM3

−AD model displays
better convergence characteristics and persuasive accuracy.
It is apparent that the Dual-3DM3

−AD’s ROC curve is nearer
to top-left corner, depicting best performance than any other
existing approaches. Hence, the multi-modal fusion based
Dual-3DM3

−AD model proves to be a betterment automatic
classification method.

3) Comparison With Diverse Machine Learning Approaches:
We compare the proposed Dual-3DM3

−AD model with vari-
ous machine learning approaches. BEMD [21], RF [27] and
SVM [28] are utilized as classifiers for Alzheimer’s diagnosis.
The comparison of Dual-3DM3

−AD performance metrics
with the existing classifiers in terms of accuracy, sensitivity,
specificity and f-measure is illustrated in Table IV. The RF

Fig. 6. Analysis of accuracy.

Fig. 7. Analysis of sensitivity.

Fig. 8. Analysis of specificity.

model performed better than SVM and NB as an Alzheimer’s
classification model on entire performance metrics. Also, the
proposed work achieves maximum accuracy than other mod-
els. The reason for attaining lower accuracy by the machine
learning approaches because they suffer from handling large
dataset and being insufficient in terms of extracting appropriate
features.


KHAN et al.: DUAL-3DM3
−AD: MIXED TRANSFORMER 705

TABLE III
COMPARISON ANALYSIS OF PROPOSED MODEL FOR ALZHEIMER DIAGNOSIS WITH BASELINE APPROACHES

TABLE IV
COMPARISON ANALYSIS OF PROPOSED MODEL FOR ALZHEIMER DIAGNOSIS WITH ML APPROACHES

Fig. 9. Analysis of F-measure.

D. Evaluation of Proposed Dual-3DM3
−AD Model

To validate the Multi-level Capsule Network and Dual
Vision Transformer based Attention Mechanism-Dual-Atten
proposed framework, we accomplish ablation tests. For that,
we have utilized SWLD-20K, Cresci-2017 and Cresci-2015
datasets to accord and understand the influence of every
layer and component of our proposed Dual-3DM3

−AD model.
The introduction of a multi-modal fusion-based approach is
promising and indicates an effort to address the complex
nature of AD diagnosis. Combining MRI and PET scans
is a sound approach. The use of sophisticated techniques,
such as QNLM, Morphology function, and BDM for image
preprocessing is a positive aspect. These techniques can sig-
nificantly enhance image quality, which is crucial for accurate
diagnosis. The adoption of the Mixed-transformer with Fur-
thered U-Net for semantic segmentation is a good choice, as it
helps in identifying and isolating relevant regions within the
images, which is critical for extracting meaningful features.
The incorporation of a multi-scale feature extraction module
DCFAM demonstrates a commitment to leveraging insights
from both scans effectively. The use of a multi-head attention
mechanism for feature dimensionality reduction is a suitable
choice, as it can help managing the complexity of the data
and concentrates upon the desired features. The application

of a softmax layer for multi-class Alzheimer’s diagnosis is
important for classifying the disease into different stages. This
is a valuable contribution, as it provides clinicians with more
detailed information.

To demonstrate its effectiveness, the proposed model has
been compared to existing methods and benchmarked against
them to establish its superiority. In conclusion, while the
proposed work appears promising and comprehensive, its true
effectiveness can only be determined through rigorous testing
and validation on real-world data, and consideration of its
practicality and ethical implications. In this experiment, the
ADNI and radiopharmaceutical 18F-FDG dataset is distributed
into training, validation and testing as 90%, 10%, and 15%,
respectively. This is because we adapted large scale of dataset,
where 10 % of data is adequate for estimation of test set or
validation set. Besides, the utilization of large data in training
can enhance the performance of deep neural network to train
sufficiently.

We also tend to compare the evaluation of the proposed
multi modal approach with the single modal approach in terms
of accuracy, specificity, sensitivity, and F-measure. For a multi
modal approach, the results we achieved are clearly depicted
in fig (6)-(9). Whereas for the single modal scenario MRI and
PET, the results acquired by the MRI is higher than the PET.
Also, Tab. V unveils the utilized symbols.

V. DISCUSSION

The effectiveness of the proposed Dual-3DM3-AD model
for Alzheimer’s diagnosis was rigorously evaluated, and the
results demonstrated its potential for accurate and early detec-
tion of the disease using both MRI and PET image scans.
In the initial stages of the study, the extensive preprocess-
ing techniques, including noise reduction, skull stripping,
and 3D image conversion, were applied using state-of-the-
art algorithms such as the QNLM, Morphology function,
and BDM. These steps significantly enhanced the quality of
the input images, ensuring that the subsequent analysis was
performed on clean and accurate data. The model architecture


706 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 32, 2024

TABLE V
SYMBOL DEFINITION

itself was designed for optimal performance. The integration
of a Mixed-transformer with Furthered U-Net for semantic
segmentation effectively minimized complexity, allowing for
the extraction of meaningful features from both MRI and
PET scans. The multi-scale feature extraction module played
a crucial role in capturing relevant information from the
segmented images. The model further benefited from the
DCFAM, which efficiently aggregated the extracted features,
enabling the utilization of both modalities. The multi-head
attention mechanism was employed for feature dimension-
ality reduction, enhancing the model’s ability to distinguish
key patterns associated with Alzheimer’s disease. Our model
overcome both underfitting and overfitting issues as:

Complexity Reduction With Mixed-Transformer and Fur-
thered U-Net: The use of a Mixed-transformer and Furthered
U-Net suggests an effort to create a model with increased rep-
resentational capacity. This can help capture complex patterns
in the data. By combining different transformer architectures
and enhancing the U-Net, the model may be better equipped
to handle intricate relationships within the images.

Dual-3DM3-AD Model: The Dual-3DM3-AD model is
described as having a multi-scale feature extraction module.
Multi-scale features can capture information at different levels
of granularity, which may assist in handling both finer details
and more global context in the images.

Feature Aggregation With Densely Connected Feature
Aggregator Module (DCFAM): The DCFAM module is men-
tioned as a feature aggregator. Aggregating features from
different scales or sources can help in capturing a compre-
hensive representation of the input data. Densely connected
architectures often encourage feature reuse, which can be
beneficial for learning informative representations.

Multi-Head Attention Mechanism for Dimensionality Reduc-
tion: The use of a multi-head attention mechanism is stated
for feature dimensionality reduction. Attention mechanisms
allow the model to focus on relevant parts of the input.
In this context, reducing dimensionality may aid in preventing
overfitting by promoting more efficient use of information.

Softmax Layer for Multi-Class Alzheimer’s Diagnosis:The
application of a softmax layer for multi-class Alzheimer’s
diagnosis indicates the usage of a common activation function

for classification tasks. This is crucial for preventing underfit-
ting or overfitting in the final classification layer.

VI. CHALLENGES AND LIMITATIONS OF PROPOSED
WORK

The proposed Dual-3DM3
−AD model for Alzheimer’s

diagnosis presents several limitations for its practical imple-
mentation in real clinical environments. Firstly, the model’s
reliance on high-quality and diverse MRI and PET datasets
may pose challenges in real-world settings, where data
availability can be limited. Additionally, the computational
demands of the model, including preprocessing and complex
neural network architectures, may strain the resources of
healthcare facilities. The lack of model interpretability hinders
the understanding of how diagnoses are arrived at, potentially
impacting trust among healthcare professionals. Variations in
imaging standards and equipment in clinical settings must be
addressed for the model to perform consistently.

VII. CONCLUSION AND FUTURE WORK

Lack of training/testing data consideration and ineffective
segmentation are one of the major reasons for low Alzheimer
diagnosis accuracy, which is still a crucial concern. To alle-
viate these issues, we presented a promising avenue for a
more comprehensive understanding of AD staging. This paper
introduced an innovative approach to address this challenge.
We proposed the Dual-3DM3

−AD model, designed for accu-
rate and early Alzheimer’s diagnosis, by leveraging both MRI
and PET image scans. Our methodology involved a series
of preprocessing steps, including noise reduction, skull strip-
ping, and 3D image conversion, performed using the QNLM,
Morphology function, and BDM, respectively, to enhance the
image quality.

Subsequently, we employed a Mixed-transformer with
Furthered U-Net architecture for semantic segmentation, effec-
tively reducing complexity. The Dual-3DM3

−AD model
incorporated a multi-scale feature extraction module to extract
pertinent features from the segmented images. These extracted
features were then aggregated using the densely connected
feature aggregator module to make the most of both informa-
tion sources. Furthermore, we employ a multi-head attention
mechanism to reduce feature dimensionality, followed by the
application of a softmax layer for multi-class Alzheimer’s
diagnosis. Our proposed Dual-3DM3

−AD model was imple-
mented in MATLAB 2020A and rigorously compared with
several baseline approaches by using a range of performance
metrics, including accuracy, sensitivity, specificity, f-measure,
and ROC curve analysis. Remarkably, our work surpassed
existing models in multi-class Alzheimer’s diagnosis, under-
scoring its potential as a valuable tool in the early detection
of this debilitating disease. In terms of future work, we have
planned to propose an Explainable Artificial Intelligence (EAI)
with computation reduction technique for better understanding
of classification result with the aim of further reducing com-
putational complexity and including feedback system.

Funding Statement: This research is supported by
the Academy of Finland under project no. WP3-Profi6
(2708102611).


KHAN et al.: DUAL-3DM3
−AD: MIXED TRANSFORMER 707

ACKNOWLEDGMENT

The authors would like to thank their affiliated universities
for supporting this research.

REFERENCES

[1] Z. Wang, J. Song, Y. Wang, and W. Liu, “Alzheimer’s disease classifica-
tion detection based on brain electrical signal graph structure,” in Proc.
3rd Int. Conf. Frontiers Electron., Inf. Comput. Technol. (ICFEICT),
May 2023, pp. 294–300.

[2] K. N. McFarland and P. Chakrabarty, “Microglia in Alzheimer’s disease:
A key player in the transition between homeostasis and pathogenesis,”
Neurotherapeutics, vol. 19, no. 1, pp. 186–208, Jan. 2022.

[3] R. Lathe and D. S. Clair, “Programmed ageing: Decline of stem
cell renewal, immunosenescence, and Alzheimer’s disease,” Biol. Rev.,
vol. 98, no. 4, pp. 1424–1458, Aug. 2023.

[4] P. Gruener, “Alzheimer’s disease in American fiction,” in Beyond the
Great Forgetting, J. B. Metzler, Ed. Berlin, Germany: Springer, 2022,
doi: 10.1007/978-3-662-66029-4_5.

[5] G. Plascencia-Villa and G. Perry, “Status and future directions of clinical
trials in Alzheimer’s disease,” Int. Rev. Neurobiol., vol. 154, pp. 3–50,
Jul. 2020.

[6] Y. Zhang, H. Chen, R. Li, K. Sterling, and W. Song, “Amyloid β-based
therapy for Alzheimer’s disease: Challenges, successes and future,”
Signal Transduction Targeted Therapy, vol. 8, no. 1, p. 248, Jun. 2023.

[7] M. Mather, “Noradrenaline in the aging brain: Promoting cognitive
reserve or accelerating Alzheimer’s disease?” Seminars Cell Develop.
Biol., vol. 116, pp. 108–124, Aug. 2021.

[8] M. F. Ahmad, S. Akbar, S. A. E. Hassan, A. Rehman, and N. Ayesha,
“Deep learning approach to diagnose Alzheimer’s disease through
magnetic resonance images,” in Proc. Int. Conf. Innov. Comput. (ICIC),
Nov. 2021, pp. 1–6.

[9] M. B. T. Noor, N. Z. Zenia, M. S. Kaiser, S. A. Mamun, and
M. Mahmud, “Application of deep learning in detecting neurological
disorders from magnetic resonance images: A survey on the detection
of Alzheimer’s disease, Parkinson’s disease and schizophrenia,” Brain
Informat., vol. 7, no. 1, pp. 1–21, Dec. 2020.

[10] S. Iqbal, A. N. Qureshi, J. Li, and T. Mahmood, “On the analyses
of medical images using traditional machine learning techniques and
convolutional neural networks,” Arch. Comput. Methods Eng., vol. 30,
no. 5, pp. 3173–3233, Jun. 2023.

[11] E. Guedj et al., “EANM procedure guidelines for brain PET imaging
using [18F]FDG, version 3,” Eur. J. Nucl. Med. Mol. Imag., vol. 49,
no. 2, pp. 632–651, Jan. 2022.

[12] B. R. Price, L. A. Johnson, and C. M. Norris, “Reactive astrocytes: The
Nexus of pathological and clinical hallmarks of Alzheimer’s disease,”
Ageing Res. Rev., vol. 68, Jul. 2021, Art. no. 101335.

[13] J. Hong et al., “Image-level trajectory inference of tau pathology using
variational autoencoder for flortaucipir PET,” Eur. J. Nucl. Med. Mol.
Imag., vol. 49, no. 9, pp. 3061–3072, Jul. 2022.

[14] M. Solnik et al., “Imaging of uveal melanoma—Current standard and
methods in development,” Cancers, vol. 14, no. 13, p. 3147, Jun. 2022.

[15] H. Pleş et al., “Migraine: Advances in the pathogenesis and treatment,”
Neurol. Int., vol. 15, no. 3, pp. 1052–1105, Aug. 2023.

[16] V. B. Gupta et al., “Retinal changes in Alzheimer’s disease—Integrated
prospects of imaging, functional and molecular advances,” Prog. Retinal
Eye Res., vol. 82, May 2021, Art. no. 100899.

[17] S. Hashimoto et al., “Neuronal glutathione loss leads to neurodegener-
ation involving gasdermin activation,” Sci. Rep., vol. 13, no. 1, pp. 1–9,
Jan. 2023.

[18] B. J. Matchett, L. T. Grinberg, P. Theofilas, and M. E. Murray, “The
mechanistic link between selective vulnerability of the locus coeruleus
and neurodegeneration in Alzheimer’s disease,” Acta Neuropathologica,
vol. 141, no. 5, pp. 631–650, May 2021.

[19] Y. Blinkouskaya and J. Weickenmeier, “Brain shape changes associated
with cerebral atrophy in healthy aging and Alzheimer’s disease,” Fron-
tiers Mech. Eng., vol. 7, pp. 1–17, Jul. 2021.

[20] V. Sathiyamoorthi, A. K. Ilavarasi, K. Murugeswari, S. T. Ahmed,
B. A. Devi, and M. Kalipindi, “A deep convolutional neural net-
work based computer aided diagnosis system for the prediction of
Alzheimer’s disease in MRI images,” Measurement, vol. 171, Feb. 2021,
Art. no. 108838.

[21] J. E. W. Koh et al., “Automated detection of Alzheimer’s disease using
bi-directional empirical model decomposition,” Pattern Recognit. Lett.,
vol. 135, pp. 106–113, Jul. 2020.

[22] H. S. Zaina, S. B. Belhaouari, T. Stanko, and V. Gorovoy, “An exem-
plar pyramid feature extraction based Alzheimer disease classification
method,” IEEE Access, vol. 10, pp. 66511–66521, 2022.

[23] S. Basheera and M. S. S. Ram, “A novel CNN based Alzheimer’s
disease classification using hybrid enhanced ICA segmented gray mat-
ter of MRI,” Computerized Med. Imag. Graph., vol. 81, Apr. 2020,
Art. no. 101713.

[24] S. Murugan et al., “DEMNET: A deep learning model for early diagnosis
of Alzheimer diseases and dementia from MR images,” IEEE Access,
vol. 9, pp. 90319–90329, 2021.

[25] M. Fabietti et al., “Early detection of Alzheimer’s disease from cortical
and hippocampal local field potentials using an ensembled machine
learning model,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 31,
pp. 2839–2848, 2023.

[26] S. Dwivedi, T. Goel, M. Tanveer, R. Murugan, and R. Sharma, “Mul-
timodal fusion-based deep learning network for effective diagnosis of
Alzheimer’s disease,” IEEE MultimediaMag., vol. 29, no. 2, pp. 45–55,
Apr. 2022.

[27] W. Yu, B. Lei, M. K. Ng, A. C. Cheung, Y. Shen, and S. Wang,
“Tensorizing GAN with high-order pooling for Alzheimer’s disease
assessment,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 9,
pp. 4945–4959, Sep. 2022.

[28] M. Song, H. Jung, S. Lee, D. Kim, and M. Ahn, “Diagnos-
tic classification and biomarker identification of Alzheimer’s disease
with random forest algorithm,” Brain Sci., vol. 11, no. 4, p. 453,
Apr. 2021.

[29] E. E. Bron et al., “Cross-cohort generalizability of deep and conventional
machine learning for MRI-based diagnosis and prediction of Alzheimer’s
disease,” NeuroImage: Clin., vol. 31, 2021, Art. no. 102712.

[30] K. Etminani et al., “A 3D deep learning model to predict the diagnosis
of dementia with lewy bodies, Alzheimer’s disease, and mild cognitive
impairment using brain 18F-FDG PET,” Eur. J. Nucl. Med. Mol. Imag.,
vol. 49, no. 2, pp. 563–584, Jan. 2022.

[31] C. S. Martinez, M. B. Cuadra, and J. Jorge, “BigBrain-MR: A new dig-
ital phantom with anatomically-realistic magnetic resonance properties
at 100-µm resolution for magnetic resonance methods development,”
NeuroImage, vol. 273, Jun. 2023, Art. no. 120074.

[32] H. Kalantar-Hormozi et al., “A cross-sectional and longitudinal study of
human brain development: The integration of cortical thickness, surface
area, gyrification index, and cortical curvature into a unified analytical
framework,” NeuroImage, vol. 268, Mar. 2023, Art. no. 119885.

[33] A. Irimia, “Cross-sectional volumes and trajectories of the human
brain, gray matter, white matter and cerebrospinal fluid in 9473 typ-
ically aging adults,” Neuroinformatics, vol. 19, no. 2, pp. 347–366,
Apr. 2021.

[34] N. Gharaibeh, A. A. Abu-Ein, O. M. Al-hazaimeh, K. M. O. Nahar,
W. A. Abu-Ain, and M. M. Al-Nawashi, “Swin transformer-based
segmentation and multi-scale feature pyramid fusion module for
Alzheimer’s disease with machine learning,” Int. J. Online Biomed. Eng.
(iJOE), vol. 19, no. 4, pp. 22–50, Apr. 2023.

[35] M. Liu et al., “A multi-model deep convolutional neural network for
automatic hippocampus segmentation and classification in Alzheimer’s
disease,” NeuroImage, vol. 208, Mar. 2020, Art. no. 116459.

[36] C. L. Saratxaga et al., “MRI deep learning-based solution for
Alzheimer’s disease prediction,” J. Personalized Med., vol. 11, no. 9,
p. 902, 2021.

[37] R. A. Hazarika, A. K. Maji, S. N. Sur, B. S. Paul, and D. Kandar,
“A survey on classification algorithms of brain images in Alzheimer’s
disease based on feature extraction techniques,” IEEE Access, vol. 9,
pp. 58503–58536, 2021.

[38] T. Wang and L. Cao, “Deep learning based diagnosis of Alzheimer’s
disease using structural magnetic resonance imaging: A survey,” in Proc.
3rd Int. Conf. Appl. Mach. Learn. (ICAML), Jul. 2021, pp. 408–412.

[39] J. Neelaveni and M. S. G. Devasana, “Alzheimer disease prediction
using machine learning algorithms,” in Proc. 6th Int. Conf. Adv. Comput.
Commun. Syst. (ICACCS), Mar. 2020, pp. 101–104.

[40] A. Puente-Castro, E. Fernandez-Blanco, A. Pazos, and C. R. Munteanu,
“Automatic assessment of Alzheimer’s disease diagnosis based on
deep learning techniques,” Comput. Biol. Med., vol. 120, May 2020,
Art. no. 103764.

http://dx.doi.org/10.1007/978-3-662-66029-4_5