VSMI2‐PANet : Versatile Scale-Malleable Image Integration and Patch Wise Attention Network With Transformer for Lung Tumour Segmentation Using Multi-Modal Imaging Techniques

© 2025 The Author(s). CAAI Transactions on Intelligence Technology published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Chongqing University of Technology. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Lung cancer (LC) is a major cancer which accounts for higher mortality rates worldwide. Doctors utilise many imaging modalities for identifying lung tumours and their severity in earlier stages. Nowadays, machine learning (ML) and deep learning (DL) methodologies are utilised for the robust detection and prediction of lung tumours. Recently, multi modal imaging emerged as a robust technique for lung tumour detection by combining various imaging features. To cope with that, we propose a novel multi modal imaging technique named versatile scale malleable image integration and patch wise attention network (VSMI2 - PANet) which adopts three imaging modalities named computed tomography (CT), magnetic resonance imaging (MRI) and single photon emission computed tomography (SPECT). The designed model accepts input from CT and MRI images and passes it to the VSMI2 module that is composed of three sub-modules named image cropping module, scale malleable convolution layer (SMCL) and PANet module. CT and MRI images are subjected to image cropping module in a parallel manner to crop the meaningful image patches and provide them to the SMCL module. The SMCL module is composed of adaptive convolutional layers that investigate those patches in a parallel manner by preserving the spatial information. The output from the SMCL is then fused and provided to the PANet module. The PANet module examines the fused patches by analysing its height, width and channels of the image patch. As a result, it provides an output as high-resolution spatial attention maps indicating the location of suspicious tumours. The high-resolution spatial attention maps are then provided as an input to the backbone module which uses light wave transformer (LWT) for segmenting the lung tumours into three classes, such as normal, benign and malignant. In addition, the LWT also accepts SPECT image as input for capturing the variations precisely to segment the lung tumours. The performance of the proposed model is validated using several performance metrics, such as accuracy, precision, recall, F1-score and AUC curve, and the results show that the proposed work outperforms the existing approaches.

URI

https://doi.org/10.1049/cit2.70039

ISSN

2468-2322
2468-6557

Kausijulkaisu

CAAI Transactions on Intelligence Technology

OKM-julkaisutyyppi

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

deep learning computational intelligence computer vision data fusion feature extraction image segmentation

Tietueen kaikki tiedot

VSMI2‐PANet : Versatile Scale-Malleable Image Integration and Patch Wise Attention Network With Transformer for Lung Tumour Segmentation Using Multi-Modal Imaging Techniques

Toimittaja(t)

Pysyvä osoite

Kuvaus

URI

Emojulkaisu

ISBN

ISSN

Aihealue

Kausijulkaisu

OKM-julkaisutyyppi

VSMI2‐PANet : Versatile Scale-Malleable Image Integration and Patch Wise Attention Network With Transformer for Lung Tumour Segmentation Using Multi-Modal Imaging Techniques

Toimittaja(t)

Pysyvä osoite

Kuvaus

URI

Emojulkaisu

ISBN

ISSN

Aihealue

Kausijulkaisu

OKM-julkaisutyyppi

Avainsanat