Application of artificial intelligence for overall survival risk stratification in oropharyngeal carcinoma : A validation of ProgTOOL
Alabi, Rasheed Omobolaji; Sjöblom, Anni; Carpén, Timo; Elmusrati, Mohammed; Leivo, Ilmo; Almangush, Alhadi; Mäkitie, Antti A. (2023-04-23)
Alabi, Rasheed Omobolaji
Sjöblom, Anni
Carpén, Timo
Elmusrati, Mohammed
Leivo, Ilmo
Almangush, Alhadi
Mäkitie, Antti A.
Elsevier
23.04.2023
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2023042438325
https://urn.fi/URN:NBN:fi-fe2023042438325
Kuvaus
vertaisarvioitu
© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Tiivistelmä
Background
In recent years, there has been a surge in machine learning-based models for diagnosis and prognostication of outcomes in oncology. However, there are concerns relating to the model’s reproducibility and generalizability to a separate patient cohort (i.e., external validation).
Objectives
This study primarily provides a validation study for a recently introduced and publicly available machine learning (ML) web-based prognostic tool (ProgTOOL) for overall survival risk stratification of oropharyngeal squamous cell carcinoma (OPSCC). Additionally, we reviewed the published studies that have utilized ML for outcome prognostication in OPSCC to examine how many of these models were externally validated, type of external validation, characteristics of the external dataset, and diagnostic performance characteristics on the internal validation (IV) and external validation (EV) datasets were extracted and compared. Methods: We used a total of 163 OPSCC patients obtained from the Helsinki University Hospital to externally validate the ProgTOOL for generalizability. In addition, PubMed, OvidMedline, Scopus, and Web of Science databases were systematically searched according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
Results
The ProgTOOL produced a predictive performance of 86.5% balanced accuracy, Mathew's correlation coefficient of 0.78, Net Benefit (0.7) and Brier score (0.06) for overall survival stratification of OPSCC patients as either low-chance or high-chance. In addition, out of a total of 31 studies found to have used ML for the prognostication of outcomes in OPSCC, only seven (22.6%) reported a form of EV. Three studies (42.9%) each used either temporal EV or geographical EV while only one study (14.2%) used expert as a form of EV. Most of the studies reported a reduction in performance when externally validated.
Conclusion
The performance of the model in this validation study indicates that it may be generalized, therefore, bringing recommendations of the model for clinical evaluation closer to reality. However, the number of externally validated ML-based models for OPSCC is still relatively small. This significantly limits the transfer of these models for clinical evaluation and subsequently reduces the likelihood of the use of these models in daily clinical practice. As a gold standard, we recommend the use of geographical EV and validation studies to reveal biases and overfitting of these models. These recommendations are poised to facilitate the implementation of these models in clinical practice.
In recent years, there has been a surge in machine learning-based models for diagnosis and prognostication of outcomes in oncology. However, there are concerns relating to the model’s reproducibility and generalizability to a separate patient cohort (i.e., external validation).
Objectives
This study primarily provides a validation study for a recently introduced and publicly available machine learning (ML) web-based prognostic tool (ProgTOOL) for overall survival risk stratification of oropharyngeal squamous cell carcinoma (OPSCC). Additionally, we reviewed the published studies that have utilized ML for outcome prognostication in OPSCC to examine how many of these models were externally validated, type of external validation, characteristics of the external dataset, and diagnostic performance characteristics on the internal validation (IV) and external validation (EV) datasets were extracted and compared. Methods: We used a total of 163 OPSCC patients obtained from the Helsinki University Hospital to externally validate the ProgTOOL for generalizability. In addition, PubMed, OvidMedline, Scopus, and Web of Science databases were systematically searched according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
Results
The ProgTOOL produced a predictive performance of 86.5% balanced accuracy, Mathew's correlation coefficient of 0.78, Net Benefit (0.7) and Brier score (0.06) for overall survival stratification of OPSCC patients as either low-chance or high-chance. In addition, out of a total of 31 studies found to have used ML for the prognostication of outcomes in OPSCC, only seven (22.6%) reported a form of EV. Three studies (42.9%) each used either temporal EV or geographical EV while only one study (14.2%) used expert as a form of EV. Most of the studies reported a reduction in performance when externally validated.
Conclusion
The performance of the model in this validation study indicates that it may be generalized, therefore, bringing recommendations of the model for clinical evaluation closer to reality. However, the number of externally validated ML-based models for OPSCC is still relatively small. This significantly limits the transfer of these models for clinical evaluation and subsequently reduces the likelihood of the use of these models in daily clinical practice. As a gold standard, we recommend the use of geographical EV and validation studies to reveal biases and overfitting of these models. These recommendations are poised to facilitate the implementation of these models in clinical practice.
Kokoelmat
- Artikkelit [2851]