Identification of muscle-invasion status in bladder cancer patients using natural language processing and machine learning.

Authors

null

Ruixin Yang

Durham Veterans Affairs Health Care System, Durham, NC

Ruixin Yang , Di Zhu , Lauren Howard , Amanda M. De Hoedt , Zachary William Abraham Klaassen , Stephen J. Freedland , Stephen B. Williams

Organizations

Durham Veterans Affairs Health Care System, Durham, NC, Veterans Affairs Health Care System, Durham, NC, Durham VA Medical Center/ Duke University, Durham, NC, Section of Urology, Durham Veterans Affairs Health Care System, Durham, NC, Division of Urology, Medical College of Georgia at Augusta University, Georgia Cancer Center, Augusta, GA, Cedars-Sinai Medical Center, Los Angeles, CA and Durham VA Medical Center, Durham, NC, Department of Surgery, Division of Urology, The University of Texas Medical Branch, Galveston, TX

Research Funding

Other Government Agency

Background: Mortality from bladder cancer (BC) increases exponentially once it invades the muscle. At the population level, accurate delineation of these patients is challenging. Methods: To develop and validate a natural language processing (NLP) model for automatically identifying muscle-invasive BC (MIBC) patients, aiding in population-based BC research. All patients with a CPT code for transurethral resection of bladder tumor (TURBT) (N = 76,060) were selected from the Department of Veterans Affairs (VA) Corporate Data Warehouse database. A sample of 600 patients (with 2,337 full-text notes) who had TURBT and confirmed pathology results were selected for NLP model development (500 patients) and validation (100 patients). Muscle-invasion (yes/no), unknown, or no cancer, were confirmed by detailed chart review of pathology notes. The NLP performance was assessed by calculating the sensitivity, positive predictive value (PPV), and overall accuracy at the individual note and patient levels. Results: In the validation cohort, the NLP model had overall accuracy of 88% and 92% at the note and patient levels. Specifically, PPV and specificity for predicting muscle-invasion on note level were 83% and 70%, respectively. The model classified non-muscle invasive BC (NMIBC) with 98% sensitivity at both the note and patient levels. Although the sensitivity for MIBC was 70% for note-level determination, the sensitivity was 86% when evaluated at the patient level. When applying the model to 71,200 patients VA-wide, the model classified 13,642 (19%) as having MIBC and 47,595 (66%) as NMIBC. The NLP model was able to identify invasion status for 96% TURBT patients at the population level. Inherent limitations include relatively small training set given the size of the VA population. Conclusions: This NLP model for identifying muscle-invasion at the population level had high accuracy. The NLP model may be a practical and accurate tool for efficiently identifying BC invasion status and may potentially aid in population-based BC research in the VA.

Disclaimer

This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org

Abstract Details

Meeting

2022 ASCO Genitourinary Cancers Symposium

Session Type

Poster Session

Session Title

Poster Session B: Urothelial Carcinoma

Track

Urothelial Carcinoma

Sub Track

Diagnostics and Imaging

Citation

J Clin Oncol 40, 2022 (suppl 6; abstr 447)

DOI

10.1200/JCO.2022.40.6_suppl.447

Abstract #

447

Poster Bd #

Online Only

Abstract Disclosures