Detection of immune-related adverse events among hospitalized patients using large language models.

Authors

null

Virginia H. Sun

Harvard Medical School, Boston, MA

Virginia H. Sun , Julius C. Heemelaar , Ibrahim Hadzic , Vineet K. Raghu , Chia-Yun Wu , Leyre Zubiri , Giselle Alexandra Suero-Abreu , Azin Ghamari , Jessica Wu , Alexandra-Chloé Villani , Jor Sam Ho , Megan J. Mooradian , Meghan E. Sise , Daniel A. Zlotoff , Steven Michael Blum , Michael L. Dougan , Ryan J. Sullivan , Tomas G. Neilan , Kerry Lynn Reynolds , Molly Fisher Thomas

Organizations

Harvard Medical School, Boston, MA, Massachusetts General Hospital, Boston, MA, Brigham and Women's Hospital, Boston, MA, Massachusetts General Hospital Cancer Center, Boston, MA, Oregon Health and Science University, Portland, OR

Research Funding

No funding sources reported

Background: Immune checkpoint inhibitor (ICI)-induced colitis, hepatitis, and pneumonitis are common immune-related adverse events (irAEs); however, the true incidence for these irAEs remains incompletely understood. Chart review is the gold standard for their detection but is time-consuming and cannot be implemented in large cohorts. The use of ICD codes is limited in sensitivity and specificity. Large language models (LLMs) are a scalable method of answering queries from human-generated text, though there is no data on the use of LLM for the identification of irAEs. Therefore, we investigated the application of a LLM to identify ICI-colitis, hepatitis, and pneumonitis among hospitalized patients, comparing its performance to manual chart review and ICD codes. Methods: Hospital admissions of patients on ICI therapy from February 5th, 2011, to November 3rd, 2021, were manually reviewed by a multidisciplinary immunotoxicity team using established published definitions for the presence of ICI colitis, hepatitis, and pneumonitis. Standard ICD codes and a LLM pipeline with retrieval-augmented generation (RAG) were used to detect irAEs. Performance was measured via sensitivity, specificity, and model runtime. The LLM was validated with a second dataset of inpatients with ICI colitis, hepatitis, and pneumonitis admitted from November 4th, 2021, to September 5th, 2023. Results: Among 5,677 hospitalized patients on ICI therapy in the initial cohort, there were 132 cases adjudicated with ICI colitis, 57 with ICI hepatitis, and 47 with ICI pneumonitis. The LLM was more sensitive in detecting all three irAEs compared to ICD codes (94.2% vs. 71.8%), achieving significance for ICI hepatitis (p<0.001) and pneumonitis (p=0.006), while having similar specificities (92.5% vs 91.1%, Table 1). The LLM approach was also efficient, spending an average of 9.42s per chart, compared to an estimated 15 minutes per chart for individual chart review. The mean sensitivity and specificity of the LLM on the validation dataset for adjudicated ICI colitis (n=20), hepatitis (n=24), and pneumonitis (n=6) were 96.9% and 93.2%, respectively. Conclusions: LLMs serve as a useful tool for the detection of ICI colitis, hepatitis, and pneumonitis, significantly outperforming ICD-codes in accuracy and manual chart review in efficiency.

Comparison of ICD codes and large language model (LLM) in detecting irAEs among hospitalized patients from February 5th, 2011, to November 3rd, 2021.

ICD SensitivityICD SpecificityLLM SensitivityLLM Specificity
Colitis90.289.291.790.0
Hepatitis50.995.293.093.0
Pneumonitis74.588.897.994.6
Average (SD)71.8 (19.8)91.1 (3.6)94.2 (3.3)92.5 (2.3)

Disclaimer

This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org

Abstract Details

Meeting

2024 ASCO Annual Meeting

Session Type

Poster Session

Session Title

Developmental Therapeutics—Immunotherapy

Track

Developmental Therapeutics—Immunotherapy

Sub Track

Other Checkpoint Inhibitors (Non-PD1/PDL1, Monotherapy, or Combination)

Citation

J Clin Oncol 42, 2024 (suppl 16; abstr 2638)

DOI

10.1200/JCO.2024.42.16_suppl.2638

Abstract #

2638

Poster Bd #

117

Abstract Disclosures