Detection of immune-related adverse events among hospitalized patients using large language models.

Authors

Virginia H. Sun

Harvard Medical School, Boston, MA

Virginia H. Sun , Julius C. Heemelaar , Ibrahim Hadzic , Vineet K. Raghu , Chia-Yun Wu , Leyre Zubiri , Giselle Alexandra Suero-Abreu , Azin Ghamari , Jessica Wu , Alexandra-Chloé Villani , Jor Sam Ho , Megan J. Mooradian , Meghan E. Sise , Daniel A. Zlotoff , Steven Michael Blum , Michael L. Dougan , Ryan J. Sullivan , Tomas G. Neilan , Kerry Lynn Reynolds , Molly Fisher Thomas

Organizations

Harvard Medical School, Boston, MA, Massachusetts General Hospital, Boston, MA, Brigham and Women's Hospital, Boston, MA, Massachusetts General Hospital Cancer Center, Boston, MA, Oregon Health and Science University, Portland, OR

Research Funding

No funding sources reported

Background: Immune checkpoint inhibitor (ICI)-induced colitis, hepatitis, and pneumonitis are common immune-related adverse events (irAEs); however, the true incidence for these irAEs remains incompletely understood. Chart review is the gold standard for their detection but is time-consuming and cannot be implemented in large cohorts. The use of ICD codes is limited in sensitivity and specificity. Large language models (LLMs) are a scalable method of answering queries from human-generated text, though there is no data on the use of LLM for the identification of irAEs. Therefore, we investigated the application of a LLM to identify ICI-colitis, hepatitis, and pneumonitis among hospitalized patients, comparing its performance to manual chart review and ICD codes. Methods: Hospital admissions of patients on ICI therapy from February 5^th, 2011, to November 3^rd, 2021, were manually reviewed by a multidisciplinary immunotoxicity team using established published definitions for the presence of ICI colitis, hepatitis, and pneumonitis. Standard ICD codes and a LLM pipeline with retrieval-augmented generation (RAG) were used to detect irAEs. Performance was measured via sensitivity, specificity, and model runtime. The LLM was validated with a second dataset of inpatients with ICI colitis, hepatitis, and pneumonitis admitted from November 4^th, 2021, to September 5^th, 2023. Results: Among 5,677 hospitalized patients on ICI therapy in the initial cohort, there were 132 cases adjudicated with ICI colitis, 57 with ICI hepatitis, and 47 with ICI pneumonitis. The LLM was more sensitive in detecting all three irAEs compared to ICD codes (94.2% vs. 71.8%), achieving significance for ICI hepatitis (p<0.001) and pneumonitis (p=0.006), while having similar specificities (92.5% vs 91.1%, Table 1). The LLM approach was also efficient, spending an average of 9.42s per chart, compared to an estimated 15 minutes per chart for individual chart review. The mean sensitivity and specificity of the LLM on the validation dataset for adjudicated ICI colitis (n=20), hepatitis (n=24), and pneumonitis (n=6) were 96.9% and 93.2%, respectively. Conclusions: LLMs serve as a useful tool for the detection of ICI colitis, hepatitis, and pneumonitis, significantly outperforming ICD-codes in accuracy and manual chart review in efficiency.

Comparison of ICD codes and large language model (LLM) in detecting irAEs among hospitalized patients from February 5th, 2011, to November 3rd, 2021.
	ICD Sensitivity	ICD Specificity	LLM Sensitivity	LLM Specificity
Colitis	90.2	89.2	91.7	90.0
Hepatitis	50.9	95.2	93.0	93.0
Pneumonitis	74.5	88.8	97.9	94.6
Average (SD)	71.8 (19.8)	91.1 (3.6)	94.2 (3.3)	92.5 (2.3)

Disclaimer

Abstract Details

Meeting

2024 ASCO Annual Meeting

Session Type

Poster Session

Session Title

Developmental Therapeutics—Immunotherapy

Track

Developmental Therapeutics—Immunotherapy

Sub Track

Other Checkpoint Inhibitors (Non-PD1/PDL1, Monotherapy, or Combination)

Citation

J Clin Oncol 42, 2024 (suppl 16; abstr 2638)

DOI

10.1200/JCO.2024.42.16_suppl.2638

Abstract #

2638

Poster Bd #

117

Abstract Disclosures

FEATURED

Detection of immune-related adverse events among hospitalized patients using large language models.

Authors

Virginia H. Sun

Organizations

Research Funding

Abstract Details

Meeting

Session Type

Session Title

Track

Sub Track

Citation

DOI

Abstract #

Poster Bd #

Similar Abstracts

Abstract

The inflamed immune phenotype (IIP): A clinically actionable artificial intelligence (AI)-based biomarker predictive of immune checkpoint inhibitor (ICI) outcomes across >16 primary tumor types.

Abstract

Immune checkpoint inhibitor induced hepatitis: Risk factors, outcomes, and impact on survival.

Abstract

Analysis of metastatic melanoma treated with immune checkpoint inhibitors and the rates of adverse events of colitis and hepatitis.

Abstract

The role of cytoreductive nephrectomy (CN) in the immune checkpoint inhibitor (ICI) era of metastatic renal cell carcinoma (mRCC): A systematic review and individual patient data (IPD) meta-analysis of 2319 patients.