Performance of a trained large language model to provide clinical trial recommendation in a head and neck cancer population.

Authors

Tony Hung

Memorial Sloan Kettering Cancer Center, New York, NY

Tony Hung , Gilad Kuperman , Eric Jeffrey Sherman , Alan Loh Ho , Winston Wong , Anuja Kriplani , Lara Dunn , James Vincent Fetten , Loren S. Michel , Shrujal S. Baxi , Chunhua Weng , David G. Pfister , Jun J. Mao

Organizations

Memorial Sloan Kettering Cancer Center, New York, NY, Columbia University, New York, NY, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY

Research Funding

emorial Sloan Kettering Cancer Center (MSK) Support Grant (P30-CA008748)

Background: Chatbots based on large language model (LLM) have demonstrated ability to answer oncology exam questions; however, leveraging LLM in medical-decision support have not yet demonstrated suitable performance in oncology practice. We evaluated the performance of a trained a LLM, GPT-4, to recommend appropriate clinical trials for a head & neck (HN) cancer population. Methods: In 2022, we developed an artificial intelligence powered clinical trial management mobile app, LookUpTrials, and demonstrated promising user engagement among oncologists. Using LookUpTrials database, we applied direct preference optimization to train GPT-4 as an in-app assistant to LookUpTrials. From Nov 7 to Dec 19, 2023, we collected consecutive, new patient cases and their respective clinical trial recommendations from oncologists in the HN medical oncology service at Memorial Sloan Kettering Cancer Center. Cases were categorized by diagnosis, cancer stage, treatment setting, and physician recommendation on clinical trials. Trained GPT-4 is prompted using a semi-structured template: “Given patient with a <diagnosis>, <cancer stage>, <treatment setting>, what are possible clinical trials?” Physician recommendations were compared with trained GPT-4 responses. We analyzed the performance of GPT-4 based on its response precision (positive predictive value), recall (sensitivity), and F1 score (harmonic mean of precision and recall). Results: We analyzed 178 patient cases, mean age 65.6 (SD 13.9), primarily male (75%) with local/locally advanced (68%) HN (61%), thyroid (16%), skin (9%), or salivary (8%) cancers. Majority were treated in the definitive setting with combined modality therapy (42%) and modest proportion were treated under clinical trials (10%). Overall, trained GPT-4 achieved a moderate performance matching physician clinical trial recommendations with 63% precision and 100% recall (F1 score 0.77), narrowing a total list of 56 HN clinical trials to a range of 0-4 relevant trials per patient case (mean 1, SD 1.2). Comparatively, performance of our trained GPT-4 exceeded historic performance of untrained LLMs to provide oncology treatment recommendation by 4-20 folds (F1 score 0.04 - 0.19). Conclusions: This proof-of-concept study demonstrated that trained LLM can achieve moderate performance in matching physician clinical trial recommendation in HN oncology. Our results suggest the potential of embedding trained LLM into oncology workflow to aid clinical trial search and accelerate clinical trial accrual. Future research is needed to optimize precision of trained LLM and to assess whether trained LLM may be a scalable solution to enhance the diversity and equity of clinical trial participation.

Disclaimer

Abstract Details

Meeting

2024 ASCO Annual Meeting

Session Type

Poster Session

Session Title

Quality Care/Health Services Research

Track

Care Delivery and Quality Care

Sub Track

Health Services Research

Citation

J Clin Oncol 42, 2024 (suppl 16; abstr 11081)

DOI

10.1200/JCO.2024.42.16_suppl.11081

Abstract #

11081

Poster Bd #

276

Abstract Disclosures

FEATURED

Performance of a trained large language model to provide clinical trial recommendation in a head and neck cancer population.

Authors

Tony Hung

Organizations

Research Funding

Abstract Details

Meeting

Session Type

Session Title

Track

Sub Track

Citation

DOI

Abstract #

Poster Bd #

Similar Abstracts

Abstract

Integrating the business of oncology during oncology fellowship: An ASCO State of Cancer Care in America (SOCCA) study of professional preparedness and practice expectations among medical oncology fellow physicians in training between 2013 and 2023.

Abstract

Optimizing GCSF prophylaxis: Artificial intelligence (AI) models to predict chemotherapy-induced FN in patients with cancer.

Abstract

Use of an artificial intelligence (AI) –based pre-screening tool for patients with bladder cancer with fibroblast growth factor receptor (FGFR) alteration.

Abstract

Training oncologists (Oncs) to conduct better informed discussions with African American (AA) cancer (CA) patients (pts) about tumor genomic profiling (TGP): Results of a randomized pilot trial of Gene Pilot Pro (GPP).