IMO Health, Rosemont, IL
Kyeryoung Lee , Hunki Paek , Liang-Chin Huang , Surabhi Datta , Augustine Annan , Nneka Ofoegbu , Mitchell K Higashi , C. Beau Hilton , Sam Rubinstein , Andrew Cowan , Mary Kwok , Jeremy Lyle Warner , Hua Xu , Xiaoyan Wang
Background: Conference abstracts serve as pivotal sources for sharing initial clinical trial findings, and influencing clinical decisions.Yet, concerns persist regarding the impact of unpublished conference abstracts on ultimate conclusions and the potential for inconsistency in result reporting. We aim to assess the feasibility of large-scale analysis, examining the consistency between initial conference results and subsequent reporting in published articles, specifically focusing on treatment efficacy and safety in oncology clinical trials using a large language model (LLM) pipeline. Methods: We collected clinical trial abstracts (2012-2023) from the American Society of Clinical Oncology conference (ASCO) conference and PubMed, encompassing both solid and hematopoietic cancer treatments. Utilizing a GPT-4-based LLM model, we extracted study details, treatment safety, and efficacy outcomes. Performance evaluation was conducted on manually annotated gold standards, including 100 multiple myelomas, 25 leukemia, 25 lymphomas, 30 breast cancer, and 35 lung cancer studies. To assess the consistency between reported outcome values in earlier conference abstracts and final published articles, we conducted a two-proportional Z-test. The test factored in cohort size and outcome values at each time point for selected efficacy outcomes, with p-values exceeding 0.05 suggesting a consistent pattern. Results: Our LLM pipeline achieved high performance with precision, recall (sensitivity), and F1 scores in the ranges of 0.958-0.986, 0.944-0.969, and 0.951-0.976, respectively, across diverse cancer types. While challenges arose in comparing outcomes between initial and final reporting in phase 1 dose-escalation studies due to variations in reported dosage groups, consistency prevailed when focusing on the recommended phase 2 dosage (RP2D) cohort in phases 1/2 and 2 studies. As part of the feasibility test, we analyzed outcomes from conference abstracts and final published data (with 1-2y differences) for the most common efficacy-safety measures in multiple myeloma studies. Results showed consistency with p-values ranging for Overall Response Rate (0.618-1), Complete Response (0.072-0.844), Very Good Partial Response (0.525), Minimal Residual Disease Negative (0.074), Neutropenia (0.212), Thrombocytopenia (0.372-0.422), Cytokine Release Syndrome (0.113-1), and Neurotoxicity (0.308-1). Conclusions: Our LLM model enables large-scale dataset analysis and facilitates effective outcome comparison among diverse sources and time points. The analysis of frequently appearing treatment outcomes showed no significant differences between earlier and final time points in the fixed dosage studies across therapies, despite variations in cohort sizes and follow-up times.
Disclaimer
This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org
Abstract Disclosures
2024 ASCO Annual Meeting
First Author: Ilse Anna Catharina Spiekman
2021 ASCO Annual Meeting
First Author: Nizar J. Bahlis
2024 ASCO Annual Meeting
First Author: Jun Zhao
2021 ASCO Annual Meeting
First Author: Xavier Leleu