Memorial Sloan Kettering Cancer Center, New York, NY
Alexander Grigorenko , Paul May , Nicholas Kastango , Howard T. Thaler , Isaac Wagner , Aryeh Caroline , Mark G. Kris , Marjorie Glass Zauderer
Background: The electronic medical record (EMR) is a tremendous research resource, but its use for exploring hypotheses has been limited by an inability to reliably and efficiently identify patient cohorts. We have created a system that uses advanced computational techniques and Memorial Sloan Kettering’s (MSK) EMR data to overcome these challenges. To validate our system, we sought to replicate a phase III clinical trial comparing cisplatin/pemetrexed (CP) to cisplatin/gemcitabine (CG) in patients with lung adenocarcinomas (Scagliotti J Clin Oncol 2008). Methods: We created a system that can identify a patient cohort by extracting structured cancer and outcomes data from the EMR, algorithmically identifying chemotherapy regimens, and using natural language processing to extract functional and smoking status from physician notes. Using the earlier clinical trial’s eligibility criteria, we identified a patient cohort and analyzed survival on an intent-to-treat basis. Our analysis relied on the extensive data warehouse of MSK’s EMR information, which contains data on the care of over a million patients since 1989. Results: Our system successfully extracted structured data, and accurately categorized treatment regimens (F-measure = 0.985), functional status (F-measure = 0.998), and smoking status (F-measure = 0.993). 281 patients were automatically identified as eligible. The median overall survival (OS) of patients with lung adenocarcinomas receiving CP and CG was 14.7 and 12.6 months with a hazard ratio (HR) of 0.69 (95% CI: 0.52 - 0.90) favoring CP. These results are similar to those of the prospective trial (Table). Conclusions: Our system replicated the results of a prospective clinical trial. Highly-accurate computational tools to extract structured and textual data from the EMR are feasible and can help address pending clinical research questions. Future steps will focus on expanding data extraction capabilities to support a broader range of hypotheses within the EMR.
Metric | Virtual Trial | Clinical Trial |
---|---|---|
OS (months; CP, CG; % difference) | 14.7, 12.6; 15.4% | 12.6, 10.9; 14.5% |
HR (95% CI) | 0.69 (0.52 - 0.90) | 0.81 (0.70 - 0.94) |
Disclaimer
This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org
Abstract Disclosures
2023 ASCO Annual Meeting
First Author: Semanti Mukherjee
2023 ASCO Genitourinary Cancers Symposium
First Author: Srikala S. Sridhar
2023 ASCO Annual Meeting
First Author: Bradley J. Monk
2024 ASCO Genitourinary Cancers Symposium
First Author: Elaine Chang