Extracting longitudinal anticancer treatments at scale using deep natural language processing and temporal reasoning.

Authors

null

Meng Ma

Sema4, Stamford, CT

Meng Ma , Kyeryoung Lee , Yun Mai , Christopher Gilman , Zongzhi Liu , Mingwei Zhang , Minghao Li , Arielle Redfern , Tommy Mullaney , Tony Prentice , Paul McDonagh , Qi Pan , Rong Chen , Eric Schadt , Xiaoyan Wang

Organizations

Sema4, Stamford, CT, Icahn School of Medicine at Mount Sinai, New York, NY

Research Funding

No funding received
None

Background: Accurate longitudinal cancer treatments are vital for establishing primary endpoints such as outcome as well as for the investigation of adverse events. However, many longitudinal therapeutic regimens are not well captured in structured electronic health records (EHRs). Thus, their recognition in unstructured data such as clinical notes is critical to gain an accurate description of the real-world patient treatment journey. Here, we demonstrate a scalable approach to extract high-quality longitudinal cancer treatments from lung cancer patients' clinical notes using a Bidirectional Long Short Term Memory (BiLSTM) and Conditional Random Fields (CRF) based natural language processing (NLP) pipeline. Methods: The lung cancer (LC) cohort of 4,698 patients was curated from the Mount Sinai Healthcare system (2003-2020). Two domain experts developed a structured framework of entities and semantics that captured treatment and its temporality. The framework included therapy type (chemotherapy, targeted therapy, immunotherapy, etc.), status (on, off, hold, planned, etc.) and temporal reasoning entities and relations (admin_date, duration, etc.) We pre-annotated 149 FDA-approved cancer drugs and longitudinal timelines of treatment on the training corpus. A NLP pipeline was implemented with BiLSTM-CRF-based deep learning models to train and then apply the resulting models to the clinical notes of LC cohort. A postprocessor was developed to subsequently post-coordinate and refine the output. We performed both cross-evaluation and independent evaluation to assess the pipeline performance. Results: We applied the NLP pipeline to the 853,755 clinical notes, and identified 1,155 distinct entities for 194 cancer generic drugs, including 74 chemotherapy drugs, 21 immunotherapy drugs, and 99 targeted therapy drugs. We identified chemotherapy, immunotherapy, or targeted therapy data for 3,509 patients in the LC cohort from the clinical notes. Compared to only 2,395 patients with cancer treatments in structured EHR, this pipeline identified cancer treatments from notes for additional 2,303 patients who did not have any available cancer treatment data in the structured EHR. Our evaluation schema indicates that the longitudinal cancer drug recognition pipeline delivers strong performance (named entity recognization for drugs and temporal: F1 = 95%; drug-temporal relation recognition: F1 = 90%). Conclusions: We developed a high-performance BiLSTM-CRF based NLP pipeline to recognize longitudinal cancer treatments. The pipeline recovers and encodes as twice as many patients with cancer treatments compared with structured EHR. Our study indicates deep NLP with temporal reasoning could substantially accelerate the extraction of treatment profiles at scale. The pipeline is adjustable and can be applied across different cancers.

Disclaimer

This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org

Abstract Details

Meeting

2021 ASCO Annual Meeting

Session Type

Publication Only

Session Title

Publication Only: Health Services Research and Quality Improvement

Track

Quality Care/Health Services Research

Sub Track

Real-World Data/Outcomes

Citation

J Clin Oncol 39, 2021 (suppl 15; abstr e18747)

DOI

10.1200/JCO.2021.39.15_suppl.e18747

Abstract #

e18747

Abstract Disclosures

Similar Abstracts

First Author: Brittany Avin McKelvey

Abstract

2023 ASCO Annual Meeting

Real-world data on incidence of acute adverse reactions (AARs) reported in clinical practice.

First Author: Laura A. Ferrari