Automated abstraction of real-world clinical outcome in lung cancer: A natural language processing and artificial intelligence approach from electronic health records.

Authors

null

Meng Ma

Sema4, Stamford, CT

Meng Ma , Arielle Redfern , Xiang Zhou , Dan Li , Ying Ru , Kyeryoung Lee , Christopher Gilman , Zongzhi Liu , Scott Jones , Yun Mai , Matthew Deitz , Yunrou Gong , Tommy Mullaney , Tony Prentice , Rong Chen , Eric Schadt , Xiaoyan Wang

Organizations

Sema4, Stamford, CT, Icahn School of Medicine at Mount Sinai, New York, NY

Research Funding

No funding received
None

Background: Real world evidence generated from electronic health records (EHRs) is playing an increasing role in health care decisions. It has been recognized as an essential element to assess cancer outcomes in real-world settings. Automatically abstracting outcomes from notes is becoming a fundamental challenge in medical informatics. In this study, we aim to develop a system to automatically abstract outcomes (Progression, Response, Stable Disease) from notes in lung cancer. Methods: A lung cancer cohort (n = 5,003) was obtained from the Mount Sinai Data Warehouse. The progress, pathology and radiology notes of patients were used. We integrated various techniques of Natural Language Processing (NLP) and Artificial Intelligence (AI) and developed a system to automatically abstract outcomes. The corresponding images, biopsies and lines of treatments (LOTs) were abstracted as attributes of outcomes. This system includes four information models: 1. Customized NLP annotator model: preprocessor, section detector, sentence splitter, named entity recognition, relation detector; CRF and LSTM methods were applied to recognize entities and relations. 2. Clinical Outcome container model: biopsy evidence extractor, lines of treatment detector, image evidence extractor, clinical outcome event recognizer, date detector, and temporal reasoning; Domain-specific rules were crafted to automatically infer outcomes. 3. Document Summarizer; 4. Longitudinal Outcome Summarizer. Results: To evaluate the outcomes abstracted, we curated a subset (n = 792) from patient cohort for which LOTs were available. About 61% of the outcomes identified were supported by radiologic images (time window = ±14 days) or biopsy pathology results (time window = ±100 days). In 91% (720/792) of patients, Progression was abstracted within a time window of 90 days prior to first-line treatment. Also, 72% of the Progression events identified were accompanied by a downstream event (e.g., treatment change or death). We randomly selected 250 outcomes for manual curation, and 197 outcomes were assessed to be correct (precision = 79%). Moreover, our automated abstraction system improved human abstractor efficiency to curate outcomes, reducing curation time per patient by 90%. Conclusions: We have demonstrated the feasibility and effectiveness of NLP and AI approaches to abstract outcomes from lung cancer EHR data. It promises to automatically abstract outcomes and other clinical entities from notes across all cancers.

Disclaimer

This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org

Abstract Details

Meeting

2020 ASCO Virtual Scientific Program

Session Type

Publication Only

Session Title

Publication Only: Care Delivery and Regulatory Policy

Track

Care Delivery and Quality Care

Sub Track

Clinical Informatics/Advanced Algorithms/Machine Learning

Citation

J Clin Oncol 38: 2020 (suppl; abstr e14062)

DOI

10.1200/JCO.2020.38.15_suppl.e14062

Abstract #

e14062

Abstract Disclosures