Natural language processing-optimized case selection for real-world evidence studies.

Authors

Jacob Koskimaki

CancerLinQ, Alexandria, VA

Jacob Koskimaki , Jenny Hu , Yiduo Zhang , Jose Mena , Nehanda Jones , Elizabeth Lipschultz , Vivek Prabhakar Vaidya , Gabriel Altay , Vance Andrei Erese , Krishna Kumar Swaminathan , Emma Mendonca , Tarun Dutt , Kuldeep Singh , Tian King , Vinay Phani Santosh Lakkimsetty , Hussein Al-Olimat , Brittany Manning , George Anthony Komatsoulis , Simon Chu , Jeff Ottens

Organizations

CancerLinQ, Alexandria, VA, AstraZeneca Pharmaceuticals LP, Gaithersburg, MD, AstraZeneca Pharmaceuticals LP, Wilmington, DE, CancerLinQ, LLC, Alexandria, VA, ConcertAI, Bengaluru, India, Tempus Labs, Inc., Chicago, IL, ConcertAI, Bangalore, India, Concert AI, Bangalore, India, Tempus Labs, Chicago, IL, American Society of Clinical Oncology’s (ASCO) CancerLinQ, Alexandria, VA

Research Funding

Pharmaceutical/Biotech Company

Background: Much information describing a patient’s cancer treatment remains in unstructured text in electronic health records and is not recorded in discrete data fields. Accurate data completeness is essential for quality care improvement and research studies on de-identified patient records. Accessing this high-value content often requires manual and extensive curation review. Methods: AstraZeneca, CancerLinQ, ConcertAI, and Tempus have developed a natural language processing (NLP)-assisted process to improve clinical cohort selection for targeted curation efforts. Hybrid, machine-learning model development included text classification, named entity recognition, relation extraction and false positive removal. A subset of nearly 60,000 lung cancer cases were included from the CancerLinQ database, comprised of multiple source EHR systems. NLP models extracted EGFR status, stage, histology, radiation therapy, surgical resection and oral medications. Based on the results, cases were selected for additional manual curation, where curators confirmed findings of the NLP-processed data. Results: NLP methods improved cohort identification. Successfully returned cases using the NLP method ranged from 75.2% to 96.5% over more general case selection criteria based on limited structured data. For all cohorts combined, 84.2% of the cases sent out for NLP curation were returned with curated content (Table). Each cohort contained a range of NLP-derived elements for curators to further review. In comparison, more general case selection criteria yielded a total of 3,878 cases returned out of 41,186 lung cancer cases sent for curation, for a success rate of only 9.6%. Conclusions: NLP-driven case selection of six distinct, complex lung cohorts resulted in an order of magnitude improvement in eligibility over candidate selection using structured EHR data alone. This study demonstrates NLP-assisted approaches can significantly improve efficiency in curating unstructured health data.

NLP-assisted cohort selection for the six pre-specified lung cancer cohorts.
Cohort	Cohort Description	Number of cases available from NLP-assisted identification methods	Number of cases sent to Tempus and ConcertAI for curation	Number of cases returned to CancerLinQ with curated content	Percent of successfully curated cases
1A	NSCLC, stage I, II, III, EGFR+, complete resection	408	408	341	83.6%
1B	NSCLC, non-squamous, stage I, II, III, EGFR wild type/unknown, complete resection	4313	1500	1285	85.7%
2A	NSCLC, stage III, unresectable, curative radiation to the chest total dose >= 50 Gy, did receive Imfinzi	852	620	466	75.2%
2B	NSCLC, stage III, unresectable, curative radiation to the chest total dose >= 50 Gy, did not receive Imfinzi	3050	750	724	96.5%
3	SCLC, received Imfinzi or Tecentriq	559	500	402	80.4%
4	NSCLC, received Tagrisso as first line treatment	971	812	647	79.7%
Total:		10153	4590	3865

Disclaimer

Abstract Details

Meeting

2022 ASCO Annual Meeting

Session Type

Poster Session

Session Title

Care Delivery and Regulatory Policy

Track

Care Delivery and Quality Care

Sub Track

Clinical Informatics/Advanced Algorithms/Machine Learning

Citation

J Clin Oncol 40, 2022 (suppl 16; abstr 1556)

DOI

10.1200/JCO.2022.40.16_suppl.1556

Abstract #

1556

Poster Bd #

149

Abstract Disclosures

FEATURED

Natural language processing-optimized case selection for real-world evidence studies.

Authors

Jacob Koskimaki

Organizations

Research Funding

Abstract Details

Meeting

Session Type

Session Title

Track

Sub Track

Citation

DOI

Abstract #

Poster Bd #

Similar Abstracts

Abstract

Survival and patient-reported outcomes of real-world patients with high-risk stage II and stage III colon cancer after 3 versus 6 months of adjuvant CAPOX.

Abstract

Predictors of benefit to radiation for oligoprogressive disease in EGFR-mutant metastatic non-small cell lung cancer patients treated with osimertinib.

Abstract

Development of natural language processing (NLP) models for extracting key features from unstructured notes to create real-world data (RWD) assets for clinical research at scale.

Abstract

Real world assessment of clinical benefit with consolidation durvalumab following chemoradiotherapy in stage III unresectable non-small cell lung cancer.