Development and validation of natural language processing (NLP) algorithm for detection of distant versus local breast cancer recurrence and metastatic site.

Authors

Yasmin Karimi

Division of Medical Oncology, Stanford School of Medicine, Stanford, CA

Yasmin Karimi , Douglas W. Blayney , Allison W. Kurian , Daniel Rubin , Imon Banerjee

Organizations

Division of Medical Oncology, Stanford School of Medicine, Stanford, CA, Stanford University, Stanford, CA, Stanford School of Medicine, Stanford, CA, Stanford University, School of Medicine, Stanford, CA, Emory University Hospital, Atlanta, GA

Research Funding

No funding received

None

Background: Electronic health records (EHR) are used for retrospective cancer outcomes analysis. Sites and timing of recurrence are not captured in structured EHR data. Novel computerized methods are necessary to use unstructured longitudinal EHR data for large scale studies. Methods: We previously developed a neural network-based NLP algorithm to identify no recurrence vs. metastatic recurrence cases by analyzing physician notes, pathology and radiology reports in Stanford’s breast cancer database, Oncoshare (Cohort A). To validate this algorithm for local vs. distant recurrence, we identified a distinct Oncoshare cohort (Cohort B). Cases were manually curated for longitudinal development of local or distant recurrence and metastatic sites. A two-sided t-test was used to compare mean probabilities between local and distant recurrence cases. Next, we combined cases in Cohorts A and B to train and validate a novel NLP classifier that identifies metastatic site. The combined cohort was randomly divided into training and validation sets. Sensitivity and specificity were calculated for the NLP algorithm’s ability to detect metastatic sites compared to manual curation. Results: In Cohort B: 350 metastatic cases were identified. Mean probability for local and distant recurrence was 0.43 and 0.79, respectively and differed significantly for patients with local vs. distant recurrence (p<0.01). In Cohorts A and B: 632 metastatic cases were used for determination of sites. Sensitivity and specificity were highest for detection of peritoneal metastasis followed by liver, lung, skin, bone and central nervous system (table). Conclusions: This NLP algorithm is a scalable tool that uses unstructured EHR data to capture breast cancer recurrence, distinguishing local from distant recurrence and identifying metastatic site. This method may facilitate analysis of large datasets and correlation of outcomes with metastatic site.

Sensitivity & specificity of extracting recurrence sites.
	Bone	Liver	Lung	Lymph Nodes	CNS	Peritoneum	Skin
N (cases)	252	98	94	101	37	15	16
Sensitivity	0.84	0.97	0.93	0.82	0.9	0.94	0.97
Specificity	0.77	0.77	0.6	0.6	0.5	1.0	0.5

Disclaimer

Abstract Details

Meeting

2020 ASCO Virtual Scientific Program

Session Type

Poster Session

Session Title

Care Delivery and Regulatory Policy

Track

Care Delivery and Quality Care

Sub Track

Clinical Informatics/Advanced Algorithms/Machine Learning

Citation

J Clin Oncol 38: 2020 (suppl; abstr 2043)

DOI

10.1200/JCO.2020.38.15_suppl.2043

Abstract #

2043

Poster Bd #

Abstract Disclosures

FEATURED

Development and validation of natural language processing (NLP) algorithm for detection of distant versus local breast cancer recurrence and metastatic site.

Authors

Yasmin Karimi

Organizations

Research Funding

Abstract Details

Meeting

Session Type

Session Title

Track

Sub Track

Citation

DOI

Abstract #

Poster Bd #

Similar Abstracts

Abstract

Sequential learning for pan-tumor detection of metastatic disease progression.

Abstract

Development of natural language processing (NLP) models for extracting key features from unstructured notes to create real-world data (RWD) assets for clinical research at scale.

Abstract

Enhancement in line of therapy (LoT) derivation from real-world data (RWD) from electronic health records (EHR) via integration of medical claims data.

Abstract

Risk of synchronous distant recurrence at time of local-regional recurrence in stage II and III breast cancer patients.