Development and validation of natural language processing (NLP) algorithm for detection of distant versus local breast cancer recurrence and metastatic site.

Authors

null

Yasmin Karimi

Division of Medical Oncology, Stanford School of Medicine, Stanford, CA

Yasmin Karimi , Douglas W. Blayney , Allison W. Kurian , Daniel Rubin , Imon Banerjee

Organizations

Division of Medical Oncology, Stanford School of Medicine, Stanford, CA, Stanford University, Stanford, CA, Stanford School of Medicine, Stanford, CA, Stanford University, School of Medicine, Stanford, CA, Emory University Hospital, Atlanta, GA

Research Funding

No funding received
None

Background: Electronic health records (EHR) are used for retrospective cancer outcomes analysis. Sites and timing of recurrence are not captured in structured EHR data. Novel computerized methods are necessary to use unstructured longitudinal EHR data for large scale studies. Methods: We previously developed a neural network-based NLP algorithm to identify no recurrence vs. metastatic recurrence cases by analyzing physician notes, pathology and radiology reports in Stanford’s breast cancer database, Oncoshare (Cohort A). To validate this algorithm for local vs. distant recurrence, we identified a distinct Oncoshare cohort (Cohort B). Cases were manually curated for longitudinal development of local or distant recurrence and metastatic sites. A two-sided t-test was used to compare mean probabilities between local and distant recurrence cases. Next, we combined cases in Cohorts A and B to train and validate a novel NLP classifier that identifies metastatic site. The combined cohort was randomly divided into training and validation sets. Sensitivity and specificity were calculated for the NLP algorithm’s ability to detect metastatic sites compared to manual curation. Results: In Cohort B: 350 metastatic cases were identified. Mean probability for local and distant recurrence was 0.43 and 0.79, respectively and differed significantly for patients with local vs. distant recurrence (p<0.01). In Cohorts A and B: 632 metastatic cases were used for determination of sites. Sensitivity and specificity were highest for detection of peritoneal metastasis followed by liver, lung, skin, bone and central nervous system (table). Conclusions: This NLP algorithm is a scalable tool that uses unstructured EHR data to capture breast cancer recurrence, distinguishing local from distant recurrence and identifying metastatic site. This method may facilitate analysis of large datasets and correlation of outcomes with metastatic site.

Sensitivity & specificity of extracting recurrence sites.

BoneLiverLungLymph NodesCNSPeritoneumSkin
N (cases)2529894101371516
Sensitivity0.840.970.930.820.90.940.97
Specificity0.770.770.60.60.51.00.5

Disclaimer

This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org

Abstract Details

Meeting

2020 ASCO Virtual Scientific Program

Session Type

Poster Session

Session Title

Care Delivery and Regulatory Policy

Track

Care Delivery and Quality Care

Sub Track

Clinical Informatics/Advanced Algorithms/Machine Learning

Citation

J Clin Oncol 38: 2020 (suppl; abstr 2043)

DOI

10.1200/JCO.2020.38.15_suppl.2043

Abstract #

2043

Poster Bd #

35

Abstract Disclosures