Division of Medical Oncology, Stanford School of Medicine, Stanford, CA
Yasmin Karimi , Douglas W. Blayney , Allison W. Kurian , Daniel Rubin , Imon Banerjee
Background: Electronic health records (EHR) are used for retrospective cancer outcomes analysis. Sites and timing of recurrence are not captured in structured EHR data. Novel computerized methods are necessary to use unstructured longitudinal EHR data for large scale studies. Methods: We previously developed a neural network-based NLP algorithm to identify no recurrence vs. metastatic recurrence cases by analyzing physician notes, pathology and radiology reports in Stanford’s breast cancer database, Oncoshare (Cohort A). To validate this algorithm for local vs. distant recurrence, we identified a distinct Oncoshare cohort (Cohort B). Cases were manually curated for longitudinal development of local or distant recurrence and metastatic sites. A two-sided t-test was used to compare mean probabilities between local and distant recurrence cases. Next, we combined cases in Cohorts A and B to train and validate a novel NLP classifier that identifies metastatic site. The combined cohort was randomly divided into training and validation sets. Sensitivity and specificity were calculated for the NLP algorithm’s ability to detect metastatic sites compared to manual curation. Results: In Cohort B: 350 metastatic cases were identified. Mean probability for local and distant recurrence was 0.43 and 0.79, respectively and differed significantly for patients with local vs. distant recurrence (p<0.01). In Cohorts A and B: 632 metastatic cases were used for determination of sites. Sensitivity and specificity were highest for detection of peritoneal metastasis followed by liver, lung, skin, bone and central nervous system (table). Conclusions: This NLP algorithm is a scalable tool that uses unstructured EHR data to capture breast cancer recurrence, distinguishing local from distant recurrence and identifying metastatic site. This method may facilitate analysis of large datasets and correlation of outcomes with metastatic site.
Bone | Liver | Lung | Lymph Nodes | CNS | Peritoneum | Skin | |
---|---|---|---|---|---|---|---|
N (cases) | 252 | 98 | 94 | 101 | 37 | 15 | 16 |
Sensitivity | 0.84 | 0.97 | 0.93 | 0.82 | 0.9 | 0.94 | 0.97 |
Specificity | 0.77 | 0.77 | 0.6 | 0.6 | 0.5 | 1.0 | 0.5 |
Disclaimer
This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org
Abstract Disclosures
2023 ASCO Annual Meeting
First Author: Foad H. Green
2023 ASCO Annual Meeting
First Author: Smita Agrawal
2023 ASCO Annual Meeting
First Author: Smita Agrawal
2016 ASCO Annual Meeting
First Author: Heather B. Neuman