University of California, Irvine, Irvine, CA
Lu He , Matthew Moldenhauer , Kai Zheng , Helen Ma
Background: Free-text clinical narratives contain rich patient information, which is labor-intensive to extract through chart review. We developed an NLP pipeline to enable automatic extraction of performance status (PS), staging, and diagnosis from clinical narratives from Veterans Affairs (VA) patients with lymphoid malignancies (LM). Methods: The rule-based NLP algorithm was developed and iteratively refined using a development corpus of 287 notes independently annotated by two clinicians. The F1-score for PS was 95.8 (precision 98.6, recall 93.2), 92.7 for staging (precision 94.0, recall 81.6), and 67 (precision 80.2, recall 57.9) for diagnosis. The NLP pipeline was then externally validated using an evaluation corpus of 97 notes from another group of 100 veterans with T-cell LM. Results: The results are reported. In the 97 notes, primary diagnosis was most routinely documented, with 2.76 mentions per note. In comparison, staging was most sparsely documented with only 34 mentions (note that 11 patients with large granular lymphocytic leukemia were not staged). The NLP pipeline performed relatively well in extracting PS and staging (F1-scores were 0.74 and 0.72, respectively). It also achieved high precision in extracting diagnosis information (precision 0.93). However, recall (0.44) for diagnosis was poor, likely due to the complexities and inconsistencies of how diagnoses are documented for LM. Frequency of documentation and performance in the external validation set. Conclusions: The pipeline shows promising performance on the external validation set, demonstrating the feasibility of using NLP to extract information from notes of patients with LM for clinical research. The NLP pipeline generally has lower recall than precision, indicating that the pipeline may miss clinical information that should be captured. FPs incorrectly capture entities that are easily confused with the clinical entities of interest, such as nutritional status versus performance status. Future work includes capturing more lexical variations and indicators of documentation, as well as contextual information, such as in which sections of notes elements are likely documented. In addition, we describe how diagnosis may exist as primary, secondary, and in a differential, and we are building an NLP-based classifier to distinguish between these types of diagnoses. We will use results from the rule-based NLP pipeline as labels to fine-tune transformer-based, weak-supervised models to further enhance the performance.
Variable | Frequency | Precision1 | Recall2 | F1-score3 |
---|---|---|---|---|
Performance status | 49 | 73 | 76 | 74 |
Staging | 34 | 79 | 66 | 72 |
Diagnosis: Primary | 268 | - | - | - |
Diagnosis: Secondary | 45 | - | - | - |
Diagnosis: Differential | 65 | - | - | - |
Diagnosis: Combined | 378 | 93 | 44 | 60 |
1Precision (P) = true positive (TP)/(TP + false positive [FP]); 2Recall (R) = TP/(TP + false negative [FN]); 3F1=2*P*R/(P + R).
Disclaimer
This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org
Abstract Disclosures
2023 ASCO Annual Meeting
First Author: Arash Maghsoudi
2023 ASCO Annual Meeting
First Author: Smita Agrawal
2020 ASCO Quality Care Symposium
First Author: Ana I. Velazquez Manana
2020 Genitourinary Cancers Symposium
First Author: Patrick R. Alba