Syapse, San Francisco, CA
Colden Johanson , Hu T. Huang , Danny Idryo , Ronda Broome , Matthew J. Rioth , Rayna K. Matsuno
Background: It is a challenge to identify patients with myelodysplastic syndrome (MDS) using structured data from electronic health records (EHRs). Current claims-based algorithms incorporating diagnosis codes, clinical labs, and procedures have not been validated against an expert reference standard. A machine learning-based approach was investigated to identify erythropoietin-stimulating agent (ESA)-treated, lower-risk (LR)-MDS patients from structured EHR data. Methods: A sample of 1,549 patients from the Syapse Learning Health Network (SLHN) was identified as potential ESA-treated LR-MDS patients by a team of clinicians and epidemiologists based on diagnosis and medication data from multiple health systems’ EHRs and cancer registries. Of these, 404 (25%) were confirmed as ESA-treated LR-MDS patients through a review of patient records by certified cancer registrars (CTRs). The sample was divided into training and validation sets at a ratio of 80/20, stratified by the outcome. Age, sex, diagnosis codes corresponding to MDS and chronic kidney disease, medication (ESA, luspatercept, lenalidomide), clinical lab tests (hemoglobin, absolute neutrophils, platelet, blast percentage), and evidence of bone marrow biopsy were included as the predictive variables for the models. Gradient boosting machines with a nested cross-validation scheme were adopted to build the optimal model on the training set. Model acceptance was evaluated based on precision and recall on the validation set. The optimal model was then applied to the remaining unscreened SLHN patient population. Results: The optimal model identified an additional cohort of 157 patients based on the predicted likelihood. Among these, 69 (44%) were CTR-confirmed ESA-treated LR-MDS patients, all of whom were previously missed by the initial expert-determined selection criteria, as shown in the table. Conclusions: The application of machine learning methods increased the rate of ESA-treated MDS patient identification even after the expertly-determined population was depleted. This suggests the application of machine learning models using EHR data may improve the efficiency of MDS patient identification and screening efforts for research, quality improvement, and clinical care.
Criteria | SLHN patients screened | ESA-treated MDS | |
---|---|---|---|
Expert-determined selection criteria | Two MDS diagnosis* dates ≥ 90 days apart + evidence of ESA treatment§† | 239 | 122 (51%) |
MDS Registry evidence by ICD-O-3 Histology‡ | 1,229 | 270 (22%) | |
Patients with suspected MDS based on manual review of clinician notes | 41 | 1 (2%) | |
Other small sample attempts | 40 | 6 (15%) | |
Total | 1,549 | 404 (25%) | |
Machine learning model | 157 | 69 (44%) |
*MDS ICD-10 codes: C94.6 and D46. ‡MDS ICD-O-3 Histology: 9980, 9981, 9982, 9983, 9984, 9985, 9986, 9987, 9988, 9989, 9993. §ESA treatment: darbepoetin alfa or epoetin alfa. †The first code fell within the study window of 2016-01-01 to 2019-06-30.
Disclaimer
This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org
Abstract Disclosures
2023 ASCO Annual Meeting
First Author: Rami S. Komrokji
2023 ASCO Annual Meeting
First Author: Guillermo Garcia-Manero
2023 ASCO Annual Meeting
First Author: Smita Agrawal
2023 ASCO Quality Care Symposium
First Author: Amit D Raval