Machine learning application to find patients with lower-risk myelodysplastic syndrome from real-world data.

Authors

null

Colden Johanson

Syapse, San Francisco, CA

Colden Johanson , Hu T. Huang , Danny Idryo , Ronda Broome , Matthew J. Rioth , Rayna K. Matsuno

Organizations

Syapse, San Francisco, CA, Vanderbilt-Ingram Cancer Center, Nashville, TN

Research Funding

Pharmaceutical/Biotech Company

Background: It is a challenge to identify patients with myelodysplastic syndrome (MDS) using structured data from electronic health records (EHRs). Current claims-based algorithms incorporating diagnosis codes, clinical labs, and procedures have not been validated against an expert reference standard. A machine learning-based approach was investigated to identify erythropoietin-stimulating agent (ESA)-treated, lower-risk (LR)-MDS patients from structured EHR data. Methods: A sample of 1,549 patients from the Syapse Learning Health Network (SLHN) was identified as potential ESA-treated LR-MDS patients by a team of clinicians and epidemiologists based on diagnosis and medication data from multiple health systems’ EHRs and cancer registries. Of these, 404 (25%) were confirmed as ESA-treated LR-MDS patients through a review of patient records by certified cancer registrars (CTRs). The sample was divided into training and validation sets at a ratio of 80/20, stratified by the outcome. Age, sex, diagnosis codes corresponding to MDS and chronic kidney disease, medication (ESA, luspatercept, lenalidomide), clinical lab tests (hemoglobin, absolute neutrophils, platelet, blast percentage), and evidence of bone marrow biopsy were included as the predictive variables for the models. Gradient boosting machines with a nested cross-validation scheme were adopted to build the optimal model on the training set. Model acceptance was evaluated based on precision and recall on the validation set. The optimal model was then applied to the remaining unscreened SLHN patient population. Results: The optimal model identified an additional cohort of 157 patients based on the predicted likelihood. Among these, 69 (44%) were CTR-confirmed ESA-treated LR-MDS patients, all of whom were previously missed by the initial expert-determined selection criteria, as shown in the table. Conclusions: The application of machine learning methods increased the rate of ESA-treated MDS patient identification even after the expertly-determined population was depleted. This suggests the application of machine learning models using EHR data may improve the efficiency of MDS patient identification and screening efforts for research, quality improvement, and clinical care.

CriteriaSLHN patients screenedESA-treated MDS
Expert-determined selection criteriaTwo MDS diagnosis* dates ≥ 90 days apart + evidence of ESA treatment§†239122 (51%)
MDS Registry evidence by ICD-O-3 Histology1,229270 (22%)
Patients with suspected MDS based on manual review of clinician notes411 (2%)
Other small sample attempts406 (15%)
Total1,549404 (25%)
Machine learning model15769 (44%)

*MDS ICD-10 codes: C94.6 and D46. MDS ICD-O-3 Histology: 9980, 9981, 9982, 9983, 9984, 9985, 9986, 9987, 9988, 9989, 9993. §ESA treatment: darbepoetin alfa or epoetin alfa. The first code fell within the study window of 2016-01-01 to 2019-06-30.

Disclaimer

This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org

Abstract Details

Meeting

2022 ASCO Annual Meeting

Session Type

Poster Session

Session Title

Care Delivery and Regulatory Policy

Track

Care Delivery and Quality Care

Sub Track

Clinical Informatics/Advanced Algorithms/Machine Learning

Citation

J Clin Oncol 40, 2022 (suppl 16; abstr 1555)

DOI

10.1200/JCO.2022.40.16_suppl.1555

Abstract #

1555

Poster Bd #

148

Abstract Disclosures