Machine learning prediction of patient-reported outcomes associated with systemic therapy from medical records.

Authors

null

Frank Po-Yen Lin

Garvan Institute of Medical Research, Sydney, Australia

Frank Po-Yen Lin , Anthony M. Joshua , Richard J. Epstein

Organizations

Garvan Institute of Medical Research, Sydney, Australia, Kinghorn Cancer Centre, Darlinghurst, Australia

Research Funding

No funding received
None.

Background: Measures of patient-reported outcome (PRO) are important metrics in cancer care. However, PROs are poorly captured in routine practice, and data collection may incur undue burden on patients. It remains unclear if electronic medical records (EMR) contain sufficient information to dynamically monitor PROs in patients undergoing intensive systemic therapy. Methods: Over a 6 week period, consecutive patients attending an outpatient chemotherapy unit completed two questionnaires (EORTC QLQ-C30 and NCI PRO-CTCAE) at each visit. The corresponding EMR, authored by oncologists and allied health providers, were extracted up to 4 weeks prior to the questionnaire date. Random forest algorithm was used to predict each dichotomized PRO class using bag-of-n-grams extracted from the text. The classification performance was assessed by 8-fold cross-validation, using the area under receiver operating characteristic curve (AUC) as the metric. This study is part of the ePERSIST (electronic phenotyping, event retrieval, and case summarisation in solid tumours) project, exploring methods for extracting structured variables from EMR to support oncology research. Results: 115 patients (157 visits to the chemotherapy unit) completed up to 6 sets of questionnaires each. Overall, the median quality-of-life (QOL) score of QLQ-C30, QL2, was 67 (IQR 50 to 83). Machine learning predicted low QL2 score with mean AUC of 0.710 (95% CI 0.630 to 0.791). At the Bonferroni threshold of 0.00025, machine learning significantly predicted several components of QLQ-C30: PF2 (mean AUC 0.722, 0.633 to 0.812), AP (0.725, 0.630 to 0.819), CO (0.686, 0.607 to 0.764), FI (0.677, 0.598 to 0.755), PA (0.696, 0.603 to 0.787), FA (0.686, 0.584 to 0.787), and RF2 (0.717, 0.595 to 0.839). Conversely, EMR could only be used to significantly predict 4 of 13 PRO-CTCAE categories: cutaneous (0.745, 0.659 to 0.832), oral (0.734, 0.630 to 0.838), cardiovascular (0.675, 0.593 to 0.758), and gastrointestinal (GI, 0.742, 0.625 to 0.860), or 15 of 79 (19%) of PRO-CTCAE items related to adverse events including cutaneous (radiation burns, hypohidrosis, rash), oral (anorexia, dysphagia, dysgeusia), and GI symptoms (constipation, bloating), and others (anxiety, hoarseness, pain, erectile dysfunction). An exploratory analysis revealed that specific keywords linked to the PRO items were present in only one (11%) QLQ-C30 and 4 (5%) PRO-CTCAE items. Conclusions: The use of machine learning on clinical text predicts QOL and certain patient-reported adverse events with moderate accuracy in individuals undergoing intensive systemic treatment, suggesting its potential utility in syndromic monitoring where PRO is not formally assessed. Given many items cannot be accurately extracted, however, obtaining information directly from patients through questionnaires may still be required because of limited documentation by clinicians.

Disclaimer

This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org

Abstract Details

Meeting

2023 ASCO Annual Meeting

Session Type

Publication Only

Session Title

Publication Only: Care Delivery and Regulatory Policy

Track

Care Delivery and Quality Care

Sub Track

Clinical Informatics/Advanced Algorithms/Machine Learning

Citation

J Clin Oncol 41, 2023 (suppl 16; abstr e13564)

DOI

10.1200/JCO.2023.41.16_suppl.e13564

Abstract #

e13564

Abstract Disclosures