Validating the use of machine-learning cancer staging algorithms for Medicare cost analyses.

Authors

null

Rebecca Smith

Milliman, Inc., New York, NY

Rebecca Smith , Lesley-Ann Miller-Wilson , Gebra Cuyun Carter , Ifrah Fayyaz , Andreyah Pope , Bruce Pyenson

Organizations

Milliman, Inc., New York, NY, Exact Sciences Corporation, Madison, WI

Research Funding

Pharmaceutical/Biotech Company
Exact Sciences Corporation

Background: Administrative claims provide valuable real-world insight into the care of cancer patients; however, claims data lacks cancer stage information. This limitation constrains research on the value of early diagnosis and treatment as well as on the costs and savings associated with increased cancer screening. In prior work, our team used the SEER-Medicare data to develop machine learning (ML) algorithms to stage non-small cell lung (NSCLC), colon (CC), and rectal (RC) cancer patients using clinical flags derived from claims data. These algorithms were 69% (RC), 78% (NSCLC), and 83% (CC) accurate at matching incident cancer patients with their SEER-recorded AJCC stage (SEER-stage) at diagnosis. This work sought to test whether these ML algorithms are sufficiently accurate for use in claim cost analyses. Methods: Incident NSCLC, CC, and RC patients were identified using 2016-2017 SEER-Medicare data and assigned a cancer stage using a claims-based predictive multinomial logistic regression model (R Statistical Software - v4.1.2 R Core Team 2021; nnet package - Venables and Ripley 2002). Patients’ cumulative medical and pharmacy costs were summarized for 12 months starting with patients’ index month. Patients’ Medicare index month was set equal to the month of their first claim with a cancer diagnosis. Patients’ SEER index month was set equal to the diagnosis month associated with their incident tumor record in SEER. Patients with each cancer type were then grouped two ways - by ML-stage and by SEER-stage. Median patient costs were compared between stage groups for each cancer type and differences tested for statistical significance using Wilcoxon Rank-Sum Testing. Results: For NSCLC and CC, raw differences in median 12-month claim costs between the ML- and SEER-stage cohorts were small (1%-4%). Cost differences for RC were higher (7%-17%). ML and SEER costs were not significantly different (p > 0.05) between later-stage cohorts (NSCLC stages 3 and 4, CC stages 2C/3 and 4, and RC stage 4); however, early-stage groups were always significantly different (p < 0.05). Conclusions: Although costs were not statistically equivalent across all stage groups, the similarity of ML and SEER costs across higher-stage cohorts and small raw differences in median costs for each NSCLC and CC group suggests that ML algorithms with higher accuracy may be used to develop costs from administrative data for stage shift modeling and cost tradeoff analyses.

CancerStageSample SizeMedian Cost (ML)Median Cost (SEER)Δ
(ML-SEER)
p-value
NSCLC0/1/24,904$46,188$48,024-$1,8360.005
NSCLC32,700$70,975$72,345-$1,3700.433
NSCLC45,835$61,177$63,392-$2,2150.053
CC0/1/2A/2B3,638$37,020$38,230-$1,2090.013
CC2C/32,125$56,722$58,838-$2,1160.071
CC41,336$67,624$70,323-$2,6980.237
RC0/1/2A/2B650$43,895$53,685-$9,7900.000
RC2C/3517$66,976$78,292-$11,3160.000
RC4247$65,879$70,723-$4,8450.115

Disclaimer

This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org

Abstract Details

Meeting

2023 ASCO Annual Meeting

Session Type

Publication Only

Session Title

Publication Only: Care Delivery and Regulatory Policy

Track

Care Delivery and Quality Care

Sub Track

Clinical Informatics/Advanced Algorithms/Machine Learning

Citation

J Clin Oncol 41, 2023 (suppl 16; abstr e13548)

DOI

10.1200/JCO.2023.41.16_suppl.e13548

Abstract #

e13548

Abstract Disclosures