Sequential learning for pan-tumor detection of metastatic disease progression.

Authors

Foad H. Green

Syapse, San Francisco, CA

Foad H. Green , Matthew J. Rioth , Joshua Loving

Organizations

Syapse, San Francisco, CA

Research Funding

No funding received

None.

Background: Detection of progression of patients with early stage cancer is a challenge for real world data from electronic health records (EHR). Temporal patterns common to patients’ data can detect patients with metastatic progression. A deep learning approach on longitudinal patient records of cancer lab testing, visit diagnosis codes, and cancer treatments was used to identify metastatic status. Methods: ICD-10-CM diagnosis codes, clinical lab values specific to cancer testing, and antineoplastic treatments were evaluated from longitudinal data of 128,614 patients who did not progress and 5,884 who did develop metastasis, across 47 tumor types in the Syapse Learning Health Network. Patients metastatic at diagnosis were excluded. Data were included beginning at the time of primary cancer diagnosis and ended for all surviving patients at an established administrative cutoff date. In the metastatic group, visits after the progression event were included but censored if they contained diagnosis codes specifically for secondary malignant neoplasms. A binary classification was developed using a multi-headed attention transformer in order to determine metastatic status. Patient history was sequentially aggregated to visit-level embeddings across their longitudinal record. Categorical layers were used for antineoplastics and diagnosis codes. Linear layers were used to embed lab values, as well as for visit intervals from primary cancer date and administrative cutoff date. All embeddings were used as feature inputs for the classifier. Pre-training random sampling of the non-metastatic population was applied to establish equally-weighted labels in a 80:10:10 split for training, validation, and testing sets; to account for the smaller population of patients who progressed to metastatic status. Results: This classification approach achieved 0.86 PR-AUC, 0.79 ROC-AUC, and 0.75 F1. This performance is comparable to published models trained for single tumor cohort prediction. With our censoring constraints, these model results are robust in the absence of routine signals of metastasis in the EHR, such as staging reports and diagnosis coding for distant metastases. Conclusions: This method generalizes, across multiple cancer types, an accurate classification of metastatic progression from patient visit history. This work is immediately useful for real world evidence data analysis complementing patients with metastases already captured in hospital registries. Using sequential deep learning with EHR data classes, this approach may be used to forecast metastatic progression for early intervention.

Disclaimer

Abstract Details

Meeting

2023 ASCO Annual Meeting

Session Type

Publication Only

Session Title

Publication Only: Care Delivery and Regulatory Policy

Track

Care Delivery and Quality Care

Sub Track

Clinical Informatics/Advanced Algorithms/Machine Learning

Citation

J Clin Oncol 41, 2023 (suppl 16; abstr e13591)

DOI

10.1200/JCO.2023.41.16_suppl.e13591

Abstract #

e13591

Abstract Disclosures

FEATURED

Sequential learning for pan-tumor detection of metastatic disease progression.

Authors

Foad H. Green

Organizations

Research Funding

Abstract Details

Meeting

Session Type

Session Title

Track

Sub Track

Citation

DOI

Abstract #

Similar Abstracts

Abstract

Enhancement in line of therapy (LoT) derivation from real-world data (RWD) from electronic health records (EHR) via integration of medical claims data.

Abstract

Comparison of real-world mortality data among patients with glioblastoma (GBM) and metastatic pancreatic cancer (mPC) treated in the community oncology setting.

Abstract

Using machine learning on real-world data to predict metastatic status.

Abstract

Development of natural language processing (NLP) models for extracting key features from unstructured notes to create real-world data (RWD) assets for clinical research at scale.