mICKEY: Memory-efficient deep learning for personalized biomarker discovery and cancer origin prediction from DNA methylation data.

Authors

null

Chanati Jantrachotechatchawan

Siriraj Hospital, Mahidol University, Bangkok Noi, Bangkok, Thailand

Chanati Jantrachotechatchawan , Pakanan Tussanapirom , Kasidech Aewsrisakul , Natthawadee Leephatarakit , Kobchai Duangrattanalert

Organizations

Siriraj Hospital, Mahidol University, Bangkok Noi, Bangkok, Thailand, Triam Udom Suksa School, Bangkok, Thailand, Hatyaiwittayalai School, Songkhla, Thailand, University Technology Center, Chulalongkorn University, Bangkok, Thailand

Research Funding

Other
Partly by NSTDA for the science project competition YSC 2023

Background: Carcinoma of unknown primary (CUP) accounts for up to 5% of all cancer cases and presents challenges in identifying primary cancer sites and successful treatment. DNA methylation abnormalities on the 5'-cytosine-phosphate-guanine-3' (CpG) motif across the genome are associated with carcinogenesis, enabling their use as cancer biomarkers for early diagnosis and tissue origin prediction. This study proposes a Combined Approach for 1) CpG site selection, 2) predicting cancer origin and identifying biomarkers, and 3) developing an open-source application for user-friendly data analysis. Methods: To emulate the state-like progression of cancer and accommodate missing values, discretized methylation beta values and missing values were tokenized. Feature selection using L1 regularization of lambda 0.0003 resulted in 152 CpG sites out of an initial set of 312,792. Three independent sets were selected for each of the 3 self-attention type: vanilla scaled dot-product, dense synthesizer, and factorized dense synthesizer. Self-attention enhances the biomarker discovery approach toward personalization as it generates a unique attention map for each sample input that highlights essential features for primary site prediction. A permutation test further evaluated contribution by the selected CpG sites, confirming biomarkers identification for each primary site. An open-source application was developed to predict cancer origin using methylation beta values, providing a user-friendly interface displaying predicted primary sites and corresponding percentages. Results: Independent sets of selected CpG performs comparably to the initial set with robustness (most well above 97% in precision, recall, and F1) across different clinical (sample type: normal, primary, metastatic, recurrent; and AJCC pathologic stage) and demographic groups (age, gender, race). The model identifies important biomarkers previously shown in literatures i.e. GATA4 and HOXD. Conclusions: This study offers valuable insights into feature selection, primary site prediction, and biomarker discovery using DNA methylation data, with potential practical applications for healthcare facilities and in personalized cancer treatment.

Disclaimer

This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org

Abstract Details

Meeting

2023 ASCO Breakthrough

Session Type

Poster Session

Session Title

Poster Session B

Track

Gastrointestinal Cancer,Gynecologic Cancer,Head and Neck Cancer,Quality of Care,Genetics/Genomics/Multiomics,Healthcare Equity and Access to Care,Healthtech Innovations,Models of Care and Care Delivery,Population Health,Viral-Mediated Malignancies

Sub Track

Artificial Intelligence/Deep Learning

Citation

JCO Global Oncology 9, 2023 (suppl 1; abstr 109)

DOI

10.1200/GO.2023.9.Supplement_1.109

Abstract #

109

Poster Bd #

G4

Abstract Disclosures

Similar Abstracts

First Author: Hongcang Gu

First Author: Lan Lan

First Author: Muhammad Shaban