Siriraj Hospital, Mahidol University, Bangkok Noi, Bangkok, Thailand
Chanati Jantrachotechatchawan , Pakanan Tussanapirom , Kasidech Aewsrisakul , Natthawadee Leephatarakit , Kobchai Duangrattanalert
Background: Carcinoma of unknown primary (CUP) accounts for up to 5% of all cancer cases and presents challenges in identifying primary cancer sites and successful treatment. DNA methylation abnormalities on the 5'-cytosine-phosphate-guanine-3' (CpG) motif across the genome are associated with carcinogenesis, enabling their use as cancer biomarkers for early diagnosis and tissue origin prediction. This study proposes a Combined Approach for 1) CpG site selection, 2) predicting cancer origin and identifying biomarkers, and 3) developing an open-source application for user-friendly data analysis. Methods: To emulate the state-like progression of cancer and accommodate missing values, discretized methylation beta values and missing values were tokenized. Feature selection using L1 regularization of lambda 0.0003 resulted in 152 CpG sites out of an initial set of 312,792. Three independent sets were selected for each of the 3 self-attention type: vanilla scaled dot-product, dense synthesizer, and factorized dense synthesizer. Self-attention enhances the biomarker discovery approach toward personalization as it generates a unique attention map for each sample input that highlights essential features for primary site prediction. A permutation test further evaluated contribution by the selected CpG sites, confirming biomarkers identification for each primary site. An open-source application was developed to predict cancer origin using methylation beta values, providing a user-friendly interface displaying predicted primary sites and corresponding percentages. Results: Independent sets of selected CpG performs comparably to the initial set with robustness (most well above 97% in precision, recall, and F1) across different clinical (sample type: normal, primary, metastatic, recurrent; and AJCC pathologic stage) and demographic groups (age, gender, race). The model identifies important biomarkers previously shown in literatures i.e. GATA4 and HOXD. Conclusions: This study offers valuable insights into feature selection, primary site prediction, and biomarker discovery using DNA methylation data, with potential practical applications for healthcare facilities and in personalized cancer treatment.
Disclaimer
This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org
Abstract Disclosures
2024 ASCO Annual Meeting
First Author: Chiharu Sako
2022 ASCO Annual Meeting
First Author: Hongcang Gu
2023 ASCO Annual Meeting
First Author: Lan Lan
2024 ASCO Annual Meeting
First Author: Muhammad Shaban