Department of Oncology, Clinica Universidad de Navarra, Pamplona, Spain
Jose Luis Perez-Gracia , Elisabet Guruceaga , Maria Pilar Andueza , Marimar Ocon , Nicolas de VIllalonga Zornoza , Jafait Junior Fodop Sokoudjou , Gorka Alkorta-Aranburu , Carlos Camps , Eloisa Jantus-Lewintre , Maria Navamuel Andueza , Miguel F. Sanmamed , Alfonso Gurpide , Ignacio Melero Bermejo , Mohamed Elgendy , Juan Alberto Perez Valencia , Mikel Hernaez , Ruben Pio Oses , Luis M. Montuenga , Idoia Ochoa , Ana Patiño-García
Background: Tobacco is the main risk factor for developing lung cancer. Yet, while some heavy smokers develop lung cancer at young age others never develop it, even at advanced age. This suggests a remarkable variability in the individual susceptibility to the carcinogenic effects of tobacco. We characterized the germline profile of subjects presenting these extreme phenotypes with Whole Exome Sequencing (WES) and Machine Learning (ML). Methods: We sequenced germline DNA from heavy smokers who either developed lung adenocarcinoma at early age (extreme cases) or did not develop it at advanced age (extreme controls). The discovery and validation cohorts included respectively 50 and 66 extreme cases and 50 and 83 extreme controls, selected from databases including > 6,000 subjects. We selected individual coding variants and variant-rich genes showing a significantly different distribution between extreme cases and controls. We trained ML models (Logistic Regression, Random Forest, Support Vector machine Classifier (SVC)) on the discovery cohort to classify subjects into their respective phenotypes and tested them on the validation cohort. Results: Mean age for extreme cases and controls in both cohorts was 50.2 and 78.4 years. Mean tobacco consumption was 38.1 and 59.1 pack-years. We validated 16 significant individual variants. The most significant variants were in ADAMTS7 (2 variants) in cases and TMEM191B (1) in controls. We validated 33 genes enriched with significant variants. The genes harboring more variants were HLA-A (4 variants) and ADAMTS7 (2) in cases; and PLIN4 (2) in controls (Table). We trained several ML models on the discovery cohort using as input the 16 significant individual variants and the number of variants in the 33 enriched genes. We tested them in the validation cohort obtaining accuracy of 72% and AUC-ROC of 87.4% with the best model (SVC), using 16 variants as input, confirming their association with the phenotypes. Functions of validated genes included oncogenes, tumor-suppressors, DNA repair, maintenance of genomic stability, HLA mediated antigen presentation and regulation of proliferation, migration, apoptosis and inflammatory pathways. Conclusions: Individuals presenting phenotypes of extreme high and low risk of developing tobacco-induced lung adenocarcinoma have different germline profiles. Our strategy may allow to identify high-risk subjects and to develop new therapeutic approaches.
Discovery | Validation | Function | |||
---|---|---|---|---|---|
N.variants | p | N.variants | p | ||
Genes(cases) | |||||
HLA-A | 4 | 3.5 E-07 | 4 | 1.7 E-06 | Antigen presentation |
ADAMTS7 | 2 | 0.02 | 2 | 3.1 E-05 | Metalloproteinase |
SPINK5 | 2 | 0.002 | 1 | 0.044 | Tumor suppressor |
REXO4 | 2 | 0.0002 | 1 | 0.046 | DNA repair |
Genes(controls) | |||||
PLIN4 | 2 | 0.0006 | 2 | 0.002 | Lipid droplet metabolism |
ZNF214 | 2 | 6.7 E-05 | 1 | 0.03 | Transcriptional regulation |
KRT18 | 1 | 0.03 | 2 | 0.003 | Ras activated oncogene |
Disclaimer
This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org
Abstract Disclosures
2023 ASCO Annual Meeting
First Author: Wei Wang
2023 ASCO Annual Meeting
First Author: Semanti Mukherjee
2023 ASCO Annual Meeting
First Author: Hossein Honarvar
2024 ASCO Annual Meeting
First Author: Pan-Chyr Yang