Memorial Sloan Kettering Cancer Center, New York, NY
Niamh M. Keegan , Samantha E Vasselman , Ethan Barnett , Barbara Nweji , Emily Carbone , Alexander Blum , Michael J. Morris , Dana E. Rathkopf , Susan F. Slovin , Daniel Costin Danila , Karen A. Autio , Howard I. Scher , Philip W. Kantoff , Wassim Abida , Konrad H. Stopsack
Background: Routine clinical data from the electronic medical record are indispensable for retrospective and prospective observational studies and clinical trials. Their reproducibility is often not assessed. We sought to develop a prostate cancer-specific database with a defined source hierarchy for clinical annotations and to evaluate data reproducibility. Methods: At a comprehensive cancer center, we designed and implemented a clinical database for men with prostate cancer and clinical-grade paired tumor–normal sequencing for whom we performed team-based retrospective clinical data annotation from the electronic medical record, using a prostate cancer-specific data dictionary. We developed an open-source R package for data processing. We then evaluated completeness of data elements, reproducibility of team-based annotation using blinded repeat annotation by a medical oncologist as the reference, and the impact of measurement error on bias in survival analyses. Results: Data elements on demographics, diagnosis and staging, disease state at the time of procuring a genomically characterized sample, and clinical outcomes were piloted and then abstracted for 2,261 patients and their 2,631 genomically profiled samples. Completeness of data elements was generally high, between 55% to 99% for elements of clinical TNM staging, self-reported race, biopsy Gleason score, and presence of variant histologies, both for the team-based annotation and the repeat annotation. Comparing team-based annotation to the repeat annotation (100 patients/samples), reproducibility of annotations was high to very high. For 7 binary data elements, both sensitivity and specificity of the team-based annotation reached or exceeded 90%. The T stage, metastasis date, and presence and date of castration resistance had lower reproducibility. Impact of measurement error on estimates for strong prognostic factors was modest. Conclusions: With a prostate cancer-specific data dictionary and quality control measures, manual team-based annotations can be scalable and reproducible. The data dictionary and the R package for reproducible data processing tools provided (https://stopsack.github.io/prostateredcap) are freely available to help increase data quality in clinical prostate cancer research.
Disclaimer
This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org
Abstract Disclosures
2023 ASCO Genitourinary Cancers Symposium
First Author: Teja Ganta
2023 ASCO Annual Meeting
First Author: Smita Agrawal
2023 ASCO Annual Meeting
First Author: Daniel Eidelberg Spratt
2023 ASCO Annual Meeting
First Author: Nasreen Khan