Clinical annotations for prostate cancer research: Defining data elements, creating a reproducible analytical pipeline, and assessing data quality.

Authors

Niamh M. Keegan

Memorial Sloan Kettering Cancer Center, New York, NY

Niamh M. Keegan , Samantha E Vasselman , Ethan Barnett , Barbara Nweji , Emily Carbone , Alexander Blum , Michael J. Morris , Dana E. Rathkopf , Susan F. Slovin , Daniel Costin Danila , Karen A. Autio , Howard I. Scher , Philip W. Kantoff , Wassim Abida , Konrad H. Stopsack

Organizations

Memorial Sloan Kettering Cancer Center, New York, NY, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, Division of Solid Tumor Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, Memorial Sloan Kettering Cancer Center and Weill Cornell Medicine, New York, NY, Memorial Sloan Kettering Cancer Center, NY, NY, Convergent Therapeutics, Inc., Cambridge, MA

Research Funding

U.S. National Institutes of Health

Background: Routine clinical data from the electronic medical record are indispensable for retrospective and prospective observational studies and clinical trials. Their reproducibility is often not assessed. We sought to develop a prostate cancer-specific database with a defined source hierarchy for clinical annotations and to evaluate data reproducibility. Methods: At a comprehensive cancer center, we designed and implemented a clinical database for men with prostate cancer and clinical-grade paired tumor–normal sequencing for whom we performed team-based retrospective clinical data annotation from the electronic medical record, using a prostate cancer-specific data dictionary. We developed an open-source R package for data processing. We then evaluated completeness of data elements, reproducibility of team-based annotation using blinded repeat annotation by a medical oncologist as the reference, and the impact of measurement error on bias in survival analyses. Results: Data elements on demographics, diagnosis and staging, disease state at the time of procuring a genomically characterized sample, and clinical outcomes were piloted and then abstracted for 2,261 patients and their 2,631 genomically profiled samples. Completeness of data elements was generally high, between 55% to 99% for elements of clinical TNM staging, self-reported race, biopsy Gleason score, and presence of variant histologies, both for the team-based annotation and the repeat annotation. Comparing team-based annotation to the repeat annotation (100 patients/samples), reproducibility of annotations was high to very high. For 7 binary data elements, both sensitivity and specificity of the team-based annotation reached or exceeded 90%. The T stage, metastasis date, and presence and date of castration resistance had lower reproducibility. Impact of measurement error on estimates for strong prognostic factors was modest. Conclusions: With a prostate cancer-specific data dictionary and quality control measures, manual team-based annotations can be scalable and reproducible. The data dictionary and the R package for reproducible data processing tools provided (https://stopsack.github.io/prostateredcap) are freely available to help increase data quality in clinical prostate cancer research.

Disclaimer

Abstract Details

Meeting

2022 ASCO Genitourinary Cancers Symposium

Session Type

Poster Session

Session Title

Poster Session A: Prostate Cancer

Track

Prostate Cancer - Advanced,Prostate Cancer - Localized

Sub Track

Quality of Care/Quality Improvement and Real-World Evidence

Citation

J Clin Oncol 40, 2022 (suppl 6; abstr 64)

DOI

10.1200/JCO.2022.40.6_suppl.064

Abstract #

Poster Bd #

Online Only

Abstract Disclosures

FEATURED

Clinical annotations for prostate cancer research: Defining data elements, creating a reproducible analytical pipeline, and assessing data quality.

Authors

Niamh M. Keegan

Organizations

Research Funding

Abstract Details

Meeting

Session Type

Session Title

Track

Sub Track

Citation

DOI

Abstract #

Poster Bd #

Similar Abstracts

Abstract

Oncologist-driven development of an electronic health record (EHR) clinical data visualization tool for prostate cancer.

Abstract

Enhancement in line of therapy (LoT) derivation from real-world data (RWD) from electronic health records (EHR) via integration of medical claims data.

Abstract

Biochemical recurrence (BCR) among patients (pts) with prostate cancer (PC) after radiation therapy (RT).

Abstract

Biochemical recurrence (BCR) and outcomes in patients (pts) with prostate cancer (PC) following radical prostatectomy (RP).