FAST National University of Computer and Emerging Sciences, Islamabad, Pakistan
Amna Basharat , Muddassar Farooq
Background: Cohort selection for specialized clinical trials is a cardinal pillar of the evidence-based medicine; however, it is the most difficult, complex, time consuming, and expensive step. Determining the efficacy of a new treatment (or intervention) requires finding eligible patients meeting the inclusion and exclusion criteria. In specialized scenarios, the complex criteria may even require researchers to do time consuming manual reviews and analyses of electronic health records (EHRs) to shortlist qualified patients. The major contribution of this research is a set of novel semantic models to build a cohort for clinical research by enabling semantic search over electronic health records represented as Knowledge graphs (KGs). Methods: We present the design of a novel cohort retrieval system, satisfying inclusion and exclusion criteria of an oncology clinical research study. Knowledge graphs and semantic models: We construct knowledge graphs (KGs) to interconnect different data sources, stored in a data-lake, and develop semantic models that enable semantic search over the processed data. We designed and constructed an oncology knowledge graph that enables semantics driven cohort selection. In addition, we have built a novel cohort retrieval system, satisfying the inclusion and exclusion criteria of an oncology clinical research study, that utilizes a semantics driven dynamic query engine to generate and execute cohort selection queries on heterogeneous EHR data. Results: We obtained real world oncology data of 21,000 oncology patients and then constructed knowledge graphs for five cancer types -- Colon (C18), Lung (C34), Breast (C50), prostate (C61), and Multiple Myeloma (C90). The cohort building scenarios are designed to represent a mix of criterion types and combinations including both inclusion and exclusion criteria, involving conjunctions, disjunctions and numeric ranges. Our cohort builder is evaluated against ten well known key competency scenarios. A team of experts validated the results of our cohort builder obtained against these competency scenarios by directly querying the graph. Our extensive evaluations demonstrate that the cohort builder searches with 100% accuracy the patients that match the criteria specified in all ten competencies. The average time to build the cohort on all graphs for these competencies is less than 10 seconds compared with that of days when patients are manually searched in EHR systems. Conclusions: Our query engine is not tightly coupled with the architecture of our data-lake; rather its architecture is flexible and can be easily integrated with other enterprise data-lakes or EMR systems. In future, we plan to scale the extent of inclusion and exclusion criteria to provide interoperability with existing clinical trial knowledge. We also aim to empirically evaluate the efficiency of cohort selection queries using a knowledge graph with classical database query approaches.
Disclaimer
This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org
Abstract Disclosures
2023 ASCO Annual Meeting
First Author: Smita Agrawal
2023 ASCO Annual Meeting
First Author: Jihong Zong
2023 ASCO Annual Meeting
First Author: Smita Agrawal
2024 ASCO Annual Meeting
First Author: Binod Dhakal