A smart clinical cohort selection system for clinical oncology research.

Authors

Amna Basharat

FAST National University of Computer and Emerging Sciences, Islamabad, Pakistan

Amna Basharat , Muddassar Farooq

Organizations

FAST National University of Computer and Emerging Sciences, Islamabad, Pakistan, CureMD Inc, New York, NY

Research Funding

Other

CureMD Inc

Background: Cohort selection for specialized clinical trials is a cardinal pillar of the evidence-based medicine; however, it is the most difficult, complex, time consuming, and expensive step. Determining the efficacy of a new treatment (or intervention) requires finding eligible patients meeting the inclusion and exclusion criteria. In specialized scenarios, the complex criteria may even require researchers to do time consuming manual reviews and analyses of electronic health records (EHRs) to shortlist qualified patients. The major contribution of this research is a set of novel semantic models to build a cohort for clinical research by enabling semantic search over electronic health records represented as Knowledge graphs (KGs). Methods: We present the design of a novel cohort retrieval system, satisfying inclusion and exclusion criteria of an oncology clinical research study. Knowledge graphs and semantic models: We construct knowledge graphs (KGs) to interconnect different data sources, stored in a data-lake, and develop semantic models that enable semantic search over the processed data. We designed and constructed an oncology knowledge graph that enables semantics driven cohort selection. In addition, we have built a novel cohort retrieval system, satisfying the inclusion and exclusion criteria of an oncology clinical research study, that utilizes a semantics driven dynamic query engine to generate and execute cohort selection queries on heterogeneous EHR data. Results: We obtained real world oncology data of 21,000 oncology patients and then constructed knowledge graphs for five cancer types -- Colon (C18), Lung (C34), Breast (C50), prostate (C61), and Multiple Myeloma (C90). The cohort building scenarios are designed to represent a mix of criterion types and combinations including both inclusion and exclusion criteria, involving conjunctions, disjunctions and numeric ranges. Our cohort builder is evaluated against ten well known key competency scenarios. A team of experts validated the results of our cohort builder obtained against these competency scenarios by directly querying the graph. Our extensive evaluations demonstrate that the cohort builder searches with 100% accuracy the patients that match the criteria specified in all ten competencies. The average time to build the cohort on all graphs for these competencies is less than 10 seconds compared with that of days when patients are manually searched in EHR systems. Conclusions: Our query engine is not tightly coupled with the architecture of our data-lake; rather its architecture is flexible and can be easily integrated with other enterprise data-lakes or EMR systems. In future, we plan to scale the extent of inclusion and exclusion criteria to provide interoperability with existing clinical trial knowledge. We also aim to empirically evaluate the efficiency of cohort selection queries using a knowledge graph with classical database query approaches.

Disclaimer

Abstract Details

Meeting

2023 ASCO Annual Meeting

Session Type

Publication Only

Session Title

Publication Only: Care Delivery and Regulatory Policy

Track

Care Delivery and Quality Care

Sub Track

Clinical Research Design

Citation

J Clin Oncol 41, 2023 (suppl 16; abstr e13613)

DOI

10.1200/JCO.2023.41.16_suppl.e13613

Abstract #

e13613

Abstract Disclosures

FEATURED

A smart clinical cohort selection system for clinical oncology research.

Authors

Amna Basharat

Organizations

Research Funding

Abstract Details

Meeting

Session Type

Session Title

Track

Sub Track

Citation

DOI

Abstract #

Similar Abstracts

Abstract

Enhancement in line of therapy (LoT) derivation from real-world data (RWD) from electronic health records (EHR) via integration of medical claims data.

Abstract

Using real-world evidence (RWE) in regulatory decision making: A study of 6 oncology approvals with RWE included in the product label.

Abstract

Development of natural language processing (NLP) models for extracting key features from unstructured notes to create real-world data (RWD) assets for clinical research at scale.

Abstract

Analysis of real-world (RW) pomalidomide (pom) dosing patterns in patients (pts) with multiple myeloma (MM) from the Flatiron database.