usage of open source software for real world data analysis in pharmaceutical companies and...
TRANSCRIPT
Usage of open source software for “Real World Data Analysis” in pharmaceutical companies & healthcare institutions
APRIL 20, 2016
Kees van Bochove, CEO & Founder, The Hyve
2
Agenda
1. Introduction: Secondary use of healthcare data for research
2. Overview of OMOP Data Model & Mapping Process
3. Overview of OHDSI data analytics tools
1.
INTRODUCTION
SECONDARY USE OF HEALTHCARE DATA FOR RESEARCH
3
4
The Hyve
u Professionalsupportforopensourceso0wareforbioinforma4csandtransla4onal
researchso0ware,suchastranSMART,cBioPortal,i2b2,Galaxy,ADAMandOHDSI
MissionEnablepre-compe44vecollabora4oninlifescienceR&Dbyleveragingopensourceso,ware
Corevalues ShareReuseSpecialize
OfficeLoca5onsUtrecht,NetherlandsCambridge,MA,UnitedStates
ServicesSo0waredevelopmentDatascienceservicesConsultancyHos4ng/SLAs
Fast-growingStartedin201235peoplebynow
Interdisciplinary team
so0ware engineers, data scien4sts, project managers & staff; exper4se inbioinforma4cs,medicalinforma4cs,so0wareengineering,biosta4s4csetc.
5
What Ewan Birney has to say about it … (GA4GH Leiden 2015)
6
Time To Market: 11 – 18 years
6
Neg
otia
tion
for R
eim
burs
emen
t 27
mem
ber S
tate
s
EMA
Fili
ng
Pre-
Clin
ical
Res
earc
h Cl
osed
& O
pen
Inno
vatio
n
Clinical Trials
EMA
App
rova
l for
Sal
e
HTA
App
rova
l
Phase 1 Phase 2 Phase 3
5,000 10,000
Compounds
250 Compounds
3 – 6 Years 6 – 7 Years
5 Therapies
1 Therapy
2 – 5 Years
Number of Patients/Subjects
20-100 100-500 1000-5000
Regulatory Review
Drug Discovery
Pre Clinical Testing
PhV Monitoring
Total Cost: $2 - $4 Billion USD
Sources: Drug Discovery and Development: Understanding the R&D Process, www.innovation.org;
CBO, Research and Development in the Pharmaceutical Industry, 2006;
Forbes, Matthew Herper, “The Truly Staggering Cost Of Inventing New Drugs”, February 10, 2012
Current EU “Patient Journey” is expensive and slow
New therapies don’t reach
patients until here
Phase 4 : $0.6B Drug Development : $2.6B
Secondary use of health data to enrich research
8 The value of healthcare data for secondary uses in clinical research and development — Gary K. Mallow, Merck, HIMSS 2012
1 2 3 4 5 6 7 8 9
1,000
10,000
100,000
1 million
Years
#Pa
tient
Exp
erie
nce
s /
Rec
ord
s
The “burning platform” for life sciences Pharma-owned highly controlled clinical trials data Clinical practice, patients, payers and providers own the data
Product Launch
R&D Phase IV
Challenge
Today, Pharma doesn’t have ready access to this data, yet insights for safety, CER and other areas are within this clinical domain, which includes medical records, pharmacy, labs, claims, radiology etc.
9
Clinical Trials vs Observational Studies Clinical Trials Observational Studies
Study Design Controlled (hypothesis driven) “Real world” data
Sample Size Small (10.000 is large) Large (millions of people)
Endpoints Efficacy, safety Effectiveness, economic value
Statistics Descriptive statistics (e.g. ANOVA) Epidemiological modeling
Cost Expensive Not so expensive
Perspective Study population Society in general
To become the trusted European hub for health care data intelligence,
enabling new insights into diseases and treatments
EMIF vision
10
Discover
Assess
Reuse
Data available through EMIF consortium
§ Large variety in “types” of data
§ Data is available from more than 53 million subjects from seven
EU countries, including
Primary care data sets
Hospital data
Administrative data Regional record-linkage systems
Registries and cohorts (broad and disease specific)
Biobanks
>25,000 subjects in AD cohorts
>90,000 subjects in metabolic cohorts
>40
mill
ion
MAAS
SDR
EGCUT
PEDIANET
SCTS
IMASIS
HSD
AUH
IPCI
ARS
SIDIAP
PHARMO
THIN
100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000
Ap
pro
xim
ate
tota
l (c
umul
ativ
e)
num
be
r of s
ubje
cts
Available data sources in EMIF
12
EMIF-Platform
EMIF-Available Data Sources; EXAMPLES
1K
2K
52K
400K
475K
2.8M
2.3M
10M
Status Jan 2016
3.6M
1.6M
1M
12M
6M
13
OMOP & OHDSI - Overview
u OMOP: Common Data Model for observational healthcare data:
persons, drugs, procedures, devices, conditions etc.
u OHDSI: Large-scale analytics tools for observational data
An open source community, a.o. developing:
u Tools to support the ETL / mapping process into OMOP (White Rabbit etc.)
u Tools to perform analytics: e.g. Achilles for data profiling, Calypso for
feasibility assessment
www.omop.org
www.ohdsi.org
2.
OMOP MODEL & DATA MAPPING PROCESS
14
15
OMOP Common Data Model v5.0
v OMOP =
Observational
Medical
Outcomes
Partnership
v CDM = Common
Data Model
v SQL Tables
16
OMOP-CDM Clinical data tables
17
Mapping the source data to OMOP CDM
ETL design
ETL implementation
White Rabbit Source data inventarisation
Rabbit in a Hat Map source tables to CDM structure To
ols
use
d
Usagi Map source terms to CDM ontologies (vocabulairies)
syntactic mapping semantic mapping
ETL verification
Achilles Review database profiles Review data quality assesment (Achilles Heel)
18
Output from White Rabbit Tab “Overview”: fields for each table
Tab “Medication”: per table values in fields and frequencies
=Medication name
19
Mapping of tables to CDM
20
v All coded items (gender, race etc) need to be mapped
v Mapping of Medication, Diagnosis, procedures values to
appropriate ontology (RXNorm, ICD-9 etc)
Map terms to target vocabularies
NHANES Gender code NHANES Gender description
Equivalent OMOP SOURCE_CODE
OMOP SOURCE_CODE_DESCRIP
TION
SOURCE_TO_CONCEPT_MAP_ID
. missing U UNKNOWN 8551
1 Male M MALE 8507
2 Female F FEMALE 8532
21
Overview of ontologies used in OMOP
SNOMED-CT
READ
ICD-9-CM RXNorm
ICD-9-Procedures
CPT-4
HCPCS
3.
OHDSI – ANALYTICS TOOLS
22
23
Open Source in Precision Medicine
Study design:
Biobanking:
Scientific compute:
Data visualisation:
Workflows & Storage:
Datawarehousing:
Imaging:
Clinical / Healthcare:
24
Tools on GitHub
25
Active Open Source Community!
26
27
ACHILLES: Database overview
28
ACHILLES: Achilles Heel Report
29
ACHILLES: Conditions Overview
30
HERACLES: Cohort Characterization
Slide from P. Ryan, Janssen
31
CALYPSO: Query Definition
32
CALYPSO: Query Definition
33
CALYPSO: Query Definition
34
CALYPSO: Query Results
Slide from P. Ryan, Janssen
Re-use of healthcare data
35
Prof. Johan van der Lei Erasmus MC University Medical Center
“We need to learn from experience and find ways to unite the large volumes of data in Europe. At
the end of the day, we are in this for better health care.”
Co-coordinator EMIF-Platform
EMIF-Platform