chief data & analytics week, europe - duccio medini phd, gsk vaccines
TRANSCRIPT
Chief Data and Analytics Officer Week - Amsterdam, September 2017
Data Science for VaccinesDUCCIO MEDINI, PhDHead Data Science & Clinical SystemsGSK Vaccines
What we do…
Pharmaceuticals
We develop and make medicines to treat a range of conditions from respiratory diseases to HIV/AIDS
Vaccines
We research and make vaccines for children and adults that protect against infectious diseases
Consumer Healthcare
We make a range of consumer healthcare products in four categories: Total Wellness, Skin Health, Oral Care and Nutrition.
4bnpacks of medicines
800mdoses of vaccines
18bnpacks of consumer health-care products
…to help people do more, feel better, live longer
The most impactful intervention for human health after clean water
Plotkin SL, Plotkin SA. A short history of vaccination. In: Plotkin SA, Orenstein WA, eds. Vaccines, 4th edn. Philadelphia: WB Saunders; 2004: 1-15.Bulletin of the World Health Organization: http://www.who.int/bulletin/volumes/86/2/07-040089/en/
What are Vaccines?
A perturbation of two interacting complex systems
Medini, D. et al. Nature Rev Microbiology, 2008
What are Vaccines?
Vaccine Research & Development: a long journey from discovery to saving people lives
Med
ical
Nee
d
Discovery Clinical R&D Vaccination
Technical R&D Manufacturing
2+ yrs 10-15 yrs 2+ yrs
350 people, operations highly outsourced using Full Service Providers, 10% in predictive analytics: internal resources and academic collaborations
A Data team in the business
Information Management
SystemsData
StandardsData
ManagementExploratory
Data Analytics
Data & Systems
Innovation
Business Excellence
Example 1:Data collection and management in Clinical studies
Med
ical
Nee
d
Discovery Clinical R&D Vaccination
Technical R&D Manufacturing
2+ yrs 10-15 yrs 2+ yrs
Taking the challenge: redefining the architecture of clinical data
One Interface per User – Connected in real time with One Centralized Data Directory
From System-centric to User-centric
Study DB
Centralized Data Directory- Each Study DB is a (partial) instance of a universal semantic data model, orchestrated by a centralized data directory
- Each Interface has live (cached) access to the whole data directory
Study DB
Study DB
……
Receive vaccines suppliesRecord samples shipmentsRecord subject dataAnswer queriesFile documents
Investigator
Design screensView subject data
Raise queriesFile documents
Data Manager
Complete e-diary cardsWears devices
Uses social media
Participant
View subject dataRaise queriesRecord SDV
File documents
Monitor
AuthoritiesSafetySupplyLabStatistician
• As human beings, we naturally describe semantic relationships between elements-Relational databases are not good in describing the relationships between entities-NoSQL databases only store sets of disconnected documents-The data model underlying a graph database is exactly designed to represent relationships
• Graphs communicate also the questions we want to ask of our domain
Semantics adds clarity to the structure of dataCohesive and robust picture of the data is fundamental
Example 2:Systems Vaccinology at the translational interface
Med
ical
Nee
d
Discovery Clinical R&D Vaccination
Technical R&D Manufacturing
2+ yrs 10-15 yrs 2+ yrs
Feed-forward interaction: clinical trials and discovery science
Adapted from Pulendran B, et al. Immunity 2010
A peculiar kind of big data: huge p – tiny nSubjects in study 100.000 → 100; data-points per subject 1000 → 10.000.000
Clinical research
Discovery based
research
Exploratory endpoints
(Immunogenicity, safety)
Hypothesis generation
Test the hypothesis
Study:Animal, in vitro models
Study:Systems Vaccinology
Exploratory clinical trial
Clinical Research: dissecting the mode of actionPredictive Biomarkers – Multi-layer integrative analysis – Causal inference
1-8 days 1-6 months
IgG Antibodies Subclasses
B-cell Ab repertoire
T B
Primary Endpoint
0
25
50
12.5
100 LHD153
NN
O
NH2
HO
O
OO
POH
O OHAlum
Adaptive Immunity
Myeloid & Lymphoid cells
circulating Cytokines gene regulation
mRNA
Innate immunity
2
4
8
16 32
64 128 256 512
hSBA: corr. protection
1-4 weeks
Compound
Example 3:Impact Assessment through Stochastic Simulations
Med
ical
Nee
d
Discovery Clinical R&D Vaccination
Technical R&D Manufacturing
2+ yrs 10-15 yrs 2+ yrs
24 hours from symptoms to death. Treatment often fails: prevention is key.Serogroup B critical. First Vaccine approved by EMA in 2013 after 17 years of R&D
Invasive meningococcal disease is devastating
*Annual estimated burden. †Does not include non-groupable isolates. 1. Cohn AC, et al. Clin Infect Dis. 2010;50:184-191; 2. Ministério da Saúde (SVS/MS). Departmento de Vigilância Epidemiológica; 3. Health Protection Agency. http://www.hpa.org.uk; 4. Parent du Châtelet I, et al. Bulletin épidémiologique hebdomadaire. 2010;31-32:340-343; 5. PHAC. Canada Comm Dis Rep. 2009;36:1-40;6. Australian Meningococcal Surveillance Programme. CDI. 2009;33:259-267. 7. World Health Organization. Meningococcal meningitis factsheet. 2010.
Serogroup B Serogroups A, C, W, Y, and Other
Inci
denc
e pe
r 100
,000
0
5
10
15
20
81%
5
13%
United States*1
28%
4%
13
19%
France4
72%
United Kingdom3
57%
43%
11
Australia6
18
89%
11%
Canada5
8 96%
Brazil†2
38
87%
50% fatality if untreated7
24 hours from symptoms to death. Treatment often fails: prevention is key.Serogroup B critical. First Vaccine approved by EMA in 2013 after 17 years of R&D
Invasive meningococcal disease is devastating
*Annual estimated burden. †Does not include non-groupable isolates. 1. Cohn AC, et al. Clin Infect Dis. 2010;50:184-191; 2. Ministério da Saúde (SVS/MS). Departmento de Vigilância Epidemiológica; 3. Health Protection Agency. http://www.hpa.org.uk; 4. Parent du Châtelet I, et al. Bulletin épidémiologique hebdomadaire. 2010;31-32:340-343; 5. PHAC. Canada Comm Dis Rep. 2009;36:1-40;6. Australian Meningococcal Surveillance Programme. CDI. 2009;33:259-267. 7. World Health Organization. Meningococcal meningitis factsheet. 2010.
Serogroup B Serogroups A, C, W, Y, and Other
Inci
denc
e pe
r 100
,000
0
5
10
15
20
81%
5
13%
United States*1
28%
4%
13
19%
France4
72%
United Kingdom3
57%
43%
11
Australia6
18
89%
11%
Canada5
8 96%
Brazil†2
38
87%
50% fatality if untreated7
Integrate biological knowledge in a stochastic model of disease spread and vaccinationComputational Epidemiology
Not Vaccinated
S C
Vaccinatedimmune
S C
Dis. CasesNot Vaccinated
Vaccinatednot immune
S C
Dis. CasesVaccinated
** Trotter CL, et al. Am J Epidemiol. 2005; 6: Christensen H, et al. Vaccine, 2013
Articles
856 www.thelancet.com/infection Vol 10 December 2010
In a sensitivity analysis, we excluded non-European countries (as defi ned by the UN) because some of these countries have seen a recent increase in non-serogroup B or C carriage and disease (eg, an increase in serogroup Y in America).24
Role of the funding sourceThere was no funding source for this study. All authors had full access to all the data in the study and had fi nal responsibility for the decision to submit for publication.
ResultsWe identifi ed 110 relevant articles (fi gure 1); 37 were in languages other than English. Nine publications that were initially excluded because of their titles17,25–32 were found to contain age-specifi c carriage data when reviewed for another purpose, and were subsequently included. Three articles were available as abstracts only.33–35 The selected publications2–4,7,8,15,17–19,25–125 reported carriage studies from 28 countries (webappendix pp 1–33). 18 studies were longitudinal (three in military recruits), fi ve combined longitudinal, 69 cross sectional, ten serial cross sectional, seven a combination of cross sectional and longitudinal, and for one paper the study type was unclear.
Longitudinal studies that reported carriage by age in non-military populations have been undertaken in 15 countries. The number of repeated measures of carriage prevalence within a given population ranged from two to 13 and the frequency ranged from monthly to annual swabs (webappendix pp 10–33). Repeated measures on the same population showed small diff erences over time in some studies but large diff erences in others. For example, Rønne and colleagues62 reported a carriage prevalence of 20·4% in people aged 16–20 years old in a school in Denmark in November 1983 and 19·8% in the following March, whereas Fraser and colleagues36 reported a carriage prevalence of 25·7% in a cohort of boys aged 15–16 years old attending a naval school in April 1972 and 75·8% 9 months later. However, most of the other longitudinal studies reported a diff erence of 10% or less over the course of the study.
89 studies reported age-specifi c carriage data from cross-sectional studies, serial cross-sectional studies, or fi rst swab results from a longitudinal study in military recruits, 16 of which reported the use of random sampling. Few papers reported carriage in older age groups; only 48 estimates of carriage prevalence were reported for individuals 25 years old and over compared with 341 in those under 25 years old (webappendix pp 10–33). The number of age groups investigated in each study varied greatly from a single age or age group (reported in 28 papers) to 19 diff erent ages within the same population: Bogaert and colleagues116 reported carriage by single year of age in people aged 1–19 years old and Caugant and colleagues74 reported carriage in 5-year age bands from 0 years to 94 years, although there were small numbers of people in the upper age groups. Carriage estimates vary greatly even within age bands. The greatest variation was seen for 20–29-year olds, with reported prevalence ranging from 2·6% to 60·7%.
Data from 82 papers, comprising 143 114 individual swabs, were available for the quantitative data synthesis.
Value
Fixed eff ect parameter estimates: when plated
Plated immediately* OR 1·00
Other OR 0·46 (95% CI 0·31–0·68; p=0·0001)
Random eff ect parameter estimates
Country Variance 0·10 (95% CI 0·003–3·05)
Study Variance 0·77 (95% CI 0·48–1·23)
OR=odds ratio. *Baseline category.
Table 1: Fixed and random eff ect estimates from the main analysis
Figure 2: Estimates of meningococcal carriage by age when swabs were plated immediately after collectionCircles are the datapoints included, with the larger circles representing a larger sample size. The largest circles represent the results of the serial cross-sectional studies in teenagers aged 15–19 years old in the UK, before and after the introduction of the meningococcal serogroup C vaccine.105,110,113 (A) 95% bias-corrected CIs. (B) With individual country predictions.
Prev
alen
ce (%
)
60
40
20
0
A
Prev
alen
ce (%
)
60
40
20
0
B
0 80604020 100
Age (years)
0 80604020 100
Observed carriage prevalencesFitted dataRange of 95% CI
Observed carriage prevalencesFitted dataCountry predictions
See Online for webappendix
Mathematical model** calibrated to replicate realistically the relevant features.
* UK Gov. web site; Mossong J, et al. PLoS Med. 2008; Christensen H, et al. Lancet; Infect Dis. 2010; Caugant, D. et al. Vaccine 2009; PHE web site
Reach domain field data and knowledge*
Fancy is not enough: need long lasting, rock-solid evidence to improve policy-making using data insights
Validate the model retrospectively on a real case study
* Trotter CL, et al. Lancet. 2004; Campbell H, et al. Clin Vaccine Immunol. 2010; Maiden MC, et al. Lancet. 2002; Maiden MC, et al. J Infect Dis. 2008
* Real MenC cases reported by Public Health England (PHE)† Synthetic MenC cases produced running the model in a predictive way,
using MCML’s best estimates of VE as inputs
2 yrs 4 yrs1 10 yrs2 2 surveys3 3 surveys4
PHE* casesMCML† cases
Stochastic simulation of infectious disease transmission and vaccination increases power of impact assessment allowing quicker decisions for policy-makers*
Monte Carlo Maximum Likelihood: up to 10-fold power increase
* L. Argante, M. Tizzoni, D. Medini, BMC Medicine 2016
Direct Effectiveness: 95% CI around 70% VE
MCML
Observational method
Data insights in a regulated environment:• Data quality by design with high validation standards• Get the regulator onboard, or don’t even try
In R&D “small-big” data powerful but dangerous:• Focus on understanding, or you will die overfitting • Domain-field vertical knowledge is critical
Key ingredients for a holistic data excellence infrastructure• Systems focused on the user• Data Governance & “advanced stewardship”• Passion for knowledge: the data scientist… is a scientist!
Hints