chief data & analytics week, europe - duccio medini phd, gsk vaccines

22
Chief Data and Analytics Officer Week - Amsterdam, September 2017 Data Science for Vaccines DUCCIO MEDINI, PhD Head Data Science & Clinical Systems GSK Vaccines

Upload: corinium-coriniumglobal

Post on 22-Jan-2018

550 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Chief Data and Analytics Officer Week - Amsterdam, September 2017

Data Science for VaccinesDUCCIO MEDINI, PhDHead Data Science & Clinical SystemsGSK Vaccines

What we do…

Pharmaceuticals

We develop and make medicines to treat a range of conditions from respiratory diseases to HIV/AIDS

Vaccines

We research and make vaccines for children and adults that protect against infectious diseases

Consumer Healthcare

We make a range of consumer healthcare products in four categories: Total Wellness, Skin Health, Oral Care and Nutrition.

4bnpacks of medicines

800mdoses of vaccines

18bnpacks of consumer health-care products

…to help people do more, feel better, live longer

The most impactful intervention for human health after clean water

Plotkin SL, Plotkin SA. A short history of vaccination. In: Plotkin SA, Orenstein WA, eds. Vaccines, 4th edn. Philadelphia: WB Saunders; 2004: 1-15.Bulletin of the World Health Organization: http://www.who.int/bulletin/volumes/86/2/07-040089/en/

What are Vaccines?

A perturbation of two interacting complex systems

Medini, D. et al. Nature Rev Microbiology, 2008

What are Vaccines?

Vaccine Research & Development: a long journey from discovery to saving people lives

Med

ical

Nee

d

Discovery Clinical R&D Vaccination

Technical R&D Manufacturing

2+ yrs 10-15 yrs 2+ yrs

350 people, operations highly outsourced using Full Service Providers, 10% in predictive analytics: internal resources and academic collaborations

A Data team in the business

Information Management

SystemsData

StandardsData

ManagementExploratory

Data Analytics

Data & Systems

Innovation

Business Excellence

Example 1:Data collection and management in Clinical studies

Med

ical

Nee

d

Discovery Clinical R&D Vaccination

Technical R&D Manufacturing

2+ yrs 10-15 yrs 2+ yrs

Taking the challenge: redefining the architecture of clinical data

One Interface per User – Connected in real time with One Centralized Data Directory

From System-centric to User-centric

Study DB

Centralized Data Directory- Each Study DB is a (partial) instance of a universal semantic data model, orchestrated by a centralized data directory

- Each Interface has live (cached) access to the whole data directory

Study DB

Study DB

……

Receive vaccines suppliesRecord samples shipmentsRecord subject dataAnswer queriesFile documents

Investigator

Design screensView subject data

Raise queriesFile documents

Data Manager

Complete e-diary cardsWears devices

Uses social media

Participant

View subject dataRaise queriesRecord SDV

File documents

Monitor

AuthoritiesSafetySupplyLabStatistician

• As human beings, we naturally describe semantic relationships between elements-Relational databases are not good in describing the relationships between entities-NoSQL databases only store sets of disconnected documents-The data model underlying a graph database is exactly designed to represent relationships

• Graphs communicate also the questions we want to ask of our domain

Semantics adds clarity to the structure of dataCohesive and robust picture of the data is fundamental

Example 2:Systems Vaccinology at the translational interface

Med

ical

Nee

d

Discovery Clinical R&D Vaccination

Technical R&D Manufacturing

2+ yrs 10-15 yrs 2+ yrs

Feed-forward interaction: clinical trials and discovery science

Adapted from Pulendran B, et al. Immunity 2010

A peculiar kind of big data: huge p – tiny nSubjects in study 100.000 → 100; data-points per subject 1000 → 10.000.000

Clinical research

Discovery based

research

Exploratory endpoints

(Immunogenicity, safety)

Hypothesis generation

Test the hypothesis

Study:Animal, in vitro models

Study:Systems Vaccinology

Exploratory clinical trial

Clinical Research: dissecting the mode of actionPredictive Biomarkers – Multi-layer integrative analysis – Causal inference

1-8 days 1-6 months

IgG Antibodies Subclasses

B-cell Ab repertoire

T B

Primary Endpoint

0

25

50

12.5

100 LHD153

NN

O

NH2

HO

O

OO

POH

O OHAlum

Adaptive Immunity

Myeloid & Lymphoid cells

circulating Cytokines gene regulation

mRNA

Innate immunity

2

4

8

16 32

64 128 256 512

hSBA: corr. protection

1-4 weeks

Compound

Example 3:Impact Assessment through Stochastic Simulations

Med

ical

Nee

d

Discovery Clinical R&D Vaccination

Technical R&D Manufacturing

2+ yrs 10-15 yrs 2+ yrs

24 hours from symptoms to death. Treatment often fails: prevention is key.Serogroup B critical. First Vaccine approved by EMA in 2013 after 17 years of R&D

Invasive meningococcal disease is devastating

*Annual estimated burden. †Does not include non-groupable isolates. 1. Cohn AC, et al. Clin Infect Dis. 2010;50:184-191; 2. Ministério da Saúde (SVS/MS). Departmento de Vigilância Epidemiológica; 3. Health Protection Agency. http://www.hpa.org.uk; 4. Parent du Châtelet I, et al. Bulletin épidémiologique hebdomadaire. 2010;31-32:340-343; 5. PHAC. Canada Comm Dis Rep. 2009;36:1-40;6. Australian Meningococcal Surveillance Programme. CDI. 2009;33:259-267. 7. World Health Organization. Meningococcal meningitis factsheet. 2010.

Serogroup B Serogroups A, C, W, Y, and Other

Inci

denc

e pe

r 100

,000

0

5

10

15

20

81%

5

13%

United States*1

28%

4%

13

19%

France4

72%

United Kingdom3

57%

43%

11

Australia6

18

89%

11%

Canada5

8 96%

Brazil†2

38

87%

50% fatality if untreated7

24 hours from symptoms to death. Treatment often fails: prevention is key.Serogroup B critical. First Vaccine approved by EMA in 2013 after 17 years of R&D

Invasive meningococcal disease is devastating

*Annual estimated burden. †Does not include non-groupable isolates. 1. Cohn AC, et al. Clin Infect Dis. 2010;50:184-191; 2. Ministério da Saúde (SVS/MS). Departmento de Vigilância Epidemiológica; 3. Health Protection Agency. http://www.hpa.org.uk; 4. Parent du Châtelet I, et al. Bulletin épidémiologique hebdomadaire. 2010;31-32:340-343; 5. PHAC. Canada Comm Dis Rep. 2009;36:1-40;6. Australian Meningococcal Surveillance Programme. CDI. 2009;33:259-267. 7. World Health Organization. Meningococcal meningitis factsheet. 2010.

Serogroup B Serogroups A, C, W, Y, and Other

Inci

denc

e pe

r 100

,000

0

5

10

15

20

81%

5

13%

United States*1

28%

4%

13

19%

France4

72%

United Kingdom3

57%

43%

11

Australia6

18

89%

11%

Canada5

8 96%

Brazil†2

38

87%

50% fatality if untreated7

Integrate biological knowledge in a stochastic model of disease spread and vaccinationComputational Epidemiology

Not Vaccinated

S C

Vaccinatedimmune

S C

Dis. CasesNot Vaccinated

Vaccinatednot immune

S C

Dis. CasesVaccinated

** Trotter CL, et al. Am J Epidemiol. 2005; 6: Christensen H, et al. Vaccine, 2013

Articles

856 www.thelancet.com/infection Vol 10 December 2010

In a sensitivity analysis, we excluded non-European countries (as defi ned by the UN) because some of these countries have seen a recent increase in non-serogroup B or C carriage and disease (eg, an increase in serogroup Y in America).24

Role of the funding sourceThere was no funding source for this study. All authors had full access to all the data in the study and had fi nal responsibility for the decision to submit for publication.

ResultsWe identifi ed 110 relevant articles (fi gure 1); 37 were in languages other than English. Nine publications that were initially excluded because of their titles17,25–32 were found to contain age-specifi c carriage data when reviewed for another purpose, and were subsequently included. Three articles were available as abstracts only.33–35 The selected publications2–4,7,8,15,17–19,25–125 reported carriage studies from 28 countries (webappendix pp 1–33). 18 studies were longitudinal (three in military recruits), fi ve combined longitudinal, 69 cross sectional, ten serial cross sectional, seven a combination of cross sectional and longitudinal, and for one paper the study type was unclear.

Longitudinal studies that reported carriage by age in non-military populations have been undertaken in 15 countries. The number of repeated measures of carriage prevalence within a given population ranged from two to 13 and the frequency ranged from monthly to annual swabs (webappendix pp 10–33). Repeated measures on the same population showed small diff erences over time in some studies but large diff erences in others. For example, Rønne and colleagues62 reported a carriage prevalence of 20·4% in people aged 16–20 years old in a school in Denmark in November 1983 and 19·8% in the following March, whereas Fraser and colleagues36 reported a carriage prevalence of 25·7% in a cohort of boys aged 15–16 years old attending a naval school in April 1972 and 75·8% 9 months later. However, most of the other longitudinal studies reported a diff erence of 10% or less over the course of the study.

89 studies reported age-specifi c carriage data from cross-sectional studies, serial cross-sectional studies, or fi rst swab results from a longitudinal study in military recruits, 16 of which reported the use of random sampling. Few papers reported carriage in older age groups; only 48 estimates of carriage prevalence were reported for individuals 25 years old and over compared with 341 in those under 25 years old (webappendix pp 10–33). The number of age groups investigated in each study varied greatly from a single age or age group (reported in 28 papers) to 19 diff erent ages within the same population: Bogaert and colleagues116 reported carriage by single year of age in people aged 1–19 years old and Caugant and colleagues74 reported carriage in 5-year age bands from 0 years to 94 years, although there were small numbers of people in the upper age groups. Carriage estimates vary greatly even within age bands. The greatest variation was seen for 20–29-year olds, with reported prevalence ranging from 2·6% to 60·7%.

Data from 82 papers, comprising 143 114 individual swabs, were available for the quantitative data synthesis.

Value

Fixed eff ect parameter estimates: when plated

Plated immediately* OR 1·00

Other OR 0·46 (95% CI 0·31–0·68; p=0·0001)

Random eff ect parameter estimates

Country Variance 0·10 (95% CI 0·003–3·05)

Study Variance 0·77 (95% CI 0·48–1·23)

OR=odds ratio. *Baseline category.

Table 1: Fixed and random eff ect estimates from the main analysis

Figure 2: Estimates of meningococcal carriage by age when swabs were plated immediately after collectionCircles are the datapoints included, with the larger circles representing a larger sample size. The largest circles represent the results of the serial cross-sectional studies in teenagers aged 15–19 years old in the UK, before and after the introduction of the meningococcal serogroup C vaccine.105,110,113 (A) 95% bias-corrected CIs. (B) With individual country predictions.

Prev

alen

ce (%

)

60

40

20

0

A

Prev

alen

ce (%

)

60

40

20

0

B

0 80604020 100

Age (years)

0 80604020 100

Observed carriage prevalencesFitted dataRange of 95% CI

Observed carriage prevalencesFitted dataCountry predictions

See Online for webappendix

Mathematical model** calibrated to replicate realistically the relevant features.

* UK Gov. web site; Mossong J, et al. PLoS Med. 2008; Christensen H, et al. Lancet; Infect Dis. 2010; Caugant, D. et al. Vaccine 2009; PHE web site

Reach domain field data and knowledge*

Fancy is not enough: need long lasting, rock-solid evidence to improve policy-making using data insights

Validate the model retrospectively on a real case study

* Trotter CL, et al. Lancet. 2004; Campbell H, et al. Clin Vaccine Immunol. 2010; Maiden MC, et al. Lancet. 2002; Maiden MC, et al. J Infect Dis. 2008

* Real MenC cases reported by Public Health England (PHE)† Synthetic MenC cases produced running the model in a predictive way,

using MCML’s best estimates of VE as inputs

2 yrs 4 yrs1 10 yrs2 2 surveys3 3 surveys4

PHE* casesMCML† cases

Stochastic simulation of infectious disease transmission and vaccination increases power of impact assessment allowing quicker decisions for policy-makers*

Monte Carlo Maximum Likelihood: up to 10-fold power increase

* L. Argante, M. Tizzoni, D. Medini, BMC Medicine 2016

Direct Effectiveness: 95% CI around 70% VE

MCML

Observational method

Data insights in a regulated environment:• Data quality by design with high validation standards• Get the regulator onboard, or don’t even try

In R&D “small-big” data powerful but dangerous:• Focus on understanding, or you will die overfitting • Domain-field vertical knowledge is critical

Key ingredients for a holistic data excellence infrastructure• Systems focused on the user• Data Governance & “advanced stewardship”• Passion for knowledge: the data scientist… is a scientist!

Hints