artificial intelligence and the future of early drug discovery · imed biotech unit ¦ discovery...

Artificial Intelligence and the Future of Early Drug Discovery

Adam Corrigan

IMED Biotech Unit ¦ Discovery Sciences ¦ Discovery Biology 2

Reasons why drugs fail

Lessons Learned from AZ Drug Pipeline

Implementation of the 5R framework is improving success rates

Candidate Drug nomination to Phase III completion has improved from 4% (2005-2010) to 19%

(2012-2016)

Drugs are still failing as a result of lack of efficacy & safety

Focus on deeper scientific understanding of disease biology

Requires models that emulate human physiology to predict drug safety and efficacy

Morgan P et. al. (2018) Nature Reviews Drug Discovery

Th

e 5

R F

ram

ew

ork

Primary Reasons for Project Failure

Introduction to Quantitative Biology (QuBi)

3

Bio-

informatics

Image

Analytics

Statistics

Quantitative Biology

Claus Bendtsen

Phenotypes

Screening

Microenvironments Morphology

Single Cell

Artificial

Intelligence

Outline

Efficient generation of training data

• Phenotype classification of time lapse imaging

• Reducing the amount of experiments required

Artificial Intelligence in Cellular Screening

• Semantic segmentation for label-free imaging

• Textural feature classification of complex cell models

• Unsupervised hit/phenotype discovery

Outlook

• Challenges

• Horizon scanning

Efficient Generation of Training Data

5

Time Lapse Analysis of Mitotic Phenotypes

6

• Understanding of when in the cell cycle defects

appear sheds light on the mechanism of action

• Time lapse imaging allows cell fate to be linked to

the point at which the defect originates

Building training data more efficiently

7 IMED Biotech Unit I Discovery Sciences I Quantitative Biology

Automated cell tracking, but

identification of subtle

phenotypes requires time-

consuming manual annotation

Choose the most informative cells to

expand training set through manual

classification

Normal

Abnormal

Mitotic

Based on initial training set calculate the

confidence of classification

Cells with high

classification probabilities

would add very little

information to the model

Active Machine Learning for training complex models –

generating the right training data effectively

DMSO Control

Treatment Control cell with normal cell division

Treated cell with elongated and abnormal mitosis,

resulting in catastrophe (death)

Becoming CHEAPER with AI

By only conducting experiments when needed

9

Extract features

(signature

descriptor)

Build/Apply ML

model

(Mondrian TCP on

SVM)

p-value (class 🙂)

p-v

alu

e (

cla

ss 🙂

)

Do

n’t

syn

thesiz

e

Don’t test

Machine Learning in Cellular Screening

Image-based Transcriptomics - MERFISH

11

>> detailed understanding of primary

disease and improved model

characterisation

1) Understand disease biology

2) Drive model selection

3) Develop better reagents

> TI, TV

Villani et al., 2017

Science

Identification of new

dendritic cell

populations

Identify

relevant cell

population

Cell lineage and state

classification

Characterise

disease

http://www.sciencemag.org/cgi/doi/10.1126/science.aaa6090

Moffit et al PNAS, 2016. 113 (39) 11046

http://science.sciencemag.org/content/356/6335/eaah4573

http://science.sciencemag.org/content/356/6335/eaah4573

http://www.sciencemag.org/cgi/doi/10.1126/science.aaa6090

MERFISH workflow

12

hyb1, hyb2

Pre Post

~0.5 px localisation accuracy

smFISH spot detection

Round-to-round registration

Barcode assignment

Downstream

analysis

PCA

Label-free Semantic Segmentation for Assay Multiplexing –

more biology at lower cost

13

FKBP

5 Marker 3

Marker 4 Marker 1

Marker 2

The problem:

Using nuclear and cell markers limits colours available to investigate biology

U-net Deep

Neural Network Label-free phase contrast image

Multiple endpoints

Subpopula

tions

Segmentation

No nuclear stain

No cell marker

Cell Division Cycle – more information from fewer labels

14

G1 G2

Transfer learning – repurpose

pretrained network to predict new

output at single cell level

G1

G2

Fluorescent DNA Stain Label free measurement

Greater sensitivity through machine-driven assay development

15

Positive

Control

Negative

Control

ATPB Phalloidin DAPI

ATPB: DAPI

Ki67 DAPI

CC3: DAPI

Textural &

Morphological

Features

• Microphysiological Organ-Chips have potential for

better translation to humans

• Systems require validation – methods from 2D

not directly applicable

• For more subtle phenotypes, manual design of

endpoint is difficult

Train

Automated

Microscopy

Test

compound

Automatic Outlier Detection

16

Outlier ≠ Artefact

(could also be uncommon phenotype)

Rapid manual annotation of latent feature space

Deep Variational AutoEncoder (VAE) –

unsupervised encoding of image appearance into

spatial dimensions.

Points close together = similar images

Automatically identify odd images as low density

regions in feature space

Dense –

many similar

images

Sparse – rare

image

appearance

Automatic Assay Development

17

Clusters are most likely different

phenotypes

Rapid annotation of training

data

Training can come from

image appearance, or from

prior information (eg clinical

toxicity)

Deep Variational AutoEncoder (VAE) – encoding

image appearance into spatial dimensions.

Points close together = similar images

Dense –

many similar

images

Sparse – rare

image

appearance

Looking to the Future

18

Challenges and Areas of Research

19

Transfer learning

Data

Augmentation

Unsupervised

pre-training

Multi-task

Learning

Dealing with limited training data

Literature

mining

Feature

Encoding

Metadata

standards

Learning to integrate multiple

experiments/data types

Continuous Technology

Development Sequence to sequence encoding

Adversarial Networks

Isola et al., CVPR’17

Vinyals et al 2015

Hierarchy of AI Applications

21

Image Segmentation

• Train machine to extract

more information from

fewer labels

• Single cell phenotypes

-40 -20 0 20 40 60

-40

-20

02

04

0

tsne-x

tsn

e-y

Image Classification

• Phenotypic screening

• Artefact Detection

• Translation prediction

Dataset annotation

• Image features

• Compound information

• Cell line metadata

Learn to pick out high level

correlations

Specific problem

solving Machine Intelligence

Confidentiality Notice

This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove

it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the

contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus,

Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com

22

artificial intelligence and the future of early drug discovery · imed biotech unit ¦ discovery...

Documents