artificial intelligence and the future of early drug discovery · imed biotech unit ¦ discovery...
TRANSCRIPT
Artificial Intelligence and the Future of Early Drug Discovery
Adam Corrigan
IMED Biotech Unit ¦ Discovery Sciences ¦ Discovery Biology 2
Reasons why drugs fail
Lessons Learned from AZ Drug Pipeline
Implementation of the 5R framework is improving success rates
Candidate Drug nomination to Phase III completion has improved from 4% (2005-2010) to 19%
(2012-2016)
Drugs are still failing as a result of lack of efficacy & safety
Focus on deeper scientific understanding of disease biology
Requires models that emulate human physiology to predict drug safety and efficacy
Morgan P et. al. (2018) Nature Reviews Drug Discovery
Th
e 5
R F
ram
ew
ork
Primary Reasons for Project Failure
Introduction to Quantitative Biology (QuBi)
3
Bio-
informatics
Image
Analytics
Statistics
Quantitative Biology
Claus Bendtsen
Phenotypes
Screening
Microenvironments Morphology
Single Cell
Artificial
Intelligence
Outline
Efficient generation of training data
• Phenotype classification of time lapse imaging
• Reducing the amount of experiments required
Artificial Intelligence in Cellular Screening
• Semantic segmentation for label-free imaging
• Textural feature classification of complex cell models
• Unsupervised hit/phenotype discovery
Outlook
• Challenges
• Horizon scanning
Efficient Generation of Training Data
5
Time Lapse Analysis of Mitotic Phenotypes
6
• Understanding of when in the cell cycle defects
appear sheds light on the mechanism of action
• Time lapse imaging allows cell fate to be linked to
the point at which the defect originates
Building training data more efficiently
7 IMED Biotech Unit I Discovery Sciences I Quantitative Biology
Automated cell tracking, but
identification of subtle
phenotypes requires time-
consuming manual annotation
Choose the most informative cells to
expand training set through manual
classification
Normal
Abnormal
Mitotic
Based on initial training set calculate the
confidence of classification
Cells with high
classification probabilities
would add very little
information to the model
Active Machine Learning for training complex models –
generating the right training data effectively
DMSO Control
Treatment Control cell with normal cell division
Treated cell with elongated and abnormal mitosis,
resulting in catastrophe (death)
Becoming CHEAPER with AI
By only conducting experiments when needed
9
Extract features
(signature
descriptor)
Build/Apply ML
model
(Mondrian TCP on
SVM)
p-value (class 🙂)
p-v
alu
e (
cla
ss 🙂
)
Do
n’t
syn
thesiz
e
Don’t test
Machine Learning in Cellular Screening
Image-based Transcriptomics - MERFISH
11
>> detailed understanding of primary
disease and improved model
characterisation
1) Understand disease biology
2) Drive model selection
3) Develop better reagents
> TI, TV
Villani et al., 2017
Science
Identification of new
dendritic cell
populations
Identify
relevant cell
population
Cell lineage and state
classification
Characterise
disease
http://www.sciencemag.org/cgi/doi/10.1126/science.aaa6090
Moffit et al PNAS, 2016. 113 (39) 11046
MERFISH workflow
12
hyb1, hyb2
Pre Post
~0.5 px localisation accuracy
smFISH spot detection
Round-to-round registration
Barcode assignment
Downstream
analysis
PCA
Label-free Semantic Segmentation for Assay Multiplexing –
more biology at lower cost
13
FKBP
5 Marker 3
Marker 4 Marker 1
Marker 2
The problem:
Using nuclear and cell markers limits colours available to investigate biology
U-net Deep
Neural Network Label-free phase contrast image
Multiple endpoints
Subpopula
tions
Segmentation
No nuclear stain
No cell marker
Cell Division Cycle – more information from fewer labels
14
G1 G2
Transfer learning – repurpose
pretrained network to predict new
output at single cell level
G1
G2
Fluorescent DNA Stain Label free measurement
Greater sensitivity through machine-driven assay development
15
Positive
Control
Negative
Control
ATPB Phalloidin DAPI
ATPB: DAPI
Ki67 DAPI
CC3: DAPI
Textural &
Morphological
Features
• Microphysiological Organ-Chips have potential for
better translation to humans
• Systems require validation – methods from 2D
not directly applicable
• For more subtle phenotypes, manual design of
endpoint is difficult
Train
Automated
Microscopy
Test
compound
Automatic Outlier Detection
16
Outlier ≠ Artefact
(could also be uncommon phenotype)
Rapid manual annotation of latent feature space
Deep Variational AutoEncoder (VAE) –
unsupervised encoding of image appearance into
spatial dimensions.
Points close together = similar images
Automatically identify odd images as low density
regions in feature space
Dense –
many similar
images
Sparse – rare
image
appearance
Automatic Assay Development
17
Clusters are most likely different
phenotypes
Rapid annotation of training
data
Training can come from
image appearance, or from
prior information (eg clinical
toxicity)
Deep Variational AutoEncoder (VAE) – encoding
image appearance into spatial dimensions.
Points close together = similar images
Dense –
many similar
images
Sparse – rare
image
appearance
Looking to the Future
18
Challenges and Areas of Research
19
Transfer learning
Data
Augmentation
Unsupervised
pre-training
Multi-task
Learning
Dealing with limited training data
Literature
mining
Feature
Encoding
Metadata
standards
Learning to integrate multiple
experiments/data types
Continuous Technology
Development Sequence to sequence encoding
Adversarial Networks
Isola et al., CVPR’17
Vinyals et al 2015
Hierarchy of AI Applications
21
Image Segmentation
• Train machine to extract
more information from
fewer labels
• Single cell phenotypes
-40 -20 0 20 40 60
-40
-20
02
04
0
tsne-x
tsn
e-y
Image Classification
• Phenotypic screening
• Artefact Detection
• Translation prediction
Dataset annotation
• Image features
• Compound information
• Cell line metadata
Learn to pick out high level
correlations
Specific problem
solving Machine Intelligence
Confidentiality Notice
This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove
it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the
contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus,
Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com
22