How and Why Novartis is Exploiting GRID
Technology?HPC and Semantic Web
Prof. Manuel C. Peitsch, PhD
Global Head of Systems Biology
HPTS Asilomar / M. Peitsch / September, 2005
Mechanism-based Drug Discovery
Understanding Disease
Pathways elucidation
Target validation
Clinical PoC
New drug candidates (to be tested in PoC studies)
Reduce project life cycle
Increase PoS after D3 (Lead optimisation)
The Challenges of Drug Discovery
Systems Biology: Combination of *Omics & Mathematical Modelling
HPTS Asilomar / M. Peitsch / September, 2005
Japan• Oncology• Diabetes• Cardiovascular
Austria• Autoimmunity
Great Britain• Respiratory• Gastrointestinal
Switzerland• Muscular and Bone• Nervous system• Oncology• Transplantation• Ophthalmology• Genome and Proteome Sciences• Discovery Techologies• Discovery Chemistry• Protease Platform• GPCR
United States• Diabetes• Infectious diseases• Cardiovascular• Oncology• Discovery Techologies• Discovery Chemistry• Animal Models• Pathways• Genome and Proteome
Sciences
Organizational complexity
HPTS Asilomar / M. Peitsch / September, 2005
Data and Information complexity
Raw data from instruments
Literature
Molecular Structure
S
1
S
2
L
3
L
4
E
5
K
6
G
7
L
8
D
9
G
10
A
11
K
12
K
13
A
14
V
15
G
16
G
17
L
18
G
19
K
20
L
21
G
22
K
23
D
24
A
25
V
26
E
27
D
28
L
29
E
30
S
31
V
32
G
33
K
34
G
35
A
36
V
37
H
38
D
39
V
40
K
41
D
42
40 30 20 10
V
43
L
44
D
45
S
46
V
47
L
48
1
S
1
S
2
L
3
L
4
E
5
K
6
G
7
L
8
D
9
G
10
A
11
K
12
K
13
A
14
V
15
G
16
G
17
L
18
G
19
K
20
L
21
G
22
K
23
D
24
A
25
V
26
E
27
D
28
L
29
E
30
S
31
V
32
G
33
K
34
G
35
A
36
V
37
H
38
D
39
V
40
K
41
D
42
40 30 20 10
V
43
L
44
D
45
S
46
V
47
L
48
1
Mass (m/z)
% I
nte
ns
ity
1500 2200 2900 3600 4300 5000
50
100
3876
.3
2738
.9
2324
.7
2495
.6
3832
.1
4174
.9
2081
.1
4503
.2
2981
.5
2623
.8
3321
.5
3717
.1
3491
.6
4059
.6
2795
.8
2209
.3
3094
.331
67.7
4290
.3
1838
.1
1652
.2
1911
.5
b27
b42 - D
b30
b38
y39 -D9
y11
y27
y33
y18 [M+H]+
y35
b39-D
b28-D (y26)
y24 -Db24-D (y22)
y20 -D
b23
b45 - D
Genomics and Proteomics
HPTS Asilomar / M. Peitsch / September, 2005
The Vision
Computational life science and HPC
GRIDs
People Networks
Data Information and Knowledge GRID Knowledge Space /
Semantic Web
Enable and transform the Drug
Discovery process through:
- Comprehensive and reliable Data
and Information
- Seamless information integration
for easy navigation
- Turning Data into Knowledge
using in silico science
- Simulate biomolecular processes
using in silico science
- E-Collaboration and v-communities
HPTS Asilomar / M. Peitsch / September, 2005
Computational Aspects in Drug Discovery
Targetfinding
Targetvalidation
Leadfinding
Leadoptim.
Bioinformatics LabMacromolecular
Structure & Function LabComputationalChemistry Lab
HPTS Asilomar / M. Peitsch / September, 2005
Signal Transduction Networks
5
- 30- 25- 20- 15- 10- 50
0 50 100 150 200
0 1 2 3 4 5-2
0
2
4
6
time
contr
ol
cyto
0 1 2 3 4 5-1
0
1
2
3
time
nuc
0 1 2 3 4 5-2
0
2
4
6
time
dru
g
0 1 2 3 4 5-1
0
1
2
3
time
...
Mathematical Representation
Omics experiments
HPTS Asilomar / M. Peitsch / September, 2005
Human data
SNPDNA SampleSequencing
DB DB
SAPTranslate &Map/Align
Model &Map
DB DB
Disease associationValidated Targets
Virtual Drug DiscoveryIn Silico Docking
In Silico “Chemogenomics”Virtual Library DesignPredictive MedChem
Tox PK/PK ADME modelling
Functional and Structural insights
DBKinasesNRProteases
Structures &Modelling templates
Pro
tein
s
Compounds
QSAR
In Silico Drug Discovery
HPTS Asilomar / M. Peitsch / September, 2005
3D-Crunch
In Silico Drug Discovery Pipeline: Can it be done?
ProductiveAutomated Protein
modelling email server
ProductiveAutomated Protein
modelling Web server
Genome scale Automated Protein modelling
SETI@Home
1990 1995 2000 2005
Protein Model Structure database
SETI@Home recognised as a leading new concept (ComputerWorld Award)
SWISS-MODEL and 3D-Crunch recognised as a leading new concept (ComputerWorld Award)
GeneCrunch
GeneCrunch recognised as a leading new concept (ComputerWorld Award)
First PC-GRID at Novartis
Docking in productionat Novartis
Automated ToxCheck and other CIx tools
Full TranscriptomeModelling at Novartis
First automated pipelines
UD recognised for visionary use of information technology in the category of Medicine (ComputerWorld Award)
In Silico Drug Discovery and
Chemogenomics pipeline
HPTS Asilomar / M. Peitsch / September, 2005
Novartis’ HPC Grid Strategy
Linux Clusters Shared Servers
PC GRID
ExternalCollaborations
Job
su
bm
issio
n layer
HPTS Asilomar / M. Peitsch / September, 2005
Influencing Biomolecular Processes
Target
Drug
Target = enzyme, receptor, nucleic acid, …Ligand = substrate, hormone, other messenger, ...
Target
ACTIVE
Ligand
INACTIVE
HPTS Asilomar / M. Peitsch / September, 2005
PC Grid Success Story: Protein Kinase CK2 Inhibition
Target finding:
Protein Kinase CK2 has roles in cell growth, proliferation and survival.
Protein Kinase CK2 has a possible role cancer and its over expression has been associated with lymphoma.
Target validation:
To elucidate the different functions and roles of CK2 and confirm it as a drug target for oncology, one needs a potent and selective inhibitor.
Approach:
The problem was addressed by in silico screening (docking).
HPTS Asilomar / M. Peitsch / September, 2005
Virtual Screening by in silico Docking
> 400,000 Compounds
DockingProcess
andSelection
ofpossible
hits
< 10 Compound
s
HPTS Asilomar / M. Peitsch / September, 2005
Important results
ConclusionWe have identified a 7-substitued Indoloquinazoline compound as a novel inhibitor of protein kinase CK2 by virtual screening of 400 000 compounds, of which a dozen were selected for actual testing in a biochemical assay. The compound inhibits the enzymatic activity of CK2 with an IC50 value of 80 nM, making it the mostpotent inhibitor of this enzyme ever reported. Its high potency, associated with high selectivity, provides a valuable tool for the study of the biological function of CK2.
“The reported work clearly shows that large database docking in conjunction with appropriate scoring and filtering processes can be useful in medicinal chemistry. This approach has reached a maturation stage where it can start contributing to the lead finding process. At the time of this study, nearly one month was necessary to complete such a docking experiment in our laboratory settings. The Grid computing architecture recently developed by United Devices allows us to now perform the same task in less than five working days using the power of hundreds of desktop PC’s. High-throughput docking has therefore acquired the status of a routine screening technique.”
“The reported work clearly shows that large database docking in conjunction with appropriate scoring and filtering processes can be useful in medicinal chemistry. This approach has reached a maturation stage where it can start contributing to the lead finding process. At the time of this study, nearly one month was necessary to complete such a docking experiment in our laboratory settings. The Grid computing architecture recently developed by United Devices allows us to now perform the same task in less than five working days using the power of hundreds of desktop PC’s. High-throughput docking has therefore acquired the status of a routine screening technique.”
HPTS Asilomar / M. Peitsch / September, 2005
Major benefits of GRID computing
Optimization of resources utilization: HPC platforms usage is maximized and Technology expertise is
shared. Response to additional performance requirements is easier and
faster No service downtime due to possibility to run same job on many
platforms across different sites.
Enable cross business units collaboration and synergies: Single efficient access path to Data and Compute resources. Tools are easily exchanged between scientists/programs.
Favor “out of the box” thinking: Apply HPC to areas which one would not even have considered a
year ago. This has created a fertile ground for a new paradigms in Drug Discovery leading to Business Process transformation.
HPTS Asilomar / M. Peitsch / September, 2005
Performance of the PC-GRID (today)
Computing Power:
Theoretical >5 TeraFLOPS harvested from 3000 PCs in all geographical locations.
Acceleration of the in silico Docking process versus 1 standard 2002 PC (start of project): ~4000x
Financial:
Immediate savings in excess of 2m$.
No need for additional data centre to support this computing power.
Optimally use of existing hardware (associates’ PCs)
HPTS Asilomar / M. Peitsch / September, 2005
Building a GRID: Management focus
You need a champion!
Do not punctuate every sentence with the GRID word and avoid the Hype!
Demonstrate value through pilots: Think “Iterative Improvement”. The conceptual layers
are there, prototype are emerging, improvements and optimization is essential, maturity will follow
Leadership, transcendence, entrepreneurship and tenacity are the essence of transformation! Concepts are easy to draw on a napkin over beer!
But new and great things are hard to achieve!
Use external goodwill to create internal acceptance!
HPTS Asilomar / M. Peitsch / September, 2005
Peru
Community projects help with acceptance
HPTS Asilomar / M. Peitsch / September, 2005
Building a GRID: User base
You need a clearly defined and communicated HP Computing strategy. Address unmet computational needs.
Apply HPC to areas which one would not even have considered two years ago. This has created a fertile ground for a new paradigms in Drug Discovery leading to Business Process transformation. Are all problems “GRIDable”?
Further applications: Sequence identification in proteomics from LC-MS/MS
data
Text Mining and semantic Web infrastructure
HPTS Asilomar / M. Peitsch / September, 2005
Building a GRID: Software
The Software licensing models will have to evolve
Do not stop because of software licensing issues.
Show success with freeware and home grown algorithms.
Demonstrate business value and cost leadership.
Opportunity to develop your own code?
Unification of HPC applications environment:
Ensure that applications can run on maximum number of systems.
Introduce HPC software management:
Influence licensing models. The classical models do not fit the GRID and HPC paradigm.
HPTS Asilomar / M. Peitsch / September, 2005
Building a GRID: PC owners
Education and awareness. Ensure that the HelpDesk is well trained and gives
the right answers.
Ensure that PC owners know about the REAL impacts, including network.
The PCs are company and not personal assets! Strategy to use them when they are idle is not a
user but a company decision.
Address power saving policies in a transparent manner.
HPTS Asilomar / M. Peitsch / September, 2005
Knowledge Space - Vision
The "Knowledge Space Portal” is a Drug Discovery oriented implementation of the Semantic Web. Through a single customizable interface it:
• Federates heterogeneous data resources and provide precise organization of the content
• Provides quick and intuitive access to information
• Provides data extraction, analysis and exploration tools
• Allows data integration, data exchange and interoperability of applications
• Provides mechanisms for data capture and annotation
• Provides knowledge sharing and collaborative tools
HPTS Asilomar / M. Peitsch / September, 2005
Basic principles behind the Knowledge Space
The Knowledge Space consists of:
The collection of all types of data and information within the scope of interest defined by a particular business. There is no conceptual difference between internal and external data/information.
The Meta Data and the Knowledge Map which describe the collection in terms of content and location.
The Text Mining platform which allows the identification of entities (using vocabularies) and the concepts they belong to using ontologies.
The Ultralinker, which associates identified entities and concepts with specific contextual rules.
A user interface.
HPTS Asilomar / M. Peitsch / September, 2005
What is an Ultralink?
The Ultralink is an “intelligent” context-sensitive Hyperlink created at run time by the Ultralinker.
The Ultralink is generally a menu of links instead of a single link.
This menu will only offers sensible actions/options:
No dead ends due to a verification process ensuring that the link has a target.
The Ultralink provides direct interaction between any type of entity (gene name, compound name, mode of action, disease name, company name, etc… with an appropriate set of tools and resources as defined by the rules encoded in the Ultralinker.
The Ultralink functionality allows the selection of any portion of text in the Web browser and sends it as input to the Ultralinker for analysis and menu creation.
The Ultralink allows easy navigation across the information domains contained in the Knowledge Space.
HPTS Asilomar / M. Peitsch / September, 2005
How the Ultralinker works
The Ultralinker is a Web service which analyses any information (such as a complete web pages) it receives for recognisable entities using text mining and pattern recognition methods.
Each recognised item is mapped onto the ontologies and the Knowledge Map.
The Expert System will define what can be done with the identified entities e.g.
If a gene name is recognised then Ultralinks are created to:
get its sequence and perform sequence similarity searches;
query genetic disorder databases and map it onto the chromosome;
produce a 3D structure by comparative modelling;
look for hits from High Throughput Screening;
etc…
Automated predefined processes can thus be activated by a single click (Ultraaction or work-flow).
The Ultralinker will create a menu that will be sent to the User interface.
HPTS Asilomar / M. Peitsch / September, 2005
Ultralinker
SemanticSearch
Text Mining
Analytics
What constitutes the Knowledge Space
Internet
Other ResearchDocumentation
Chemistry
Biology
Literature
Comp. Inf.
Bioinformatics
Meta Data K map
ThesauriiOntologies
Rules
Definedworkflows
HPTS Asilomar / M. Peitsch / September, 2005
Knowledge Space Search Modes
Text Structure Concepts
HPTS Asilomar / M. Peitsch / September, 2005
Knowledge Space: Text search
ACE modulator Cholecystokinin modulator Metalloprotease 4 modulatorACE-related carboxypeptidase modulator Chymase modulator Metalloprotease 7 modulatorAcrosin modulator Chymotrypsin modulator Metalloprotease 8 modulatorAggrecanase modulator Clipsin modulator Metalloprotease 9 modulatorAlpha 1 protease modulator Collagenase modulator Metalloprotease modulatorAlpha 1 proteinase inhibitor Complement cascade modulator NAALADase modulatorAlpha 1 proteinase modulator Complement factor modulator Pepsin modulatorAminopeptidase modulator Cysteine protease modulator Peptidase modulatorAmyloid protease modulator Dipeptidase modulator Plasmepsin modulatorAntitrypsin modulator Elastase modulator Plasmin modulatorAspartic protease modulator Endopeptidase modulator Protease inhibitorAtriopeptidase modulator Endothelin converting enzyme modulator Protease stimulantCalpain inhibitor Factor IX modulator Proteasome inhibitorCalpain modulator Factor VII modulator Proteasome modulatorCarboxypeptidase modulator Factor X modulator Renin modulatorCaspase modulator Factor XII modulator Secretase modulatorCathepsin B modulator Gelatinase modulator Serine protease modulatorCathepsin D modulator Interleukin 1 converting enzyme modulator Thrombin modulatorCathepsin F modulator Kallikrein modulator Thrombokinase modulatorCathepsin G modulator Metalloprotease 1 modulator Trypsin modulatorCathepsin K modulator Metalloprotease 11 modulator Tryptase modulatorCathepsin L modulator Metalloprotease 12 modulator Ubiquitin-specific protease inhibitorCathepsin modulator Metalloprotease 13 modulator Ubiquitin-specific protease modulatorCathepsin S modulator Metalloprotease 2 modulator Ubiquitin-specific protease stimulantCathepsin V modulator Metalloprotease 3 modulator Urokinase modulatorCathepsin X modulator Viral protease modulator
AntiviralCMV protease inhibitorCMV protease modulatorHepatitis C protease inhibitorHepatitis C protease modulatorHerpes simplex virus protease inhibitorHerpes simplex virus protease modulatorHIV protease inhibitorHIV protease modulatorHIV-1 protease inhibitorHIV-1 protease modulatorHIV-2 protease inhibitorHIV-2 protease modulatorNS3 protease inhibitorNS3 protease modulatorPicornavirus protease inhibitorPicornavirus protease modulator
Expansion: EMTREE + Novartis proprietary dictionary expansion for protease modulators + respective synonyms
HPTS Asilomar / M. Peitsch / September, 2005
Analysis tools
Display-Navigation-UltralinkProtease modulator in Literature DB (Medline-Embase)
Sort capabilities
Easy navigation in record titles
Search report: Number of Docs, Key-words extracted
Ranking value and access to document
HPTS Asilomar / M. Peitsch / September, 2005
Document view
Take advantage of the full-text article provided by PubMed
HPTS Asilomar / M. Peitsch / September, 2005
Analysis Tools
HPTS Asilomar / M. Peitsch / September, 2005
Data Analysis Protease modulators in CI DBs July 2004 - ADIS & Pharmaprojects
Univariate - Companies Univariate - MOA
Univariate - Diseases conditionned by Companies Clustering Diseases -MOAs
HPTS Asilomar / M. Peitsch / September, 2005
Graph Navigator Protease modulators in CI DBs July 2004 - ADIS & Pharmaprojects
HPTS Asilomar / M. Peitsch / September, 2005
Clustering
HPTS Asilomar / M. Peitsch / September, 2005
Chemistry, Chemoinformatics and Structural Biology