drug discovery grid -- a real grid application zhang wenju, shen jianhua shanghai institute of...
TRANSCRIPT
Drug Discovery Grid-- A real grid application
Zhang Wenju, Shen Jianhua
Shanghai Institute of Materia Medica, CASShanghai Jiaotong University Jiangnan Institute of ComputingThe University of Hong Kong
Agenda
1. DDGrid Introduction
2. DDGrid Architecture
3. DDGrid Application
4. DDGrid Demo
Background
Large-scale High-throughput Virtual Screening
in SilicoThe computational analysis of chemical databases to identify compounds appropriate for a given biological receptor
in VitroIdentification of new compounds showing some activity against a target biological receptor, and the progressive optimization of these leads to yield a compound with improved potency and physicochemical properties in vitro
in Vivoeventually, improved efficacy, pharmacokinetics, and toxicological profiles in vivo.
Process of Drug Discovery and Design
2-3 2-3 yearsyears
3-4 3-4 yearsyears
Random Screening Random Screening 10, 000 ~ 20, 000 10, 000 ~ 20, 000
CompoundsCompounds
Random Screening Random Screening 10, 000 ~ 20, 000 10, 000 ~ 20, 000
CompoundsCompounds Drug CandidateDrug CandidateDrug CandidateDrug Candidate Pre-clinicPre-clinicPre-clinicPre-clinic
ClinicClinic(phase I, II, III)(phase I, II, III)
ClinicClinic(phase I, II, III)(phase I, II, III)
MarketMarketMarketMarket
2-3 2-3 yearsyears 2-3 2-3 yearsyears
Time: 10-12 yearsTime: 10-12 yearsMoney: several billion dollarsMoney: several billion dollars
ComputerComputer-Aid Drug -Aid Drug
DesignDesign
Leads and Opt.Leads and Opt.Leads and Opt.Leads and Opt.
DDGrid overview
◆ Drug Discovery Grid project aims to build a collaboration platform for drug discovery using the state-of-the-art grid computing technology.
◆ This project intends to solve large-scale computation and data intensive scientific applications in the fields of medicine chemistry and molecular biology with the help of grid middleware developed by our team.
◆ Over one million compounds database with 3-D structure and physicochemical properties are also provided to identify potential drug candidates. Users also can build and maintain their own customized ligand database to share in this grid platform.
DDGrid Architecture
Internet
Global Server
Slave Server
User User User
Internet
Slave Server Slave Server
DDGrid Architecture
Internet
Global Server
Slave Server
User 终端 终端
Internet
Slave Server Slave Server
Resource monitoring, job submit and monitor, input and parameter, result view and download through Web Portal
Resource monitoring, job submit and monitor, input and parameter, result view and download through Web Portal
DDGrid Architecture
Internet
Global Server
Slave Server
User User User
Internet
子服务器
•U
ser interface
•R
esources manag.
•Job subm
it and mon.
•K
ey and cert manag.
•R
esult analysis
•G
lobal scheduling
•visisualiszation
•D
istributed CD
B
DDGrid Architecture
Internet
主服务器
Slave server
User User User
Internet
slave slave
•Local job manag.•Local res. manag.•Local CDB manag.•Data en-decrypt•Local result
assimilate
•Local job manag.•Local res. manag.•Local CDB manag.•Data en-decrypt•Local result
assimilate
DDGrid Workflow
Job Submit
Global Server (Monitoring, Work Pool, Resource Manag., Assimilate of Result)
ID and Result Return
Slave Server (Local Resource Manag., Monitoring, Local Work Pool, Assimilate of Result)
Return of Result, New job requestJob Dispatch
Computational Client (Docking)
Job DispatchReturn of Result, New job request
xmlxml
DDGrid security
1. PKI-based security2. All the sites involved should hold a certification issued by our CA3. All the databases deployed and results are encrypted4. All the message passing are SSL/TLS-enabled
DDGrid Web Portal
Test Case 1
Virtual Screening from 20,000 compounds
Involved Sites:
Shanghai Inst. of M. M. (SIMM) Alpha Cluster (32CPU)
Beijing Mol. Ltd. Sunway Cluster (224CPU)
The Univ. of Hong KongGideon Cluster (16CPU)
Shanghai SuperComp. Centre Dawning 4000A
Dalian Univ. of Tech. Dawning 4000A
London e-Science Centre Mars Cluster
Time consumed:
5946 sec ( appr. 99 min)
Data Sets (CDB):Specs
Job scheduling
Visualisation of Docking Result
DDGrid message passing
<scheduler_request> <authenticator>3333</authenticator> <hostid>102</hostid> <rpc_seqno>2401</rpc_seqno> <platform_name>i686-pc-linux-gnu</platform_name> <core_client_major_version>2</core_client_major_version> <core_client_minor_version>19</core_client_minor_version> <idle_ncpu>16</idle_ncpu> <project_disk_usage>5315768.000000</project_disk_usage> <total_disk_usage>68417940.000000</total_disk_usage> <code_sign_key> … </code_sign_key> <projects> <project> <master_url>http://www.ddgrid.ac.cn/ddg/</master_url> <resource_share>100.000000</resource_share> </project> </projects> <result> … </result> … <host_info> … </host_info></scheduler_request>
DDGrid message passing
<scheduler_reply> <message priority="low">No work available</message> <project_name>Ddg</project_name> <user_name>sss</user_name> <code_sign_key> … </code_sign_key> … <workunit> … </workunit> <preferences> <low_water_days>1.2</low_water_days> <high_water_days>2.5</high_water_days> <disk_max_used_gb>0.4</disk_max_used_gb> <disk_max_used_pct>50</disk_max_used_pct> <disk_min_free_gb>0.4</disk_min_free_gb> … </preferences> …</scheduler_reply>
DDGrid message passing
<workunit> <file_info> <number>0</number> </file_info> <file_info> <number>1</number> </file_info> <file_info> <number>2</number> </file_info> … <file_ref> <file_number>0</file_number> <open_name>tabfile</open_name> </file_ref> <file_ref> <file_number>1</file_number> <open_name>infile</open_name> </file_ref> <file_ref> <file_number>2</file_number> <open_name>sphfile</open_name> </file_ref> <command_line>-business</command_line></workunit>
DDGrid message passing
…<project> <scheduler_url>http://www.ddgrid.ac.cn/ddg_cgi/cgi</scheduler_url> <master_url>http://www.ddgrid.ac.cn/ddg/</master_url> <project_name>Ddg</project_name></project><app> <name>gridapp</name></app><file_info> <name>gridapp/gridapp_2.19_i686-pc-linux-gnu</name> <nbytes>260754.000000</nbytes> <max_nbytes>0.000000</max_nbytes> <executable/> <signature_required/> <file_signature> … </file_signature> <url>http://www.ddgrid.ac.cn/ddg/download/gridapp_2.19_i686-pc-linux-gnu</url></file_info><file_info> … </file_info>…
DDGrid Resources
Computational and Data Resources Integration
Resources aggregatedSIMM Sunway 32A Cluster
Beijing Molecule Inc. Sunway 256P ClusterHKU Gideon 300 ClusterSSC Dawning 4000ALeSC Mars Cluster (Test only)Singapore Poly-tech Univ.Dalian Univ. of TechnologyShanghai Jiaotong Univ.
Heterogeneous resourcesOS: IRIX, Digital Unix, Linux(IA32, x86_64)CPU : R12000, Alpha, Pentium, AMD
DDGrid Resources
DDGrid Apps.
1. Docking pre-process softwareCombimark
2. Docking software 1) Dock UCSF 2) gsDock SIMM3. CDB build and maintain S/W
Combilib4. AutoDock5. AutoGrid6. Visualisation 7. Security-related tools
Fixed CDB
start
InputFile
PreprocessDock Drug-like
Analysis
New CDB
Experiment
end
CDBGen.
CDBPara.
DDGrid Resources
Chemical Databases (CDB) Each ligand record in a chemical database represents the 3D structural information of
a compound. The numbers of compounds in each CDB can be in the order of tens of thousands and the database size be anywhere from tens of megabytes to gigabytes and
even terabytes.
1. static databasespurchased from commercial chemical company.
Available Chemical Directory (ACD) Chinese natural product database (CNPD)
SPECS databasechemical ADME/T database, etc.
2. dynamic databasesmade by user own, and deployed automatically.
Deployed commercial CDB (appr.700,000)
Name of Database Description
Specs Provides about 230,000 compounds
CMC-3D Provides 3D models and important biochemical properties (including drug class, logP, and pKa values) for over 8,400 pharmaceutical compounds.
ACD-3D Provides 200,000 3D compounds commercial available
NCI-3D 213,000compounds with 2D information from the National Cancer Institute
CNPD Collected 12,000 Chinese natural products with chemical structure
TCMD With 9127 compounds and 3922 herbs
Vendor Num. of Mol. Vendor Num. of Mol.
ACB-Eurochem 98603 Maybridge 53042
Ambinter 533866 Nanosyn 68317
Asinex 293385 National Cancer Institute 223536
ChemBridge 562624 Otava 181195
ChemDiv 361859 Peakdale 9632
ComGenex 38590 Pharmeks 116355
Enamine 533111 PubChem 164031
IBScreen 452728 Ryan Scientific 64205
InterChim 288882 Sigma-Aldrich 49022
KeyOrganics 22294 Specs 307550
Life Chemicals 44762 TimTec 127173
appr. 3,300,000 compounds
CDB example : CNPD-China Natural Products Database
CDB example : CNPD
CNPD: The first and only comprehensive source of chemical, structural and bibliographic data on all known natural products in China.
CNPD serves as information sources for chemical, physical and biological properties, literature, they are useful to scientists within the pharmaceutical industry.
CNPD can be searched in flexible ways: structure, sub-structure, name, molecular formula, molecular weight, CAS register number, category, etc.
CNPD: Traditional Chinese Medicine (TCM) applications are pre-indexed in CNPD to provide hints for lead compounds discovery.
CDB example : CNPD
CDB example : TCMD
TCMD-Traditional Chinese Medicine Database
TCMD is a bibliographical database of approximately 20,000 records with abstracts of TCM articles. Relevant articles are selected from among 150-200 journals from Mainland China, Taiwan, and Hong Kong (most of them are Chinese); English abstracts are written for the selected articles and other pertinent information is translated into English.
CDB example : TCMD
DDGrid applications in reality
SIMM carried out anti-SARS and anti-diabetes drug research using the DDGrid
1. Anti-SARS drug research2. Anti-diabetes drug research
Virtual screening from Comprehensive Medicinal Chemistry-3D (CMC-3D) database which contains 7,900 compounds, found that cinanserin have distinct anti-SARS effect
Department of Virology, Bernhard-Nocht-Institute for Tropical Medicine, Germany
Research Department, Cantonal Hospital St Gallen, Switzerland
“Basically your inhibitor turned out to be the best compound we have tested so far! ”
Have applied for domestic patent 03129071.x and PCT patent pi034248
Research on Anti-SARS medicine
Found an anti-
diabetes lead better
than Rosiglitazone.
by
targeting on
PPAR , through
virtual screening,
optimization design
and synthesis and
biology and
pharmacology
testing
CADD process
10
100
1k
10k
100k
1M
10M
2400000
10,000
500300
14276
800,000
200,000
13814
Research on anti-diabetes medicine
2.4 m
10 t
500
142
76protein testing
400 t
85
composite design
virtual screening
virtual screening
48
synthesis
8cell testing
4animal testing
1comprehensive evaluation
48 KD<1 M
22 KD<0.1MKD<100M
protein testing
Research on anti-diabetes medicine
manually screening
New anti-diabetes drug
Current Progress
1. Applied for patent 200410016460.X , and PCT patent
2. Security testing and pre-clinic research
What does the DDGrid provide ?
1 、 Drug Design Collaboration PlatformLarge-scale Virtual Screening platformsharing large CDB
2 、 Computational Resources SharingSIMM/SSC/HKU/Mol. Ltd/SJTU/DUT
3 、 Data Resources Sharingpre-deployed commercial CDB (ACD/CNPD …)sharing self-made CDB
4 、 Medicinal chemistry text and structure search5 、 Customization and Extension
Collaboration
Selected Users of DDGrid
DDGrid Demo
http://www.ddgrid.ac.cn
Demo
Demo
Demo
Demo
Demo
Demo
Thank you !
Q&A