![Page 1: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/1.jpg)
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Gergely Sipos
MTA SZTAKILaboratory of Parallel and Distributed Systems
www.lpds.sztaki.hu
Life sciences applicationson the EGEE Grid
![Page 2: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/2.jpg)
2
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 2
The EGEE Project
• Aim of EGEE: “to establish a seamless European Grid infrastructure for the support of the European Research Area (ERA)”
• EGEE– 1 April 2004 – 31 March 2006– 71 partners in 27 countries, federated in regional Grids
• EGEE-II– 1 April 2006 – 30 April 2008– Expanded consortium
• EGEE-III– 1 May 2008 – 30 April 2010– Transition to sustainable model
![Page 3: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/3.jpg)
3
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Life sciences cluster in EGEE
Life sciences is one of the strategic communities for EGEE
• Life sciences cluster in EGEE:– To increase the impact of EGEE on this community– To drive the development of the EGEE services– To develop domain specific, high level services– Main topics:
Drug discovery Medical imaging Bioinformatics
![Page 4: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/4.jpg)
4
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
4
Enabling Grids for E-sciencE
Biomed Virtual Organization
Size of the infrastructure today:• > 250 sites in 48 countries• > 68 000 CPU cores• ~ 20 PB disk + tape MSS• > 150 000 jobs/day• > 9000 registered usersOut of which, Biomed VO:• > 100 sites in 30 countries• ~ 17 000 CPU• > 150 registered users
![Page 5: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/5.jpg)
6
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
6
Enabling Grids for E-sciencE
Life sciences applications
Resources
Communication layer (GEANT, Internet...)
EGEE middleware services
Applications
Pro
du
ctio
n g
rid
infr
astr
uct
ure
lev
el
Resources Resources Resources Resources
Applications Applications Applications
Domain-specific services Domain-specific services
App
licat
ions
leve
l
![Page 6: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/6.jpg)
7
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
7
Enabling Grids for E-sciencE
Application example 1: WISDOM
Resources
Communication layer (GEANT, Internet...)
Biomed Virtual Organization, EGEE middleware services
WISDOM
Pro
du
ctio
n g
rid
infr
astr
uct
ure
lev
el
Resources Resources Resources Resources
AMGA metadata catalogDIANE grid job scheduler
GAP user interface moduleApp
licat
ions
leve
l
![Page 7: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/7.jpg)
8
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
WISDOM In silico Drug Discovery
• WISDOM: http://wisdom.healthgrid.org/• Goal: find new drugs for neglected and emerging
diseases– Neglected diseases lack R&D– Emerging diseases require very rapid response time
• Need for an optimized environment– To achieve production in a limited time– To optimize performances
• Method: grid-enabled virtual docking– Cheaper than in vitro tests– Faster than in vitro tests
![Page 8: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/8.jpg)
9
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
High throughput virtual dockingEnabling Grids for E-sciencE
Chemical compounds :Chembridge – 500,000Drug like – 500,000
Targets :Plasmepsin II (1lee, 1lf2, 1lf3)Plasmepsin IV (1ls5)(enzymes)
Millions of chemicalcompounds available
in laboratories
High Throughput Screening1-10$/compound, nearly impossible
Molecular docking (FlexX, Autodock)~80 CPU years, 1 TB data
Computational data challenge~6 weeks on ~1000/1600 computers
Hits screeningusing assays performed onliving cells
Chemical compounds : ZINCMolecular docking : FlexX, AutodockTargets structures : PDBGrid infrastructure : EGEE
Leads
Clinical testing
Drug
![Page 9: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/9.jpg)
10
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Computing model & workflow
Simulationjobs run on theEGEE Grid
Simulationresults stored
on the EGEE Grid
![Page 10: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/10.jpg)
12
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Efficiency
Estimated duration on 1 CPU 88.3 years
Duration on EGEE 6 weeks
Cumulative number of Grid jobs 54,000
Maximum number of concurrent CPUs used
2,000
Approximated throughput 2 sec/docking
• Second data challenge for avian flu drug analysis– 8 targets against 300,000 compounds
(2,400,000 simulations)
![Page 11: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/11.jpg)
13
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Statistics of deployment
• First Data Challenge: July 1st - August 15th 2005– Target: malaria– 80 CPU years– 1 TB of data produced– 1700 CPUs used in parallel– 1st large scale docking on world-wide e-infrastructure
• Second Data Challenge: April 15th - June 30th 2006 – Target: avian flu– 100 CPU years– 800 GB of data produced– 1700 CPUs used in parallel– Infrastructure was configured in 45 days
• Third Data Challenge: October 1st - 15th December 2006 – Target: malaria– 400 CPU years– 1,6 TB of data produced– Up to 5000 CPUs used in parallel– Very high docking throughput: > 100.000 compounds per hour
![Page 12: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/12.jpg)
14
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
14
Enabling Grids for E-sciencE
Application example 2: Bronze standard
Resources
Communication layer (GEANT, Internet...)
Biomed Virtual Organization, EGEE middleware services
Bronze standard workflow
Pro
du
ctio
n g
rid
infr
astr
uct
ure
lev
el
Resources Resources Resources Resources
MOTEUR workflow manager
App
licat
ions
leve
l
![Page 13: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/13.jpg)
15
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Scientific challenge
• Medical image registration is the process by which two images acquired independently are registered into a common frame.
Unregistered Registered
O1
O2
T
• Registration accuracy is critical for many image analysis procedures• Bronze Standard is a statistical procedure to estimate the performance of registration algorithms
![Page 14: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/14.jpg)
16
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Implementation on EGEE Enabling Grids for E-sciencE
A Params
PFRegister
Service
GetFromEGEE YasminaPFMatchICP
CrestLines
B
Baladin
FormatConv GetFromEGEE GetFromEGEE
GetFromEGEE
FormatConv
FormatConv FormatConv
MultiTransfoTest
ParamsParams Params
Params
Params
Accuracy Translation Accuracy Rotation
WriteResults
WriteResults
WriteResults WriteResults
Params
MethodToTest
Params Params
~100 image pairs
~800 EGEE jobs
![Page 15: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/15.jpg)
17
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
17
Enabling Grids for E-sciencE
Application example 3: Bioinformatics Grid Portal
Resources
Communication layer (GEANT, Internet...)
Biomed Virtual Organization, EGEE middleware services
Bioinformatics Grid Portal
Pro
du
ctio
n g
rid
infr
astr
uct
ure
lev
el
Resources Resources Resources Resources
App
licat
ions
leve
l
![Page 16: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/16.jpg)
18
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
Enabling Grids for E-sciencE
18
GPSA: Bioinformatics Grid Portal
• Scientific objectives– Protein sequence analysis– Analyse data from high-throughput Biology: genome projects, structural biology, ….
• Tools–Web interface: NPS@–Protein databases are stored on grid storage as flat files
SWISS-PROT, SP-TrEMBL, NRL_3D, PATTINPROT, …
– Legacy bioinformatics applications
FASTA, BLAST, PSI-BLAST, SSEARCH, …
• Contact– http://npsa-pbil.ibcp.fr/– [email protected]
![Page 17: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/17.jpg)
20
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
How to get involved with EGEE
• More information on EGEE:– http://www.eu-egee.org – Life Sciences cluster:
http://technical.eu-egee.org/index.php?id=258 – Coordinator of life sciences cluster:
Vincent BRETON ([email protected])
• To get your own application ported to EGEE:– Support team: http://www.lpds.sztaki.hu/gasuc
• To get access to Biomed Virtual Organization– Obtain a certificate from NIIF CA: http://www.ca.niif.hu/– Register to Virtual Organization:
https://voms.cnaf.infn.it:8443/voms/bio/webui/request/user/create – Access grid from P-GRADE Portal, Bioinformatics Grid Portal, etc.
• EGEE User Forum, Catania, Italy, 2-6 March, 2009:– http://indico.cern.ch/conferenceDisplay.py?confId=40435
![Page 18: Gergely Sipos MTA SZTAKI Laboratory of Parallel and Distributed Systems lpds.sztaki.hu](https://reader036.vdocuments.mx/reader036/viewer/2022062804/56814a7d550346895db791b9/html5/thumbnails/18.jpg)
21
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688 21
www.eu-egee.org
www.lpds.sztaki.hu
Gergely Sipos