shared cyberinfrastructure for global medical research (pdf)
TRANSCRIPT
Garuda : The National Grid Computing Initiative - The shared cyberinfrastructure for data and compute intensive research
Subrata ChattopadhyayCDAC Knowledge Park,[email protected]
www.garudaindia.in
Outline
• Introduction on Garuda • NKN – highlights• Tools and Services - GarudaWare • Major Applications• Collaborations • Q & A
Global Access to Resources Using Distributed Architecture
Garuda on MPLS based NKN
LegendH Head NodeG Gateway
CC--DAC, DAC, BangaloreBangalore
LANLocalUser
Compute Nodes
H
InternetAccess
Partner Partner without without
resourcesresources
PartnerPartnerwith resourceswith resources
Compute Nodes
H
User
Tele-scope
LAN
Storage
LANAccess Terminal
Gridfs AccessTerminal
Access Terminal
M P L S AccessM P L S Access
Access Terminals
G
G
G
• High Capacity, Highly Scalable Backbone
• Provide Quality of Service (QoS) and Security
• Wide Geographical Coverage
• Common Standard Platform
• Bandwidth from Many NLD’s
• Highly Reliable & Available by Design
• Test beds ( for various implementation)
• Dedicated and Owned.
NKN – National Knowledge Network
Garuda High Level System Components
ProgrammingDevelopment Environment
Computing Resources and Virtual Organizations
Research Organizations
Educational institutions Computing Centers
WSRF+GT4 + other Services + Cloud S/W (Nimbus/ VMware)WSRF+GT4 + other Services + Cloud S/W (Nimbus/ VMware)
NKN
Grid PSE
Virtualization support
Workflows
Grid Security and High-Performance Grid Networking
Data
Grid
Reso
urce
En
ab
ler &
Mo
nito
ring
CDAC Resource centers
Access PortalCLI Visualization
Federated Information Server Job Scheduler
Programming Environments Grid ApplicationsSecurity
Resource Management User
EnvironmentsMiddleware Data GridResources
Hand held devices
GARUDA – enabled Applications
Non – Research
Organizations
Cloud Interface
GARUDA Middleware componentsGARUDA Middleware components
Utility tools• RAT• Compiler Service• Gridftp GUI• GARUDA Information
Registry
Access Methods• Access Portal • Problem Solving
Environments• Workflows• Visualization
gateways• Hand held device• Cloud Interface
Management, Monitoring & Accounting• Paryaveekshanam• GARUDA Accounting• MDS4
Security Framework• IGCA Certificates• VOMS • MyProxy• Login Service
Resource Mgmt & Scheduling• Resource Reservation• QoS• GridWay Meta-scheduler• Torque, Load Leveler• Globus 4.x (WS Components)
Legend••••
Data Management• SRB• GSRM• GridFTP
– Indian Grid Certification Authority located at C-DAC, Knowledge Park, Bangalore, India.
– IGCA is the accredited member of APGridPMA.– Issues X.509 Certificates to support the secure environment
in Grid. (for GARUDA, institutes that do research in grid from India and foreign institutes that collaborates with GARUDA).
– http://ca.garudaindia.in
GARUDA SLCS provides gridusers an instant access toGARUDA grid for a trial periodof 30days.
Highlights:• Hassle free registration• Get an access in less than 5mins.• Service over the internet.
Features:• GARUDA Job submission portal• GARUDA Compiler Service
Website: http://labs.garudaindia.in
GARUDA Short Live Certificate
GARUDA Resources
CDAC Resource :
•
•
Fourteen of the partner institutions are also contributing resources including satellite terminals.Total computing power is more than 5500 CPUs equivalent to 65TFStorage space 220 TB
GARUDA Resources – cont...Institution Location Resources
Space Application Centre Ahmedabad VSAT Terminal - 2 Nos.
Indian Institute of Science Bangalore 64 cpu; POWER5; Linux
Raman Research Institute Bangalore 32 cpu; Opteron; Linux
Institute of Mathematical Sciences Chennai 24 cpu; Opteron cluster (Cray XD1)
Madras Institute of Technology Chennai 16 cpu; P4; Linux
Indian Institute of Technology Delhi 32 cpu; Opteron; Linux
Jawaharlal Nehru University Delhi 32+16+16 cpu; Opteron, Opteron, Itanium; Linux
Institute of Genomics and Integrative Biology
Delhi 48 cpu; Xeon; Linux
Indian Institute of Technology Guwahati 128 cpu; Opteron; Linux
University of Hyderabad Hyderabad 32 way SMP; POWER4, AIX
Indian Institute of Technology Kharagapur 16+16 cpu; Power PC2, Xeon; AIX, Linux
Physical Research Laboratory Ahmedabad 320cpus; 64bit AMD
CDAC Bangalore 64 cpu Power 5; 320 cpu Xeon Linux
CDAC Hyderabad 320 cpu Xeon Linux
CDAC chennai 320 cpu Xeon Linux
CDAC Pune 32 cpu Xeon Linux: 4068 CPU Linux
GARUDA Operations & Management
• Looks after deployment of
middleware and network
• Operates from CDAC KP
Bangalore
• Operation Centre with High
resolution, scalable display wall
• Conduct Regular Monday
meetings among administrators
to maintain Garuda health
GARUDA Partners• Motivation
– To Collaborate on Research and Engineering of Technologies, Architectures, Standards and Applications
– To Contribute to the aggregation of GARUDA resources
• Participation– 36 research & academic
institutions in the 17 cities– 8 centres of C-DAC– Total of 45 institutions– Additional over 20 labs
with LOE
Virtual User Community (VOMS)Group Name Description
Bioinformatics application of statistics and computer science to the molecular biology
ClimateModelling Deals with the dynamics of the climate system.
OSDD Community dedicated to develop drugs for tropical infectious diseases like malaria, tuberculosis
GeoPhysis Study related to physics of the Earth and its environment in space
CAE usage of computer software to solve engineering problems
IndianHeritage Focused on technology products for preserving & processing Heritage texts
HealthInformatics Focused on utilizing compute power for health informatics
MaterialScience interdisciplinary field applying the properties of matter to science and engineering
Euindia The vision of a worldwide Grid for Research by both Europe and India
ToolsDeveloper Forum to communicate and collaborate on developing Garuda Tools
GarudaAdmin Meant for administrators from resource providers & Garuda Operation team members
Applications on GARUDA
OSDD Chemo-informatics
datasets
Curatedmolecule datasets
CheminformaticsModels
Analysis
Data Mining and
Analysis
HT Virtual screening
PubChem
ChEMBL
DrugBank
Experimental Assays
Community of About 400
Role of Garuda Grid in OSDD
•
Project Team
Internet/NKN
Results
NKN
OSDD-Garuda Interface
Galaxy Workflow
Weka Workflow
Customized Galaxy Framework on GARUDA for OSDD:Chemo-informatics
• Integrated with Grid Authentication mechanism - Indian Grid Certificate Authority (IGCA)
• Integrated with Gridway Metascheduler - Job scheduling and management
• Integrated OSDD required tools - Weka (for data mining) and Autodock (Virtual screening)
• Provided support to upload multiple input files as tar file
• Data libraries of OSDD community are uploaded and are shared by all users
• Integrated with PostgreSQL
Bioinformatics: Protein Structure Prediction on Grid
• Genetic Algorithm for Protein Structure Prediction (PSP), an in-house developed code is Grid-enabled
• Concurrent jobs of PSP are done by splitting the protein molecule into multiple overlapping parts
• Uses Divide-and-Construct approach for– Reduction in Complexity– Possibility of Concurrency– To handle larger protein molecules
Flow of PSP
Dividing the sequence into parts
Mapping of each part onto a grid resource and to run GA
Constructing the molecule by combining parts and to run GA on combined sequence
Input:Protein
Sequence
Protein Sequence:
Part 1
Protein Sequence:
Part 2
Protein Sequence:
Part 3
Torsion Angles of
Part 1
Torsion Angles of
Part 2
Torsion Angles of
Part 3
Grid Resource 1 Grid
Resource 2Grid Resource 3
Combined GA output
for full molecule
Dividing the sequence into parts
Mapping of each part onto a grid resource and to run GA
Constructing the molecule by combining parts and to run GA on combined sequence
Input:Protein
Sequence
Protein Sequence:
Part 1
Protein Sequence:
Part 2
Protein Sequence:
Part 3
Torsion Angles of
Part 1
Torsion Angles of
Part 2
Torsion Angles of
Part 3
Grid Resource 1 Grid
Resource 2Grid Resource 3
Combined GA output
for full molecule
Performance of GA based PSP on Garuda
• Dataset:– 1TUP – a tumor suppressor protein having 219
amino acids.
• Molecule is splitted into 9 parts and each part has 30 amino acids
• GA on full molecule took 76 hours whereas distributed GA on Garuda took only 3 hours
Data/Memory Intensive Applications on Garuda
Computationally Intensive Applications on Garuda
caBIG - Garuda
• Exploring possibilities in Collaboration
• Interoperability of the “grid” technologies– Make the software
components talk to each other– Follow same Data standards
for publication– Common tool base for
researchers• Leverage HPC capabilities for
applications
Other Areas of Collaboration
• Indian Cancer Grid
• Protein folding analysis (using caGrid workflow, transport and security technology)
• caTissue – implement in the Software as a Service (SaaS) model
• Building a regional biobanking system—based on caTissue—at the Tata Memorial Centre & Hospital in Mumbai
Overall goal• Facilitate meeting priorities of NCI and ICMR towards discovering
and application of carcinogenesis biology into cancer prevention.• Discuss development of personalized approach to cancer
prevention and control through linking cancer biology to population diversity.
• To consider development and validation of cost effective biomarkers capable of early detection of cancers through global scientific, population and technological resources.
• Improve and share population databases from India and the United States to compare cancer biology, incidence, mortality, natural history, geographic and population diversity.
• To create understanding between India and Western nations to develop collaborative studies.
• Further details at http://canbio.in/overview.htm
Translational Cancer Prevention & Biomarkers Workshop 2011 @ Bangalore
Highlights• Founded in 2002 by two Yale trained physicians.
• Teleradiology services to hospitals around the globe.
• Teleradiology services include interpretation of all non-invasive imaging studies, namely CT, MRI, ultrasound, nuclear medicine studies and digitized Xrays.
• Emergency reports are provided within thirty minutes.
• Joint research partnerships with major technology vendors such as GE, to explore new techniques in 3D imaging analysis
• Further details at http://www.telradsol.com/
Innovative Startup
31
Collaborative Class Room
Supported Features:-
• Interface to Access grid• GSRM based data storage for maintaining course repositories• Indexing of course material based on key words
Website: http://ccr.garudaindia.in
Interoperability with International Grids
• Integrating technological components of Garuda and EGI– Glite and Globus– Customizing Gridway meta-scheduler – To run real life application across both Infrastructures
• Collaboration between CaBig and Garuda– Interoperation of technological service among these grids– Cancer Research application portability – Contribution to standards for using distributed computing in Health care
!
! !
! !
Thank you!很好, 谢谢
!
Grazie tanto!