the gridbus toolkit for building and deploying escience applications on utility grids rajkumar buyya...
Post on 27-Dec-2015
236 Views
Preview:
TRANSCRIPT
The Gridbus Toolkit for Building and Deploying eScience Applications on Utility Grids
Rajkumar Buyya
Fellow of Grid Computing
Grid Computing and Distributed Systems (GRIDS) Lab. Dept. of Computer Science and Software EngineeringThe University of Melbourne, Australia
www.gridbus.org
2
Outline
Introduction to eScience and Challenges Introduction to the Gridbus Project An Overview of Gridbus Components Grid Service Broker
Architecture Design and Implementation
Scheduling Algorithms BioGrid Demo OR Performance Evaluation
A Case Study in High Energy Physics Economy-based Scheduling in Data Grids
Summary
3
Prominent Grid Drivers: Emerging eScinece and eBusiness Apps
Next generation experiments, simulations, sensors, satellites, even people and businesses are creating a flood of data. They all involve numerous experts/resources from multiple organization in synthesis, modeling, simulation, analysis, and interpretation.
Life Sciences Digital Biology
Finance: Portfolio analysis
~PBytes/sec
Newswire & data mining:Natural language engineering
Astronomy
Internet & Ecommerce
High Energy Physics Brain Activity Analysis
Quantum Chemistry
4
E-Science Elements
Distributed instruments
Distributed computation
Distributed data
Peers sharing ideas and collaborative interpretation of data/resultsE-Scientist
2100 2100 2100 2100
2100 2100 2100 2100
Remote Visualization
Data & Compute Service
5
Grids have Emerged as Scalable Cyberinfrastructure for e-Science Applications
Grid Resource Broker
Resource Broker
Application
Grid Information Service
Grid Resource Broker
databaseR2R3
RN
R1
R4
R5
R6
Grid Information Service
6
Type of Services Modern Grids Offer
Computational Services – CPU cycles SETI@Home, NASA IPG, TeraGrid, I-Grid,…
Data Services Data replication, management, secure access--
LHC Grid/Napster Application Services
Access to remote software/libraries and license management—NetSolve
Information Services Extraction and presentation of data with meaning
Knowledge Services The way knowledge is acquired and managed—
data mining. Utility Computing Services
Towards a market-based Grid computing: Leasing and delivering Grid services as ICT utilities.
Computional Grid
Data Grid
ASP Grid
Information Grid
Knowledge Grid
Utility Grid
7
Grid Challenges
Security
Resource Allocation & Scheduling
Data locality
Network Management
System Management
Resource Discovery
Uniform Access
Computational Economy
Application Construction
8
Some Grid Initiatives Worldwide
Australia Nimrod-G Gridbus DISCWorld GrangeNet. APACGrid ARC eResearch?
Brazil OurGrid, EasyGrid LNCC-Grid + many others
China ChinaGrid – Education CNGrid - application
Europe UK eScience EU Grids.. and many more...
India I-Grid
Japan NAGERI
Korea...N*Grid
SingaporeNGP
USA Globus NASA IPG AccessGrid TeraGrid Cyberinfrasture and many more...
Industry Initiatives IBM On Demand Computing HP Adaptive Computing Sun N1 Microsoft - .NET Oracle 10g Infosys – Business Grid StorageTek –Grid.. and many more
Public Forums Global Grid Forum Australian Grid Forum Conferences:
CCGrid Grid P2P HPDC
http://www.gridcomputing.com
1.3 billion – 3 yrs
1 billion – 5 yrs
450million – 5 yrs
486million – 5 yrs
1.3 billion (Rs)
27 million
2? billion
120million – 5 yrs
9
The Gridbus Project @ Melbourne:Enable Leasing of ICT Services on Demand
WWG
World Wide Grid!On Demand Utility
Computing
Gridbus
Distributed Data
10
The Gridbus Project: http://www.gridbus.org
A multi-institutional “Open Source” R&D Project with focus on: Architecture, Specification, and Open Source Reference Implementation. Service-Oriented Grid, Utility Computing & Distributed Data and Computation Economy Scaling from Desktops, Clusters, Cluster Federation, Enterprise Grids to Global Grids.
Grid Market Directory and Web Services Grid Bank: Accounting and Transaction Management Visual Tools for Creation of Distributed Applications Workflow Composition and Deployment Services Data Grid Brokering and Grid Economy Services Data Replication Strategies GridSim Toolkit: Enhanced to support Data Grid, Reservation, etc. Libra: Economic Cluster Scheduler Coupling of Clusters and Computational Economy Alchemi: Harnessing .NET/Windows-based Resources WWG: Global Data Intensive Grid Testbed Application Enabler Projects:
High-Energy Physics , Astronomy, Brain Activity Analysis – Osaka U., Natural Language Processing, Portfolio Analysis – Spain, BioGrid - WEHI (via APACGrid), SensorGrid (NICTA), Medical Imaging (HFI)
Supported by:
11
Grid Economy: Methodology for Sustained Resourced Sharing and Managing Supply-and-Demand for Resources
12
New challenges of Grid Economy
Grid Service Providers (GSPs) How do I decide service pricing models ? How do I specify them ? How do I translate them into resource allocations ? How do I enforce them ? How do I advertise & attract consumers ? How do I do accounting and handle payments? …..
Grid Service Consumers (GSCs) How do I decide expenses ? How do I express QoS requirements ? How do I trade between timeframe & cost ? How do I map jobs to resources to meet my QoS needs? …..
They need mechanisms and technologies for value expression, value translation, and value enforcement.
14
Grid Node N
GRACE: A ReferenceService-Oriented Grid Architecture for Computational Economies
Grid Consumer
Pro
gra
mm
ing
En
viro
nm
ents
Grid Resource Broker
Grid Service Providers
Grid Explorer
Schedule Advisor
Trade Manager
Job ControlAgent
Deployment Agent
Trade Server
Resource Allocation
ResourceReservation
R1
Misc. services
Information Service
R2 Rm…
Pricing Algorithms
Accounting
Grid Node1
…
Grid Middleware Services
…
…
HealthMonitor
Grid Market Services
JobExec
Info ?
Secure
Trading
QoS
Storage
Sign-on
Grid Bank
Ap
pli
cati
on
s
Data Catalogue
15
Gridbus and Complementary Grid Technologies – realizing GRACE
AIXSolarisWindows Linux
.NET GridFabricSoftware
GridApplications
Core GridMiddleware
User-LevelMiddleware(Grid Tools)
GridBank
Grid Exchange & Federation
JVM
Grid Brokers:
X-Parameter Sweep Lang.
Gridbus Data Broker
MPI
Condor SGE TomcatPBS
Alchemi
Workflow
IRIX OSF1 Mac
Libra
Globus Unicore ……Grid
MarketDirectory
PDB
CDB
Worldwide Grid
GridFabricHardware
……
PortalsScience Commerce Engineering ……Collaboratories
……
Workflow Engine
Grid Storage Economy
Gri
d E
con
om
y NorduGrid XGrid
ExcellGrid
Nimrod-G
GRIDSIM
Gridscape
16
Gridbus Technologies
Application Construction Tools Visual Parametric Modeller (VPM)
Grid Economy Services Grid Market Directory
A Registry for publication of GSPs and their Services – VO/VE Grid Bank
A Grid Accounting Services Grid Trading Services
Data Grid Service Broker QoS based Scheduling of Distributed Data Oriented Apps on global Grids
Grid Workflow Management System Gridscape
Interactive Grid Testbed Portal Generator G-monitor
Grid Application Execution Management Portal GridSim
A Grid Simulation Toolkit Libra
Economy based Cluster Scheduling
17
Alchemi: .NET-based Enterprise Grid Platform & Web Services
InternetInternet
InternetInternet
Alchemi Worker Agent
Alchemi Manager
Alchemi Users
Web Services
Web Services
•SETI@Home like Model•General Purpose•Dedicated/Non-dedicate workers•Role-based Security•.NET and Web Services•C# Implementation•GridThread and Job Model Programming•Easy to setup and use
18
On Demand Assembly of Services: Putting Them All Together
Data Source
(Instruments/distributed sources)
Data Replicator(GDMP) ASP Catalogue
Grid Info Service
Grid Market Directory
GSP(Accounting Service)
GridbusGridBank
Data
GSP(e.g., UofM)
PEGSP
(e.g., VPAC)
PE
GSP(e.g., IBM)
CPUorPE
Grid Service (GS)
(Globus)
Alchemi
GS
GTS
Cluster Scheduler
Grid Service Provider (GSP)
(e.g., CERN)
PECluster Scheduler
Job
8
GridResource Broker
2
Visual Application Composer
Application CodeExplore
data1
36
45
Resu
lts9 7
Results+
Cost Info
10
11
Bill
12Data Catalogue
20
A Market-Oriented Grid Environment
“Solve this in5hrs for $20”
Grid Market Directory (GMD)
ResourceBroker
Grid Info. Service
GTS
GTS
(Grid Service Provider)
GTS
GTS GTS
“ register me as GSP”
“Give me list of GSPs & price?”
“ service available?”
(GTS - Grid Trade Server)
(GSP)
“ service available?”“
service available?”
(RB selects GSPs)
“Solve this in5hrs for $20”
Grid Market Directory (GMD)
ResourceBroker
Grid BankService
GTSGTS
GTSGTS
(Grid Service Provider)
GTSGTS
GTSGTS GTSGTS
“ register me as GSP”
“Give me list of GSPs & price?”
“ service available?”
(GTS - Grid Trade Server)
(GSP)
“ service available?”“
service available?”
(RB selects GSPs)
21
Grid Market Infrastructure
Grids need to provide an infrastructure that supports: (a) the creation of one or more GMP registries; (b) the contributors to register themselves as GSPs along with
their resources/application services that they wish to provide; (c) GSPs to publish themselves in one or more GMPs along
with service prices; and (d) Grid resource brokers to discover resources/services and
their attributes (e.g., access price and usage constraints) that meet user QoS requirements.
22
Grid Bank: Grid Transactions Authorization, Accounting, & Payment Infrastructure
Grid Resource
Broker (GRB)
GridBank Payment Module
Grid Trade Server
GridBank Charging Module
GridBank Server
Establish Service Costs
A p p l i c a t i o n s
Grid AgentGrid
Resource Meter
GridCheque
Deploy Agent and Submt Jobs
Usage Agreement
Resource Usage
GridCheque
Grid Service Provider (GSP)
GridCheque + Resource Usage (GSC Account Charge
Grid Service Consumer (GSC)
R1 R2 R3 R4
User
User
Grid Applications: Composition and Deployment – A Broker Perspective
Nimrod-G Broker: A Grid Broker for Computational Grids
Gridbus Broker: A Grid Service Broker for Data Grids
24
Grid Applications and Parametric Computing
Grid Applications and Parametric Computing
Bioinformatics: Bioinformatics: Drug Design / Protein Drug Design / Protein
ModellingModelling
SensitivitySensitivityexperiments experiments
on smog formationon smog formation
Natural Language Natural Language EngineeringEngineering
Ecological Modelling: Ecological Modelling: Control Strategies Control Strategies
for Cattle Tickfor Cattle Tick
Electronic CAD: Electronic CAD: Field Programmable Field Programmable
Gate ArraysGate ArraysComputer Graphics: Computer Graphics: Ray TracingRay Tracing
High Energy High Energy Physics: Physics:
Searching for Searching for Rare EventsRare Events
Finance: Finance: Investment Risk AnalysisInvestment Risk Analysis
VLSI Design: VLSI Design: SPICE SimulationsSPICE Simulations
Aerospace: Aerospace: Wing DesignWing Design
Network SimulationNetwork SimulationAutomobile:Automobile:
Crash Simulation Crash Simulation
Data MiningData Mining
Civil Engineering:Civil Engineering:Building Design Building Design
astrophysics astrophysics
25
Thesis
Build a task farming application (parameter sweep or bag of tasks) and execute it on Grid within “T” hours or early and cost not exceeding $M.
Manual
Automated
Three Options/Solutions: Using pure Globus commands Build your own Distributed App & Scheduler Use Gridbus Resource Broker
to compose and schedule
The Gridbus Grid Service Broker for Data Grid Applications
Builds on the Nimrod-G Computational Grid Broker and Computational Economy [Buyya, Abramson, Giddy, Monash University, 1999-
2001]And
Extends its notion for Data and Service Grids
27
A resource broker for scheduling task farming data Grid applications with static or dynamic parameter sweeps on global Grids.
It uses computational economy paradigm for optimal selection of computational and data services depending on their quality, cost, and availability, and users’ QoS requirements (deadline, budget, & T/C optimisation)
Key Features A single window to manage & control experiment Programmable Task Farming Engine Resource Discovery and Resource Trading Optimal Data Source Discovery Scheduling & Predications Generic Dispatcher & Grid Agents Transportation of data & sharing of results Accounting
Grid Service Broker (GSB)
28
Gridbus Broker at a GlanceHome Node/Portal
-PBS-Condor-SGE
Alchemi Globus
Job manager
fork() batch()
GridbusBroker
Gateway
Unicore
fork()
batch() -PBS-Condor-Alchemi
Data Store
Access Technology
Grid FTPSRB
Gridbusagent
Data Catalog
Credential RepositoryMyProxy
29
Gridbus Broker Architecture
Grid Middleware
Gridbus Client Gridbus ClientGribus Client
Grid Info Server
Schedule Advisor
Trading Manager
Gridbus Farming Engine
RecordKeeper
Grid Explorer
GE GIS, NWSTM TS
RM & TS
Grid Dispatcher
RM: Local Resource Manager, TS: Trade Server
G
G
CU
Globus enabled node.A
L
Alchemi enabled node.
(Data Grid Scheduler)
DataCatalog
DataNode
Unicore enabled node.
$
$
$
App, T, $, Opt
(Bag of Tasks Applications)
30
Gridbus Services for eScience applications
Application Development Environment: XML-based language for composition of task farming (legacy)
applications as parameter sweep applications. Task farming APIs for new applications. Web APIs (e.g., Portlets) for Grid portal development. Workflow interface and Gridbus-enabled workflow engine.
Resource Allocation and Scheduling Dynamic discovery of optional computational and data nodes
that meet user QoS requirements. Hide Low-Level Grid Middleware interfaces
Globus, Alchemi, Unicore, NorduGrid, XGrid, etc.
31
Gridbus Broker: XML file
<parameter name=“X" type="integer"> <domain> <range><value from="1" to="10"/> <interval type="step"> 1</interval> </range> </domain></parameter><parameter name=“Y" type="integer"> <domain> <single> <value> 1</value> </single> </domain></parameter><task> <type>main</type> <copy> <source location="local" file="calc.$OS"/> <destination location="node" file="calc"/> </copy> <execute location="node"> <command>./calc $X $Y</command> </execute> <copy> <source location="node" file="output"/> <destination location="local" file="output.$jobname"/> </copy> </task>
32
Portal-based Access to Grid Broker for Launching and Steering Applications
Grid BrokerGrid Broker
World-Wide Grid
34
Excel Plugin to Access Gridbus Services
Excel
ExcelGrid Add-In
ExcelGrid Runner
ExcelGridJob
ExcelGrid Middleware
Gridbus Broker
Enterprise Grid
2100
2100
2100
2100
2100
2100
2100
2100
35
Discover Discover ResourcesResources
Distribute JobsDistribute Jobs
Establish Establish RatesRates
Meet requirements ? Remaining Meet requirements ? Remaining Jobs, Deadline, & Budget ?Jobs, Deadline, & Budget ?
Evaluate & Evaluate & RescheduleReschedule
Discover Discover More More
ResourcesResources
Compose & Compose & ScheduleSchedule
Adaptive Scheduling Steps
36
Deadline (D) and Budget (B) Constrained Scheduling Algorithms
Algorithm Execution Time (D)
Execution Cost (B)
Compute Grid
Data Grid
Cost Opt Limited by D Minimize Yes Yes
Cost-Time Opt
Minimize if possible
Minimize Yes
Time Opt Minimize Limited by B Yes Yes
Conservative-Time Opt
Minimize Limited by B, jobs have guaranteed minimum budget
Yes
37
Sample Applications of Gridbus Broker
Molecular Docking - WEHI Drug Discovery
Brain Activity Analysis – Osaka University Neuroscience studies
Natural Language Engineering – Melbourne NLP Indexing of newswire data
High Energy Physics – School of Physics/Melbourne Belle experiment data analysis
Finance - Portfolio Analysis – U. Comp. Madrid/Spain Investment risk analysis
Astronomy Australian Virtual Observatory
Spreadsheet Processing Microsoft Excel
39
Case Study: High Energy Physics
What is High Energy Physics? (HEP) Study of the fundamental constituents of matter and
forces. High Energy Physics - using H.E. enables the
probing of smaller distances/structures and study in early-universe like environ.
Particle Physics - quanta of matter/forces and their properties
The Belle Experiment KEK B-Factory, Japan Investigating fundamental violation of symmetry in
nature (Charge Parity) which may help explain the universal matter – antimatter imbalance.
Collaboration 400 people, 50 institutes 100’s TB data currently
40
Case Study: Event Simulation and Analysis
B0->D*+D*-Ks
• Simulation and Analysis Package - Belle Analysis Software Framework (BASF)• Experiment in 2 parts – Generation of Simulated Data and Analysis of the distributed data• Only the Analysis is discussed here
41
Australian Belle Data Grid Testbed
Grid Service Broker
Replica Catalog
AARNET
NWS NameServer
VirtualOrganization
Analysis Request
Analysis Results
CertificateAuthority
NWSSensor
GridFTPGRIS
GlobusGatekeeper
Dual Intel Xeon 2.8 Ghz, 2 GB RAM
NWSSensor
GridFTPGRIS
GlobusGatekeeper
Dual Intel Xeon 2.8 Ghz, 2 GB RAM
NWSSensor
GridFTPGRIS
GlobusGatekeeper
Dual Intel Xeon 2.8 Ghz, 2 GB RAM
GRIDS Lab, University of Melbourne
Dept. of Physics,University of Sydney
ANU, Canberra
Dept. of Computer Science, University of Adelaide
NWSSensor
GridFTPGRIS
GlobusGatekeeper
Intel Pentium 2.0 Ghz, 512 MB RAM
Dept. of Physics,University of Melbourne
NWSSensor
GridFTPGRIS
GlobusGatekeeper
Dual Intel Xeon 2.8 Ghz, 2 GB RAM
42
Case Study: Input File for Analysis
parameter jobf Gridfile lfn:/users/winton/fsimddks/fsimdata*.mdst;task main copy runme.grid2 node:runme.grid2 node:execute ./runme.grid2 $jobf $jobnameendtask
• Dynamic parameter defined to describe an input data file
• Logical file name pointing to the location in the replica catalog that contains a mapping to where the physical files are present.
100 data files (30MB each) were equally distributed among the five nodes
43
Resources Used and their Service Price
Organization
Node details Role Cost (in G$/CPU-sec)
CS,UniMelb belle.cs.mu.oz.au4 CPU, 2GB RAM, 40 GB HD, Linux
Broker host, Data host, NWS server
N.A. (Not used as a compute resource)
Physics, UniMelb fleagle.ph.unimelb.edu.au1 CPU, 512 MB RAM, 40 GB HD, Linux
Replica Catalog host, Data host, Compute resource, NWS sensor
2
CS, University of Adelaide
belle.cs.adelaide.edu.au4 CPU (only 1 available) , 2GB RAM, 40 GB HD, Linux
Data host, NWS sensor
N.A. (Not used as a compute resource)
ANU, Canberra belle.anu.edu.au4 CPU, 2GB RAM, 40 GB HD, Linux
Data host, Compute resource, NWS sensor
4
Dept of Physics, USyd
belle.physics.usyd.edu.au4 CPU (only 1 available), 2GB RAM, 40 GB HD, Linux
Data host, Compute resource, NWS sensor
4
VPAC, Melbourne brecca-2.vpac.org180 node cluster (only head node used), Linux
Compute resource,NWS sensor
6
44
Network Cost (in Grid $/Currency!)
NETWORK COSTS BETWEEN THE DATA HOSTS AND THE COMPUTE RESOURCES
(IN G$ PER MB) Data Node
Compute Node ANU UniMelb
Physics Sydney Physics
VPAC
ANU 0 34.0 31.0 38.0 Adelaide CS 34.0 36.0 31.0 33.0 UniMelb Physics 40.0 0 32.0 39.0 UniMelb CS 36.0 30.0 33.0 37.0 Sydney Physics 35.0 33.0 0 37.0
45
Deploying Application Scenario
A data grid scenario with 100 jobs and each accessing remote data of ~30MB
Deadline: 3hrs. Budget: G$ 60K Scheduling Optimisation Scenario:
Minimise Time Minimise Cost
Results:
SUMMARY OF EVALUATION RESULTS
Scheduling strategy Total Time Taken (mins.)
Compute Cost (G$)
Data Cost (G$)
Total Cost (G$)
Cost Minimization 71.07 26865 7560 34425 Time Minimization 48.5 50938 7452 58390
46
Time Minimization in Data Grids
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Time (in mins.)
Nu
mb
er
of
job
s c
om
ple
ted
fleagle.ph.unimelb.edu.au belle.anu.edu.au belle.physics.usyd.edu.au brecca-2.vpac.org
47
Results : Cost Minimization in Data Grids
0
10
20
30
40
50
60
70
80
90
100
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63
Time(in mins.)
Nu
mb
er o
f jo
bs
com
ple
ted
fleagle.ph.unimelb.edu.au belle.anu.edu.au belle.physics.usyd.edu.au brecca-2.vpac.org
48
SUMMARY OF EVALUATION RESULTS
Scheduling strategy Total Time Taken (mins.)
Compute Cost (G$)
Data Cost (G$)
Total Cost (G$)
Cost Minimization 71.07 26865 7560 34425 Time Minimization 48.5 50938 7452 58390
Observation
Organization
Node details Cost (in G$/CPU-sec) Total Jobs Executed
Time Cost
CS,UniMelb belle.cs.mu.oz.au4 CPU, 2GB RAM, 40 GB HD, Linux
N.A. (Not used as a compute resource)
-- --
Physics, UniMelb fleagle.ph.unimelb.edu.au1 CPU, 512 MB RAM, 40 GB HD, Linux
2 3 94
CS, University of Adelaide
belle.cs.adelaide.edu.au4 CPU (only 1 available) , 2GB RAM, 40 GB HD, Linux
N.A. (Not used as a compute resource)
-- --
ANU, Canberra belle.anu.edu.au4 CPU, 2GB RAM, 40 GB HD, Linux
4 2 2
Dept of Physics, USyd
belle.physics.usyd.edu.au4 CPU (only 1 available), 2GB RAM, 40 GB HD, Linux
4 72 2
VPAC, Melbourne brecca-2.vpac.org180 node cluster (only head node used), Linux
6 23 2
49
Grid Workflow Management System and Broker Services
DatabaseDatabase
Workflow Submission Handler
Workflow Language Parser
Tasks Parameters Dependencies
Resource Discovery
Dispatcher Data Movement
GMD
ReplicaCatalog
Gridbus Broker Globus
Web services HTTP GridFTP
Data transfer
Workflow Planner Application Composition …… Scientific Portal
Workflow Enactment Engine
Workflow description & QoS
Info Service
MDS
Workflow Scheduler
50
The GridSim ToolkitA Java based tool for Grid Scheduling Simulations
Basic Discrete Event Simulation Infrastructure
Virtual Machine (Java, cJVM, RMI)
PCs ClustersWorkstations
. . .
SMPs Distributed Resources
GridSim Toolkit
Application Modeling
InformationServices
Resource Allocation
Grid Resource Brokers or Schedulers’s Simulation
Statistics
Resource Modeling and Simulation (with Time and Space shared schedulers)
Job Management
ClustersSingle CPU ReservationSMPs Load Pattern
Application Configuration
Resource Configuration
Visual Modeler
Grid Scenario
Network
SimJava Distributed SimJava
Resource Entities
Output
Application, User, Grid Scenario’s Input and Results
Add your own policy for resource allocation
52
Summary and Conclusion
Introduced requirements for an eScience application
Demonstrated suitability of Grid computing as Cyberinfrastructure for eScience and eBusiness.
Grids exploit synergies that result from cooperation of autonomous entities:
Resource sharing, dynamic provisioning, and aggregation at global level.
Grids allow users to dynamically lease Grid services at runtime based on their quality, cost, availability, and users QoS requirements.
Delivering ICT services as computing utilities. Grids offer enormous opportunities for realizing
eScience and eBusiness at global level.
top related