Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
eSci
ence
W T Hewitt
Monday, April 10, 2023UCISA MeetingEdinburgh
What is e-Science & What is What is e-Science & What is the Grid?the Grid?
Supercomputing, Visualization & e-Science2 escigriducisa/03
Agenda
What is Grid & eScience?
The Global Programme
The UK eScience Programme
Impacts
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
eSci
ence
What is e-Science & the Grid?
Supercomputing, Visualization & e-Science4 escigriducisa/03
Why Grids?
Large-scale science and engineering are done through – the interaction of people, – heterogeneous computing resources, information systems, and instruments, – all of which are geographically and organizationally dispersed.
The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and engineering.
From Bill Johnston 27 July 01
Supercomputing, Visualization & e-Science5 escigriducisa/03
The Grid…
"…is the web on steroids." "…is Napster for Scientists" [of data grids] "…is the solution to all your problems." "…is evil." [a system manager, of Globus] "…is distributed computing re-badged." "…is distributed computing across multiple administrative
domains"– Dave Snelling, senior architect of UNICORE
Supercomputing, Visualization & e-Science6 escigriducisa/03
[…provides] "Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource"– From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”
"…enables communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of central location, central control, omniscience, existing trust relationships."
Supercomputing, Visualization & e-Science7 escigriducisa/03
CERN: Large Hadron Collider (LHC)
Raw Data: 1 Petabyte / secFiltered 100Mbyte / sec = 1 Petabyte / year = 1 Million CD ROMsRaw Data: 1 Petabyte / secFiltered 100Mbyte / sec = 1 Petabyte / year = 1 Million CD ROMs
CMS Detector
Supercomputing, Visualization & e-Science8 escigriducisa/03
Why Grids?
A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour;
A biologist combines a range of diverse and distributed resources (databases, tools, instruments) to answer complex questions;
1,000 physicists worldwide pool resources for petaop analyses of petabytes of data
Civil engineers collaborate to design, execute, & analyze shake table experiments
From Steve Tuecke 12 Oct. 01
Supercomputing, Visualization & e-Science9 escigriducisa/03
Why Grids? (contd.)
Climate scientists visualize, annotate, & analyze terabyte simulation datasets
An emergency response team couples real time data, weather model, population data
A multidisciplinary analysis in aerospace couples code and data in four companies
A home user invokes architectural design functions at an application service provider
From Steve Tuecke 12 Oct. 01
Supercomputing, Visualization & e-Science10 escigriducisa/03
Broader Context
“Grid Computing” has much in common with major industrial thrusts– Business-to-business, Peer-to-peer, Application Service Providers, Storage
Service Providers, Distributed Computing, Internet Computing…
Sharing issues not adequately addressed by existing technologies – Complicated requirements: “run program X at site Y subject to community
policy P, providing access to data at Z according to policy Q”
– High performance: unique demands of advanced & high-performance systems
Supercomputing, Visualization & e-Science11 escigriducisa/03
What is the Grid?
“ Grid computing [is] distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation...we review the "Grid problem", which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources - what we refer to as virtual organizations."
From "The Anatomy of the Grid: Enabling Scalable Virtual Organizations" by Foster, Kesselman and Tuecke
Supercomputing, Visualization & e-Science12 escigriducisa/03
New Book
Supercomputing, Visualization & e-Science13 escigriducisa/03
What is the Grid?
Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations
On-demand, ubiquitous access to computing, data, and all kinds of services
New capabilities constructed dynamically and transparently from distributed services
No central location, No central control, No existing trust relationships, Little predetermination
Uniformity Pooling Resources
Supercomputing, Visualization & e-Science14 escigriducisa/03
e-Science and the Grid
‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’
‘e-Science will change the dynamic of the way science is undertaken.’
John Taylor,
Director General of Research Councils,
Office of Science and Technology
Supercomputing, Visualization & e-Science15 escigriducisa/03
Why GRID?
VERY VERY IMPORTANT
The GRID is one way to realise the e-Science vision.
WE ARE TRYING TO DO E-SCIENCE!
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
eSci
ence
Grid Middleware
Diverse global services
Gridservices
Local OS
Supercomputing, Visualization & e-Science17 escigriducisa/03
Common principles
Single sign-on– Often implying Public Key Infrastructure (PKI)
Standard protocols and services Respect for autonomy of resource owner Layered architectures Higher-level infrastructures hiding heterogeneity of lower
levels Interoperability is paramount
Supercomputing, Visualization & e-Science18 escigriducisa/03
Grid Middleware
Middleware Globus UNICORE Legion and Avaki
Scheduling Sun Grid Engine Load Sharing Facility (LSF)
– from Platform Computing
OpenPBS and PBS(Pro)– from Veridian
Maui scheduler Condor
– could also go under middleware
Data Storage Resource Broker (SRB) Replica Management OGSA-DAI
Web services (WSDL, SOAP, UDDI) IBM Websphere Microsoft .NET Sun Open Net Environment (Sun
ONE)
PC Grids
Peer-to-Peer computing
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
eSci
ence
Data-oriented Grids
Supercomputing, Visualization & e-Science20 escigriducisa/03
Data-oriented middleware
Wide-area distributed file systems (e.g. AFS) Storage Resource Broker (SRB)
– UCSD and SDSC– Provide transparent access to data storage– Centralised architecture– Motivated by experiences of HPC users, not database users– Little enthusiasm from UK e-Science programme
OGSA-DAI– Database Access and Integration– Strategic contribution of UK e-Science programme– Universities of Edinburgh, Manchester, Newcastle; IBM, Oracle– Alpha release January 2003
Globus Replica Management software– Next up!
Supercomputing, Visualization & e-Science21 escigriducisa/03
Data Grids forHigh Energy Physics
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm
~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
Supercomputing, Visualization & e-Science22 escigriducisa/03
Data Intensive Issues Include …
Harness [potentially large numbers of] data, storage, network resources located in distinct administrative domains
Respect local and global policies governing what can be used for what
Schedule resources efficiently, again subject to local and global constraints
Achieve high performance, with respect to both speed and reliability
Catalog software and virtual data
Supercomputing, Visualization & e-Science23 escigriducisa/03
Desired Data Grid Functionality
High-speed, reliable access to remote data Automated discovery of “best” copy of data Manage replication to improve performance Co-schedule compute, storage, network “Transparency” wrt delivered performance Enforce access control on data Allow representation of “global” resource allocation
policies
Supercomputing, Visualization & e-Science24 escigriducisa/03
Grid Standards
Grid Standards Bodies:– IETF: Home of the Network Infrastructure Standards
– W3C: Home of the Internet
– GGF: Home of the Grid
GGF Defines the Open Grid Services Architecture– OGSI is the Infrastructure part of OGSA
– OGSI Public comment draft submitted 14 February 2003
Key OGSA Areas of Standards Development– Job management interfaces
– Resources & Discovery
– Security
– Grid Economy and Brokering
Supercomputing, Visualization & e-Science25 escigriducisa/03
What is OGSA?
““Web ServicesWeb Serviceswith Attitude!”with Attitude!”
Also known as
"Open Grid Services Architecture"
Supercomputing, Visualization & e-Science26 escigriducisa/03
Aside: What are Web Services?
Loosely Coupled Distributed Computing– Think Java RMI or C remote procedure call
Text Based Serialization– XML: “Human Readable” serialization of objects
IBM and Microsoft lead– Web Services Description Language (WSDL)
– W3C Standardization
Three Parts– Messages (SOAP)
– Definition (WSDL)
– Discovery (UDDI)
Supercomputing, Visualization & e-Science27 escigriducisa/03
Web Services in Action
UDDI
Publish/WSDLSearch
Client https/SOAP
Java/C/Browser
LegacyEnterprise
Application
Database ...
WSPlatform
InterStage, WebSphere, J2EE, GLUE, SunOne, .NET
Any protocol
Supercomputing, Visualization & e-Science28 escigriducisa/03
Enter Grid Services
Experiences of Grid computing (and business process integration) suggest similar extensions to Web Services
State– Service Data Model
Persistence and Naming– Two Level Naming (GSH, GSR)– Allows dynamic migration and QoS adaptation
Lifetime Management– Self healing and ‘soft’ garbage collection.
Standard PortTypes– Guarantee of minimal level of service– Beyond P2P is Federation through Mediation
Explicit Semantics– Grid Services specify semantics on top of Web Service syntax.– PortType Inheritance
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
eSci
ence
If one GRID is good then Many GRIDS must be better
Supercomputing, Visualization & e-Science30 escigriducisa/03
US Grid Projects
NASA Information Power Grid DOE Science Grid NSF National Virtual
Observatory NSF GriPhyN DOE Particle Physics Data Grid NSF DTF TeraGrid DOE ASCI DISCOM Grid
DOE Earth Systems Grid DOE FusionGrid NEESGrid NIH BIRN NSF iVDGL
Supercomputing, Visualization & e-Science31 escigriducisa/03
National Grid Projects
Japan – Grid Data Farm, ITBL Netherlands – VLAM, DutchGrid Germany – UNICORE, Grid proposal France – Grid funding approved Italy – INFN Grid Eire – Grid-Ireland Poland – PIONIER Grid Switzerland - Grid proposal Hungary – DemoGrid, Grid proposal ApGrid – AsiaPacific Grid proposal
Supercomputing, Visualization & e-Science32 escigriducisa/03
EU GridProjects
DataGrid (CERN, ..) EuroGrid (Unicore) DataTag (TTT…) Astrophysical Virtual
Observatory GRIP (Globus/Unicore) GRIA (Industrial applications) GridLab (Cactus Toolkit) CrossGrid (Infrastructure
Components) EGSO (Solar Physics) COG (Semantic Grid)
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
eSci
ence
UK e-Science Programme
Supercomputing, Visualization & e-Science34 escigriducisa/03
£80m Collaborative projects
E-ScienceSteering
Committee
DG Research Councils
Director Director’s
Management RoleDirector’s
Awareness and Co-ordination Role
Generic Challenges EPSRC (£15m), DTI (£15m)
Industrial Collaboration (£40m)
Academic Application SupportProgramme
Research Councils (£74m), DTI (£5m)
PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m)
Grid TAG
From Tony Hey 27 July 01
UK e-Science Programme
Supercomputing, Visualization & e-Science35 escigriducisa/03
Key Elements
Development of Generic Grid Middleware Network of Grid Core Programme e-Science Centres
– National Centre http://www.nesc.ac.uk– Regional Centres http://www.esnw.ac.uk/
Grid IRC Grand Challenge Project Support for e-Science Pilots Short term funding for e-Science demonstrators Grid Network Team Grid Engineering Team Grid Support Centre Task Forces
– Database lead by Norman Paton– Architecture lead by Malcolm Atkinson
International Involvement
Adapted from Tony Hey 27 July 01
Supercomputing, Visualization & e-Science36 escigriducisa/03
Cambridge
Newcastle
Edinburgh
Oxford
Glasgow
Manchester
Cardiff
Southampton
London
Belfast
DL
RALHinxton
National & Regional Centres
Centres donate equipment to make a Grid
Supercomputing, Visualization & e-Science37 escigriducisa/03
e-Science Demonstrators
Dynamic Brain Atlas Biodiversity Chemical Structures Mouse Genes Robotic Astronomy Collaborative Visualisation Climateprediction.com Medical Imaging/VR
Supercomputing, Visualization & e-Science38 escigriducisa/03
Grid Middleware R&D
£16M funding available for industrial collaborative projects £11M allocated to Centres projects plus £5M for ‘Open
Call’ projects Set up Task Forces
– Database Task Force
– Architecture Task Force
– Security Task Force
Supercomputing, Visualization & e-Science39 escigriducisa/03
Grid Network Team
Expert group to identify end-to-end network bottlenecks and other network issues
– e.g. problems with multicast for Access Grid
Identify e-Science project requirements Funding £0.5M traffic engineering/QoS project with PPARC, UKERNA and
CISCO– investigating MPLS using SuperJANET network
Funding DataGrid extension project investigating bandwidth scheduling with PPARC
Proposal for ‘UKLight’ lambda connection to Chicago and Amsterdam
Supercomputing, Visualization & e-Science40 escigriducisa/03
UK e-Science Pilot Projects
GRIDPP (PPARC) ASTROGRID (PPARC) Comb-e-Chem (EPSRC) DAME (EPSRC) DiscoveryNet (EPSRC) GEODISE (EPSRC) myGrid (EPSRC) RealityGrid (EPSRC)
Climateprediction.com (NERC) Oceanographic Grid (NERC) Molecular Environmental Grid
(NERC) NERC DataGrid (+ OST-CP) Biomolecular Grid (BBSRC) Proteome Annotation Pipeline
(BBSRC) High-Throughput Structural
Biology (BBSRC) Global Biodiversity (BBSRC)
RASMOL
Supercomputing, Visualization & e-Science41 escigriducisa/03
e-Science Centres of Excellence
Birmingham/Warwick – Modelling Bristol – Media UCL – Networking White Rose Grid – Leeds, York, Sheffield Lancaster – Social Science Leicester – Astronomy Reading - Environment
Supercomputing, Visualization & e-Science42 escigriducisa/03
Cambridge
Newcastle
Edinburgh
Oxford
Glasgow
Manchester
Cardiff
Soton
London
BelfastDL
RL Hinxton
UK e-Science Grid
Supercomputing, Visualization & e-Science43 escigriducisa/03
UK e-Science Funding
First Phase: 2001 –2004 Application Projects
– £74M
– All areas of science and engineering
Core Programme– £15M + £20M (DTI)
– Collaborative industrial projects
Second Phase: 2003 –2006
Application Projects– £96M– All areas of science and
engineering
Core Programme– £16M – Core Grid Middleware– DTI follow-on?
Supercomputing, Visualization & e-Science44 escigriducisa/03
EPSRC: Computer Science for e-Science– £9M, 18 projects so far
ESRC: National e-Social Science Centre + 3 hubs– ~£6M
PPARC MRC BBSRC
Supercomputing, Visualization & e-Science45 escigriducisa/03
Core Programme: Phase 2
UK e-Science Grid/Centres and e-Science Institute Grid Operation Centre and Network Monitoring Core Middleware engineering National Data Curation Centre e-Science Exemplars/New Opportunities Outreach and International involvement
Supercomputing, Visualization & e-Science46 escigriducisa/03
Other Activities
Security Task Force– Joint fund key security projects with EPSRC & JCSR and coordinated effort
with NSF NMI Internet2 projects
– JCSR £2M call in preparation
UK Digital Curation Centre– £3M, Core e-Science + JCSR
JCSR– £3M per annum
Supercomputing, Visualization & e-Science47 escigriducisa/03
SR2004 – e-Science Infrastructure
Persistent UK e-Science Research Grid Grid Operations Centre UK Open Middleware Infrastructure Institute National e-Science Institute UK Digital Curation Centre AccessGrid Support Service e-Science/Grid collaboratories Legal Service International Standards Activity
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
eSci
ence
Conclusions
Supercomputing, Visualization & e-Science49 escigriducisa/03
Today’s Grid
A Single System Image Transparent wide-area access to
large data banks Transparent wide-area access to
applications on heterogeneous platforms
Transparent wide-area access to processing resources
Security, certification, single sign-on authentication, AAA
– Grid Security Infrastructure,
Data access,Transfer & Replication – GridFTP, Giggle
Computational resource discovery, allocation and process creation
– GRAAM, Unicore, Condor-G
Supercomputing, Visualization & e-Science50 escigriducisa/03
Reality Checks!!
The Technology is Ready– Not true — its emerging
• Building middleware, Advancing Standards, Developing, Dependability
• Building demonstrators.
• The computational grid is in advance of the data intensive middleware
• Integration and curation are probably the obstacles
• But!! It doesn’t have to be all there to be useful.
We know how we will use grid services– No — Disruptive technology
• Lower the barriers of entry.
Supercomputing, Visualization & e-Science51 escigriducisa/03
Grid Evolution
1st Generation Grid– Computationally intensive, file access/transfer– Bag of various heterogeneous protocols & toolkits– Recognises internet, Ignores Web– Academic teams
2nd Generation Grid– Data intensive -> knowledge intensive– Services-based architecture– Recognises Web and Web services– Global Grid Forum– Industry participation
We are here!
Supercomputing, Visualization & e-Science52 escigriducisa/03
Impacts
It's all about interoperability, really. Web & Grid Services are creating a new marketplace for
components If you're concerned with systems integration or internet
delivery of services, embrace Web Services technologies now. You'll be ready for Grid Services when they're ready for you.– If you're a developer, get Web Services on your CV
– If you're an IT manager, collect Web Service expertise through hiring or training
Software license models must adapt
Supercomputing, Visualization & e-Science53 escigriducisa/03
I don't want to share!Do I need a grid?
Supercomputing, Visualization & e-Science54 escigriducisa/03
In conclusion
The GRID is not, and will not, be free– must pay for resources
What have we to show for £250M?
Supercomputing, Visualization & e-Science55 escigriducisa/03
Acknowledgements
Carole Goble Stephen Pickles Paul Jeffreys
University of Manchester Academic collaborators
Industrial collaborators
Funding Agencies: DTI, EPSRC, NERC, ESRC, PPARC
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
eSci
ence
World Leading Supercomputing Service, Support and Research
Bringing Science and Supercomputers Together
www.man.ac.uk/[email protected]
SVE @ Manchester ComputingSVE @ Manchester Computing