datagrid is a project funded by the european union ict kenniscongres 2003 grids – achtergronden en...
TRANSCRIPT
DataGrid is a project funded by the European Union ICT KennisCongres 2003
Grids – Achtergronden en praktijkin het EU Data Grid
David Groep, [email protected]
http://www.dutchgrid.nl/
http://www.eu-datagrid.org/http://www.edg.org/
Dutc hG rid
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 3
Grid – a vision
The GRID: networked data processing centres and ”middleware” software as the “glue” of resources.
Researchers perform their activities regardless geographical location, interact with colleagues, share and access data
Scientific instruments and experiments provide huge amounts of data
next: beyond distributed computing
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 4
Beyond distributed computing
A grid integrates resources that are
not owned or administered by one single organisation
speak a common, open protocol … that is generic
working as a coordinated, transparent system
And … can be used by many people from multiple organisations
that work together in one Virtual Organisation
Checklist items based on: Ian Foster What is the Grid? July 2002
next: virtual organisations
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 5
Virtual Organisations
A VO is a temporary alliance of stakeholders Users
Service providers
Information Providers
A set of individuals or organisations, not under single hierarchical control, temporarily joining forces to solve a particular problem at hand, bringing to the collaboration a subset of their resources, sharing those at their discretion and each under their own conditions.
Viewgraph: Foster, Kesselman, Tuecke, the Globus Project
next: common and open protocols
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 7
Common and open protocols
Applications
Grid Services GRAM
Grid Security Infrastructure (GSI)
Grid FabricFARMS Supers Desktops TCP/IP Apparatus
Application ToolkitsDUROC MPICH-G2Condor-G
GridFTPInformation
VLAM-G
• Resources must talk standard protocols …
• … for interoperability of application toolkits
Replica
DBs
next: protocol standards
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 9
Standard protocols
New Grid protocols based on popular Web Services
Open Grid Services Architecture
service discovery
many different bindings
easily integrated in hosting environments (Java, WebSphere, .NET)
is entirely generic
adds: transient services, stateful services
Global Grid Forum (GGF) promotes the open standards process
next: access in a coordinated way
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 10
Access in a coordinated way
New ‘qualities-of-service’
Transparently crossing of domain boundariessatisfying constraints of
site autonomy
authenticity, integrity, confidentiality
single sign-on to all services
ways to address services collectively
preferably via portals and visual programming
next: example GOME analysis
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 11
Example: GOME analysis
Task: ozone is the component in the atmosphere that protects us from harmful UV radiation. Its concentration varies widely. What is happening?
the EnviSat satellite is orbiting the earth and measuring light absorption in the atmosphere
the absorption is related to the ozone concentration,but needs instrument corrections
ground-based observation give absolute concentrations linking both datasets can give us the concentration everywhere terabytes of data come in at several ground stations,
and various labs need the final products
Grid can provide a good solution to this problem
next: GOME analysis on the Grid, domains
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 12
Example: Ozone Analysis on the Grid
10100100010111101001000100101101010010001000101011010100101010100001011110101001010011010010010111001001001010010011111010101001010111001010101010101001001001111101010100100010100101100010100000101010001010010001011110100100010010110101001000100010101101010010101010000101111010100101001101001001011100100100101001001111101010100101011100101010101010100100100111110101010010001010010110001010000010101000
NOPREGO
OPERA
LIDARdatabase
validation
visualize
resourcebroker
next: DataGrid overview
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 13
A Working Grid: the EU DataGrid
Objective:
build the next generation computing infrastructure providing intensive computation and analysis of shared large-scale
databases, from hundreds of TeraBytes to PetaBytes, across widely distributed scientific communities
official start in 2001
21 partners
in the Netherlands: NIKHEF, SARA, KNMI
Pilot applications: earth observation, bio-medicine, high-energy physics
aim for production and stability
next: history of grids
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 16
Realising the Grid Vision
Grid was the logical next step in the end of the 1990:
Harnassing desktop power became commonplace – 1988: Condor, later: SETI@Home, Entropia, Distributed.NET
Peer-to-peer data access protocols emerged– 1999: Napster, later: Gnutella, KaZaa, BitTorrent
Network access became extremely fast– 1997: wide area bandwidth starts to double every 9 months!
1997: Globus starts developing basic middleware– 1996: middleware by Legion, 2000: Unicore
Massive take-up of the Grid vision in 1999– lead in Europe by the EU DataGrid– others include: NASA-IPG, CrossGrid, GridLab, PPDG, Alliance, …
next: the EU DataGrid project
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 17
Grid Security Infrastructure
Crucial in Grid computing: it gives Single Sign-On
GSI uses a Public Key Infrastructure with proxy-ing and delegation
multiple VOs per user, groups and role support
C=IT/O=INFN /L=CNAF/CN=Pinco Palla/CN=proxy
VOMSpseudo-cert
Query
Authentication
Request
AuthDBVOMS
pseudo-cert
connect to providers Gr i
d S
erv
ice
1G
r id
Se
rvic
e 1
Se
rvic
e 2
Se
rvic
e 2contracts
next: information services overview
VOMS overview: Luca dell’Agnello and Roberto Cecchini, INFN and EDG WP6
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 19
What is needed to get the work done
Fabric information what are the resources (computers, disk, tape) available to my VO?
how do I access these resources (the “contact information”)?
“Physical” meta-data when was this dataset written?
where can I find copies of it ‘close’ to me?
Contextual meta-data or ‘information’ Which datasets contain feature “X”?
Which DNA sequence corresponds to this protein?
Actual storage, processing power, network connectivity
next: spitfire
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 20
Spitfire: Access to Data Bases
based on common EDG Trust and Authorization Manager
VO and Role mapping to data base views
Access via
Browser
Web Service
Commands
Screenshots: Gavin McCance, Glasgow University and EDG WP2
next: R-GMA
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 21
Grid information: R-GMA
Relational Grid Monitoring Architecture
a Global Grid Forum standard
Implemented by a relational model
used by grid brokers
next: RLS and RMC
Screenshots: R-GMA Browser, Steve Ficher et al., RAL and EDG WP3
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 22
Replica Location Service
Search on file attributes (date, name, …)
Find replicas on (close) Storage Elements
SE1SARA
SE2CERN
cacheUvA DAS2 CE
DAS-2
CECERN
higgs1.dat, ... sara:atlas/data/higgs1.dat
cern:lhc/atlas/higgses/1.dathiggs2.dat, ...
cern:lhc/atlas/higgses/2.dat
ATLAS Replica Service
next: CE and RB, brokering and LCAS
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 23
Compute Brokering: reliable execution
User can delegate all job actions to the Resource Broker …… and go away
Reliable scheduling of jobs over the entire grid (as seen from the R-GMA information system)
Users are roaming, and can retrieve their results anywhere, anytime
next: EDG test bed overview
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 24
Current EU DataGrid Facilities
CERNLyon
RAL
NIKHEF
EDG and LCG sites
CNAF
Core site
TokyoTaipeiBNL
~1000 CPUs~100 Tbyte storage several key databases
~60 sites, ~600 users in ~7 VOs
next: using EDG, VisualJob
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 25
Using the DataGrid for Real
next: Portals
Screenshots: Krista Joosten and David Groep, NIKHEF
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 26
Portals
next: conclusions and outlook
Screenshots: ICES/KIS and WTCW: VLAM-G; INFN-GRID and EDG: Genius; NPACI: Rocks
ICT KennisCongres 2003 – Grids: Achtergronden en praktijk– n° 27
What more is there to see and do?
The current Grids are only the beginning!
portals will get more users on the Grid
more functionality, better resilience, strong reliability
joining the Grid will be as simple as joining a file-sharing network
EGEE: a pan-European Grid Infrastructure being created today
The EU DataGrid project web www.edg.org
DutchGrid Platform www.dutchgrid.nl
For other grid projects, see www.gridstart.org www.enterthegrid.com