the open science grid challenges and opportunities for the next 5 years frank würthwein osg...
TRANSCRIPT
The Open Science GridChallenges and Opportunities
for the next 5 years
Frank WürthweinOSG Executive Director
UCSD/SDSC
September 10th, 2015
OSG since Inception
3
Accounting was not available at inception
80 Million hours/month
60 Million hours/month
40 Million hours/month
20 Million hours/month
20102005 2015
September 10th, 2015
On the Path to one Billion hours
4
Over the last 12 months200 Million jobs consumed
860 Million hours of computinginvolving 850 Million data
transfers to move 163 Petabytes
This aggregate was accomplished by
federating 127 clusters that contributed 1h to 100M hours each
http://display.grid.iu.edu
88 Million Core hours in the past 30 days
September 10th, 2015
It is all about Sharing
• Clusters at Universities & National Labs are shared. Sharing policy is locally controlled. (local autonomy) All owners want to share to maximize the benefit to all.
(common goal)
• Researcher uses a single interface to access local and remote resources … … they own … others are willing to share … they have an allocation on … they buy from a commercial (cloud) provider
6
OSG focuses on making this technically possible for High Throughput Computing applications
September 10th, 2015
It must be open
• Operate a shared Production Infrastructure collaborate with partners that want to share their
hardware => Open Facility
• Advance a shared Software Infrastructure collaborate with partners that want to share their
software => Open Software Stack
• Disseminate knowledge across Researchers, IT professionals & Software developers. collaborate with partners who want to share their
ideas and experiences => Open Ecosystem
7
September 10th, 2015
ATLAS
CMS
other physics
life sciences
other sciences
OSG Hours 2015 by Science Domain
9
Science other than Physics makes up ~20% of the OSG hours
September 10th, 2015
Enabling dHTC for Physics
• LHC experiments: ATLAS, CMS, ALICE
• Other HENP experiments: Mu2e, Nova, Belle, Argoneut,
CDF, D0, CDMS, COUPP,
DarkSide, glueX, ILC, LAr1, MiniBoone, MicroBoone, Minerva, Minos, Star, sPHENIX, LBNE/Dune, XENON, LZ, …
• Other Physics Experiments: IceCube, DES, LSST, PolarBear, …
• Individual Theoretical Physics Groups, mostly from Astro, Nuclear, and Particle physics, but also some Biophysics, Condensed Matter Physics, ....
10
Communities with >1M hours
last year
September 10th, 2015
Submit Locally and Run Globally
11
Mu2e submits work transparently to FNAL and 17 other clusters on OSG.
Mu2e consumed 18M hours on OSG outside FNAL + 6M hours at FNAL
from May to August 2015.
Mu2e use of OSG outside FNAL
September 10th, 2015
More than 70% of Cycles from outside DOE labs
12
Fermigrid
CMS T1
ATLAS T1CMS T2sATLAS T2s
9 other clusters
Mu2e benefited dramatically from resources outside Fermilab
e.g. Syracuse University contributed as much as Fermigrid
Bo Jayatilaka (FNAL) et al. responsible for expanding the resource pool.
September 10th, 2015
dHTC Accelerates Science
13
classic HTC for boundless needs
elastic short term HTC scale-out
0.5 – 2.5M hours/week
continuously for eight months
One time~1M hours for
2 daysin one month
While the throughput needs of individual scientists may vary dramatically, dHTC services provided by OSG can address
them in all cases.
V.Pande, Chemistry, StanfordKrieger, Neuroscience, Pittsburgh
September 10th, 2015
Supporting Individual Researchers
14
Researcher Institution Science Domain Hours Access
Don Krieger Pittsburgh Neuroscience 46M OSG-XD
Nicolas Roys Wisconsin Economics 10M GLOW
Steffen Bass Duke Nuclear Theory 7M OSG
Martin Purschke BNL Nuclear Experiment 6M OSG
P.S. Radziszowski Rochester IT Computer Science 4M OSG
Barry Van Veen Wisconsin Neuroscience 2M GLOW
David Minh Illinois IT Chemistry 2M OSG
Jinbu Xu Toyota TI Bioinformatics 2M OSG
Two fragments of these stories:Krieger uses MEG functional brain images to understand brain trauma in humans. One ~40min MEG recording requires ~360k coreHours to analyze.
Roys is an Asst. Prof. in Economics at UW-Madison who overflows from GLOW into OSG to study such things as “Origin and Causes of Economic Growth” or “The Causal Effect of Parents’ Education on Children’s Earnings”.
OSG has multiple individual researchers that each successfully
consume resources at the 1-50 Million hours/year level !
Existence proof that you don’t need the backing of a large HEP experiment
to succeed!
September 10th, 2015
OSG Pitch to Campuses
• Workforce Development we organize workshops for researchers and/or IT
professionals increasingly those are bundled with “software carpentry”
• Elastic scale-out of Science onto OSG Submit Locally and Run Globally
• Sharing with partner institutions OSG enables cross institutional sharing under your control.
• Sharing nationally
16
Increasingly we are engaging CIOs at Universities
September 10th, 2015
O(106) Dynamic Range
• How to build an integrated dHTC Cyberinfrastructure (CI) that
connects computing from Gflops to Exaflops? supports data science from Gbyte to Exabyte? reaches from small colleges to the largest national labs? is operated by anything from large professional IT teams to
single student in their spare time.
• These imply challenges pertaining to: catering to diversity in human knowledge, and skills providing wide range of solutions that match the wide
range of effort available at different organizations to operate & maintain them
18
September 10th, 2015
Bringing it all together
19
1000’s of independent researchers
100’s of independent IT infrastructures
O(1) Cyberinfrastructure organizations
Break the many-to-manyrelationship into two
many-to-few relationships
The few must do the heavy lifting to “operate” most of the Cyberinfrastructure:
services, software, workforce development
CI organization must be open in both directions
September 10th, 2015
Who do you trust?
IT organizations don’t care to deal with 1000’s of strangers and researchers do not want to deal with 100’s of IT organizations. Both do not want to deal with 10’s of software providers.
Need to support delegated trust relationships: IT Orgs Virtual Organizations Researchers Virtual Organizations IT Orgs & Researchers Software Providers
20
trust
trust
trust
September 10th, 2015
Trust & Security in OSG
• Operational cyber security incident response response to vulnerabilities training & drills
• Assessment & Consulting to software developers to ensure secure & usable services
• Architecture work towards advancement of trust management & security models e.g. simplify use of certificates for data management e.g. work with DOE on security models across labs
21
Mine Altunay (FNAL) et al.
September 10th, 2015
Who gets what and when?
22
1000’s of independent researchers
100’s of independent IT infrastructures
A single Provisioning System (glideinWMS factory) to create community specific
overlay batch systems
Break the many-to-manyrelationship into two
many-to-few relationships
VOs are autonomous to schedule the resources provisioned to them.
September 10th, 2015
Four ways to increase throughput
23
OSGglideinWMS
Service
CampusAccess Points
Community/VO Access Points
OSG hosted Access Points
OSG-XDOSG-Connect
OSG Federation of clusters at Universities, National labs, Clouds
September 10th, 2015
Big Data beyond the LHC
• While the LHC Experiments operate more than a dozen multiple petabyte scale storage systems at US Universities and National Labs.
• Many in the HTC community still struggle with GB datasets.
• We are addressing this by enabling all of science on OSG to move from GB to TB datasets by deploying a system of multi-TB caches at OSG sites across the US.
24
September 10th, 2015
OSG Federation of Caches
25
Multiple Community specific sources
A Single Namespace for allvia the OSG Redirector
Caches at multiple OSG sites
jobs access the “closest” cache
Robert Illingworth (FNAL) prototyping this for FIFE customers
September 10th, 2015
Aside on Read Strategies
• If IO subsystem is latency tolerant then applications might be better off to read from remote all the time, and not even bother with caching.
• If IO is moderately latency tolerant and does only partial file reads then reading from a nearby cache may work well.
• If IO is completely latency intolerant with lot’s of reads jumping around in the file and/or always the entire file is read then you may want to copy the file into the sandbox before processing begins.
26
One-size-fits-all seems unlikely in data access.
September 10th, 2015
CMS: Fraction of a file that is read
# o
f file
s re
ad
For half the files less than 8.5% of a file is read.
8.5%
Overflow bin
median atStatistics on 23M files
read via the WANJan. 2012 – Feb. 2014
12 Petabytes read in total.
Detailed Study of the IO behavior of different applications using FIFE may be both intellectually satisfying and
worth the effort.
There is a lot to be learned from CMS!
September 10th, 2015
US Networking build-out
• NSF made a series of competitive grants to over 100 US universities to aggressively upgrade their campus network capacity within the last few years.
• ESNet now supporting all WLCG traffic. incl. T3 – T3, T2 – T2, T3 – T2, … Belle is proving that experiments not located at
CERN can be members of WLCG.
• NSF moving up the stack and beyond Campus via “Pacific Research Platform” award.
30
September 10th, 2015
Pacific Research Platform
31
International Partnersinclude
Amsterdam, Tokyo,
Australiain addition to LHCOne
Science includes:ATLAS & CMS
TelescopesGalaxy Evolution
LIGOCancer GenomicsIntegrative OmicsStructural BiologyEarth Sciences
Visualization CS R&D
September 10th, 2015
Collaborating with regional Science DMZ
32
+
Collaboration of OSG with Calit2, CITRIS, and SDSC
OSG Software & ServicesPacific Research Platformas a regional science DMZ
Adding
on top of a regional science DMZ
September 10th, 2015
Understanding the Network
33
Collaboration of ESNet & OSG to provide global monitoring infrastructure
Purpose:Debugging Network IssuesLong Term performance repository => source of data for CS R&Dpossibly use info for scheduling in the future.
Establish complete matrix ofperfSONAR network performance
measurements across OSG
September 10th, 2015
Enabling CS R&D on Distributed Computing
• OSG runs >200 Million jobs a year, for most of which we record performance characteristics.
• OSG collecting network performance data.• OSG starting to collect detailed application IO
access data for its Data Federation.
34
A wealth of data that could be mined for CS R&D.
September 10th, 2015
Example Questions
• What applications benefit from remote IO? And for those who don’t, why don’t they?
• What applications benefit from the newly deployed caches? How close does the cache have to be? How many jobs can read from the same cache simultaneously before
that cache is overloaded? How much of the data in the file is read per file? Are we better off
copying the file to process into the local sandbox? Is it worth optimizing the IO stack of an application like CMS did?
• What level cache is needed at NERSC for which types of applications?
35
In all cases we may want to understand the behavior of applications on the production system.
September 10th, 2015
In the not too far future
• PI received “cloud credit” from a funding agency.• PI uses those credits to have OSG scale out the OSG
infrastructure into commercial cloud resources to meet her deadline. bring her own data via OSG Cache and Internet2/ESNet
connection to these resources. analyze data stored remotely bring results out via OSG Cache as needed, or via a file
transfer service when desired.
36
Strong overlap in goals with HEP Cloud project at FNAL
September 10th, 2015
“Open” HPC Clusters in the US
38
Name Institution Architecture Start Date
Stampede TACC 100k core Intel Sandy Bridge 2013
Comet SDSC 47k core Intel Haswell 4/2015
Cori 1 NERSC 22k cores Intel Haswell Fall 2015
Cori 2 NERSC 9.3k nodes Intel Knights Landing 2016
Theta ANL 2.5k nodes Intel Knights Landing 2016
Summit Oakridge ~3400 nodes IBM Power9 & NVIDIA GPUs 2017
Aurora ANL ~50k nodes Intel XEON Phi Gen 3 2018
What HEP production apps will run on what architecture?
All HEP applications run on Sandy Bridge & Haswell => 170k cores of HPC
September 10th, 2015
Example CMS Simulation
39
LHE GEN-SIM DIGI-RECO Analysis
CPU: O(1)% 1/4 1/4 1/2
Size/evt: <1 kByte 300/40 kByteAOD/MINIAOD
1 MByte
GEANTdominates
Trackingdominates
Both CMS SIM & tracking are run on Intel x86 today
(Sandy Bridge & Haswell from previous slide)
Analysis also only on Intel x86
September 10th, 2015
Final Stages of Analysis is IO limited
40
RECO Analysis Public Analysis Private40 kByte/evt
MINIAOD
CPU time triples going from 20-30 PU
4-40TB“private” data
per publicationin Run 1
O(1)Hzevent
processing
O(102 - 104)Hzevent
processing
Heavily IO limited
September 10th, 2015
Bifurcation of Application needs drive new paradigms
& architectures
The impact of this dual trend on “shared commodity clusters”as well as HTC & HPC in general is as yet unclear.
OSG interested in preparing HTC in both directions.
September 10th, 2015
OSG as an XD Service Provider
• Individual researchers can apply for an allocation on an NSF HPC cluster via an allocation process every 3 months.
• OSG offers itself as an HTC “cluster” via this allocation process.
• Any and all joint DOE-NSF experiments can apply for cycles via this process.
• DOE ASCR has a similar allocation process.
42
Why is there no HTC “facility” in DOE ASCR allocations process ?