university of california, san diego san diego supercomputer center fran berman september 11, 2006...
TRANSCRIPT
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
September 11, 2006
Dr. Francine Berman
Director, San Diego Supercomputer Center
Professor and High Performance Computing Endowed Chair, UC San Diego
Beyond Branscomb
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
The Branscomb Committee• Charge: The Branscomb Committee was to
assess the role of HPC for NSF constituent communities. The Committee focused in particular on 4 challenges
• Challenge 1: How can NSF remove existing barriers to the evolution of HPC and make it broadly usable?
• Challenge 2: How can NSF provide scalable access to a pyramid of computing resources? What balance of computational resources should NSF anticipate and encourage?
• Challenge 3: How should NSF encourage broad participation in HPC?
• Challenge 4: How can NSF best create the intellectual and management leadership for the future of high performance computing in the U.S.? What role should NSF play wrt the HPCC program and other agencies?
The Branscomb Report
TITLE: From Desktop to TeraFlop: Exploiting the U.S. Lead in High Performance Computing
AUTHORS: NSF Blue Ribbon Panel on High Performance Computing(Branscomb, Belytschko, Bridenbaugh, Chay, Dozier, Grest, Hays, Honig, Lane, Lester, McCrae, Sethian, Smith, Vernon
DATE: August, 1993
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
The Branscomb Pyramid
• Major Recommendations from the Branscomb Report
• NSF should make investments at all levels of the Branscomb Pyramid as well as investments in aggregating technologies (today’s cluster and grid computing). NSF should make balanced investments.
• Increase support of HPC-oriented SW. algorithm, and model development
• Coordinate and continue to invest in Centers. Develop allocation committees to facilitate use of resources in community.
• Develop an OSTP advisory committee representing states, HPC users, NSF Centers, computer manufacturers, computer and computational scientists to facilitate state-federal planning for HPC.
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
The Branscomb Pyramid, circa 1993
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
The Branscomb Pyramid, circa 2006
Small-scale, desktop, home
Medium-scale Campus/Commercial
Clusters
Large-scale campus/commercial
resources, Center supercomputers
Leadership Class
100’s of TFs
10’s of TFs
1’s of TFs
10’s of GFs
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
The Branscomb Pyramid and U.S. Competitiveness
Small-scale, desktop, home
Medium-scale Campus/Commercial
Clusters
Leader-ship Class
Large-scale resources, center supercomputers
Spots 1-10
Spots 11-50
Spots 51-500
Everyone Else
According to the last Top500 List (June 2006),
• Leadership Class (1-10) – 6 US machines
• 5 machines (1, 3, 4, 6, 9) at DOE national laboratories (LLNL, NASA Ames, Sandia) and 1 machine (2) at a U.S. corporation have spots
• Large-scale (11-50) – 19 US machines
• 3 machines (23, 26, 28) at U.S. academic institutions (IU, USC, Virginia Tech)
• 2 machines (37, 44) at NSF centers (NCSA, SDSC)
• 5 machines (13, 14, 24, 25, 50) at DOE national laboratories (ORNL, LLNL, LANL, PNNL)
• 4 machines (20, 32, 33, 36) at other federal facilities (ERDC MSRC, Wright-Patterson, ARL, NAVOCEANO)
• 5 machines (19, 21, 31, 39, 41) at US corporations (IBM, Geoscience, COLSA)
• Medium-scale (51-500) – 273 US machines• 38 are in the academic sector
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Who is Computing on the Branscomb Pyramid?
• Leadership Class (1-10) • DOE users, industry researchers,
Japanese academics and researchers, German and French researchers
• Large-scale (11-50) (5 academic)
• Campus researchers, DOE and government users, industry users
• National open academic community at SDSC, NCSA, IU (around 50 TF in aggregate)
• Medium-scale (51-500) (38 academic)
• Campus researchers, federal agency users, industry users
• National open academic community on TeraGrid (not including above -- around 50 TF in aggregate)
Small-scale, desktop, home
Medium-scale Campus/Commercial
Clusters
Leader-ship Class
Large-scale resources, center supercomputers
Spots 1-10
100’s of TFs
Spots 11-50
10’s of TFs
Spots 51-500
1’s of TFs
Everyone Else, 10’s of GFs
More than 15,000,000 students attend
college.
The number of degrees in Science and
Engineering exceeds 500,000
There are ~2500 accredited institutions of higher education in
the U.S. *
* Ballpark numbers
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Competitiveness at all Levels
Small-scale, desktop, home
Medium-scale Campus/Commercial
Clusters
Leader-ship Class
Large-scale resources, center supercomputers
Currently U.S. dominating. Top500 “bragging rights”.
Federal support required
Cost-effective user supported
commercial model
No coordinated approach to national
research infrastructure.
Wide variability in coverage, use, service,
support
Potential for breakthrough “pioneer” computational science
discoveries
IT-literate workforce
Mid-levels the focus of almost all academic and commercial
R&D –lion’s share of new results and discoveries
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Balancing Investments in Branscomb
• If HPC is to become the ubiquitous enabler of science and engineering envisioned in Branscomb Report (and every report since), we need to re-focus on providing
• Enough cycles to cover the broad needs of academic researchers and educators on-demand and without high barriers to access
• Usable and scalable software tools with useful documentation
• “You’ve got 1024 processors and you can only smile and wave at them” HPC user
• Professional-class strategy for SW sharing, standards, development environments
Branscomb Recommendations Revisited
NSF should make investments at all levels of the Branscomb Pyramid as well as investments in aggregating technologies (today’s cluster and grid computing).
NSF should make balanced investments.
Increase support of HPC-oriented SW. algorithm, and model development
Coordinate and continue to invest in Centers.
Develop allocation committees to facilitate use of resources in community.
Develop an OSTP advisory committee representing states, HPC users, NSF Centers, computer manufacturers, computer and computational scientists to facilitate state-federal planning for HPC.
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Fran’s “No User Left Behind” Initiative
“No User Left Behind” Goal: Sufficient and usable computational resources to support computationally-oriented research and education throughout the U.S. academic community
How (Fran’s 5 step program for computational health)
1. Do market research – what is adequate coverage for the university community? Where are the gaps in coverage in the US?
2. Get creative -- Work with the private sector and universities to develop a program for adequate coverage of computational cycles (we’re doing it with networking to K-12, no reason we can’t do it with computation for 12+)
3. Fund support professionals – every facility should have sys admins and help desk people – they should be part of a national organization which meets to exchange best practices and helps develop standards
4. Raise the bar on SW – private sector should step up and work with academia to improve HPC environments. Professors and grad students cannot provide robust SW tools with adequate documentation and evolutionary support
5. Get serious about data – many HPC applications involve significant data input or output – HPC efforts and data efforts must be coupled
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
On the Horizon: Emerging Data Crisis will Increasingly Impact
Computational Users
• More academic, professional, public, and private users use their computers to access data than for computation
• Data management, stewardship and preservation fundamental for new advances and discovery
Astronomy
NVO – 100+ TB
Physics
Projected LHC Data – 10 PB/year
Geosciences
SCEC – 153 TB
Life Sciences
JCSG/SLAC – 15.7 TB
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Today’s Applications Cover the Spectrum
Compute (more FLOPS)
Dat
a (m
ore
BY
TE
S)
Home, Lab, Campus, Desktop
Applications
Medium, Large, and Leadership
HPCApplications
Data-oriented Science and Engineering
Applications
Everquest
Quicken
PDB applications
TeraShake
NVO
MolecularModeling
Large-scale data required as input,
intermediate, output for many
modern HPC applications
Applications vary with respect to
how well they can perform in
distributed mode (grid computing)
Analogue of
High
Performance
Computing
(HPC)
is High
Reliability
Data
(HRD)
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Applying Branscomb to Data:The Data Pyramid
Facilities
National-scale data repositories, archives, and libraries. Maintained by professionals.High capacity, high reliability
Regional libraries and targeted data centers.
Maintained by professionals.Medium capacity, medium-high
reliability
Private repositories. Supported by users or
their proxies. Low-medium reliability,
low capacity
Target Collections
Reference, nationally important, and irreplaceable data
collections. (PDB, PSID, Shoah, Presidential
Libraries, etc.)
Research and project data collections.
Personal data collections
Regional Scale
Local Scale
NationalScale
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Adapting to a Digital World
Emerging commercial opportunities Local Scale
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Data Storage for Rent
• Cheap commercial data storage is moving us from a “napster model” (data is accessible and free) to an “iTunes model” (data is accessible and inexpensive)
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Amazon S3 (Simple Storage Service)
• Storage for Rent:• Storage is $.15 per GB per month
• $.20 per GB data transfer (to and from)
• Write, read, delete objects containing 1 GB-5GB (number of objects is unlimited), access controlled by user
• For $2.00 +, you can store for one year• Lots of high resolution family photos
• Multiple videos of your children’s recitals
• Personal documentation equivalent to up to 1000 novels, etc.
Should we store the NVO with Amazon S3?
The National Virtual Observatory (NVO) is a critical
reference collection for the astronomy community of data
from the world’s large telescopes and sky surveys.
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
A Thought Experiment• What would it cost to store the SDSC NVO
collection (100 TB) on Amazon?
• 100,000 GB X $2 (1 ingest, no accesses + storage for a year) = $200K/year
• 100,000 GB X $3 (1 ingest, average 5 accesses per GB stored + storage for a year) = $300K/year
• Not clear:• How many copies Amazon stores• Whether the format is well-suited for NVO• Whether the usage model would make the costs of data
transfer, ingest, access, etc. infeasible, etc.• If Amazon constitutes a “trusted repository”• What happens to your data when you stop paying, etc.
• What about the CERN LHC collection (10 PB/year)?
• 10,000,000 GB X $2 (1 ingest, no accesses per item + storage for a year) = $20M/year
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
The most valuable research data is in the most danger
Universities and libraries can provide greater support but
they need help
Emerging commercial opportunities
Reference and irreplaceable data require
long-term preservation and reliable stewardshipNo real sustainable plan
Regional Scale
Local Scale
NationalScale
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Providing Sustainable and Reliable Data Infrastructure Incurs Real Costs
Entity at risk
Size What can go wrong FrequencyMinimum number of replicas needed to mitigate risk
Administrative support FTEs
File ~2 MBCorrupted media, disk failure
1 year2 copies in single system
System Admins
Tape ~200 GB+ Simultaneous failure of 2 copies
5 years3 homogeneous systems
+ Storage Admin
System ~10 TB
+ Systemic errors in vendor SW, or Malicious user, or Operator error that deletes multiple copies
15 years3 independent, heterogeneous systems
+ Database Admin+ Security Admin
Archive ~1 PB+ Natural disaster, obsolescence of standards
50 - 100 years
3 distributed, heterogeneous systems
+ Network Admin+ Data Grid Admin
Less risk means more replicants, more resources, more people
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Supporting Long-lived Data: What Happens if We Don’t Preserve Our Most important Reference Collections?
UCSD Libraries
Life sciences research would have the resources available in roughly the 1970’s – no PDB, no Swiss-Prot, no PubMed, Etc.
New discoveries from climate and other predictive simulation models
which utilize longitudinal data would dramatically slow
iTunes would store only current
music, NetFlix would provide only
current movies
Federal, state, and local records would need to remain on paper.Without
preservation, digital history is only as old as the current storage media.
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
• Chronopolis provides a comprehensive approach to infrastructure for long-term preservation integrating
• Collection ingestion
• Access and Services
• Research and development for new functionality and adaptation to evolving technologies
• Business model, data policies, and management issues critical to success of the infrastructure
SDSC , the UCSD Libraries, NCAR, UMd , NARA working together on long-term preservation of digital collections
Consortium
Chronopolis: Using the Data Grid to support
Long-Lived Data
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Chronopolis – Replication and Distribution
• 3 replicas of valuable collections considered reasonable mitigation for risk of data loss
• Chronopolis Consortium will store 3 copies of preservation collections:
• “Bright copy” – Chronopolis site supports ingestion, collection management, user access
• “Dim copy” – Chronopolis site supports remote replica of bright copy and supports user access
• “Dark copy” – Chronopolis site supports reference copy that may be used for disaster recovery but no user access
• Each site may play different roles for different collections
UCSD
U MdNCAR
Chronopolis Site
Chronopolis Federation architecture
Bright copy C1
Dim copy C1
Bright copy C2
Dark copy C1
Dim copy C2
Dark copy C2
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Creative Business Models Needed to Support Long-lived Data
• Data preservation infrastructure need not be an infinite, increasing mortgage
• Creative solutions are possible
• Relay funding
• Consortium support
• Recharge
• Use fees
• Hybrid models, and other support mechanisms
can be used to create sustainable business models
Regional Scale
Local Scale
NationalScale
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Current competitions providing a venue for a broader set of players and experts
Our best and our brightest are becoming lean, mean competition machines – does this really serve the science and engineering community best?
• We’re getting good at circling the wagons and pointing the guns inward, isn’t it time we turned things around?
What will it take for all of US to take the leadership to better focus CS infrastructure, research, and development efforts?
Beyond Branscomb
Whining
UNIVERSITY OF CALIFORNIA, SAN DIEGO
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Thank You
www.sdsc.edu