data-intensive computing at nsf corporate alliance june 18, 2008 jeannette m. wing assistant...
TRANSCRIPT
Data-Intensive Computing at NSF
Corporate AllianceJune 18, 2008
Jeannette M. WingAssistant Director
Computer and Information Science and Engineering Directorate
Thanks to the NSF team: Dan Atkins, Debbie Crawford,Haym Hirsh, Jim French, Stephen Meacham, …
3Data-Intensive Computing Jeannette M. Wing
How Much Data?
• NOAA has ~1 PB climate data (2007)• Wayback machine has ~2 PB (2006)• CERN’s LHC will generate 15 PB a year (2008)• HP building WalMart a 4PB data warehouse (2007)• Google processes 20 PB a day (2008)• “all words ever spoken by human beings” ~ 5 EB• Int’l Data Corp predicts 1.8 ZB of digital data by
2011640K ought to be enough for anybody.
4Data-Intensive Computing Jeannette M. Wing
Convergence in Trends
• Drowning in data
• Data-driven approach in computer science research– graphics, animation, language translation, search, …,
computational biology
• Cheap storage– Seagate Barracuda 1TB hard drive for $195
• Growth in huge data centers
• Open Source “MapReduce” programming model
5Data-Intensive Computing Jeannette M. Wing
“Work”
w1 w2 w3
r1 r2 r3
“Result”
“worker” “worker” “worker”
Partition
Combine
Master
Divide and Conquer
6Data-Intensive Computing Jeannette M. Wing
Data-Intensive ComputingSample Research Questions
Science– What are the fundamental capabilities and limitations of
this paradigm? – What new programming abstractions (including models,
languages, algorithms) can accentuate these fundamental capabilities?
– What are meaningful metrics of performance and QoS?Engineering
– How can we automatically manage the hardware and software of these systems at scale?
– How can we provide security and privacy for simultaneous mutually untrusted users, for both processing and data?
– How can we reduce these systems’ power consumption?Users
– What (new) applications can best exploit this computing paradigm?
7Data-Intensive Computing Jeannette M. Wing
NSF’s Interest in Data-Intensive Computing• Broad interest, (potentially) long-term
• CISE– Cross-directorate: CCF, CNS, IIS– Short-term: CluE
• To provide the broad academic community access to large-scale computing cluster and massive data sets
– Longer-term: Look for cross-cutting theme in FY09 solicitation
• NSF– Potentially cross-foundational, e.g., via Cyber-enabled
Discovery and Innovation (CDI); CISE, OCI, MPS, ENG, …– Why? Scientists are drowning in data!
8Data-Intensive Computing Jeannette M. Wing
CluE: Cluster Exploratory
• Google+IBM cluster software and services– Same as Academic Computing Cluster provided
for six universities (announced last October)
• Seed program by NSF– $5M will fund SGERs and regular awards– Solicitation released; July 17 proposal deadline.– Jim French (IIS Program Director)
• Hope: CluE will be a wild success and community interest and demand will be high
9Data-Intensive Computing Jeannette M. Wing
Google+IBM Cluster
• Cluster– 1600+ processors, terabytes of memory, hundreds
of terabytes of storage, internal networking– External network connection
• Software– Linux– Hadoop (written by Yahoo!): Open Source version of
Google’s MapReduce, Google File System– IBM Tivoli: management, monitoring and dynamic
resource provisioning of the cluster
• Services– Operations and maintenance, including staff, loading
data and programs, energy costs
11Data-Intensive Computing Jeannette M. Wing
The Partnership: Roles
• Google and IBM– Provide data cluster, user support,
scheduling,
• NSF– Review proposals, identify awardees,
funding
• Universities– Propose and execute research plans on data
cluster
12Data-Intensive Computing Jeannette M. Wing
The MOU
• Codify the roles• Establish restrictions to comply with
export law• Prescribe the need for “usage
agreement”– Remove NSF from this industry/university
process and raise awareness of university sensitivities
13Data-Intensive Computing Jeannette M. Wing
The Usage Agreement
• Sets out terms and conditions for use of the hardware/software suite
• Three significant issues– Indemnification
• State universities prevented by constitution or law from signing
• Private universities will not sign as a matter of policy– Export control
• Barrier to university mission. May prohibit access by some students.
– Intellectual Property• Jury is out on this. Part of 1 on 1 negotiation.
14Data-Intensive Computing Jeannette M. Wing
Indemnification Example
• University and Corporation each agree to defend, indemnify and hold harmless the other respective parties for and against any losses damages or claims for damages arising from the wrongful acts or omissions of their respective officers, employees, students or agents (including, without limitation, University Students and University Personnel) in connection with the exercise of their rights and the performance of their obligations under this Agreement, including but not limited to …
Asymmetric: We agree not to sue each other but University pays costof defending Corporation should it be sued based on something a University person did.
15Data-Intensive Computing Jeannette M. Wing
Export Control Example
• Specifically, unless authorized by appropriate government license or regulations, you agree not to export, directly or indirectly, any technology, software or commodities provided by Corporation or their direct product (including software developed by you on the Corporate systems) to any of the following countries or to the nationals of any of the following countries, wherever they may be located: Cuba, Iran, Sudan, Syria, and North Korea.
Explicit Country List discriminates against students from those countries who may be enrolled in University.
17Data-Intensive Computing Jeannette M. Wing
Academia-Industry-Government Partnership
• Win-win-win for all
• New model for NSF– CISE is breaking new ground at NSF (in many
ways)
• NSF/CISE welcomes– Other corporations to participate in Data-Intensive
Computing effort and other efforts in the future– This and other new models of A-I-G partnerships
19Data-Intensive Computing Jeannette M. Wing
Credits
• Copyrighted material used under Fair Use. If you are the copyright holder and believe your material has been used unfairly, or if you have any suggestions, feedback, or support, please contact: [email protected]
• Except where otherwise indicated, permission is granted to copy, distribute, and/or modify all images in this document under the terms of the GNU Free Documentation license, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation license” (http://commons.wikimedia.org/wiki/Commons:GNU_Free_Documentation_License)