e science as a lens on the world lazowska

106
eScience as a Lens on the World Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington WCET 21 st Annual Conference October 2009 http://lazowska.cs.washington.edu/wcet.pdf

Upload: wcet

Post on 25-Jun-2015

253 views

Category:

Documents


0 download

DESCRIPTION

Ed Lazowska's WCET keynote presentation, Thursday morning

TRANSCRIPT

Page 1: E Science As A Lens On The World   Lazowska

eScience as a Lens on the World

Ed Lazowska

Bill & Melinda Gates Chair inComputer Science & Engineering

University of Washington

WCET 21st Annual Conference

October 2009

http://lazowska.cs.washington.edu/wcet.pdf

Page 2: E Science As A Lens On The World   Lazowska

This morning

The nature of eScienceThe advances that enable itScalable computing for everyoneBroadband, and the role of higher educationComputer science & engineering: Changing the worldThe changing nature of our economy, and of educational requirements

Page 3: E Science As A Lens On The World   Lazowska

eScience: Sensor-driven (data-driven) science and engineering

Transforming science (again!)

Jim Gray

Page 4: E Science As A Lens On The World   Lazowska

TheoryExperiment

Observation

Page 5: E Science As A Lens On The World   Lazowska

TheoryExperiment

Observation

Page 6: E Science As A Lens On The World   Lazowska

TheoryExperimentObservation

[John Delaney, University of Washington]

Page 7: E Science As A Lens On The World   Lazowska

TheoryExperiment

ObservationComputational

Science

Page 8: E Science As A Lens On The World   Lazowska

Protein interactions in striated muscles

Tom Daniel lab

Page 9: E Science As A Lens On The World   Lazowska

QCD to study interactions of

nuclei

David Kaplan lab

Page 10: E Science As A Lens On The World   Lazowska

Gas Stars

Dark MatterStudy of dark matter

Tom Quinn lab

Page 11: E Science As A Lens On The World   Lazowska

Protein structure prediction

David Baker lab

Page 12: E Science As A Lens On The World   Lazowska

TheoryExperiment

ObservationComputational

ScienceeScience

Page 13: E Science As A Lens On The World   Lazowska

eScience is driven by data

Massive volumes of data from sensors and networks of sensors

Apache Point telescope, SDSS

15TB of data (15,000,000,000,000 bytes)

Page 14: E Science As A Lens On The World   Lazowska

Large Synoptic Survey Telescope (LSST)

30TB/day,60PB in its 10-year

lifetime

Page 15: E Science As A Lens On The World   Lazowska

Large Hadron Collider

700MB of dataper second,

60TB/day, 20PB/year

Page 16: E Science As A Lens On The World   Lazowska

Illumina Genome Analyzer

~1TB/day

Page 17: E Science As A Lens On The World   Lazowska

Regional Scale Nodes of the NSF Ocean Observatories

Initiative

1000 km of fiber optic cable on the seafloor, connecting

thousands of chemical, physical, and biological

sensors

Page 18: E Science As A Lens On The World   Lazowska

The Web

20+ billion web pages x 20KB = 400+TB

One computer can read 30-35 MB/sec

from disk => 4 months just to read the web

Page 19: E Science As A Lens On The World   Lazowska

Point-of-sale terminals

Page 20: E Science As A Lens On The World   Lazowska

eScience is about the analysis of data

The automated or semi-automated extraction of knowledge from massive volumes of data

There’s simply too much of it to look at

Page 21: E Science As A Lens On The World   Lazowska

The technologies of eScience

Sensors and sensor networksDatabasesData miningMachine learningData visualizationCluster computing at enormous scale

Page 22: E Science As A Lens On The World   Lazowska

eScience will be pervasive

Computational science has been transformational, but it was a niche

As an institution (e.g., a university), you didn’t need to excel in order to be competitive

eScience capabilities must be broadly available in any organization

If not, the organization will simply cease to be competitive

Page 23: E Science As A Lens On The World   Lazowska

Life on Planet Earth

[John Delaney, University of Washington]

Page 24: E Science As A Lens On The World   Lazowska

[John Delaney, University of Washington]

Page 25: E Science As A Lens On The World   Lazowska

[John Delaney, University of Washington]

Page 26: E Science As A Lens On The World   Lazowska

Top faculty across all disciplines understand the coming data tsunami

Page 27: E Science As A Lens On The World   Lazowska

Questions for you …

How does your institution track the IT needs –present and future – of its leading researchers?To what extent are you meeting these needs, and in what critical areas are you falling short?How well does your technology staff understand the institution’s research and disciplinary directions and their IT implications?What potential resources, other than those currently in place, can be used to provide broad-based IT support for eScience and eScholarship?

Page 28: E Science As A Lens On The World   Lazowska

More on the enablement of eScience

Ten quintillion: 10*1018

The number of grains of rice harvested in 2004

Page 29: E Science As A Lens On The World   Lazowska

Ten quintillion: 10*1018

The number of grains of rice harvested in 2004The number of transistors fabricated in 2004

Page 30: E Science As A Lens On The World   Lazowska

William Shockley, Walter Brattain and John Bardeen, Bell Labs, 1947

The transistor

Page 31: E Science As A Lens On The World   Lazowska

The integrated circuit

Jack Kilby, Texas Instruments, and Bob Noyce, Fairchild Semiconductor Corporation, 1958

Page 32: E Science As A Lens On The World   Lazowska

Exponential progress

Gordon Moore, 1965

Page 33: E Science As A Lens On The World   Lazowska
Page 34: E Science As A Lens On The World   Lazowska
Page 35: E Science As A Lens On The World   Lazowska
Page 36: E Science As A Lens On The World   Lazowska
Page 37: E Science As A Lens On The World   Lazowska

Processing power, historically1980: 1 MHz Apple II+, $2,000

1980 also 1 MIPS VAX-11/780, $120,0002006: 2.4 GHz Pentium D, $800

A factor of 6000

Processing power, recentlyAdditional transistors => more cores of the same speed, rather than higher speed2008: Intel Core 2 Quad-Core 2.4 GHz, $800 (4x the capability, same price)2009: Intel Core 2 Quad-Core 2.5 GHz, $183 (same capability, 1/4 the price)

Page 38: E Science As A Lens On The World   Lazowska

Primary memory – same story, same reason (but no multicore fiasco)

1972: 1MB, $1,000,0001982: 1MB, $60,0002005: $400/GB (1MB, $0.40)

4GB vs. 2GB (@400MHz) = $800

($400/GB)

Page 39: E Science As A Lens On The World   Lazowska

2007: $145/GB (1MB, $0.15)

4GB vs. 2GB (@667MHz) = $290

($145/GB)

Page 40: E Science As A Lens On The World   Lazowska

2008: $49/GB (1MB, $0.05)

4GB vs. 3GB (@800MHz) = $49

($49/GB)

Page 41: E Science As A Lens On The World   Lazowska

2009: $16/GB (1MB, $0.015)Remember, it was $400/GB in 2005!

2GB (@667MHz) = $32 ($16/GB)

Page 42: E Science As A Lens On The World   Lazowska

Moore’s Law drives sensors as well as processing and memory

LSST will have a 3.2 Gigapixel camera

Page 43: E Science As A Lens On The World   Lazowska

Disk capacity, 1975-1989doubled every 3+ years25% improvement each yearfactor of 10 every decadeStill exponential, but far less rapid than processor performance

Disk capacity since 1990doubling every 12 months100% improvement each yearfactor of 1000 every decade10x as fast as processor performance

Page 44: E Science As A Lens On The World   Lazowska

Only a few years ago, we purchased disks by the megabyte (and it hurt!)Current cost of 1 GB (a billion bytes) from Dell

2005: $1.002006: $0.502008: $0.25

Purchase increment2005: 40GB2006: 80GB2008: 250GB

Page 45: E Science As A Lens On The World   Lazowska

Optical bandwidth todayDoubling every 9 months150% improvement each yearFactor of 10,000 every decade10x as fast as disk capacity100x as fast as processor performance

Page 46: E Science As A Lens On The World   Lazowska

A connected region – then

Page 47: E Science As A Lens On The World   Lazowska

A connected region – now

Page 48: E Science As A Lens On The World   Lazowska

But eScience is equally enabled by software for scalability and for discovery

It’s likely that Google has several million machinesBut let’s be conservative – 1,000,000 machinesA rack holds 176 CPUs (88 1U dual-processor boards), so that’s about 6,000 racksA rack requires about 50 square feet (given datacenter cooling capabilities), so that’s about 300,000 square feet of machine room space (more than 6 football fields of real estate – although of course Google divides its machines among dozens of datacenters all over the world)A rack requires about 10kw to power, and about the same to cool, so that’s about 120,000 kw of power, or nearly 100,000,000 kwh per month ($10 million at $0.10/kwh)

Equivalent to about 20% of Seattle City Light’s generating capacity

Page 49: E Science As A Lens On The World   Lazowska

Many hundreds of machines are involved in a single Google search request (remember, the web is 400+TB)

There are multiple clusters (of thousands of computers each) all over the worldDNS routes your search to a nearby cluster

Page 50: E Science As A Lens On The World   Lazowska

A cluster consists of Google Web Servers, Index Servers, Doc Servers, and various other servers (ads, spell checking, etc.)These are cheap standalone computers, rack-mounted, connected by commodity networking gear

Page 51: E Science As A Lens On The World   Lazowska

Within the cluster, load-balancing routes your search to a lightly-loaded Google Web Server (GWS), which will coordinate the search and responseThe index is partitioned into “shards.” Each shard indexes a subset of the docs (web pages). Each shard is replicated, and can be searched by multiple computers – “index servers”The GWS routes your search to one index server associated with each shard, through another load-balancerWhen the dust has settled, the result is an ID for every doc satisfying your search, rank-ordered by relevance

Page 52: E Science As A Lens On The World   Lazowska

The docs, too, are partitioned into “shards” – the partitioning is a hash on the doc ID. Each shard contains the full text of a subset of the docs. Each shard can be searched by multiple computers – “doc servers”The GWS sends appropriate doc IDs to one doc server associated with each relevant shardWhen the dust has settled, the result is a URL, a title, and a summary for every relevant doc

Page 53: E Science As A Lens On The World   Lazowska

Meanwhile, the ad server has done its thing, the spell checker has done its thing, etc.The GWS builds an HTTP response to your search and ships it off

Many hundreds of computers have enabled you to search 400+TB of web in ~100 ms.

Page 54: E Science As A Lens On The World   Lazowska

Enormous volumes of dataExtreme parallelismThe cheapest imaginable components

Failures occur all the timeYou couldn’t afford to prevent this in hardware

Software makes itFault-TolerantHighly AvailableRecoverableConsistentScalablePredictableSecure

Page 55: E Science As A Lens On The World   Lazowska

How on earth would you enable mere mortals write hairy applications such as this?

Recognize that many Google applications have the same structure

Apply a “map” operation to each logical record in order to compute a set of intermediate key/value pairsApply a “reduce” operation to all the values that share the same key in order to combine the derived data appropriately

Example: Count the number of occurrences of each word in a large collection of documents

Map: Emit <word, 1> each time you encounter a wordReduce: Sum the values for each word

Page 56: E Science As A Lens On The World   Lazowska

Build a runtime library that handles all the details, accepting a couple of customization functions from the user – a Map function and a Reduce functionThat’s what MapReduce is

Supported by the Google File System and the Chubby lock managerAugmented by the BigTable not-quite-a-database system

Page 57: E Science As A Lens On The World   Lazowska

eScience utilizes the Cloud: Scalable computing for everyone

Page 58: E Science As A Lens On The World   Lazowska

Amazon Elastic Compute Cloud (EC2)

$0.80 per hour for8 cores of 3 GHz 64-bit Intel or AMD7 GB memory1.69 TB scratch storage

Need it 24x7 for a year?$3000

$0.10 per hour for1 core of 1.2 GHz 32-bit Intel or AMD (1/20th the above)1.7 GB memory160 GB scratch storage

Need it 24x7 for a year?$379

Page 59: E Science As A Lens On The World   Lazowska

This includesPurchase + replacementHousingPowerOperationReliabilitySecurityInstantaneous expansion and contraction

1000 computers for a day costs the same as one computer for 1000 days – revolutionary!

Page 60: E Science As A Lens On The World   Lazowska

[Werner Vogels, Amazon.com]

Page 61: E Science As A Lens On The World   Lazowska

[Werner Vogels, Amazon.com]

Page 62: E Science As A Lens On The World   Lazowska

Still running your own email servers?

Page 63: E Science As A Lens On The World   Lazowska

[Werner Vogels, Amazon.com]

Page 64: E Science As A Lens On The World   Lazowska

Broadband, and the role of higher education

Page 65: E Science As A Lens On The World   Lazowska

ARPANET, 1980

Page 66: E Science As A Lens On The World   Lazowska

DARPA Gigabit Testbeds, mid-1990’s

Page 67: E Science As A Lens On The World   Lazowska

Pac BellNAP

SCMSacramento

NCARNational Center for

Atmospheric Research

SDSCSan DiegoSupercomputer Center

HSJHouston

DNJDenver

Ameritech NAPDNGChicago

NCSANational Center forSupercomputingApplications

NORCleveland

CTCCornell Theory

Center

PYMPerryman, MD

Sprint NAP

MFS NAP

PSCPittsburgh

SupercomputerCenter

C A

C

AC

C

RTOLos Angeles

C

AC

C

ASTAtlanta

C

C C AC

C

AC

C

C

C

WORNew York City

Ascend GRF 400

Cisco 7507

FORE ASX-1000

Network Access Point

TableA

C

DS-3

OC-3C

OC-12C

NSF vBNS, 1997

Page 68: E Science As A Lens On The World   Lazowska

UCAID Abilene network, 1998

Page 69: E Science As A Lens On The World   Lazowska
Page 70: E Science As A Lens On The World   Lazowska

ARPANET, 1983

Page 71: E Science As A Lens On The World   Lazowska

NSFNET, 1991

Page 72: E Science As A Lens On The World   Lazowska

DARPA / NGI Testbed, late 1990’s

Page 73: E Science As A Lens On The World   Lazowska

NSF vBNS, 1998

Page 74: E Science As A Lens On The World   Lazowska

Internet2, 1999

Page 75: E Science As A Lens On The World   Lazowska

SunnyvaleDenver

Seattle

LASan Diego

ChicagoPitts

Wash DC

Raleigh

Jacksonville

Atlanta

KC

Portland

Clev

Boise

Ogden

NLR MetaPOPNLR Regen or OADM

NLR Route

NLR NLR –– Optical Infrastructure Optical Infrastructure -- Phase 1Phase 1

National LambdaRail, 2004

Page 76: E Science As A Lens On The World   Lazowska

Who reaches K-12 institutions, CC’s, tribal colleges, libraries, telemedicine sites?

Page 77: E Science As A Lens On The World   Lazowska
Page 78: E Science As A Lens On The World   Lazowska

Who reaches unserved and underserved regions?

Page 79: E Science As A Lens On The World   Lazowska

The broadband stimulus

Page 80: E Science As A Lens On The World   Lazowska

The broadband stimulus

Page 81: E Science As A Lens On The World   Lazowska
Page 82: E Science As A Lens On The World   Lazowska
Page 83: E Science As A Lens On The World   Lazowska

Advances in computing change the way we live, work, learn, and communicateAdvances in computing drive advances in nearly all other fieldsAdvances in computing power our economy

Not just through the growth of the IT industry – through productivity growth across the entire economy

Computer science & engineering:Changing the world

Page 84: E Science As A Lens On The World   Lazowska

84

Page 85: E Science As A Lens On The World   Lazowska

85

Page 86: E Science As A Lens On The World   Lazowska

86

Page 87: E Science As A Lens On The World   Lazowska

87

Page 88: E Science As A Lens On The World   Lazowska

A day without the Internet and all that it enablesA day without diagnostic medical imagingA day during which automobiles lacked electronic ignition, antilock brakes, and electronic stability controlA day without digital media – without wireless telephones, high-definition televisions, MP3 audio, DVD video, computer animation, and videogamesA day during which aircraft could not fly, travelers had to navigate without benefit of GPS, weather forecasters had no models, banks and merchants could not transfer funds electronically, factory automation ceased to function, and the US military lacked technological supremacy

Imagine spending a day without information technology

Page 89: E Science As A Lens On The World   Lazowska

A day without the Internet and all that it enablesA day without diagnostic medical imagingA day during which automobiles lacked electronic ignition, antilock brakes, and electronic stability controlA day without digital media – without wireless telephones, high-definition televisions, MP3 audio, DVD video, computer animation, and videogamesA day during which aircraft could not fly, travelers had to navigate without benefit of GPS, weather forecasters had no models, banks and merchants could not transfer funds electronically, factory automation ceased to function, and the US military lacked technological supremacy

Imagine spending a day without information technology

Page 90: E Science As A Lens On The World   Lazowska

Research has built the foundation

Page 91: E Science As A Lens On The World   Lazowska

The future is full of opportunity

Creating the future of networkingDriving advances in all fields of science and engineeringRevolutionizing transportationPersonalized educationThe Smart GridPredictive, preventive, personalized medicineQuantum computingEmpowerment of the developing worldPersonalized health monitoring => quality of lifeNeuroboticsSynthetic biology

Page 92: E Science As A Lens On The World   Lazowska

Why do students choose the field?

Page 93: E Science As A Lens On The World   Lazowska

Power to change the world

People enter the field for a wide variety of aspirational reasonshttp://www.cs.washington.edu/WhyCSE/

Page 94: E Science As A Lens On The World   Lazowska

Pathways in computer science

People pursue diverse careers following their education in computer sciencehttp://www.cs.washington.edu/WhyCSE/

Page 95: E Science As A Lens On The World   Lazowska

A day in the life

Working in the software industry is creative, interactive, empoweringhttp://www.cs.washington.edu/WhyCSE/

Page 96: E Science As A Lens On The World   Lazowska

Science & Engineering Job Growth, 2006-16(BLS Occupational Employment Projections)

Page 97: E Science As A Lens On The World   Lazowska

Average annual projected growth rates for all 22 major occupational groups in Washington State, 2006 to 2016

Page 98: E Science As A Lens On The World   Lazowska

Occupations are ranked based on the average of two criteria: average annual growth rate for 2006 to 2016 and total number of job openings due to growth and replacement

Page 99: E Science As A Lens On The World   Lazowska

Science & Engineering Job Growth, 2006-16 (BLS Occupational Employment Projections)

23

98

13

9

4

University of Washington academic departments

Page 100: E Science As A Lens On The World   Lazowska

The transformation of our economy, and of educational requirements

Once upon a time, the “content” of the goods we produced was largely physical

Page 101: E Science As A Lens On The World   Lazowska

Then we transitioned to goods whose “content” was a balance of physical and intellectual

Page 102: E Science As A Lens On The World   Lazowska

In today’s knowledge-based economy, the “content”of goods is almost entirely intellectual rather than physical

Page 103: E Science As A Lens On The World   Lazowska

What kind of education is required to produce a good whose content is almost entirely intellectual?

Page 104: E Science As A Lens On The World   Lazowska

HIGH SCHOOLGRADUATION

PEERS: 7NATION: 32

BACHELOR’SPRODUCTION

PEERS: 8NATION: 36

S&E DEGREEPRODUCTION

PEERS: 7NATION: 31

K-12 ACHIEVEMENT:(2007 8TH GRADE NAEP &GRADUATION RATE FROMPOSTSECONDARY.ORG)

HIGHER EDUCATION:(BACHELOR’S PRODUCTION,S&E GRADUATE PARTICIPATION,S&E PhD PRODUCTION)

S&E GRADUATEPARTICIPATION

PEERS: 10NATION: 46

S&E PhDsAWARDED

PEERS: 10NATION: 27

LIFE/PHYS.SCIENTISTS

PEERS: 3NATION: 7

ENGINEERS

PEERS: 2NATION: 3

COMPUTERSPECIALISTS

PEERS: 6NATION: 8

The Nation’s S&E Top 5(Workforce intensity, all S&E occupations, 2006)

1. Virginia2. Massachusetts3. Maryland4. Washington5. Colorado

Education in Washington:  Where do we standamong the states?

Page 105: E Science As A Lens On The World   Lazowska

Washington State is gambling with its future!

How about you?

Page 106: E Science As A Lens On The World   Lazowska

This morning

The nature of eScienceThe advances that enable itScalable computing for everyoneBroadband, and the role of higher education“There’s only so much you can do with asphalt”Don’t gamble with the future of your region and of the kids who grow up there

http://lazowska.cs.washington.edu/wcet.pdf