1 berkeley rad lab technical approach armando fox, randy katz, michael jordan, dave patterson, scott...

1

Berkeley RAD LabTechnical Approach

Armando Fox, Randy Katz, Michael Jordan, Dave Patterson, Scott Shenker, Ion StoicaOctober 2005

2

RAD LabThe 5-year Vision:Single person can go from vision to a next-generation IT service (“the Fortune 1

million”) E.g., over long holiday weekend in 1995, Pierre Omidyar created Ebay v1.0

The Challenges: Develop the new Service Assess: Measuring, Testing, and Debugging the new Service in a realistic

distributed environment Deploy: Scaling up a new, geographically distributed Service Operate a service that could quickly scale to millions of users

The Vehicle:Interdisciplinary Center creates core technical competency to demo 10X to

100X Researchers are leaders in machine learning, networking, and systems Industrial Participants: leading companies in HW, systems SW, and online

services Called “RAD Lab” for Reliable, Adaptable, Distributed systems

3

RAD LabThe Science:Both shorter-term and longer-term solutions Develop using primitives functions (MapReduce), services (Craigslist) Assess/debug using deterministic replay and finding new metrics Deploy using “Internet-in-a-Box” via FPGAs under failure/slowdown

workloads Operate using Statistical Learning Theory-friendly, Control Theory-

friendly software architectures and visualization tools

Cap:Cap:DadoDado: :

(The section of a (The section of a pedestal between cap pedestal between cap

and base)and base)Base:Base:

Added Value to Industrial Participants: Working with leading people and companies from different

industries on long-range, pre-competitive technology Training of dozens of future leaders of IT in multiple disciplines,

and their recruitment by industrial participants Working with researchers with successful track record of rapid

transfer of new technology

4

Steps vs. Process

Process: SupportDADO Evolution, 1 group

Steps: Traditional, Static Handoff Model, N groups

Develop

Assess Deploy

Operate

Develop

Assess

Deploy

Operate

5

Create abstractions, primitives, & toolkit for large scale systems that make it easy to invent/deploy functions like MapReduce For example, Distributed Hash Tables

(OpenDHT), Rendezvous-based communication, (Internet Indirection Infrastructure), Weak-semantics tuple spaces

Already setting the trend for IETF standards

DADO - Develop

Application

Higher Functions (MapReduce)

Middleware (J2EE)

Libraries

Compilers/Debuggers

Operating System

Virtual Machine

Hardware

6

The opportunity of middleware: Middleware becoming dominant way to deploy commercial networked applications Innovate below abstraction Unmodified/proprietary apps deployed on improved middleware We put instrumentation and recovery support in middleware

Pinpoint (diagnosis) and Microreboot (fast recovery) added to J2EE server

Good news: Middleware imposes design constraints on applications that help recovery, e.g., Separation of state from app logic

DADO - Develop

7

First test: how easy to build MapReduce? Build simple versions of current generation

Internet apps and scale up Auctions, Craigslist, Email, Sales, Free DB, …

Test RAD vision and system in new courses and evolve system based on feedback 2007, 2008, 2009: Students from CS, SIMS, MBA Operate good services from classes afterwards at partners site?

A future Sergy Brin, Larry Page, Eric Brewer, or Pierre Omidyar in one of these classes?

DADO - Develop

8

“We improve what we can measure” Inspect box visibility into networks, usually data poor Servers data rich; data often discarded

Statistical and Machine Learning to the rescueIt works well when You have lots of raw data You have reason to believe the raw data is related to some

high-level effect you’re interested in You don’t have a model of what that relationship is

Note: SLT advances fast analysis

DADO - Assess

9

Example: Statistical Debugging Instrument programs to add predicates (assertions) via compiler Sparsely sample (~ 1%), recording predicates T/F and crashes Collect information over the Internet Learn a statistical classifier based on successful and failed runs,

using feature predicate selection + clustering methods to pinpoint the bugs

Found bugs in several open-source programs (moss, ccrypt, bc, rhythmbox, exif)

DADO - Assess

To learn more, see "To learn more, see "Scalable Statistical Bug Isolation," B. ," B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan, Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan, PLDIPLDI, , 2005. 2005.

10

DADO - Assess (MOSS results)

• Reinsert 9 known bugs in MOSS program to see if can findReinsert 9 known bugs in MOSS program to see if can find• Statistical debugging points out 7 of 9 (1 never occurred)Statistical debugging points out 7 of 9 (1 never occurred)

Bug Thermometer Predicate Line of Code #1 #2 #3 #4 #5 #6 #7 #8 #9files[filesindex].language > 16 5869 0 0 28 54 1585 0 0 0 68((*(fi + i)))->this.last_line == 1 5442 774 0 17 0 0 0 18 0 2token_index > 500 4325 31 0 16 711 0 0 0 0 47(p + passage_index___0)->last_token <= filesbase 5289 28 2 508 0 0 0 1 0 29__result___430 == 0 is TRUE 5789 16 0 0 9 19 291 0 0 13config.match_comment is TRUE 1994 791 2 23 1 0 5 11 0 41i___0 == yy_last_accepting_state 5300 55 0 21 0 0 3 7 0 769f < f 4497 3 144 2 2 0 0 0 0 5files[fileid].size < token_index 4850 31 0 10 633 0 0 0 0 40passage_index___0 == 293 5313 27 3 8 0 0 0 2 0 366((*(fi + i)))->other.last_line == yyleng 5444 776 0 16 0 0 0 18 0 1min_index == 64 5302 24 1 7 0 0 1 1 0 249((*(fi + i)))->this.last_line == yy_start 5442 771 0 18 0 0 0 19 0 0(passages + i)->fileid == 52 4576 24 0 477 14 24 0 1 0 14passage_index___0 == 25 5313 60 5 27 0 0 4 10 0 962strcmp > 0 4389 0 0 28 54 1584 0 0 0 68i > 500 4865 32 2 18 853 54 0 0 0 53token_sequence[token_index].val >= 100 4322 1250 3 28 38 0 15 19 0 65i == 50 5252 27 0 11 0 0 1 4 0 463passage_index___0 == 19 5313 59 5 28 0 0 4 10 0 958bytes <= filesbase 4481 1 0 19 0 0 0 0 0 1

11

Distributed debugging is very hard Services required to be up 24x7 Even rare failures are catastrophic

Very hard to reproduce in lab

Provide continuous logging and check-pointing Reproducible behavior via replaying

Leverage RAMP, Iboxes for deterministic replaying (interrupt at clock cycle 100M …) and “what-if” scenario analysis

Create an “open source” failure and slowdown (e.g, Ebates, Windows Minidump) respository with sanitized information (+ tools to sanitize), workloads so other researchers can help

DADO - Assess

12

DADO - Deploy

How can academics experiment withsystems of 1000+ nodes?

RAMP (Research Accelerator for Multiple Processors) for parallel HW & SW research Single FPGA hold ~ 25 CPUs + caches in 2005 ~$100k = ~4 FPGAs / board, ~4 DIMMs / FPGA ,10-20 boards

+ low-cost Storage Server over Ethernet 1000 CPUs, 256 MB DRAM/CPU, 20 GB disk storage/CPU

Pros: free “IP” (opencores.org), large scale, low purchase cost, low operation cost, change easy, trace easy, reproducible behavior, real ISA and OS, grows with Moore’s Law (2X CPUs, clock / 1.5 yrs)

Cons: Slow clock rate (100-200 MHz vs. 2-4 GHz)

13

Why RAMP Attractive to Research?Priorities for Research Parallel Computer

1a. Cost of purchase1b. Cost of ownership (staff to administer it)1c. Scalability/Reality (1000 nodes, “real” SW)4. Observability (measure, trace everything)5. Reproducibility (to debug, run experiments)6. Flexibility (change for different experiments)7. Credibility (results are believable for tech.

transfer)8. Performance

Note: Note: CommercialCommercial parallel computer parallel computer Performance #1 Performance #1

14

Why RAMP Attractive for Research? SMP, Cluster, Simulator v. RAMP SMP Cluster Simulate RAMP

Cost (1 CPU) F ($40k) B ($2k) A+ ($0k) A ($0.1k)

Cost of ownership A D A A

Scalability (1000) C A A A

Observability D C A+ A+

Reproducibility B D A+ A+

Community D A A A

Flexibility D C A+ A+

Credibility A+ A+ F A

Perform. (clock) A (2 GHz) A (3 GHz) F (0 GHz) C (0.2 GHz)

GPA C B- B A-

15

DADO - Deploy

Re-engineer RAMP to act like 1000+ node distributed system under realistic failure and slowdown workloads Same HW emulates data center as well as wide area systems Embed Emulab and ModelNet emulation test beds Have synthetic time, checkpoint/restart, clock cycle accurate

reproducibility, dedicated use of large system, trace anything, … Researchers should be able to develop in similar environment that led

to innovations like MapReduce

Failure data collection of PlanetLab et al => failure and slowdown workload

16

DADO - Operate

• Idea: when site misbehaves, users notice, and change their behavior; use as “failure detector”

• Approach: combine visualization with Statistical Learning Theory (SLT, aka machine learning) analysis so operator see anomalies too

• Experiment: does distribution of hits to various pages match the “historical” distribution? Each minute, compare hit counts of top N pages to hit counts over

last 6 hours using Bayesian networks and 2 test, real Ebates data

To learn more, see “Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization,” In Proc. 2nd IEEE Int’l Conf. on Autonomic Computing, June 2005, by Peter Bodik, Greg Friedman, Lukas Biewald, Helen Levine (Ebates,com), George Candea, Kayur Patel, Gilman Tolle, Jon Hui, Armando Fox, Michael I. Jordan, David Patterson.

17

Visualization as user behavior completely different; usually animate architectureSeeing is believing:Win trust in SLT by leveraging operator expertise and human visual pattern recognition

TopTop4040

PagesPages

Time (5 minute intervals)Time (5 minute intervals)

18

Maintaining Quality of Service in presence of DDOS, flash crowds, ... is critical

Key observation: network service failures attributed to unexpected traffic patterns

Key Approach: identify and protect “good” traffic vs. discard “bad” traffic

Create “Inspection-and-Action” Boxes Deep multiprotocol packet inspection Exploit SLT to discover “Normal” model + anomaly detection Mark and annotate packets to add info / Prioritize and throttle Evolve network architecture, e.g., include annotation layer

DADO - Operate

19

RAD Lab Opportunity: New Research Model

Chance to Partner with the Top University in Computer Systems on the “Next Great Thing” National Academy of Engineering mentions Berkeley in 7 of 19 $1B+

industries that came from IT research NAE mentions Berkeley 7 times, Stanford 5 Times, MIT 5, CMU 3

Timesharing (SDS 940), Client-Server Computing (BSD Unix), Graphics, Entertainment, Internet, LANs, Workstations, GUI, VLSI Design (Spice) [ECAD $5B?/yr] , RISC [$10B?/yr] , Relational DB (Ingres/Postgres) [RDB $15B?/yr], Parallel DB, Data Mining, Parallel Computing, RAID [$15B?/yr] , Portable Communication (BWRC), WWW, Speech Recognition, Broadband

Berkeley one of the top suppliers of systems students to industry and academia

US News & World Report ranking of CS Systems universities:1 Berkeley, 2 CMU, 2 MIT, 4 Stanford, 5 Washington

For example: Quanta (Taiwan PC laptop clone manufacturer) funds MIT CSAIL @ $4M/year for 5 years to reinvent PC April 2005 (“Tparty”)

RAID project (4 faculty, 20 grads, 10 undergrads) helped create $15B industry, but not fundable today at DARPA, NSF

20

RAID Alumni 10 years later

Industry managers: AT&T, HP, IBM, Microsoft, Sun, … Founders of Startups: Electric Cloud, Panasas, VMware, … Professors: CMU, Stanford, Michigan, Arizona, UCSC

21

Founding the RADLab; Start 12/1 $2.5M / yr (1/2 BWRC) 70% industry,

20% state, 10% fed gov’t 25 grad students + 15 undergrads+ 6 faculty + 2 staff

Looking for 3 to 4 founding companies to fund ≈ 3-5 years @ cost of $0.5M / year Follow SNRC (Stanford Network Research Center), BWRC (Berkeley Wireless

R. C.)

Feedback on forming consortium? Prefer founding partner technology in prototypes Designate employees to act as consultants Head start for participants on research results Putting IP in Public Domain so partners not sued

Press release of founding RAD Lab partners December 1 Mid project review after 3 years by founding partners

22

Critical Mass vs. Spreading $ Thinly Seems safer to spend $50k at 10 universities

vs. $500k at 1 university But still get diversity, portfolio effect across fields

N faculty and grad students in N areas Improved student-to-student and student-to-prof training across fields

But critical mass on coherent systems project much greater chance of technical success (in our experience) E.g., BSD Unix, Ingres, Postgres, RISC, RAID, NOW all had critical mass

But less management overhead in critical mass model Less industry time for interaction/travel, Less $ for faculty support

More students supported with less hassle in critical mass model

But much more influence on directions, participants “$50k like a cousin; $500k like a spouse” Preference to partner’s technology

23

RAD Lab Model

Preference to Partner’s Technology √

Partner employees advises design teams

Attend Two 3-day Reviews √ √

6-month delay on Review presentations

Annual Contribution ≥$500k ≥$50k

FoundationMember

AffiliateMember

(Contribution counts towards CITRIS donation)(Contribution matching state $ via MICRO, UC Discovery)

(5 students) (1/2 student)(5 students) (1/2 student)

√

√ √

24

• Working with different industries on long-range, pre-competitive technology• Training of dozens of future leaders of IT, plus their recruitment• Working with researchers with track records of successful technology transfer

RAD Lab: Interdisciplinary Center for Reliable, Adaptive, Distributed Systems

Develop using primitives to enable functions (MapReduce), services (Craigslist)Assess using deterministic replay and statistical debuggingDeploy via “Internet-in-a-Box” FPGAsOperate SLT-friendly, Control Theory-friendly architectures and operator-centric visualization and analysis tools

CapabilityCapability (Desired): (Desired): 1 person can invent & run the next-gen IT service

BaseBase Technology: Technology:Server Hardware, System Server Hardware, System Software,Software,Middleware, NetworkingMiddleware, Networking

25

Backup Slides

26

References

• “Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization,” In Proc. 2nd IEEE Int’l Conf. on Autonomic Computing, June 2005, by Peter Bodik, Greg Friedman, Lukas Biewald, Helen Levine (Ebates,com), George Candea, Kayur Patel, Gilman Tolle, Jon Hui, Armando Fox, Michael I. Jordan, David Patterson.

• “Microreboot -- A Technique for Cheap Recovery,” George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, and Armando Fox. Proc. 6th Symp. on Operating Systems Design and Implementation (OSDI), San Francisco, CA, Dec. 2004.

• “Path-Based Failure and Evolution Management,” Mike Y. Chen, Anthony Accardi, Emre Kiciman, Jim Lloyd, Dave Patterson, Armando Fox, and Eric Brewer In Proc. 1st USENIX/ACM Symp. on Networked Systems Design and Implementation (NSDI '04), San Francisco, CA, March 2004.

• ""Scalable Statistical Bug Isolation," Ben Liblit, M. Naik, Alice. X. Zheng, Alex ," Ben Liblit, M. Naik, Alice. X. Zheng, Alex

Aiken, and Micheal I. Jordan, Aiken, and Micheal I. Jordan, PLDIPLDI, 2005., 2005.

To learn more, see

27

Sustaining Innovation/Training Engine in 21st Century Replicate research centers based

primarily on industrial funding to expand IT market and to train next generation of IT leaders Berkeley Wireless Research Center (BWRC):

50 grad students, 30 undergrads @ $5M per year Stanford Network Research Center (SNRC):

50 Grad students @ $5M per year MIT Tparty $4M per year (100% $ from Quanta) Industry largely funds

N companies, where N is 5? Exciting, long term technical vision

Demonstrated by prototype(s)

28

State of Research Funding Today Most industry research shorter term DARPA exiting long-term (exp.) IT research

’03-’05 BAAs IPTO: 9 AI, 2 classified, 1 SW radio, 1 sensor net, 1 reliability, all have 12 to 18 month “go/no go” milestones

Academic led funding reduced 50% (so far) 2001 to 2004 Faculty ≈ consultants in consortia led by defense contractor,

get grants ≈ support 1-2 students (~ NSF funding level)

NSF swamped with proposals, conservative 2000 to 6500 proposals in 5 years

IT has lowest acceptance rate at NSF (between 8% to 16%) “Ambitious proposal” is a negative review Even if get NSF funding, proposal reduced to stretch NSF $

e.g., got 3 x 1/3 faculty, 6 grad students, 0 staff, 3 years

(To learn more, see www.cra.org/research)

29

RAD Lab Timeline

2005 Launch RAD Lab 2006 Collect workloads, Internet in a Box 2007 SLT/CT distributed architectures, Iboxes,

annotative layer, class testing 2008 Development toolkit 1.0, tuple space,

class testing; Mid Project Review 2009 RAD Lab software suite 1.0, class testing 2010 End of Project Party

30

Guide to Visualization

Multiple interesting & useful predicate metrics Graphical representation helps reveal trends

error bound

log(Number runs P observed)

Context(P)

Fails whether Fails whether or not P trueor not P true

Increase(P)How much P true increases probabilityHow much P true increases probability

S(P)

Succeeds Succeeds despite P truedespite P true

1 berkeley rad lab technical approach armando fox, randy katz, michael jordan, dave patterson, scott...

Documents

new courses

new serviceassess

new metricsdeploy

test rad vision

recovery support

distributed serviceoperate

primitives functions

inventdeploy functions