1 berkeley rad lab technical approach armando fox, randy katz, michael jordan, dave patterson, scott...
TRANSCRIPT
1
Berkeley RAD LabTechnical Approach
Armando Fox, Randy Katz, Michael Jordan, Dave Patterson, Scott Shenker, Ion StoicaOctober 2005
2
RAD LabThe 5-year Vision:Single person can go from vision to a next-generation IT service (“the Fortune 1
million”) E.g., over long holiday weekend in 1995, Pierre Omidyar created Ebay v1.0
The Challenges: Develop the new Service Assess: Measuring, Testing, and Debugging the new Service in a realistic
distributed environment Deploy: Scaling up a new, geographically distributed Service Operate a service that could quickly scale to millions of users
The Vehicle:Interdisciplinary Center creates core technical competency to demo 10X to
100X Researchers are leaders in machine learning, networking, and systems Industrial Participants: leading companies in HW, systems SW, and online
services Called “RAD Lab” for Reliable, Adaptable, Distributed systems
3
RAD LabThe Science:Both shorter-term and longer-term solutions Develop using primitives functions (MapReduce), services (Craigslist) Assess/debug using deterministic replay and finding new metrics Deploy using “Internet-in-a-Box” via FPGAs under failure/slowdown
workloads Operate using Statistical Learning Theory-friendly, Control Theory-
friendly software architectures and visualization tools
Cap:Cap:DadoDado: :
(The section of a (The section of a pedestal between cap pedestal between cap
and base)and base)Base:Base:
Added Value to Industrial Participants: Working with leading people and companies from different
industries on long-range, pre-competitive technology Training of dozens of future leaders of IT in multiple disciplines,
and their recruitment by industrial participants Working with researchers with successful track record of rapid
transfer of new technology
4
Steps vs. Process
Process: SupportDADO Evolution, 1 group
Steps: Traditional, Static Handoff Model, N groups
Develop
Assess Deploy
Operate
Develop
Assess
Deploy
Operate
5
Create abstractions, primitives, & toolkit for large scale systems that make it easy to invent/deploy functions like MapReduce For example, Distributed Hash Tables
(OpenDHT), Rendezvous-based communication, (Internet Indirection Infrastructure), Weak-semantics tuple spaces
Already setting the trend for IETF standards
DADO - Develop
Application
Higher Functions (MapReduce)
Middleware (J2EE)
Libraries
Compilers/Debuggers
Operating System
Virtual Machine
Hardware
6
The opportunity of middleware: Middleware becoming dominant way to deploy commercial networked applications Innovate below abstraction Unmodified/proprietary apps deployed on improved middleware We put instrumentation and recovery support in middleware
Pinpoint (diagnosis) and Microreboot (fast recovery) added to J2EE server
Good news: Middleware imposes design constraints on applications that help recovery, e.g., Separation of state from app logic
DADO - Develop
7
First test: how easy to build MapReduce? Build simple versions of current generation
Internet apps and scale up Auctions, Craigslist, Email, Sales, Free DB, …
Test RAD vision and system in new courses and evolve system based on feedback 2007, 2008, 2009: Students from CS, SIMS, MBA Operate good services from classes afterwards at partners site?
A future Sergy Brin, Larry Page, Eric Brewer, or Pierre Omidyar in one of these classes?
DADO - Develop
8
“We improve what we can measure” Inspect box visibility into networks, usually data poor Servers data rich; data often discarded
Statistical and Machine Learning to the rescueIt works well when You have lots of raw data You have reason to believe the raw data is related to some
high-level effect you’re interested in You don’t have a model of what that relationship is
Note: SLT advances fast analysis
DADO - Assess
9
Example: Statistical Debugging Instrument programs to add predicates (assertions) via compiler Sparsely sample (~ 1%), recording predicates T/F and crashes Collect information over the Internet Learn a statistical classifier based on successful and failed runs,
using feature predicate selection + clustering methods to pinpoint the bugs
Found bugs in several open-source programs (moss, ccrypt, bc, rhythmbox, exif)
DADO - Assess
To learn more, see "To learn more, see "Scalable Statistical Bug Isolation," B. ," B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan, Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan, PLDIPLDI, , 2005. 2005.
10
DADO - Assess (MOSS results)
• Reinsert 9 known bugs in MOSS program to see if can findReinsert 9 known bugs in MOSS program to see if can find• Statistical debugging points out 7 of 9 (1 never occurred)Statistical debugging points out 7 of 9 (1 never occurred)
Bug Thermometer Predicate Line of Code #1 #2 #3 #4 #5 #6 #7 #8 #9files[filesindex].language > 16 5869 0 0 28 54 1585 0 0 0 68((*(fi + i)))->this.last_line == 1 5442 774 0 17 0 0 0 18 0 2token_index > 500 4325 31 0 16 711 0 0 0 0 47(p + passage_index___0)->last_token <= filesbase 5289 28 2 508 0 0 0 1 0 29__result___430 == 0 is TRUE 5789 16 0 0 9 19 291 0 0 13config.match_comment is TRUE 1994 791 2 23 1 0 5 11 0 41i___0 == yy_last_accepting_state 5300 55 0 21 0 0 3 7 0 769f < f 4497 3 144 2 2 0 0 0 0 5files[fileid].size < token_index 4850 31 0 10 633 0 0 0 0 40passage_index___0 == 293 5313 27 3 8 0 0 0 2 0 366((*(fi + i)))->other.last_line == yyleng 5444 776 0 16 0 0 0 18 0 1min_index == 64 5302 24 1 7 0 0 1 1 0 249((*(fi + i)))->this.last_line == yy_start 5442 771 0 18 0 0 0 19 0 0(passages + i)->fileid == 52 4576 24 0 477 14 24 0 1 0 14passage_index___0 == 25 5313 60 5 27 0 0 4 10 0 962strcmp > 0 4389 0 0 28 54 1584 0 0 0 68i > 500 4865 32 2 18 853 54 0 0 0 53token_sequence[token_index].val >= 100 4322 1250 3 28 38 0 15 19 0 65i == 50 5252 27 0 11 0 0 1 4 0 463passage_index___0 == 19 5313 59 5 28 0 0 4 10 0 958bytes <= filesbase 4481 1 0 19 0 0 0 0 0 1
11
Distributed debugging is very hard Services required to be up 24x7 Even rare failures are catastrophic
Very hard to reproduce in lab
Provide continuous logging and check-pointing Reproducible behavior via replaying
Leverage RAMP, Iboxes for deterministic replaying (interrupt at clock cycle 100M …) and “what-if” scenario analysis
Create an “open source” failure and slowdown (e.g, Ebates, Windows Minidump) respository with sanitized information (+ tools to sanitize), workloads so other researchers can help
DADO - Assess
12
DADO - Deploy
How can academics experiment withsystems of 1000+ nodes?
RAMP (Research Accelerator for Multiple Processors) for parallel HW & SW research Single FPGA hold ~ 25 CPUs + caches in 2005 ~$100k = ~4 FPGAs / board, ~4 DIMMs / FPGA ,10-20 boards
+ low-cost Storage Server over Ethernet 1000 CPUs, 256 MB DRAM/CPU, 20 GB disk storage/CPU
Pros: free “IP” (opencores.org), large scale, low purchase cost, low operation cost, change easy, trace easy, reproducible behavior, real ISA and OS, grows with Moore’s Law (2X CPUs, clock / 1.5 yrs)
Cons: Slow clock rate (100-200 MHz vs. 2-4 GHz)
13
Why RAMP Attractive to Research?Priorities for Research Parallel Computer
1a. Cost of purchase1b. Cost of ownership (staff to administer it)1c. Scalability/Reality (1000 nodes, “real” SW)4. Observability (measure, trace everything)5. Reproducibility (to debug, run experiments)6. Flexibility (change for different experiments)7. Credibility (results are believable for tech.
transfer)8. Performance
Note: Note: CommercialCommercial parallel computer parallel computer Performance #1 Performance #1
14
Why RAMP Attractive for Research? SMP, Cluster, Simulator v. RAMP SMP Cluster Simulate RAMP
Cost (1 CPU) F ($40k) B ($2k) A+ ($0k) A ($0.1k)
Cost of ownership A D A A
Scalability (1000) C A A A
Observability D C A+ A+
Reproducibility B D A+ A+
Community D A A A
Flexibility D C A+ A+
Credibility A+ A+ F A
Perform. (clock) A (2 GHz) A (3 GHz) F (0 GHz) C (0.2 GHz)
GPA C B- B A-
15
DADO - Deploy
Re-engineer RAMP to act like 1000+ node distributed system under realistic failure and slowdown workloads Same HW emulates data center as well as wide area systems Embed Emulab and ModelNet emulation test beds Have synthetic time, checkpoint/restart, clock cycle accurate
reproducibility, dedicated use of large system, trace anything, … Researchers should be able to develop in similar environment that led
to innovations like MapReduce
Failure data collection of PlanetLab et al => failure and slowdown workload
16
DADO - Operate
• Idea: when site misbehaves, users notice, and change their behavior; use as “failure detector”
• Approach: combine visualization with Statistical Learning Theory (SLT, aka machine learning) analysis so operator see anomalies too
• Experiment: does distribution of hits to various pages match the “historical” distribution? Each minute, compare hit counts of top N pages to hit counts over
last 6 hours using Bayesian networks and 2 test, real Ebates data
To learn more, see “Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization,” In Proc. 2nd IEEE Int’l Conf. on Autonomic Computing, June 2005, by Peter Bodik, Greg Friedman, Lukas Biewald, Helen Levine (Ebates,com), George Candea, Kayur Patel, Gilman Tolle, Jon Hui, Armando Fox, Michael I. Jordan, David Patterson.
17
Visualization as user behavior completely different; usually animate architectureSeeing is believing:Win trust in SLT by leveraging operator expertise and human visual pattern recognition
TopTop4040
PagesPages
Time (5 minute intervals)Time (5 minute intervals)
18
Maintaining Quality of Service in presence of DDOS, flash crowds, ... is critical
Key observation: network service failures attributed to unexpected traffic patterns
Key Approach: identify and protect “good” traffic vs. discard “bad” traffic
Create “Inspection-and-Action” Boxes Deep multiprotocol packet inspection Exploit SLT to discover “Normal” model + anomaly detection Mark and annotate packets to add info / Prioritize and throttle Evolve network architecture, e.g., include annotation layer
DADO - Operate
19
RAD Lab Opportunity: New Research Model
Chance to Partner with the Top University in Computer Systems on the “Next Great Thing” National Academy of Engineering mentions Berkeley in 7 of 19 $1B+
industries that came from IT research NAE mentions Berkeley 7 times, Stanford 5 Times, MIT 5, CMU 3
Timesharing (SDS 940), Client-Server Computing (BSD Unix), Graphics, Entertainment, Internet, LANs, Workstations, GUI, VLSI Design (Spice) [ECAD $5B?/yr] , RISC [$10B?/yr] , Relational DB (Ingres/Postgres) [RDB $15B?/yr], Parallel DB, Data Mining, Parallel Computing, RAID [$15B?/yr] , Portable Communication (BWRC), WWW, Speech Recognition, Broadband
Berkeley one of the top suppliers of systems students to industry and academia
US News & World Report ranking of CS Systems universities:1 Berkeley, 2 CMU, 2 MIT, 4 Stanford, 5 Washington
For example: Quanta (Taiwan PC laptop clone manufacturer) funds MIT CSAIL @ $4M/year for 5 years to reinvent PC April 2005 (“Tparty”)
RAID project (4 faculty, 20 grads, 10 undergrads) helped create $15B industry, but not fundable today at DARPA, NSF
20
RAID Alumni 10 years later
Industry managers: AT&T, HP, IBM, Microsoft, Sun, … Founders of Startups: Electric Cloud, Panasas, VMware, … Professors: CMU, Stanford, Michigan, Arizona, UCSC
21
Founding the RADLab; Start 12/1 $2.5M / yr (1/2 BWRC) 70% industry,
20% state, 10% fed gov’t 25 grad students + 15 undergrads+ 6 faculty + 2 staff
Looking for 3 to 4 founding companies to fund ≈ 3-5 years @ cost of $0.5M / year Follow SNRC (Stanford Network Research Center), BWRC (Berkeley Wireless
R. C.)
Feedback on forming consortium? Prefer founding partner technology in prototypes Designate employees to act as consultants Head start for participants on research results Putting IP in Public Domain so partners not sued
Press release of founding RAD Lab partners December 1 Mid project review after 3 years by founding partners
22
Critical Mass vs. Spreading $ Thinly Seems safer to spend $50k at 10 universities
vs. $500k at 1 university But still get diversity, portfolio effect across fields
N faculty and grad students in N areas Improved student-to-student and student-to-prof training across fields
But critical mass on coherent systems project much greater chance of technical success (in our experience) E.g., BSD Unix, Ingres, Postgres, RISC, RAID, NOW all had critical mass
But less management overhead in critical mass model Less industry time for interaction/travel, Less $ for faculty support
More students supported with less hassle in critical mass model
But much more influence on directions, participants “$50k like a cousin; $500k like a spouse” Preference to partner’s technology
23
RAD Lab Model
Preference to Partner’s Technology √
Partner employees advises design teams
Attend Two 3-day Reviews √ √
6-month delay on Review presentations
Annual Contribution ≥$500k ≥$50k
FoundationMember
AffiliateMember
(Contribution counts towards CITRIS donation)(Contribution matching state $ via MICRO, UC Discovery)
(5 students) (1/2 student)(5 students) (1/2 student)
√
√ √
24
• Working with different industries on long-range, pre-competitive technology• Training of dozens of future leaders of IT, plus their recruitment• Working with researchers with track records of successful technology transfer
RAD Lab: Interdisciplinary Center for Reliable, Adaptive, Distributed Systems
Develop using primitives to enable functions (MapReduce), services (Craigslist)Assess using deterministic replay and statistical debuggingDeploy via “Internet-in-a-Box” FPGAsOperate SLT-friendly, Control Theory-friendly architectures and operator-centric visualization and analysis tools
CapabilityCapability (Desired): (Desired): 1 person can invent & run the next-gen IT service
BaseBase Technology: Technology:Server Hardware, System Server Hardware, System Software,Software,Middleware, NetworkingMiddleware, Networking
25
Backup Slides
26
References
• “Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization,” In Proc. 2nd IEEE Int’l Conf. on Autonomic Computing, June 2005, by Peter Bodik, Greg Friedman, Lukas Biewald, Helen Levine (Ebates,com), George Candea, Kayur Patel, Gilman Tolle, Jon Hui, Armando Fox, Michael I. Jordan, David Patterson.
• “Microreboot -- A Technique for Cheap Recovery,” George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, and Armando Fox. Proc. 6th Symp. on Operating Systems Design and Implementation (OSDI), San Francisco, CA, Dec. 2004.
• “Path-Based Failure and Evolution Management,” Mike Y. Chen, Anthony Accardi, Emre Kiciman, Jim Lloyd, Dave Patterson, Armando Fox, and Eric Brewer In Proc. 1st USENIX/ACM Symp. on Networked Systems Design and Implementation (NSDI '04), San Francisco, CA, March 2004.
• ""Scalable Statistical Bug Isolation," Ben Liblit, M. Naik, Alice. X. Zheng, Alex ," Ben Liblit, M. Naik, Alice. X. Zheng, Alex
Aiken, and Micheal I. Jordan, Aiken, and Micheal I. Jordan, PLDIPLDI, 2005., 2005.
To learn more, see
27
Sustaining Innovation/Training Engine in 21st Century Replicate research centers based
primarily on industrial funding to expand IT market and to train next generation of IT leaders Berkeley Wireless Research Center (BWRC):
50 grad students, 30 undergrads @ $5M per year Stanford Network Research Center (SNRC):
50 Grad students @ $5M per year MIT Tparty $4M per year (100% $ from Quanta) Industry largely funds
N companies, where N is 5? Exciting, long term technical vision
Demonstrated by prototype(s)
28
State of Research Funding Today Most industry research shorter term DARPA exiting long-term (exp.) IT research
’03-’05 BAAs IPTO: 9 AI, 2 classified, 1 SW radio, 1 sensor net, 1 reliability, all have 12 to 18 month “go/no go” milestones
Academic led funding reduced 50% (so far) 2001 to 2004 Faculty ≈ consultants in consortia led by defense contractor,
get grants ≈ support 1-2 students (~ NSF funding level)
NSF swamped with proposals, conservative 2000 to 6500 proposals in 5 years
IT has lowest acceptance rate at NSF (between 8% to 16%) “Ambitious proposal” is a negative review Even if get NSF funding, proposal reduced to stretch NSF $
e.g., got 3 x 1/3 faculty, 6 grad students, 0 staff, 3 years
(To learn more, see www.cra.org/research)
29
RAD Lab Timeline
2005 Launch RAD Lab 2006 Collect workloads, Internet in a Box 2007 SLT/CT distributed architectures, Iboxes,
annotative layer, class testing 2008 Development toolkit 1.0, tuple space,
class testing; Mid Project Review 2009 RAD Lab software suite 1.0, class testing 2010 End of Project Party
30
Guide to Visualization
Multiple interesting & useful predicate metrics Graphical representation helps reveal trends
error bound
log(Number runs P observed)
Context(P)
Fails whether Fails whether or not P trueor not P true
Increase(P)How much P true increases probabilityHow much P true increases probability
S(P)
Succeeds Succeeds despite P truedespite P true