a tentative proposal for istore-2 winfried w. wilcke [email protected] (408) 927-2139 almaden...

33
A Tentative Proposal for ISTORE-2 Winfried W. Wilcke [email protected] (408) 927-2139 Almaden Research Center July 18, 2000 Richard C. Booth [email protected] m (408) 927-1879 Almaden Research Center David A. Patterson [email protected] (510) 642-6587 University of California, Berkeley

Post on 20-Dec-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

A Tentative Proposal for ISTORE-2

Winfried W. [email protected](408) 927-2139Almaden Research Center

July 18, 2000

Richard C. [email protected](408) 927-1879Almaden Research Center

David A. [email protected](510) 642-6587University of California, Berkeley

Underlying Beliefs...• Commodity components are quickly winning the server

wars– Gigabit Ethernet will win everything

– x86 Processors

– Linux OS will prosper

• Large servers (100-10k nodes) will be quite common - and most are storage centric

• What matters most:– Ease of management, density of nodes and seamless

geographical interconnect

Generations of IStore• IStore = IStore-1: Present UCB Project• IStore-2: Joint Research Prototype

– ~2000 nodes

– Split between UCB, IBM and others

– Hardware similar to IStore-1

– Focus on real applications and management software

– Operational YE 2001

• Follow-on Work

Talk Outline

• Project Goals

• Applications

• Research Topics

• Hardware Architecture

• Development Schedule

• Working Relationships

• Next Steps

Applications&

Research Topics

Candidate Applications

• Research Focus– NOAA Severe Weather Warning (R. Arps, ARC)

– Fast Image Recognition (J. Malik, UCB)

• Commercial Focus– Scalable E-business server (IGS) - a must !

– Deep Searching of Entire Web; Webfountain (N. Pass)

– (tbd) Large Scale Network Attached Server (J. Palmer)

– (tbd) Speech Recognition Farms for Phone-based Special Web-services

NOAA Severe Weather.... Ron Arps

• Doppler Radar enables detection of violent tornadoes

and plane crashes due to windshear • Doubled warning time for residents in Oklahoma

during '99 class 5 outbreaks– Goal: 15 minutes avg. warning time in 2004

• Eventually 120 radar sites will be established

• Matches well with I-Store characteristics– Needs scalable local storage/processing plus seamless transfer

of data on geographical scale, manageable from one site

WebfountainNorm Pass

• Index entire Web every few weeks– Google, Northernlight index 25%

• 4 TB index => 200 TB in two years

• 'Miner' technology demonstrated– Resumes, Prices, Geospatial,...– Prototype running on a 30 node Linux farm

Software Model

• Users will see a standard Linux farm (shared nothing) programming model– No porting effort for existing Linux farm

applications (except dealing with different versions of Linux, of course)

• The system management functions are only visible to system administrators– Exception are performance monitoring functions

useful for tuning apps

Differences to a Linux Farm

• Much higher spatial density of Nodes or ‘Bricks’

• Single network protocol (Ethernet) for ALL off-node communications

• Design with geographical distribution in mind

• Diagnostic Processors

• Lego-like, standardized building blocks – Regular and relaxed homogeneous

• Monitoring Hardware

• Measuring of relevant environmental parameters

• (New) System Management Language

• AME, SON and RAIN objectives

AME, RAIN and SON

• Three areas of system research to be explored with I-Store

• These three areas are largely independent of each other

AME• Availability

– No single points of failure

– Introspection, failover and fast failure

– Fast repair by swapping identical blocks

• Maintainability– Homogenous structure

– System management language

• Extensibility/Scalability– Shared nothing architecture

RAIN• Redundant Array of Inexpensive Network

(Switches)• Issues to be explored

– Optimal topology

– Density/cost of ports, optics vs. copper

– Routing algorithms within a machine

– Need for TCP hardware acceleration

– Performance of Ethernet protocol

– Frame sizes

– Simplified switches

SON

• Storage Oriented Nodes• Basic Premise of one node=one disk=one

processor– It works in farms, but is it a good general choice?

– Is the loss of flexibility (in the ratio of disks per processor) a good tradeoff for easier management?

Additional Software Research Topics...

• Define AME, RAIN, SON benchmarks• Server Management Language• Parallel Searching of geographically

distributed database• Dynamic Resource Allocation (i.e. Firewalls)• SCSI over TCP/IP (SAN within I-Store)• Storage for mobile users (a’la Ocean Store)

System Management Language

• Define a high-level, interpretive(?) system management language– May use facilities of system OS

• Highly regular I-Store is the first target• Sample Verbs

– allocate, protect, share, map, backup, restore, copy, correlate, display, discover, ping, initialize, report, arm, define(node)....

System Management Language

• Should easily describe tasks such as:– Backup all data located in the Philippines to Colorado (a

volcano is about to blow)

– Set alarm if any disk is more than 80% full

– Define protected subregions in the system

– Display CPU utilization by time and state

– Discover present routing topology

– Show 3D correlation plot of disk vibration vs brick temperature vs. actual failure events

– .....

Hardware ArchitectureDevelopment Schedule

&Working Relationships

IStore HardwareArchitecture Goals

• Seamless Scalability– O(10,000) AME Storage Nodes

– Optimized Storage Brick for Packaging Density

• Geographically Disperse Nodes– Gb Ethernet Connections to WAN Routers

• Storage Brick – Full PME Brick: Processor, Memory, Cache

– Gb Ethernet as the Sole Interconnection Fabric

– Imbedded Disk with 10s GBytes

IStore HardwareArchitecture Goals (cont.)

• State-of-the-art Intel Processor Memory Element (PME) – 650 MHz Pentium III with 100 MHz System Bus

– 256 KB L2 cache

– O(512MB) main memory

• State-of-the-art Interconnect Fabric– 1 Gb Ethernet Runtime Network

– 10/100 Mb Ethernet Diagnostic Network

• State-of-the-art Disks– 2.5" ~32 GB drive

IStore HardwareArchitecture Goals (cont.)

• Berkeley AME Hardware Management Support– Diagnostic processor

– Environmental sensors

• TCP/IP Hardware Accelerator– Class 4: Hardware State Machine

• SCSI over TCP ("iSCSI") Support• Compatible with Standard Ethernet

Switches/Routers

IStore-1Current Berkeley Design

• 80 nodes

• AME

• 266 MHz Pentium II

• Four 100 MB Ethernet Ports/brick

• Integrated UPS

IStore-2Deltas from IStore-1

• Geographically Disperse Nodes– O(1000) nodes at Almaden

– O(1000) nodes at Berkeley

• Upgraded Storage Brick– Pentium III 650 MHz Processor

– Two Gb Ethernet Copper Ports/brick

– One 2.5" ATA disk

• User Supplied UPS Support

• Standard Ethernet Switches

Follow on Work

• Ethernet Sourced in Memory Controller (North Bridge)

• TCP/IP Hardware Accelerator– Class 4: Hardware State Machine

• SCSI over TCP Support

• Integrated UPS

Why an IStore-2 PrototypeIs Interesting

• Storage Bricks– New ratios for MIPS/bandwidth/storage

– New level of density

• AME Hardware Support– Seamless scaling

– Self maintaining nodes

• It Exists

IStore-2Core Design Team

• IBM (full time)– System Architect: Winfried Wilcke– Lead Designer: Richard Booth– 1 Experienced Hardware Designer: tbd– 3 Designers: tbd

• Berkeley– 6 Graduate Students

IStore-2Development Schedule

• Working Model– 7/00: Agreement in Principle

– 8/00: Working Team Membership

• Design– 9/00: Architecture Specification version 1.0

– 11/00: Design Workbook version 1.0

• Implementation– 2Q/01: First 3 Nodes Power-up

– 3Q/01: O(64) nodes available to users

– 4Q/01: O(2000) nodes available to users

IStore-2 Footprint(per 1000 nodes)

• 16 Storage (19") Racks – 64 Storage bricks/rack

• 8 type 1 storage bricks/drawer

• 8 storage drawers/rack

– Ethernet switches in rack

• 8 Global Ethernet Switch (19") Racks

• Requires 600 sq.. ft lab

IStore-2 PlatformRequired Resources

• Staffing– 6 ARC/SSD IBMers

– 6 UCB Graduate Students

• Lab Space– 600 sq. ft. lab at Almaden

– 600 sq. ft. lab at Berkeley

• Hardware Costs– $3M (mostly 2001 dollars)

IStore-2Working Model

• Jointly Authored Architecture Specification– 1 or 2 Almaden authors

– 1 or 2 Berkeley authors

• Design Workbook– Each Core Team Member owns a section

• Weekly Half Day Working Face-to-face Meetings– Alternate between Almaden and Berkeley

• Shared Electronic Documentation

• Machine Available -for free- to Users From Either Institution

• IP is Handled Like Previous IBM/UCB Projects ??

• Fabrication (some design ?) Vendored Out

Next Steps

• Continue to Seek Feedback on Proposal

• Funding Discussion– IBM– Berkeley

• Form IBM Team

• Begin Regular Working Meetings

• Begin Architectural Design