istore software runtime architecture

37
Slide 1 ISTORE Software Runtime Architecture Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Jim Beck, John Kubiatowicz, and David Patterson http://iram.cs.berkeley.edu/istore 1999 Winter IRAM Retreat

Upload: sarila

Post on 25-Feb-2016

63 views

Category:

Documents


2 download

DESCRIPTION

ISTORE Software Runtime Architecture. Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Jim Beck, John Kubiatowicz, and David Patterson http://iram.cs.berkeley.edu/istore 1999 Winter IRAM Retreat. ISTORE Runtime Software Architecture. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ISTORE Software Runtime Architecture

Slide 1

ISTORE Software Runtime Architecture

Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Jim Beck, John

Kubiatowicz, and David Patterson

http://iram.cs.berkeley.edu/istore

1999 Winter IRAM Retreat

Page 2: ISTORE Software Runtime Architecture

Slide 2

ISTORE RuntimeSoftware Architecture

• Runtime system goals for the ISTORE meta-appliance(1) Provide mechanisms that allow network service

applications to exploit introspection (monitor + adapt)

(2) Allow appliance designer to tailor runtime system policies and interfaces

• How the goals are achieved (1) Introspection: layered local and global runtime system

libraries that manipulate and react to monitoring data(2) Specialization: runtime system is extensible using

domain- specific languages (DSLs)

Page 3: ISTORE Software Runtime Architecture

Slide 3

Roadmap• Layered software structure• Example of introspection• Runtime system extensibility using DSLs• Conclusion

Page 4: ISTORE Software Runtime Architecture

Slide 4

Device Interface& RTOS

Device Interface& RTOS

Layered software structure

Device nodes

Front-end node(s)

SwitchLAN/WANHW Device

HW Device (NIC)

Local Runtime

Local Runtime

DistributedGlobal RuntimeDist

ribut

ed G

lobal

Runtim

e DistributedGlobal Runtime

Parallel App.Worker Code Pa

ralle

l App

. Wor

ker C

ode Application Front-

End Code

Page 5: ISTORE Software Runtime Architecture

Slide 5

Device Interface& RTOS

Device Interface& RTOS

Device Interface Layer

Device nodes

Front-end node(s)

SwitchLAN/WANHW Device

HW Device (NIC)

Page 6: ISTORE Software Runtime Architecture

Slide 6

Device interface layer• Microkernel OS modules• Traditional OS services

– Networking, mem management, process scheduling, threads, …

• Device-specific monitoring– Raw access patterns– Utilization statistics– Environmental parameters– Indications of impending failure

• Self-characterization of performance, functional capabilities

Page 7: ISTORE Software Runtime Architecture

Slide 7

Device Interface& RTOS

Device Interface& RTOS

Local runtime layer

Device nodes

Front-end node(s)

SwitchLAN/WANHW Device

HW Device (NIC)

Local Runtime

Local Runtime

Page 8: ISTORE Software Runtime Architecture

Slide 8

Local runtime layer• Non-distributed mechanisms needed by network

service applications• Feeds information to global layer or performs local

operations on behalf of global layer• Example mechanisms

– Application-specific filtering/aggregation of device monitoring data

» Example: OLTP server vs. DSS server– Data layout and naming

» Example: record-based interface for DB, file-based for web server

– Device scheduling» Example: maximize TPS vs. maximize disk bandwidth utilization

– Caching» Coherence essential vs. coherence unnecessary» More efficient caching implementation possible in second case

Page 9: ISTORE Software Runtime Architecture

Slide 9

Device Interface& RTOS

Device Interface& RTOS

Global runtime layer

Device nodes

Front-end node(s)

SwitchLAN/WANHW Device

HW Device (NIC)

Local Runtime

Local Runtime

DistributedGlobal RuntimeDist

ribut

ed G

lobal

Runtim

e DistributedGlobal Runtime

Page 10: ISTORE Software Runtime Architecture

Slide 10

Global runtime layer• Aggregate, process, react to monitoring data• Relies on local per-device runtime mechanisms to

provide monitoring data, implement control actions• Provides application interface that hides distributed

implementation of runtime services• Example services

– High-level services» Load balancing: replicate and/or migrate heavily used data

objects when a disk becomes over-utilized» Availability: replicate data from failed or failing component to

restore required redundancy» Plug-and-play: integrate new devices into the system

– Low-level services used to implement high-level global services

» Distributed directory tracks data and metadata objects» Migration, replication, caching» Inter-brick communication» Distributed transactions

Page 11: ISTORE Software Runtime Architecture

Slide 11

Device Interface& RTOS

Device Interface& RTOS

Distributed application worker code

Device nodes

Front-end node(s)

SwitchLAN/WANHW Device

HW Device (NIC)

Local Runtime

Local Runtime

DistributedGlobal RuntimeDist

ribut

ed G

lobal

Runtim

e DistributedGlobal Runtime

Parallel App.Worker Code Pa

ralle

l App

. Wor

ker C

ode

Page 12: ISTORE Software Runtime Architecture

Slide 12

Distributed application worker code

• Runs on top of global runtime system• Written by appliance designer• Application-specific

– Database» scan, sort, join, aggregate, update record, delete

record, ...– Transformational web proxy

» fetch web page (from disk or remote site), apply transformation filter, update user preferences database, ...

• System administration tools implemented at this level

– Customized runtime system defines administrative interface tailored to application

Page 13: ISTORE Software Runtime Architecture

Slide 13

Device Interface& RTOS

Device Interface& RTOS

Application front-end code

Device nodes

Front-end node(s)

SwitchLAN/WANHW Device

HW Device (NIC)

Local Runtime

Local Runtime

DistributedGlobal RuntimeDist

ribut

ed G

lobal

Runtim

e DistributedGlobal Runtime

Parallel App.Worker Code Pa

ralle

l App

. Wor

ker C

ode Application Front-

End Code

Page 14: ISTORE Software Runtime Architecture

Slide 14

Application front-end code• Runs on front-end interface bricks • Accepts requests from LAN/WAN connection

– Incoming requests made using standard high-level protocols

» HTTP, NFS, SQL, ODBC, …• Invokes and coordinates appropriate worker code

components that execute on internal blocks– Takes into account locality and load balancing– Database: front-end performs SQL query optimization,

invokes distributed relational operators on data storage devices

– Transformational proxy: front-end invokes distiller thread on appropriate device brick

» if data is cached, invoke on disk node » otherwise, fetch data from web and invoke on compute node

or disk node

Page 15: ISTORE Software Runtime Architecture

Slide 15

Roadmap• Layered software structure• Example of introspection• Runtime system extensibility using DSLs• Conclusion

Page 16: ISTORE Software Runtime Architecture

Slide 16

From introspection to adaptation

• Example: slowly-failing data disk in large DB system(1) Detect problem(2) Repair problem while continuing to handle

incoming requests(3) Return to normal system operation

Intelligent HWcomponents

Continuous monitoring

Extensible, application-tailored runtime system

Adaptive, self-maintaining appliance+

Page 17: ISTORE Software Runtime Architecture

Slide 17

Failing disk: detection• Microkernel monitoring module continuously

monitoring disk’s health detects exceptional condition, e.g.

– ECC failures– Media errors– Increased rates of ECC retries

• Notifies global fault handling mechanism

Page 18: ISTORE Software Runtime Architecture

Slide 18

Failing disk: reaction• Global fault handling mechanism…

– Prevents system from sending more work to failed device

» Modifies global directory to remove entries corresponding to failed component’s data

– Application-specific response to impending failure» Transactional system: discard work currently in

progress on failing device, reissue to another data replica

» Non-transactional system w/o coherent replicas: checkpoint computation, restore on another data replica

» Transformational web proxy: do nothing– Instruct disk runtime system to shut disk down

» Disk device is considered failed

Page 19: ISTORE Software Runtime Architecture

Slide 19

Failing disk: return to normal operation

• Global fault handling mechanism...– Rebuilds data redundancy

» By allocating space for a new replica on a functioning disk and copying data to it from existing replicas

» Using an application-specific data replication mechanism•Where to allocate new replicas, how to copy data,

how to lay out data for new replicas, how to update global directory

•Example in upcoming slide• Life returns to normal

– Degree of fault-tolerance has been restored• Failed component can be replaced during

regularly-scheduled maintenance

Page 20: ISTORE Software Runtime Architecture

Slide 20

Roadmap• Layered software structure• Example of introspection• Runtime system extensibility using DSLs• Conclusion

Page 21: ISTORE Software Runtime Architecture

Slide 21

Runtime system extensibility• Two ways of looking at system

– Partitioned on functional/mechanism boundaries» Collection of libraries: failure detection, transactions, ...» Mechanisms are isolated

application

libfail

librepl

libtrxn

libcache

OS

Page 22: ISTORE Software Runtime Architecture

Slide 22

Runtime system extensibility• Two ways of looking at system

– Partitioned on functional/mechanism boundaries» Collection of libraries: failure detection, transactions, ...» Mechanisms are isolated

– Partitioned on global system properties» This is how the programmer thinks about the system (high-

level)» e.g. application-specific data availability policy

•Failure detection (which devices to monitor, …) •Replication (used to restore redundancy)•Transactions (how to restart work in progress)•Caching (how to handle dirty cached objects during failure)

application

libfail

librepl

libtrxn

libcache

OS

Page 23: ISTORE Software Runtime Architecture

Slide 23

Runtime system extensibility• Two ways of looking at system

– Partitioned on functional/mechanism boundaries» Collection of libraries: failure detection, transactions, ...» Mechanisms are isolated

– Partitioned on global system properties» This is how the programmer thinks about the system (high-

level)» e.g. application-specific data availability policy

•Failure detection (which devices to monitor, …) •Replication (used to restore redundancy)•Transactions (how to restart work in progress)•Caching (how to handle dirty cached objects during failure)

application

libfail

librepl

libtrxn

libcache

OS

policy

compiler

Customized runtimesystem library

Page 24: ISTORE Software Runtime Architecture

Slide 24

Extensibility using DSLs• DSLs are languages specialized for a particular task• Each ISTORE DSL

– Encapsulates high-level semantics of one system behavior– Allows declarative specification of

» Behavior of one aspect of the system (a “policy”)» Interfaces to coordinated mechanisms that implement the policy

– Is compiled into an implementation that might coordinate several local and/or global base runtime system mechanisms

» May be implemented as background and/or foreground tasks• Analysis tools can potentially infer unspecified

emergent system behaviors from the specifications– e.g. what impact will a new redundancy policy have on

transaction commit time• Extensions compiled together with local and global

base mechanisms form the distributed runtime system

Page 25: ISTORE Software Runtime Architecture

Slide 25

Extensibility using DSLs: Example

Avail::FailureDetected(Device d) {Object o; ObjList objs;Transaction t; TxnList txns;Replica x, c, r;

Directory::MarkDeviceDisabled(d);Admin::AlertFailure(d);objs = Directory::GetObjects(d); objs stored on failed deviceforeach o (objs) {x = Directory::GetReplica(o,d) find o’s replica on dDirectory::DeleteReplica(x); delete from global directorytxns = Txn::GetActiveTxns(x);foreach t (txns) {Txn::AbortTxn(t); abort pending txns for o on d }c = Directory::GetReplica(o); find still-accessible copyr = LoadBalancer::AllocateReplica(o); get space for new replLocalRuntime::CopyObject(c,c->device,r,r->device); copy itDirectory::AddReplica(r,r->device); update directoryforeach t (txns) {Txn::IssueTxn(txn,r); reissue txns on new replica}}

}

Page 26: ISTORE Software Runtime Architecture

Slide 26

Extensibility using DSLs (cont.)• Similar specification written for each

extension to base library• Other examples of extensible system

behaviors– Transaction response time requirements– Prioritizing operations based on type of data

processed– Resource allocation– Backup policy– Exported administrative interface

Page 27: ISTORE Software Runtime Architecture

Slide 27

Why use DSLs?• Possible choices

– Each appliance designer writes runtime system from scratch

» Similar to exokernel operating systems– All designers use single parameterized runtime system

library» Similar to tunable kernel parameters in modern OSs

– Designer writes high-level specification of system behavior

» DSL compiler automatically translates specification into runtime system extensions that coordinate base mechanisms

» Advantages include•Programmability•Performance•Reliability, verifiability, safety•Artificial diversity

Page 28: ISTORE Software Runtime Architecture

Slide 28

DSL advantages (cont.)• Programmability

– High-level specification close to designer’s abstraction level

» Easier to write, reason about, maintain, modify runtime system code

» Simple enough to allow site-specific customization at installation time

• Performance– Aggressive DSL compiler can take advantage of high-

level semantics of specification language– Base library mechanisms can be highly optimized;

optimization complexity hidden from appliance designer

– Web example: infer that TCP checksums should be stored with web pages

Page 29: ISTORE Software Runtime Architecture

Slide 29

DSL advantages (cont.)• Reliability

– Automatically generate code that’s easy to forget or get wrong

» Example: synchronization operations to serialize accesses to distributed data structure

• Verifiability– Of input code (DSL specification)

» More abstract form of semantic checking» e.g. DSL supports types natural to behavior being specified =>

type-checking verifies some semantic constraints•e.g. “ensure no unencrypted objects are written to disk”

– Of output code (coordinated use of base mechanisms)» DSL compiler writer satisfied DSL compiler is correct =>

appliance designer inherits verification effort• Safety (prevent runtime errors)

– Whole classes of general programming errors not possible» DSLs hide details: runtime memory management, IPC, ...» Compiler automatically adds code: synchronization, ...

Page 30: ISTORE Software Runtime Architecture

Slide 30

DSL advantages

(cont.)

• Artificial diversity– Potentially allow system to continue operation in face of

internal bugs or malicious attack» Multiple implementations of component run simultaneously on

different data replicas» Continuously check each other with respect to high-level behavior» Non-traditional fault-tolerance, but related to process pairs

– Potentially usable to enhance performance» Select best-performing implementation(s) for future use;

periodically reevaluate choice– Examples of possible implementation differences

» Low-level: runtime memory layout, code ordering and layout» High-level: system resource usage (recompute vs. use stored

data, general space/time/bandwidth tradeoffs)

Specification

DSL compiler

Implementation 1 Implementation 2 Implementation 3 ...

Page 31: ISTORE Software Runtime Architecture

Slide 31

ISTORE software summary• ISTORE software architecture provides an

extensible runtime environment for distributed network service application code

– Layered local and global mechanism libraries provide introspection and self-maintenance

– Mechanisms can be customized using DSL-based specifications of application policy

» DSL code coordinates base mechanisms to implement application semantics and interfaces

» DSL-based extension offers significant advantages in programmability, performance, reliability, safety, diversity

Page 32: ISTORE Software Runtime Architecture

Slide 32

ISTORE summary• Network services are increasing in importance

– Self-maintaining scaleable storage appliances match the needs of these services

• ISTORE provides a flexible architecture for implementing storage-based network service apps

– Modular, intelligent, fault-tolerant hardware platform is easy to configure, scale, and administer

– Runtime system allows applications to leverage intelligent hardware, achieve introspection, and provide self-maintenance through

» Layered runtime software structure» DSL-based extensibility that allows easy application-

specific customization

Page 33: ISTORE Software Runtime Architecture

Slide 33

Agenda• Overview of ISTORE: Motivation and

Architecture• Hardware Details and Prototype Plans • Software Architecture• Discussion and Feedback

Page 34: ISTORE Software Runtime Architecture

Slide 34

Backup slides

Page 35: ISTORE Software Runtime Architecture

Slide 35

What ISTORE is not• An extensible operating system

– Use commodity OS, only add hardware monitoring module» MM could just be a device driver => no need for microkernel OS» ISTORE could be built on top of an extensible operating system

for even greater flexibility• An attempt to make commodity OS’s extensible

– Extensible runtime system allows designer to customize higher-level operations than OS extensions do

– Closest to an extensible distributed operating system built on top of a commodity single-node operating system

• A multiple-protection-domain system– Assumes non-malicious programmer– If user-downloaded code permitted, sandbox must be

implemented as part of (trusted) application– DSLs specify resource allocation/scheduling policies,

appliance designer responsible for ensuring fairness• A framework for building generic servers

Page 36: ISTORE Software Runtime Architecture

Slide 36

ISTORE boot process(1) Initially, undifferentiated ISTORE system(2) On boot, each device block contacts system

boot server

(3) Device blocks download customized runtime system and application worker code

– Front-end blocks also download application front-end code

• Runtime system libraries structured as shared libraries => hot upgrade

Page 37: ISTORE Software Runtime Architecture

Slide 37

Example Appliances• E-commerce• Web search engine• Transformational web/PDA proxy• Election server• Mail server• News server• NFS server• Database server: OLTP, DSS, mixed OLTP-DSS• Video server