will computer systems with performance guarantees ever go ...keynote. hase 2014. miami, fl, usa. jan...

Will Computer Systems With Performance

Guarantees Ever Go Mainstream?

Keynote. HASE 2014.

Miami, FL, USA. Jan 10, 2014

Juan A. Colmenares, Ph.D. 1

Will Computer Systems With Performance Guarantees

Ever Go Mainstream?Keynote

Presented at the 15th IEEE International Symposium on High Assurance Systems Engineering (HASE 2014)

January 10, 2014 / Miami, Florida, USA

Juan A. ColmenaresComputer Science Laboratory (CSL)

Samsung Research America – Silicon Valley (SRA-SV)[email protected]

Disclaimer

• No part of this presentation necessarily represents the views and opinions of my current and former employers or my research collaborators.



Keynote. HASE 2014.



Introduction

• Performance guarantees are key in mission-critical, cyber-physical systems

– Considered a ultra-specialized area• Average performance

– Today’s common figure of merit for software systems• For 10+ years, sustained demands for high-quality

multimedia applications– Multi-party video conference and video/audio on demand– Typical motivation for (probabilistic) performance

guarantees• Now, Internet-based service providers start to show

interest in offering predictably responsive interactive services

– To differentiate themselves from the competition – To retain existing customers/users and attract new ones

Introduction

• Developing distributed computing systems with performance guarantees is

– Harder – More expensive – More time consuming

• We just do it when it is strictly necessary• Naturally we defer such hard problems until there is

no other choice but to face them– Notable recent example: parallel computing



Keynote. HASE 2014.



Questions

• Will current trends force us to develop massively used computer systems with some type of performance guarantees?

• And if so, are we prepared?

Mainstream Applications and SystemsSome Targets

• Currently or expected to become popular (used by millions)

• With clear demands for performance guarantees• Developed and supported by multiple, large teams

http://www.thinkgig.com http://www.isisingenieria.com

Data Centers (Cloud Computing)

NetworkedSensors and Actuators

[ Phrase borrowed from Prof. Jan Rabaey, UC Berkeley Swarm Lab ]



Keynote. HASE 2014.



Cloud ComputingA Target for Performance Guarantees

• Run in data centers – Serving millions of people

• Examples of what to guarantee– Their contributions to the service response

times experienced by users– Throughput for media content delivery

Web Search

www.adobe.com

thinkjudd.com

Media Content Delivery

Networked Sensors and ActuatorsA Target for Performance Guarantees

• Some apps are basically control systems, wired and/or wireless

• Deployed in the environment• Examples of what to guarantee

– Response time latencies of critical actions

• Some apps with strict requirements

Autonomous Cars

From article by Tom Vanderbilt (Feb 2012)http://www.wired.com/magazine/2012/01/ff_autonomouscars/

Amazon Prime Air Rotorcraft

Sourc

e: H

onda

Robotic Assistants

http://www.amazon.com



Keynote. HASE 2014.



Swarm of Devices at the Edge of the Cloud

[ Prof. Jan Rabaey, ASPDAC’08 ]

Infrastructuralcore

The Cloud Mobile Access & Relay

The Swarm

Swarm of Devices at the Edge of the Cloud

http://www.wired.com

www.popsci.com.au

The Cloud



Keynote. HASE 2014.



Smart Homes and SpacesSwarm of Devices at the Edge of the Cloud

http://www.corning.com

http://www.corning.com

http://www.samsung.com

Samsung Smart Home Corning’s A Day Made of Glass

Smart CitiesFocus of the TerraSwarm Research Center[TS12]

• Meant to handle two cases– Normal operation and disasters

• Integrate– Fixed infrastructure

• e.g., environmental monitoring, energy-usage, tracking and mapping

– Mobile assets (automatic vehicles, UAVs, robots) – Immersive humans

• Cloud as a companion, but data locality is key for latency

[TS12] Lee et al. The TerraSwarm Research Center (TSRC) (A White Paper). Tech Report No. UCB/EECS-2012-207



Keynote. HASE 2014.



Smart CitiesSwarm of Devices at the Edge of the Cloud

• Some initiatives in industry and academia– IBM’s The Smarter City– Schneider Electric’s Smart Cities Solution– TerraSwarm Research Center @ UC Berkeley– Center for Urban Science + Progress (CUSP) @ NYU

http://www.schneider-electric.com

Clear Demands for Performance Guarantees May Not Be Enough

• GUIs should provide response-time guarantees to users– At least for some meaningful actions

• But I don’t expect major improvement here soon

Popularity in decline, so no major interest in improving guarantees of desktop GUIs

Popular indeed , but battery life is a much more pressing issue

Uprising, but similar battery-life issue

Desktops Mobile Devices(e.g., smart phones

and tablets)

Wearable Devices(e.g., smart watches

and glasses)

GUI hardware acceleration is key in keeping response times low

Google Glasses

Samsung Gear



Keynote. HASE 2014.



What Performance Guarantees Are We Talking About?

• We often seek guarantees on:– Throughput (e.g., requests per second)– Latency to response (e.g., service time)

• Other interesting performance metrics– Energy and power consumption (e.g., energy/power

budget)– Time to recovery (e.g., guaranteed maximum

recovery time)

• What type of guarantees?– Probabilistic with high confidence (mostly)

• Often easier targets than hard guarantees• Leave more room for tradeoffs instead of largely

overprovisioning for the infrequent worst cases

Our focus today

Any Performance Guarantees Offered by Public Clouds?

• None so far– At least by 3 major cloud providers

• Service Level Agreements (SLAs) are only about availability and accessibility

– e.g., monthly availability > 99.95%; otherwise, you get service credits

• Will competition have any effect?

https://cloud.google.comhttp://aws.amazon.com http://www.windowsazure.com



Keynote. HASE 2014.



I/O Bandwidth Provisioning in Amazon Elastic Block Store (EBS)

Signs of Improvement in Public Clouds

• Amazon EBS offers volumes– Durable, block-level storage devices

• For a virtual-machine instance, an EBS volume appears as a native block device similar to a hard drive

• Provisioned IOPS volumes– Offer consistent performance for I/O-intensive workloads

(e.g., databases) in Amazon EC2– Designed to deliver within 10% of the specified IOPS rate

99.9% of the time• But this is NOT part of any SLA!

– IOPS rate up to 4000 IOPS per volume– Volume sizes from 10 GB to 1 TB

• Possibly inspired by– Gulati et al. mClock: Handling throughput variability for

hypervisor IO scheduling. OSDI 2010.

Source: http://aws.amazon.com/ebs/piops/

SolidFire’s All-Flash Storage Infrastructure with QoS


• At full scale (100 nodes) able to deliver– 3.4PB of effective capacity – 7.5 million IOPS

• Reduced cost– Below $3/GB and below $1/IOPS (60TB to 3.4PB)

• Below the cost of traditional performance disk solutions

• Able to guarantee performance to thousands of volumes within a shared storage

– Pending patent on QoS capabilities

http://www.solidfire.com

July 2013



Keynote. HASE 2014.



Allocated Bandwidth for Streaming Servers in Windows Azure


• Media Services enable creation, management, and distribution of media

– e.g., encoding and on-demand streaming

• Reserved Units (RUs) – Dedicated set of resources for media processing tasks– Highly recommended for on-demand streaming

• Actually, availability SLA only valid with RUs

• Each RU provides bandwidth up to 200 Mbps for streaming origin servers

– Bandwidth allocation NOT part of any SLA– Availability SLA only applies when using <= 80% of available

bandwidth

Source: http://www.windowsazure.com/en-us/support/legal/sla/

Research Efforts Clear Interest in Improving

• Barker and Shenoy (UMass). Empirical evaluation of latency-

sensitive application performance in the cloud. MMSys 2010.– Focus on interference of dynamically varying background load on

latency-sensitive tasks

– Careful configurations mitigate, but do not eliminate interference

• Dean and Barroso (Google). The tail at scale. CACM 2013– Latency tail-tolerant software techniques to build predictable systems

out of less predictable parts

• Ferguson et al. (MSR) Jockey: Guaranteed job latency in data

parallel clusters. EuroSys 2012– Latencies guarantees for parallel data processing jobs using a resource

allocation control loop

• Terry et al (MSR). Consistency-based service level agreements

for cloud storage. SOSP 2013– A replicated key-value store that allows applications to declare their

consistency and latency priorities via consistency-based SLAs



Keynote. HASE 2014.



So Far …

• In cloud computing– No public offerings with high-confidence

performance guarantees available• As far as we can tell

– How about private offerings?

• In some apps with networked sensors and actuators

– Clear requirements of performance guarantees due to safety

• Future swarm applications– A number will require performance guarantees– Also need to interact with the Cloud

Design Principles and Techniques to Build Software Systems with High-Confidence

Performance Guarantees

• Already available for system developers to adopt them

• Some challenges– Performance guarantees considered less important than

other requirements• Not perceived as a differentiating factor

– Additional complexity– End-to-end properties

• Multi-layered factors • A piece-by-piece game with distributed responsibility

– Input dependent– Cost effectiveness

• Especially considering the investment in existing systems

– Trained workforce



Keynote. HASE 2014.



Design Principles and Techniques to Build Software Systems with High-Confidence

Performance Guarantees

• Next we will discuss– Divide-and-conquer design principle– Limiting system load– Mitigating performance variability

Divide and Conquer

• Systems should be built to enable systematic evaluation (via analysis and/or measurements) of:

– The individual contributions of factors that make up the system’s performance

– The effects of the combination of those factors on the system’s performance

• This design principle is key– Systems include multiple components or sub-systems– Performance guarantees of interest are usually end-to-end– Multiple factors influence system’s performance

• e.g., architectural features, algorithmic efficiency, task scheduling, memory management, I/O behavior, thread affinity, cache locality, etc.

• But too many factors can influence performance– To make it practical, we should consider the most

important ones for the system in hand



Keynote. HASE 2014.



Performance Decoupling of System ComponentsEnabling Divide and Conquer

• Extension of software componentization to performance aspects

– Software components are used to divide the system’s logic in parts of manageable complexity

• The idea is to evaluate the contributions of individual components to the system’s performance

• KV-Cache[UCC13]

–Hash table coupled with a replacement logic

–Exploits a software absolute zero-copy approach and aggressive customization to offer high performance

An In-Memory Key-Value CacheExample of Performance Decoupling and Customization

Comm & Mem Mgmt Layer (10G NIC Driver + UDP + Mem Pools)

Hash Table

(with Fine Grain Locks)

Non-Blocking Queue-based CLOCK

(Replacement Logic)

Non-Blocking Queue-based CLOCK

(Replacement Logic)

Application Layer

No

n-B

lock

ing

C

ha

nn

els

> Decoupling <

[UCC13] Waddington, Colmenares, Kuang and Song. KV-Cache: A scalable high-performance web-object caching for manycore. 6th IEEE/ACM Int’l Conference on Utility and Cloud Computing. Implemented on Genode/Fiasco.OC µkernel



Keynote. HASE 2014.



In-Memory Web-Object CachingOverview

• Widely used by Internet-based service providers to reduce latency and increase system throughput

– Memcached: a popular example

• www.memcached.org

Typical Side-Cache Deployment

A De Facto Figure of Merit: Capacity [IGCC11, BagLRU]

Maximum throughput (in RPS) the system can sustain with an average round-trip time (RTT) below 1 ms

[BagLRU] Wiggins and Langston. Enhancing the scalability of memcached. Intel Tech. Rep. 2012

(http://software.intel.com/en-us/articles/enhancing-the-scalability-of-memcached-0)

[IGCC11] Berezecki et al. Manycore key-value store. Proc. of the 2011 Int’l Green Computing Conference. 2011.

Experimental ResultsKV-Cache vs. Intel’s Bag-LRU Memcached[BagLRU]

Latency comparison for one million GET requests at 600K RPS (a slow rate)

Throughput comparison with average round-trip time < 1ms

2x

[BagLRU] Wiggins and Langston. Enhancing the scalability of memcached. Intel Tech. Rep. 2012



Keynote. HASE 2014.



An In-Memory Key-Value CacheExample of Performance Decoupling and Customization

• We could have a stricter figure of merit (stronger guarantees)

– Maximum throughput (in RPS) the system can sustain with

• A target RTT of 1 msobserved on average, and

• No more than 0.1% of late responses, arriving after the target RTT

Round-trip time distribution at 3 million RPS for a single NIC

(3 million GET requests)

KV-Cache[UCC13] never exceeded the target round-trip time of 1 ms!

[UCC13] Waddington, Colmenares, Kuang and Song. KV-Cache: A scalable high-performance web-object caching for manycore. 6th IEEE/ACM Int’l Conference on Utility and Cloud Computing.

Space-Time PartitioningEnabling Divide and Conquer

Time

Spa

ce

Yellow partition grows due to adaptation

Spatial Partition: Key for performance isolation•Hard boundaries and

controlled communication between partitions

Spatial partitioning is not static and may vary over time•Partitions can be time multiplexed;

resources are gang-scheduled•Partitioning adapts to system’s needs

• Each partition receives a vector of basic resources– A number of hardware threads, memory pages, a portion of

cache segments, memory bandwidth, and energy budget• A partition may also receive

– Exclusive access to other resources (e.g., a device)– Guaranteed fractional services from other partitions



Keynote. HASE 2014.



Space-Time PartitioningEnabling Divide and Conquer

Time

Spa

ce


Spatial Partition: Key for performance isolation•Hard boundaries and

controlled communication between partitions

Spatial partitioning is not static and may vary over time•Partitions can be time multiplexed;

resources are gang-scheduled•Partitioning adapts to system’s needs

• Each partition receives a vector of basic resources– A number of hardware threads, memory pages, a portion of

cache segments, memory bandwidth, and energy budget• A partition may also receive

– Exclusive access to other resources (e.g., a device)– Guaranteed fractional services from other partitions

Controlled multiplexing is key

The Cell: Our Partitioning AbstractionUser-level Software Container

with Guaranteed Access to Resources

2nd-level Scheduling

2nd-level Mem Mgmt

Address Space A

Address Space B

Cell A

Task

Time

Spa

ce

Cell B

• Basic properties of cells– Full control over resources it

owns when mapped to hardware

– One or more address spaces (protection domains)

– Efficient inter-cell communication channels


2nd-level runtime must be adaptive, too



Keynote. HASE 2014.



Basis of a Component-based Modelwith Composable Performance

• Applications = Set of interacting components deployed on different cells

– Applications split into performance-incompatible and mutually distrusting cells with controlled communication

– OS Services are independent servers that provide QoS• Requires fast inter-cell communication

– Could use hardware acceleration for fast messaging

Application Component

DeviceDrivers

FileService

Real-time Cell

Core Application

Parallel Library

Channel

Channel

Storage Device

• Available preemptive schedulers– Round-robin (and pthreads) – EDF and Fixed Priority– Multiprocessor Constant Bandwidth

Server (M-CBS) [ECRTS’04]

– Juggle: A load balancer for SPMD applications [CLUSTER’12]

• Able to handle cell resizing Tessellation KernelTessellation Kernel

(Partition Support)

Application

Cell

[ECRTS’04] S. Baruah et al. Executing aperiodic jobs in a multiprocessor

constant-bandwidth server implementation. ECRTS'04.

[CLUSTER’12] S. Hofmeyr, J. Colmenares et al. Juggle: Addressing extrinsic

load imbalances in SPMD applications on multicore computers. Cluster

Computing Journal.

PULSE Framework

Scheduler X

Hardware cores

Timer interrupts

Customizable User-Level RuntimesPULSE: A framework for

Preemptive User-Level SchEdulers



Keynote. HASE 2014.



• Supports reservations (i.e., differentiated service classes) and proportional share of bandwidth

– Using mClock scheduling algorithm [OSDI’10] (on top of PULSE)• NIC driver is entirely contained in user-space

– No system calls when transmitting and receiving buffers

[DAC’13] Colmenares, et al. Tessellation: Refactoring the OS around explicit resource containers with continuous adaptation.

[JAES’13] Colmenares, et al. A multi-core operating system with QoS-guarantees for network audio applications.

[OSDI’10] A. Gulati et al. mClock: handling throughput variability for hypervisor IO scheduling.

Network ServiceAn OS Service with QoS Guarantees[DAC’13, JAES’13]

(Avg. throughput = 125.2 KB/s)

A Divide and Conquer Approach to Deriving Time Bounds

Analytically derived execution-time

bounds of functions

H+Execution-time

measurements of functions

Tight execution-time bounds of functions

Analytically derived service-time

bounds of functions

H+ Service-time measurements of

functions

Tight service-time bounds of functions

Combine via a hybrid approach

0.2

0.4

0.6

0.8

1.0

time

0.2

0.4

0.6

0.8

1.0

time

Analytically

Derived Bound

Max. Observed

Value

Adopted Bound

Pessimistic Optimistic

Individual

functions

running in

isolation

Concurrent function-

executions, resource

sharing, and communication

activities

[IESS09] Colmenares et al. Experimental evaluation of a hybrid approach for deriving service-time bounds of methods in

real-time distributed computing objects. Proc. Int'l Embedded Systems Symposium 2009.



Keynote. HASE 2014.



Basic Approaches for Deriving Time Bounds

Static Analysis Approaches

Measurement-based Approaches

Hard bound with a practically zero probability of being violated at run

time

Tend to produce excessively loose bounds when applied to modern fully-featured processors

Maximum measured execution-time value

Safety Margin

Soft bound with a non-negligible

probability of being exceeded at run time

May not cover the worst-case

Basic Approaches for Deriving Time Bounds

Static Analysis Approaches

Measurement-based Approaches

Hard bound with a practically zero probability of being violated at run

time

Tend to produce excessively loose bounds when applied to modern fully-featured processors

Maximum measured execution-time value

Safety Margin

Soft bound with a non-negligible

probability of being exceeded at run time

May not cover the worst-case

We want a tight time bound in between.

But how to determine the safety margin?



Keynote. HASE 2014.



Curve Fitting TechniqueCentral to the Hybrid Approach

• Combines (1) measurements and (2) loose but analytically-derived hard bounds to produce reasonably safe and tight time bounds

α

Margin value

Probability of the soft bound being exceeded

at run-time

A Televideo Application

Display Windows

Video Streams

Performance Metric Reports(feedback)

Remote

User

Local

User

Remote

User

Local

User

Node 1 Node 2

TMOSM

OS/HW Platform

Network

TVTMO TVTMO

TMOSM

OS/HW Platform

Network Performance Metrics

• Throughput (at the application level)

• Message loss rate

• End-to-end delay



Keynote. HASE 2014.



A Televideo Application

Frame size: 320 x 240 Frame rate: 10 fpsColor depth: 24 bitsCODEC: MPEG-4 (implementation FFMpeg)

Obtaining a Tight Service Bound for a Function via the Hybrid Approach

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60

Time (ms)

Est

imat

ed P

roba

bilit

y

CDF Richards Model

Analytically Derived Bound

54 ms30 ms

Adopted Bound



Keynote. HASE 2014.



Limiting System Load

• We can only guarantee performance under certain load limits and conditions (i.e., input)

Example

Avg

.

100% GET

requests

On-Line Admission ControlLimiting System Load

• Hey system! Can you guarantee performance X for this job?• Some possible answers

– Sure, no problema! [The rare happy case]– Yes, but let me put order in the house

• Possible downgrading and revocation

– Nope. I am sorry. Bye.– Nope, but let’s negotiate a little bit

• No with performance X, but with performance Y. Is this OK with you?

• Typical issues– Computational cost– Reduction in effective system utilization due to pessimism in the

analysis

• Some efforts to deal with those issues– Nie et al. Capacity-based admission control for mixed periodic

and aperiodic real time service processes. SOCA 2011.



Keynote. HASE 2014.



Load Regulation and ShapingLimiting System Load

• Limit request rate or progress rate– Maximum number of requests in a given interval

period, or maximum inter-arrival rate (MIR)

• Leaky bucket– Classic textbook example of traffic shaping

• Handling excess of work– Queue requests, and drop if too many– Tradeoff content quality

• Good-enough in-time content can be better than late content

Mitigating Performance Variability

• Computer systems (architectures, networking, and software) are often built favoring average performance over performance predictability

– e.g., multi-level caches and deep pipelines with dynamic dispatch and speculative execution

• Often in practice, building the system from scratch to remove/reduce unpredictability is not economically feasible

– So, to learn to live with it, we need. [Yoda!]

• Common technique: Overprovisioning

However, some are trying to reintroduce timing predictability and repeatability from the ground up for safety-critical systems• Precision Timed (PRET) Machines @ UC Berkeley

[http://chess.eecs.berkeley.edu/pret/]• Time-Predictable Multi-Core Architecture for Embedded Systems

(T-CREST) -- An EU Research Project [http://www.t-crest.org/]



Keynote. HASE 2014.



Mitigating Latency VariabilityIn Data Centers [CACM13]

• Issue the same request to multiple replicas and use the first response you get (hedged requests)

– Copies of the same request are sent with a short delay among them

– The client cancels outstanding requests once it gets the response

• Requests sent to multiple servers and the servers do cross-server status updates (tied requests)

– e.g., a server sends cancelations to others once is starts servicing the request

• Can reduce latency with modest load increase– If causes of variability do not simultaneously affect the

replicas

[CACM13] Dean and Barroso (Google). The tail at scale. Communications of the ACM. 2013.

Mitigating Latency VariabilityIn Data Centers [CACM13]

• Latency-induced probation

– In some situations the system performs better by

excluding a particularly slow machine and putting it

on probation

• Slowness is often caused by temporary phenomena

• Interesting point

– Removal of serving capacity from a live system

during periods of high load actually improves

latency

[CACM13] Dean and Barroso (Google). The tail at scale. Communications of the ACM. 2013.



Keynote. HASE 2014.



Adaptive Resource AllocationA Complementary Technique

• Systems need to adapt to changes in the workloads (application and request mixes) and resource availability

• Number of efforts in this area:– Yang et al. Redline: First class support for interactivity in

commodity operating systems. OSDI 2008.– Padala et al. Automated control of multiple virtualized

resources. EuroSys 2009.– Hoffmann et al. SEEC: a general and extensible framework

for self-aware computing. Technical Report MIT-CSAIL-TR-2011-046.

– Sharifi at al. METE: meeting end-to-end qos in multicores through system-wide resource management. SIGMETRICS Perform. Eval. Rev., 39(1):13–24, June 2011.

Example Adaptive Control Loop

Application1

QoS-aware

Scheduler

BlockService

QoS-aware

Scheduler

NetworkService

QoS-aware

Scheduler

GUIService

Channel

Running System(Data Plane)

Application2

Channel

PerformanceReports

ResourceAssignments

Resource Allocation(Control Plane)

Partitioningand

Distribution

Observationand

Modeling

Cell Cell

Cell

[DAC13] Colmenares et al. Tessellation: refactoring the OS around explicit resource containers with continuous adaptation. DAC 2013.



Keynote. HASE 2014.



Other Complementary Techniques

• Workload characterization• Load balancing• Differentiating service classes• Managing background activities and synchronized

disruption• Software customization• High-precision global time

– e.g., Precision Time Protocol (PTP) -- IEEE 1588

Conclusions

• Current trends indicate that distributed software systems with performance guarantees are likely to

– Become very popular– Demand large number of software developers

• When!? • Obstacles

– Other requirements perceived as more urgent• Power and energy efficiency• Security and privacy• High availability

– Legal hurdles for motivating apps (e.g., autonomous cars)• Design principles and techniques are available

– But need to be adapted to the system in hand• Major challenges

– Cost effectiveness– Trained workforce



Keynote. HASE 2014.



THANKS

Questions?

will computer systems with performance guarantees ever go ...keynote. hase 2014. miami, fl, usa. jan...

Documents