bandwidth and latency optimizations

43
Bandwidth and latency optimizations Jinyang Li w/ speculator slides from Ed Nightingale

Upload: dillan

Post on 11-Jan-2016

44 views

Category:

Documents


2 download

DESCRIPTION

Bandwidth and latency optimizations. Jinyang Li w/ speculator slides from Ed Nightingale. What we’ve learnt so far. Programming tools Consistency Fault tolerance Security Today: performance boosting techniques Caching Leases Group commit Compression Speculative execution. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bandwidth and latency optimizations

Bandwidth and latency optimizations

Jinyang Li

w/ speculator slides from Ed Nightingale

Page 2: Bandwidth and latency optimizations

What we’ve learnt so far

• Programming tools• Consistency• Fault tolerance• Security• Today: performance boosting techniques

– Caching– Leases– Group commit– Compression– Speculative execution

Page 3: Bandwidth and latency optimizations

Performance metrics

• Throughput– Measures the achievable rate (ops/sec) – Limited by the bottleneck resource

• 10Mbps link: max ~150 ops/sec for writing 8KB blocks

– Increase tput by using less bottleneck resource

• Latency– Measures the latency of a single client response– Reduce latency by pipelining multiple operations

Page 4: Bandwidth and latency optimizations

Caching (in NFS)• NFS clients cache file content and directory name

mappings• Caching saves network bandwidth, improves latency

GETATTR

READ

READ

data

Page 5: Bandwidth and latency optimizations

Leases (not in NFS)• Leases eliminate latency in freshness check, at the cost of keeping

extra state at the server

READ

READ fh1

LEASE fh1, data

fh1: C1fh1: C1 INVAL fh1

WR

ITE

fh1

OK

Page 6: Bandwidth and latency optimizations

Group commit (in NFS)

• Group commit reduces the latency of a sequence of writes

COMMIT

WRITE

WRITE

Page 7: Bandwidth and latency optimizations

Two cool tricks

• Further optimization for b/w and latency is necessary for wide area– Wide area network challenges

• Low bandwidth (10~100Mbps)• High latency (10~100ms)

• Promising solutions:– Compression (LBFS)– Speculative execution (Speculator)

Page 8: Bandwidth and latency optimizations

Low Bandwidth File System

• Goal: avoid redundant data transfer between clients and the server

• Why isn’t caching enough?– A file with duplicate content duplicate

cache blocks– Two files that share content duplicate

cache blocks– A file that’s modified previous cache is

useless

Page 9: Bandwidth and latency optimizations

LBFS insights: name by content hash

• Traditional cache naming: (fh#, offset)• LBFS naming: SHA-1(cached block)Same contents have the same name

– Two identical files share cached blocks

Cached blocks keep the same names despite file changes

Page 10: Bandwidth and latency optimizations

Naming granularity • Name each file by its SHA-1 hash

– It’s rare for two files to be exactly identical– No cache reuse across file modifications

• Cut a file into 8KB blocks, name each [x*8K,(x+1)*8K) range by hash

– If block boundaries misalign, two almost identical files could share no common block

– If block boundaries misalign, a new file could share no common block with its old version

SHA-1(8K) SHA-1(8K)

Page 11: Bandwidth and latency optimizations

Align boundaries across different files

• Idea: determine boundary based on the actual content– If two boundaries have the same 48-byte

content, they probably correspond to the same position in a contiguous region of identical content

Page 12: Bandwidth and latency optimizations

Align boundaries across different files

ab9f..0a

ab9f..0a

87e6b..f5

87e6b..f5

Page 13: Bandwidth and latency optimizations

LBFS content-based chunking

• Examine every sliding window of 48-bytes• Compute a 2-byte Rabin fingerprint f of 48-

byte window• If the lower 13-bit of f is equal to v, f

corresponds to a breakpoint• 2 consecutive breakpoints define a “chunk”• Average chunk size?

Page 14: Bandwidth and latency optimizations

LBFS chunking

• Two files with the same but misaligned content of x bytes• How many fingerprints for each x-byte content? How

many breakpoints? Breakpoints aligned?

f1f2

f3

f4

f1f2

f3f4

Page 15: Bandwidth and latency optimizations

Why Rabin fingerprints?

• Why not use the lower 13 bit of every 2-byte sliding window for breakpoints?– Data is not random, resulting in extremely

variable chunk size

• Rabin fingerprints computes a random 2-byte value out of 48-bytes data

Page 16: Bandwidth and latency optimizations

Rabin fingerprint is fast

• Treat 48-byte data D as a 48 digit radix-256 number

• f47 = fingerprint of D[0…47]

= ( D[47] + 256*D[46] + … + 25646*D[1]+

…+25647*D[0] ) % q• f48 = fingerprint of D[1..48]

= ((f47 - D[0]*25647)* 256 + D[48] ) % q

A new fingerprint is computed from the old fingerprint and the new shifted-in byte

Page 17: Bandwidth and latency optimizations

LBFS reads

GETHASHFile not in cache

(h1, size1,

h2, size2,

h3, size3)

READ(h1,size1)

READ(h2,size2)

Ask for missing Chunks: h1, h2

Reconstruct fileas h1,h2,h3

Fetching missing chunksOnly saves b/w by reusing common cached blocks across different files or different versions of the same file

Page 18: Bandwidth and latency optimizations

LBFS writes

MKTMPFILE(fd)

CONDWRITE(fd, h1,size1, h2,size2, h3,size3)

HASHNOTFOUND(h1,h2)

TMPWRITE(fd, h1)TMPWRITE(fd, h2)

COMMITTMP(fd, target_fhandle)

Create tmp file fd

Reply with missing chunksh1, h2

Construct tmp file fromh1,h2,h3

copy tmp file contentto target file

Transferring missing chunkssaves b/w if different files or different versions of the same file have pieces of identical content

Page 19: Bandwidth and latency optimizations

LBFS evaluations

• In practice, there are lots of content overlap among different files and different version of the same file– Save a Word document– Recompile after a header change– Different versions of a software package

• LBFS results in ~1/10 b/w use

Page 20: Bandwidth and latency optimizations

Speculative Execution in a Distributed File System

Nightingale et al.

SOSP’05

Page 21: Bandwidth and latency optimizations

How to reduce latency in FS?

• What are potentially “wasteful” latencies?• Freshness check

– Client issues GETATTR before reading from cache– Incurs an extra RTT for read– Why wasteful? Most GETATTRs confirm freshness ok

• Commit ordering– Client waits for commit on modification X to finish before

starting modification Y– No pipelining of modifications on X & Y– Why wasteful? Most commits succeed!

Page 22: Bandwidth and latency optimizations

RPC Req

Client

RPC Resp

• Guarantees without blocking I/O!

Server

Block!2) Speculate!

1) Checkpoint

Key Idea: Speculate on RPC responses

3) Correct? Yes: discard ckpt.No: restore process & re-execute

RPC Req

RPC Resp

RPC Req

RPC Resp

Page 23: Bandwidth and latency optimizations

Conditions of useful speculation

• Operations are highly predictable

• Checkpoints are cheaper than network I/O– 52 µs for small process

• Computers have resources to spare– Need memory and CPU cycles for speculation

Page 24: Bandwidth and latency optimizations

Undo log

Implementing SpeculationProcess

Checkpoint Spec

1) System call2) Create speculation

Time

Page 25: Bandwidth and latency optimizations

Speculation Success

Undo log

Checkpoint

1) System call2) Create speculation

Process

3) Commit speculation

Time

Spec

Page 26: Bandwidth and latency optimizations

Speculation Failure

Undo log

Checkpoint

1) System call2) Create speculation

Process

3) Fail speculation

Process

Time

Spec

Page 27: Bandwidth and latency optimizations

Ensuring Correctness

• Speculative processes hit barriers when they need to affect external state– Cannot roll back an external output

• Three ways to ensure correct execution– Block– Buffer– Propagate speculations (dependencies)

• Need to examine syscall interface to decide how to handle each syscall

Page 28: Bandwidth and latency optimizations

Handle systems calls• Block calls that externalize state

– Allow read-only calls (e.g. getpid)– Allow calls that modify only task state (e.g. dup2)

• File system calls -- need to dig deeper– Mark file systems that support Speculator

getpid

reboot

mkdir

Call sys_getpid()

Block until specs resolved

Allow only if fs supports Speculator

Page 29: Bandwidth and latency optimizations

Output Commits

“stat worked”

“mkdir worked”

Undo log

Checkpoint

Checkpoint

Spec(stat)

Spec(mkdir)

1) sys_stat 2) sys_mkdir

Process

Time

3) Commit speculation

Page 30: Bandwidth and latency optimizations

Multi-Process Speculation

• Processes often cooperate– Example: “make” forks children to compile, link, etc.– Would block if speculation limited to one task

• Allow kernel objects to have speculative state– Examples: inodes, signals, pipes, Unix sockets, etc.– Propagate dependencies among objects– Objects rolled back to prior states when specs fail

Page 31: Bandwidth and latency optimizations

Spec 1Spec 1

Multi-Process Speculation

Spec 2

pid 8001

Checkpoint

Checkpoint

inode 3456

Chown-1

Write-1

pid 8000

CheckpointCheckpoint

Checkpoint

Chown-1

Write-1

Page 32: Bandwidth and latency optimizations

Multi-Process Speculation

• What’s handled:– DFS objects, RAMFS, Ext3, Pipes & FIFOs– Unix Sockets, Signals, Fork & Exit

• What’s not handled (i.e. block)– System V IPC– Multi-process write-shared memory

Page 33: Bandwidth and latency optimizations

Example: NFSv3 LinuxClient 1 Client 2Server

Open BGetattr

Modify BWrite

Commit

Page 34: Bandwidth and latency optimizations

Example: SpecNFS

Modify B

speculate

Getattr

Open Bspeculate

Open BGetattrspeculate

Write+Commit

Client 1 Client 2Server

Page 35: Bandwidth and latency optimizations

Problem: Mutating Operations

• bar depends on speculative execution of “cat foo”• If bar’s state could be speculative, what does client

2 view in bar?

Client 1

1. cat foo > bar

Client 2

2. cat bar

Page 36: Bandwidth and latency optimizations

Solution: Mutating Operations• Server determines speculation success/failure

– State at server is never speculative

• Clients send server hypothesis speculation based on– List of speculations an operation depends on

• Server reports failed speculations

• Server performs in-order processing of messages

Page 37: Bandwidth and latency optimizations

Server checks speculation’s status

Client 1 Server

Cat foo>bar

Write+Commit

Foo v=1

Check if foo indeedhas version=1, if no

fail

Page 38: Bandwidth and latency optimizations

Group Commit

• Previously sequential ops now concurrent

• Sync ops usually committed to disk

• Speculator makes group commit possible

write

writecommit

commit

ClientClient Server Server

Page 39: Bandwidth and latency optimizations

Putting it Together: SpecNFS

• Apply Speculator to an existing file system

• Modified NFSv3 in Linux 2.4 kernel– Same RPCs issued (but many now asynchronous)– SpecNFS has same consistency, safety as NFS– Getattr, lookup, access speculate if data in cache– Create, mkdir, commit, etc. always speculate

Page 40: Bandwidth and latency optimizations

Putting it Together: BlueFS• Design a new file system for Speculator

– Single copy semantics– Synchronous I/O

• Each file, directory, etc. has version number– Incremented on each mutating op (e.g. on write)– Checked prior to all operations.– Many ops speculate and check version async

Page 41: Bandwidth and latency optimizations

Apache Benchmark

• SpecNFS up to 14 times faster

0

50

100

150

200

250

300

No delay

Time (seconds)

NFS

SpecNFS

BlueFS

ext3

0

500

1000

1500

2000

2500

3000

3500

4000

4500

30 ms delay

Page 42: Bandwidth and latency optimizations

Rollback cost is small

• All files out of date SpecNFS up to 11x faster

0

20

40

60

80

100

120

140

NFS SpecNFS ext3

No delay

Time (seconds)

0

200

400

600

800

1000

1200

1400

1600

1800

2000

NFS SpecNFS ext3

30ms delay

No files invalid10% files invalid

50% files invalid100% files invalid

Page 43: Bandwidth and latency optimizations

What we’ve learnt today

• Traditional Performance boosting techniques– Caching– Group commit– Leases

• Two new techniques– Content-based hash and chunking– Speculative execution