venti : a new approach to archival storage

19
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Venti : A new approach to archival storage Sean Quinlan & Sean Dorward Bell Labs, Lucent Technology FAST 2002 Presented by: Fabián E. Bustamante

Upload: abner

Post on 11-Jan-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Venti : A new approach to archival storage. Sean Quinlan & Sean Dorward Bell Labs, Lucent Technology FAST 2002 Presented by: Fabián E. Bustamante. Archival Storage. Abundant storage - storing data for long time, even forever Archival system – write-once policy – changes how we see storage - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Venti : A new approach to archival storage

CS 443 Advanced OS

Fabián E. Bustamante, Spring 2005

Venti : A new approach to archival storage

Sean Quinlan & Sean DorwardBell Labs, Lucent Technology

FAST 2002

Presented by: Fabián E. Bustamante

Page 2: Venti : A new approach to archival storage

2

Archival Storage

Abundant storage - storing data for long time, even foreverArchival system – write-once policy – changes how we see storagePrevalent form of archival storage - tape backup– Central service for a number of clients– Restoring data tedious and error prone (requires sys admin or

privileged software)– Infrequent, so problems may go undetected for long– Tradeoff bet/ performance of backup & restore (incremental

backups are more efficient to generate, but not self-contained)Snapshots (Plan 8, Andrew FS, …)– A consistent read-only view of FS at some point in past– Maintains file system permissions – Can be accessed with standard tools – ls, cat, cp, grep, diff– Avoids tradeoff between full vs. incremental backup

• Looks like full backup• Implementation resembles incremental backup – share blocks• Only a small number of versions are kept

Page 3: Venti : A new approach to archival storage

3

Venti

GOAL: “To provide a write-once archival repository that can be shared by multiple client machines and applications”Block level network storage system– Actually a backend storage for client apps– Block level i/f places few restrictions on data

structures/format

Blocks addressed by hash of their contents– Uses SHA-1 algorithm– SHA-1 output is 160 bit (20 byte) fingerprint of data block

Write once policy– Once written cannot be deleted

Multiple writes of same data coalesced– Data sharing – increases effective storage capacity– Makes write operation idempotent

Page 4: Venti : A new approach to archival storage

4

Venti

Multiple clients can share a Venti server– Hash fn gives an universal namespace

Inherent integrity checking of data– Fingerprint computation on retrieval

Caching is simplifiedUses magnetic disks as storage technology– Access time comparable to non-archival data

Hashing function – sha1 is more than enough for now

Exabyte of storage (1018 bytes) in 8 Kbytes blocks

(~1014 blocks - n) with sha1 (b = 160 bits) – p < 10-20

b

nnp

2

1

2

)1(

Number of pairs of blocks Probability that a

given pair will collide

Page 5: Venti : A new approach to archival storage

5

Applications - general

Venti as a building block to construct storage appsData divided into blocksApp needs fingerprint for retrievalOne approach - fingerprints packed into pointer blocks, also written to serverRepeated recursively to get single fingerprint - root of tree

Page 6: Venti : A new approach to archival storage

6

Applications - general

Tree cannot be modified, butUnchanged sections of tree reused More complex data structures– Mixing data & fingerprints

in a block

e.g. structure for storing file system– 3 types of blocks

• Directory – file metadata + root fingerprint

• Pointer • Data

Page 7: Venti : A new approach to archival storage

7

Example app: Vac

Similar to zip & tar – store collection of files & directories as single objectTree of blocks formed for selected filesvac archive file - 45 bytes long– 20 bytes for root fingerprint– 25 bytes fixed header string– any amount of data compressed down to 45 bytes

Unvac – to restore file from archiveDuplicate copies of file coalesced on server– Multiple users vac same data – only 1 copy stored– vac on changed contents

Page 8: Venti : A new approach to archival storage

8

Example app: Physical level backup

Copy the raw disk blocks to Venti– No need to walk file hierarchy– Higher throughput

Duplicate blocks are coalescedFree block – null in pointer treeUser sees full backup of deviceStorage space advs of incremental backup retainedRandom access possible– Directly mounting a backup file system image (w/ OS

support)– Enables lazy (on demand) restore

Page 9: Venti : A new approach to archival storage

9

Example app - Plan 9 FS

Previously– Magnetic disk + write-once optical jukebox– Cache for faster access & accumulated changes bet/

snapshots– Cache smaller than active file system– Cache misses are painful

Plan 9 FS on top of Venti– Venti as storage device (instead of jukebox)– Equalizes access to both active & archival storage– Cache can also be smaller

Page 10: Venti : A new approach to archival storage

10

Implementation

Append-only log of data blocks– RAID array

Separate index maps fingerprints to log location

– Can be regenerated from data log

Index Cache

FS

Block Cache Data

Index

Venti Server

Network

Client

Client

Client

Page 11: Venti : A new approach to archival storage

11

Venti’s data log

Log divided into arenas for

maintenance

Append only section of data blocks (variable

size)For integrity checking

& data recovery

Data compressed (LZ77) – • encoding – if, algo.• esize size after compression

For fast index rebuild & error recovery

Data log and directory grow in opposite directionsWhen arena is filled, mark as sealed, fingerprint compute

Page 12: Venti : A new approach to archival storage

12

Venti’s index

No. of fingerprints >> no. of blocks on a server

Index as disk-resident hash table

Hashing function maps fingerprints to index buckets

Going through index is main performance penalty– Caching (block and index)– Index striped across multiple disks – fingerprint location in index is random

& index is too big to fit in memory– Write buffering (to increase number of concurrent accesses)

Block cache - hit - index lookup & data log bypassed

Index cache - hit - index lookup bypassed

Page 13: Venti : A new approach to archival storage

13

Performance: computing environments

Testbed– Dedicated dual 550Mhz Pentium 3, 2GB mem.– Access through a 100Mbps Ethernet– Data log stored on a 500GB MaxTronic IDE Raid 5– Index stored on 8 Seagate Cheetah 18XL 9GB

SCSI

Micro-benchmark results

Sequential reads

Random reads

Virgin writes Duplicates writes

Uncached 0.9 0.4 3.7 5.6

Index cache 4.2 0.7 - 6.2

Block cache 6.8 - - 6.5

Raw raid 14.8 1.0 12.4 12.4

Page 14: Venti : A new approach to archival storage

14

Storage usage – file servers used

2 plan 9 file servers, bootes and emelie– Spanning 1990 to 2001

522 user accounts, 50-100 active at a time

Numerous development projects hosted

Several large data sets– Astronomical data, satellite imagery, multimedia files

Next figures– Size of active file system– Space consumed on jukebox– Space when using Venti– Ratio – using Venti reduces the cost of retaining snapshots

(so ratio is smaller)

Page 15: Venti : A new approach to archival storage

15

Storage usage

Page 16: Venti : A new approach to archival storage

16

Storage saving

When stored on Venti, size of jukebox data reduced by 3 factors– Elimination of duplicate blocks– Elimination of block fragmentation– Compression of block contents

% reduction of storage Bootes Emelie

Elimination of duplicates 27.8% 31.3%

Elimination of fragments 10.2% 25.4%

Data compression 33.8% 54.1%

Total reduction 59.7% 76.5%

Page 17: Venti : A new approach to archival storage

17

Reliability & recovery

Tools for integrity checking & error recovery– Verifying structure of arena– Checking there is an index entry for every block in data log,

vice versa– Rebuilding index from data log– Copying arena to removable media

Data log on RAID 5 disk array – Protection against single drive failures

Off-site mirrors for extra protection, if that’s not enough …

Stored sealed arenas into removable media

Page 18: Venti : A new approach to archival storage

18

Future Work

Load balancing– Distribute Venti across multiple machines– Replicate server– Use of proxy server to hide it from client

Better access control– Currently just authentication to server– Single root fingerprint gives access to entire file

tree

Use of variable sized blocks as in LBFS

Page 19: Venti : A new approach to archival storage

19

Conclusion

Addressing block by SHA-1 hash of contents – good fit for archival storage

Magnetic disks as storage technology– Large capacity at low price – online storage– Random access– Performance comparable to non-archival data

Write once model– Reduces accidental or malicious data loss– Simplifies administration– Simplifies caching– Allows sharing of data