storage systems for shingled disks - snia...2012 storage developer conference. © carnegie mellon...

30
Storage Systems for Shingled Disks Garth Gibson Carnegie Mellon University and Panasas Inc Anand Suresh, Jainam Shah, Xu Zhang, Swapnil Patil, Greg Ganger

Upload: others

Post on 19-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Storage Systems for Shingled Disks

Garth Gibson Carnegie Mellon University

and Panasas Inc

Anand Suresh, Jainam Shah, Xu Zhang, Swapnil Patil, Greg Ganger

Page 2: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Kryder’s Law for Magnetic Disks

r  Market expects ever more dense disks r  Future is multi-terabit per square inch r  Real challenge is making money at $100/disk

when engineering is this hard

G. Gibson, Sept 2012"2

Page 3: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Directions in High Capacity Disks

r  Heat-Assisted (HAMR) r  Small bits need high coercivity

media to retain orientation r  High coercivity can’t be

changed by normal writing r  Heated media lowers coercivity r  Include lasers?

r  Bit-Patterned (BPM) r  Small bits retain orientation

easier if bits kept apart r  Pattern media so only write a

single dot per bit r  Tera-dots per sq. inch?

G. Gibson, Sept 2012"3

Page 4: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Shingled Magnetic Recording (SMR)

G. Gibson, Sept 2012"4

Page 5: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

G. Gibson, Sept 2012"5

Page 6: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

G. Gibson, Sept 2012"

File systems do far too much small random writing

6

Page 7: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

G. Gibson, Sept 2012"

File systems do far too much small random writing

Disk becomes tape !!

7

Page 8: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

What About Reading?

Read head is possibly thinner than write head r  If target is 2-3 X density, maybe not too hard Targeting higher density sees lots of crosstalk r  Signal processing in two dimensions (TDMR) One approach to TDMR involves gathering signal

from 1-2 adjacent tracks on both sides r  Means 3 to 5 revs to read a single sector r  Not likely to be accepted by marketplace Safe plan is to “see” residual track w/ only 1 head

G. Gibson, Sept 2012 8

Page 9: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Geometry Model: Getting a handle on the parameters

G. Gibson, Sept 2012"9

Page 10: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Shingled writing: need big bands

r  Reason for doing it: density r Shingling projected at 1.5-2.5X track density

r  Can mix shingled and non-shingled r so, e.g., separate sequential from random r just lose some of the density gains

r  Can break up sets of shingled tracks (“bands”) r allowing overwrite of individual bands r but, they need to be big… like 32 to 256 MB

G. Gibson, Sept 2012"10

Page 11: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Simple Geometry Model

r  SMR allows wider write heads, w’>w

r  SMR reduces gaps, g, per track to per band (B tracks)

r  Residual (readable) track width (r) after overlapping is a key factor

r  A fraction of tracks not shingled, f, allows some random sector writing

G. Gibson, Sept 2012"

! !" # # $

% & ' (

% & ' ( ) *

11

Page 12: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Simple Geometry Model

r  SMR allows wider write heads, w’>w

r  SMR reduces gaps, g, per track to per band (B tracks)

r  Residual (readable) track width (r) after overlapping is a key factor

r  A fraction of tracks not shingled, f, allows some random sector writing

r  SMR increase in areal density given by simple model

G. Gibson, Sept 2012"

! !" # # $

% & ' (

% & ' ( ) *

12

Page 13: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Areal Density Favors Large Bands

G. Gibson, Sept 2012"

Eg. w=25, g=5, w’=70, r=10,13,20 nm, f=0%,1%,10%

13

Page 14: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Areal Density Favors Large Bands

G. Gibson, Sept 2012"

Eg. w=25, g=5, w’=70, r=10,13,20 nm, f=0%,1%,10%

•  1% unshingled is affordable •  10% if r<w

•  small B bad news •  r~=w needs

large B (~100+) •  r<w allows smallish

B (~10) •  But not soon ….

Systems should plan for large bands

14

Page 15: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Coping with SMR at the system level

G. Gibson, Sept 2012"15

Page 16: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Convergence with Flash

G. Gibson, Sept 2012"16

Page 17: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Transparent STL/FTL approach

r  Shingled disks implement “translation” r Same types of algorithms as Flash r Data will be correct using existing program code

r  But, not performance transparent r Erase block: 100-1000 X bigger r Read-erase-write: 1000-10000 X longer r Sure to exceed long tolerable latency thresholds

r  And, not cost transparent r Disk margins < flash margins r Yet disk STL needs more resources

G. Gibson, Sept 2012"17

Page 18: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Non-transparent SMR interface

r  Define an interface exposing key differences r Bands, non-shingled regions, trim, …

r  Modify systems software to avoid and minimize read-modify-write r Log-structured files systems 20 years old r STL-like technology not costly in host r Cloud storage writes in 64 MB chunks (HDFS) r Flash, PCM, etc may be available to host

G. Gibson, Sept 2012"18

Page 19: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Non-transparent SMR interface

r  Standards processes in T13 and T10 exist r  Key idea: disk attribute says “sequential writing” r  Each band has a write cursor for next write LBA r  Writes before and reads after cursor are “bad” r  Software can reset cursor, mostly to start of band r  Software can ask for map of bands & cursors

G. Gibson, Sept 2012"19

Page 20: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Experimenting with File Systems for SMR

G. Gibson, Sept 2012"20

Page 21: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Project Plan

r  Demonstrate systems using SMR interface r Mock interface models SMR device

r  Cloud/BigData initial target application space r Hadoop / HDFS first example

r Chunks ~= Bands r HDFS is write once, so easier to pack frags

r  Log-structured Merge Tree/LFS ? r  Implement directories and inodes as table entries r Logs of changes in tables written as bands

G. Gibson, Sept 2012"21

Page 22: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Start w/ class project framework 1)  App does create(f1.txt) 2)  MelangeFS creates “f1.txt” in SSD 3)  Ext2 on SSD returns a handle for

“f1.txt” to FUSE 4)  FUSE “translates” that handle into

another handle which is is returned to the app

5)  App uses the returned handle to write to “f1.txt” on the SSD

6)  When “f1.txt” grows big, MelangeFS moves it to HDD, and “f1.txt” on the SSD becomes a symlink to the file on HDD

7)  Because this migration has to be transparent, app continues to write as before (all writes go to the HDD).

Your FUSE file system (melangefs)

SSD (ext2) HDD (ext2)

Application

1

2

3

4

5 6 7

<F1> …

G. Gibson, Sept 2012"22

Page 23: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. 23

Experimental Platform Today

Shingledfs

Hadoop/HDFS

FUSE

User  

ext3

SMR Model

USER-LEVEL EMULATOR

File-to-Band/Block translation

SMR Device Emulator

T13 interface model

To disk

Open file cache F1 F2 …

Shingled partition

Unshingled partition

open(F2,w)

Metadata ops for F2

F2 written to SMR on close()

Write to open file go to cache

G. Gibson, Sept 2012"

Page 24: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Prototype Banded Disk API

r  CMU view of API essentials r Edi_modesense()

r Discover band information (number, size) r Edi_managebands(OP, band, offset, length)

r GET: where is next_write_offset? r SET: change next_write_offset (mostly to 0)

r Edi_read(band, offset, length) r Offset must be less than next_write_offset

r Edi_write(band, offset, length) r Offset must be next_write_offset (else reject)

G. Gibson, Sept 2012"24

Page 25: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Hadoop Sort Benchmark

G. Gibson, Sept 2012"25

r  6 node Hadoop cluster: write, sort, verify X GBs r  Compare local disk, FUSE-local, FUSE-SMR r  FUSE causes

most overhead r  No cleaning

during tests

r  SMR file system can support Big Data apps

Page 26: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Ongoing Project Directions

G. Gibson, Sept 2012"26

Page 27: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Future Work: General Workloads

r  Compile Linux 2.6.35 on SMR r  Bigger overheads

r Especially untar r Lots of small files, lots of directory operations, etc

G. Gibson, Sept 2012"27

Page 28: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Future Work: Pack Metadata

r  Change traditional file systems in unshingled region

r  Use LSM Tree for directories, inodes r Eg. LevelDB r Most metadata on

disk in SSTable blobs r  Initial experiments

reduce disk seeks for metadata ops

G. Gibson, Sept 2012"28

Page 29: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Summary of status

r  Experiments: SMR appropriate for Big Data apps r Deployment: embed in HDFS DataNode servers or

local file system r  Implementation greatly simplified by

r  one file: one band r  files open for write held in memory until close r  Hadoop/HDFS is write-once

r  Next steps: r Cleaning overhead, cluster soon-to-delete r Log-structured Merge Tree to pack metadata

29

Page 30: Storage Systems for Shingled Disks - SNIA...2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved. Storage Systems for Shingled Disks Garth Gibson Carnegie

2012 Storage Developer Conference. © Carnegie Mellon University. All Rights Reserved.

Further reading

r  www.pdl.cmu.edu technical reports:

CMU-PDL-12-105: Big Data experiments CMU-PDL-11-107: Principles of operations CMU-PDL-12-103: TableFS approach

Thanks to our sponsors: Seagate and the PDL Consortium (Actifio, APC, EMC, Emulex, Facebook, Fusion-IO, Google, HP Labs, Hitachi, Huawei, Intel, Microsoft, NEC, NetApp, Oracle, Panasas, Riverbed, Samsung, STEC, Symantec, VMWare, Western Digital)

30