9/12/2019 cs262a-f19 lecture-05 2 - people

8
EECS 262a Advanced Topics in Computer Systems Lecture 5 Flash File Systems September 12 th , 2019 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs262 9/12/2019 2 cs262a-F19 Lecture-05 Today’s Papers F2FS: A New File System for Flash Storage Changman Lee, Dongho Sim, Joo-Young Hwang, and Sangyeun Cho Appears in Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST 2015) A Fast and Slippery Slope for File Systems Ricardo Santana, Raju Rangaswami, Vasily Tarasov, Dean Hildebrand Appears in Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads (INFLOW 2015) • Thoughts? 9/12/2019 3 cs262a-F19 Lecture-05 Solid State Disks (SSDs) 1995 – Replace rotating magnetic media with non-volatile memory (battery backed DRAM) 2009 – Use NAND Multi-Level Cell (2 or 3-bit/cell) flash memory Sector (4 KB page) addressable, but stores 4-64 “pages” per memory block Trapped electrons distinguish between 1 and 0 No moving parts (no rotate/seek motors) Eliminates seek and rotational delay (0.1-0.2ms access time) Very low power and lightweight Limited “write cycles” Rapid advances in capacity and cost ever since! 9/12/2019 4 cs262a-F19 Lecture-05 Some “Current” 3.5in SSDs Seagate Nytro SSD: 15TB (2017) Dual 12Gb/s interface Seq reads 860MB/s Seq writes 920MB/s Random Reads (IOPS): 102K Random Writes (IOPS): 15K Price (Amazon): $6325 ($0.41/GB) Nimbus SSD: 100TB (2019) Dual port: 12Gb/s interface Seq reads/writes: 500MB/s Random Read Ops (IOPS): 100K Unlimited writes for 5 years! Price: ~ $50K? ($0.50/GB)

Upload: others

Post on 13-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 9/12/2019 cs262a-F19 Lecture-05 2 - People

EECS 262a Advanced Topics in Computer Systems

Lecture 5

Flash File SystemsSeptember 12th, 2019

John KubiatowiczElectrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~kubitron/cs262

9/12/2019 2cs262a-F19 Lecture-05

Today’s Papers• F2FS: A New File System for Flash Storage

Changman Lee, Dongho Sim, Joo-Young Hwang, and Sangyeun ChoAppears in Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST 2015)

• A Fast and Slippery Slope for File Systems Ricardo Santana, Raju Rangaswami, Vasily Tarasov, Dean HildebrandAppears in Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads (INFLOW 2015)

• Thoughts?

9/12/2019 3cs262a-F19 Lecture-05

Solid State Disks (SSDs)

• 1995 – Replace rotating magnetic media with non-volatile memory (battery backed DRAM)

• 2009 – Use NAND Multi-Level Cell (2 or 3-bit/cell) flash memory– Sector (4 KB page) addressable, but stores 4-64 “pages” per memory

block– Trapped electrons distinguish between 1 and 0

• No moving parts (no rotate/seek motors)– Eliminates seek and rotational delay (0.1-0.2ms access time)– Very low power and lightweight– Limited “write cycles”

• Rapid advances in capacity and cost ever since!9/12/2019 4cs262a-F19 Lecture-05

Some “Current” 3.5in SSDs• Seagate Nytro SSD: 15TB (2017)

– Dual 12Gb/s interface– Seq reads 860MB/s– Seq writes 920MB/s– Random Reads (IOPS): 102K– Random Writes (IOPS): 15K– Price (Amazon): $6325 ($0.41/GB)

• Nimbus SSD: 100TB (2019)– Dual port: 12Gb/s interface – Seq reads/writes: 500MB/s– Random Read Ops (IOPS): 100K– Unlimited writes for 5 years!– Price: ~ $50K? ($0.50/GB)

Page 2: 9/12/2019 cs262a-F19 Lecture-05 2 - People

9/12/2019 5cs262a-F19 Lecture-05

FLASH Memory

• Like a normal transistor but:– Has a floating gate that can hold charge– To write: raise or lower wordline high enough to cause charges to tunnel– To read: turn on wordline as if normal transistor

» presence of charge changes threshold and thus measured current• Two varieties:

– NAND: denser, must be read and written in blocks– NOR: much less dense, fast to read and write

• V-NAND: 3D stacking (Samsung claims 1TB possible in 1 chip)

Samsung 2015:512GB, NAND Flash

9/12/2019 6cs262a-F19 Lecture-05

SSD Architecture – Reads

Read 4 KB Page: ~25 usec– No seek or rotational latency– Transfer time: transfer a 4KB page

» SATA: 300-600MB/s => ~4 x103 b / 400 x 106 bps => 10 us– Latency = Queuing Time + Controller time + Xfer Time– Highest Bandwidth: Sequential OR Random reads

Host

BufferManager(softwareQueue)

FlashMemoryController

DRAM

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

NANDNAND

SATA

9/12/2019 7cs262a-F19 Lecture-05

Flash Memory (Con’t)

• Data read and written in page-sized chunks (e.g. 4K)– Cannot be addressed at byte level– Random access at block level for reads (no locality advantage)– Writing of new blocks handled in order (kinda like a log)

• Before writing, must be erased (256K block at a time)– Requires free-list management– CANNOT write over existing block (Copy-on-Write is normal case)

9/12/2019 8cs262a-F19 Lecture-05

Flash Details• Program/Erase (PE) Wear

– Permanent damage to gate oxide at each flash cell– Caused by high program/erase voltages– Issues: trapped charges, premature leakage of charge– Need to balance how frequently cells written: “Wear Leveling”

• Flash Translation Layer (FTL)– Translates between Logical Block Addresses (at OS level) and

Physical Flash Page Addresses– Manages the wear and erasure state of blocks and pages– Tracks which blocks are garbage but not erased

• Management Process (Firmware)– Keep freelist full, Manage mapping, Track wear state of pages– Copy good pages out of basically empty blocks before erasure

• Meta-Data per page:– ECC for data– Wear State

Page 3: 9/12/2019 cs262a-F19 Lecture-05 2 - People

9/12/2019 9cs262a-F19 Lecture-05

Phase Change memory (IBM, Samsung, Intel)

• Phase Change Memory (called PRAM or PCM)– Chalcogenide material can change from amorphous to crystalline

state with application of heat– Two states have very different resistive properties – Similar to material used in CD-RW process

• Exciting alternative to FLASH– Higher speed– May be easy to integrate with CMOS processes

9/12/2019 10cs262a-F19 Lecture-05

Nano-Tube Memory (NANTERO)

• Yet another possibility: Nanotube memory– NanoTubes between two electrodes, slight conductivity difference

between ones and zeros– No wearout!

• Better than DRAM? (Check out optional paper I uploaded)– Speed of DRAM, no wearout, non-volatile!– Nantero promises 512Gb/dice for 8Tb/chip! (with 16 die stacking)

9/12/2019 11cs262a-F19 Lecture-05

F2FS: A New File System for Flash Storage• File system used on a bunch of mobile devices

– Including the Pixel 3 from Google– Latest version supports block-encryption for security– Been “mainstream” in linux for several years now

• Assume standard SSD interface (with built-in Flash Translation Layer)

• Statements:– Random writes are bad for flash storage

» Sustained write performance degrades/lifetime reduced– Use of log-structure to maintain locality

» Start with Log-structured file systems/copy-on-write file systems• Contribution:

– Design of file system to leverage and optimize the usage of NAND flash solutions

– Comparison with Ext4, Btrfs, Nilfs2, etc

9/12/2019 12cs262a-F19 Lecture-05

Design Considerations:• Flash-friendly on-disk layout• Cost-effective index structure• Multi-head logging• Adaptive logging• Fsync acceleration with roll-forward recovery

Page 4: 9/12/2019 cs262a-F19 Lecture-05 2 - People

9/12/2019 13cs262a-F19 Lecture-05

Flash-friendly on-disk Layout

9/12/2019 14cs262a-F19 Lecture-05

LFS Index Structure

9/12/2019 15cs262a-F19 Lecture-05

F2FS Index Structure

9/12/2019 16cs262a-F19 Lecture-05

Multi-head Logging

Page 5: 9/12/2019 cs262a-F19 Lecture-05 2 - People

9/12/2019 17cs262a-F19 Lecture-05

Evaluation

9/12/2019 18cs262a-F19 Lecture-05

Mobile Benchmark

9/12/2019 19cs262a-F19 Lecture-05

Server Benchmark

9/12/2019 20cs262a-F19 Lecture-05

Multi-Head Logging

Page 6: 9/12/2019 cs262a-F19 Lecture-05 2 - People

9/12/2019 21cs262a-F19 Lecture-05

Cleaning Cost Analysis

9/12/2019 22cs262a-F19 Lecture-05

Adaptive Logging Performance

9/12/2019 23cs262a-F19 Lecture-05

Conclusion

9/12/2019 24cs262a-F19 Lecture-05

Is this a good paper?• What were the authors’ goals?• What about the evaluation/metrics?• Did they convince you that this was a good

system/approach?• Were there any red-flags?• What mistakes did they make?• Does the system/approach meet the “Test of Time”

challenge?• How would you review this paper today?

Page 7: 9/12/2019 cs262a-F19 Lecture-05 2 - People

9/12/2019 25cs262a-F19 Lecture-05

Break

9/12/2019 26cs262a-F19 Lecture-05

A Fast and Slippery Slope for File Systems• Use of virtual block device (dm-delay)

– Delay every I/O request to the underlying storage for configurable amount of time

– Use of RAM disk underneath – leaves “at most” 5 from userspace– Variable latency (such as from physical disk drives) doesn’t seem to be

emulated with this technique

• Evaluate file systems across a spectrum of device speeds (the User Performance Expectation Model)

– Simple analytical device: 𝑻𝒉𝒓𝒐𝒖𝒈𝒉𝒑𝒖𝒕 𝒍𝒅𝒆𝒗 𝑪𝒍𝒅𝒆𝒗 𝒍𝒔𝒘– Here, we have 3 parameters:» 𝒍𝒅𝒆𝒗 is the device latency» 𝒍𝒔𝒘 is the software latency (including system, filesystem, workload)» 𝑪 is a coefficient of proportionality specific to environment– Calibrate workloads at 5ms and 10ms device delay

9/12/2019 27cs262a-F19 Lecture-05

Validation of UPEM vs Workload/Filesystem

9/12/2019 28cs262a-F19 Lecture-05

Cross-Filesystem Evaluation

• Observations– Almost all file systems improve

performance as underlying storage decreases

– Performance flattens out under 1ms– EXT4 good across many environments– F2FS better at latencies blow 1ms

• Nilfs2 has performance anomalies– Serious lock contention as device gets

faster….

Page 8: 9/12/2019 cs262a-F19 Lecture-05 2 - People

9/12/2019 29cs262a-F19 Lecture-05

Is this a good paper?• What were the authors’ goals?• What about the evaluation/metrics?• Did they convince you that this was a good

system/approach?• Were there any red-flags?• What mistakes did they make?• Does the system/approach meet the “Test of Time”

challenge?• How would you review this paper today?