transactional flash v. prabhakaran, t. l. rodeheffer, l. zhou (msr, silicon valley), osdi 2008...

31
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group

Upload: oscar-jennings

Post on 20-Jan-2018

212 views

Category:

Documents


0 download

DESCRIPTION

Idea: Transactional Flash (Txflash) An SSD (w/ new features) Addressing: a linear array of pages Support read and write operations Support a simple transactional construct Each tranx consists of a series of write operations Atomicity Isolation Durability

TRANSCRIPT

Page 1: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Transactional FlashV. Prabhakaran, T. L. Rodeheffer, L.

Zhou (MSR, Silicon Valley), OSDI 2008

Shimin ChenBig Data Reading Group

Page 2: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Introduction SSD: block-level APIs as disks

Lost of opportunity

Goal: new abstractions for better matching the nature of the new medium as well as the need from file systems and databases

Page 3: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Idea: Transactional Flash (Txflash) An SSD (w/ new features) Addressing: a linear array of pages Support read and write operations Support a simple transactional construct

Each tranx consists of a series of write operations Atomicity Isolation Durability

Page 4: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Why is this useful? Transaction abstraction required in many

places: file system journals, etc. Each application implements its own

Complexity Redundant work Reliability of the implementation

Great if a storage layer provides transactional API

Page 5: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Previous Work: disk-based Copy-on-Write + Logging

Fragmentation poor read performance Checkpointing and cleaning

Cleaning cost

SSDs mitigate these problems SSDs already do CoW for flash-related reasons Random read accesses are fast

Page 6: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Outline Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

Page 7: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

TxFlash Architecture & API

s

WriteAtomic(p1…pn) p1…pn are in a tranx followed by write(p1)…write(pn) atomicity, isolation, durability

Abort aborting in-progress tranx

In-progress tranx

Not issue conflict writes

Core of TxFlash

Page 8: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Simple Interface WriteAtomic: multi-page writes

Useful for file systems Not full-fledged tranx: no reads in tranx

Reduce complexity Backward compatible

Page 9: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Flash is good for this purpose Copy-on-write: already supported by FTL Fast random reads High concurrency

multiple flash chips inside New device:

New interface more likely

Page 10: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Outline Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

Page 11: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Traditional Commit First write to a log:

Intention record: (data, page# & version#, tranx ID) … Intention record Commit record

Tranx is committed == commit record exists Intention records modify original data If modifications are done, the records can be

garbage collected

Page 12: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Traditional Commit on SSDs Optimizations:

All writes can be issued in parallel Not update the original data, just update the

remap table Problem: commit record

Extra latency after other writes Garbage collection is complicated:

Must know if all the updates complete or not

Page 13: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

New Proposal (1): Simple Cyclic Commit No commit record Intension records of the same tranx use

next links to form a cycle (data, page# & version#, next page# & version#)

Tranx is committed == all intension records are written

Flash page (4KB) + metadata (128B)are co-located

Page 14: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Problem

Page 15: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Solution: Any uncommitted intention on the stable

storage must be erased before any new writes are issued to the same or a referenced page

Page 16: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Operations Initialization:

Setting version# to 0, next-link to self Transaction Garbage Collection:

For any uncommitted intention For committed page if a newer version is

committed Recovery: scan all pages then look for cycles

Page 17: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

New Proposal (2):Back Pointer Cyclic Commit Another way to deal with ambiguity Intention record:

(data, page#&version#, next-link, link to last committed version)

Page 18: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

A3 is a straddler of A2

Some complexity in garbage collection and recovery because of this

Page 19: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Protocol Comparison

Page 20: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Outline Introduction The Case for TxFlash Commit Protocols Implementation Evaluation Conclusion

Page 21: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Implementation Simulatior

DiskSimtrace-driven SSD simulator (UNIX’08)modifications for TxFlash

Support tranx of maximum size 4MB Pseudo-device driver for recording traces TxExt3:

Employ Txflash for Ext3 file system Tranx: Ext3 journal commit

Page 22: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Experimental Setup TxFlash device:

32GB: 8x 4GB flash packages 4 I/O operations within every flash package 15% of space reserved for garbage collection

Workload on top of Ext3: IOzone: micro benchmark (no sync writes) Linux-build (no sync writes) Maildir (sync writes) TPC-B: simulate 10,000 credit-debit-like operations on

TxExt3 file system (sync writes) Synthetic workloads

Page 23: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Cyclic commit vs. Traditional commit

Page 24: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Unlike database logging, large tranx sizes: no sync; data are included

Page 25: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

• simple cyclic commit has a high cost if there are aborts

Page 26: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…
Page 27: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

TxFlash vs. SSD Remove WriteAtomic from traces Use SSD simulator SSD does not provide any transaction

guarantees (so should have better performance)

Page 28: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Space comparison: TxFlash needs 25% of more main memory than SSD

• 4+1 MB per 4GB flash 40 MB for the 32GB TxFlash device

Page 29: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

End-to-end performance TxFlash:

Run pseudo-device driver on real SSD The performance is close to that of TxFlash

Ext3: Use SSD as journal

SSD cache is disabled in both cases

Page 30: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…
Page 31: Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin…

Summary TxFlash:

Adding transaction interface in SSD Cyclic commit protocols

Nice solution for file system journaling