from backups to time travel: a systems perspective on snapshots

From Backups To Time Travel: A Systems Perspective on Snapshots Ross Shaull

Where it comes from A Modular and Efficient Past State System for Berkeley DB. Ross Shaull, Liuba Shrira, Barbara Liskov. USENIX 2014. Retro: A Methodology for Retrospection Everywhere (PhD thesis). Ross Shaull. 2013. Skippy: a new snapshot indexing method for time travel in the storage manager. Ross Shaull, Liuba Shrira, Hao Xu. SIGMOD 2008.

Talk plan Define snapshots and programming model Architecture Compare with other approaches Formal model for deriving snapshot protocols How snapshots are saved and accessed A peek at why snapshots are recoverable Challenges and opportunities in a distributed system Speculation and pronouncement

Snapshots and Retrospection Past states of data can provide insights trend analysis anomaly and intrusion detection Auditing may require past-state retention Saving consistent past states (snapshots) is challenging and not available in all data stores Retrospection = point in time query + programming model

Snapshots are Consistent Global Named Application-declared June 2014 Salary: $200K

Snapshots are Consistent Global Named Application-declared June 2013 Salary: $120K June 2013 Snapshot

Programming Model begin; insert into accounts values(); update accounts set balance=0 where name=Tom; snapshot; commit; select * from accounts where name = Tom as of S;

Backup Separate copy of data Usually stored off-site Think of it as a copy of a snapshot Can also be incremental of course Snapshots and backup are different, an in-situ snapshot does not protect from catastrophic storage loss, a backup does not enable retrospection

Architecture Application Database Interface Transactional storage manager Snapshot layer WAL Page cache MVCC DB Disk Retro Disk as of snap Access methods / indexes Snapshot layer works at the page level, inside the TSM

Protocol extensions Application Database Interface Transactional storage manager Snapshot layer WAL Page cache MVCC DB Disk Retro Disk as of snap Access methods / indexes Retro recovery Retro page cache Snapshot page translation Concurrency for retrospection Efficient COW

Why this design for Retro? Logical-level snapshots require significant modifications to the data store Application Database Interface TSM Snapshot layer DB Disk Retro Disk as of snap Access methods / indexes

Why this design for Retro? Logical-level snapshots require significant modifications to the data store With low-level snapshots, it can be expensive to get consistency Application Database Interface TSM Snapshot layer DB Disk Retro Disk as of snap Access methods / indexes

Why this design for Retro? Logical-level snapshots require significant modifications to the data store With low-level snapshots, its expensive to get consistency Retro is not too high or too low Simple integration and non- disruptive Application Database Interface TSM Snapshot layer DB Disk Retro Disk as of snap Access methods / indexes

An interlude Well look at a snapshot strategy that depends on a system layer below the database What are its strengths and weaknesses? Why implement inside the database? Well also quickly review some database concepts

Atomic file system snapshot Below the level of database transactions Can provide crash consistent snapshots An alternative with limitations Before we talk about limitations, lets look at why it provides crash-consistency

Atomicity and Durability A and D in ACID Modern relational databases typically achieve them with transactions which are logged to a write-ahead log (WAL) Visibility of this object is which letter in ACID? Page cache Journal Data

Atomicity and Durability

Atomic file system snapshot Strengths Simple: compose it from existing building blocks Weaknesses Snapshot consistent but may not be precise Must perform recovery before retrospection Requires file system support

Retro snapshots, copy-on-write T1: update P1, P2, declare S2 T2: update P1 P1 P2 Database Retro P1 P2 P1 P1 P2 P1 S1 S2 Transactions

Retro snapshots Retrospection as of snapshot S is equivalent to a hypothetical current-state query that runs immediately after S was declared T1: update P1, P2, declare S2 T2: update P1 P1 P2 Database Retro P1 P2 P1 P1 P2 P1 S1 S2 Transactions

Overwrite sequence (OWS) OWS(H) is a tagging of history H which page pre-states to save the snapshot pages a retrospective query accesses which pre-states and snapshot declarations to recover A Modular and Efficient Past State System for Berkeley DB. Ross Shaull, Liuba Shrira, Barbara Liskov. USENIX 2014.

Protocol extension: MVCC Concurrent access to current state and snapshots Efficient copying of snapshots Retrospection runs using MVCC and page requests are redirected to snapshot pages that have migrated to Retro disk

Retrospection (querying as of) Snapshot Layer Access methods / indexes Page cache Page name translation Database disk Retro disk Interface as of Look up (DBFile, P)@S1 Get (DBFile, P) Get (RetroFile, X) @S1

Current state queries Snapshot Layer Access methods / indexes Page cache Page name translation Database disk Retro disk Interface Get (DBFile, P)

Protocol extensions: Recovery Like the database, snapshots are written asynchronously (non-disruptiveness) Retro saves pre-states during BDB recovery Snapshot declarations are also logged Identify needed pre-states using Retro metadata during recovery

Protocol extension: Recovery Runtime invariants Snapshots are made durable first (WAS-invariant) Recovery-time extension Recover snapshot metadata first Idempotent we can tell if pre-state was saved already (using snapshot metadata) only recreate pre-states not saved yet

Experimental evaluation Implemented as a set of callbacks About 250 lines of modifications to BDB source Call into about 5000 lines of snapshot layer code Retro is thread-safe Care taken to follow OWS(H) order in face of concurrency

Experimental Results Database and snapshot data are written to one disk, logs to the other Database size is 1 gb Snapshot store on Retro disk can be >100 gb Non-disruptiveness Random update workload with and without Retro With Retro, declare snapshot after every transaction Enforcing invariants for snapshot durability imposes about 4% overhead on throughput

Retrospection: I/O build page table query 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 50/50 80/20 90/10 50/50 80/20 90/10 costintermsofQ skew 100mb 1gb

Now about that future Distributed systems can support snapshots for example HDFS (distributed block store) has COW snapshots Cassandra can do EC backups Many systems must be quiesced first We want inexpensive snapshots with online, in-situ retrospection?

Replicated systems, MVCC If there is correspondence between histories on replicas, then a replica can save snapshots A replica saving snapshots eventually receives a record of all transactions If a distributed system provides snapshot isolation, MVCC can be exploited for retrospection Exploit visibility mechanisms built into system to determine which objects belong to each snapshot

We are here at NuoDB after all TE TE SM SM Distributed durable cache

Lets speculate TE TE SM SM Transaction tier Durability tier

Retrospection Transaction tier Durability tier AS OF query Miss that requires going to an SM

Retrospection Transaction tier Durability tier AS OF query Correct version of object AS OF snapshot Visibility computed with MVCC

Opportunities Adding the capability to store and query snapshots is similar to adding capacity Multiple replicas can improve the performance of saving and accessing snapshots Snapshot workload can be physically separated from current-state workload within the same database

Conclusions Retro is a novel design for adding retrospection Supports powerful programming model Non-disruptive, long-lived snapshots Layered design key to simple design and implementation Protocol extensions make it efficient, see paper Adding snapshots to a distributed system is a natural extension with opportunities to improve coexistence of current and past states

Thank You! Questions? e: [email protected] t:@nuodb www.nuodb.com

from backups to time travel: a systems perspective on snapshots

Software

declare s2

distributed

ross shaull

snapshot pages

berkeley db

update p1

barbara liskov

liuba shrira