database system implementation · the system uses an in-memory (volatile) buffer ... a survey of...
TRANSCRIPT
![Page 1: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/1.jpg)
DATABASE SYSTEM IMPLEMENTATION
GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ
LECTURE #2: IN-MEMORY DATABASES
![Page 2: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/2.jpg)
TODAY’S AGENDA
BackgroundIn-Memory DBMS ArchitecturesEarly Notable In-Memory DBMSs
2
![Page 3: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/3.jpg)
LAST CLASS
History of DBMSs→ In a way though, it really was a history of data models
Data Models→ Hierarchical data model (tree) (IMS)→ Network data model (graph) (CODASYL)→ Relational data model (tables) (System R, INGRES)
Overarching theme about all these systems→ They were all disk-based DBMSs
3
![Page 4: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/4.jpg)
BACKGROUND
Much of the history of DBMSs is about dealing with the limitations of hardware.
Hardware was much different when the original DBMSs were designed:→ Uniprocessor (single-core CPU)→ RAM was severely limited (few MB).→ The database had to be stored on disk.→ Disk is slow. No seriously, I mean really slow.
4
![Page 5: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/5.jpg)
BACKGROUND
But now DRAM capacities are large enough that most databases can fit in memory.→ Structured data sets are smaller (e.g., tables with schema).→ Unstructured or semi-structured data sets are larger (e.g., videos, log files).
So why not just use a "traditional" disk-oriented DBMS with a really large cache?
5
![Page 6: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/6.jpg)
DISK-ORIENTED DBMS
The primary storage location of the database is on non-volatile storage (e.g., HDD, SSD).→ The database is stored in a file as a collection of fixed-
length blocks called slotted pages on disk.
The system uses an in-memory (volatile) buffer pool to cache blocks fetched from disk.→ Its job is to manage the movement of those blocks back
and forth between disk and memory.
6
![Page 7: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/7.jpg)
BUFFER POOL
When a query accesses a page, the DBMS checks to see if that page is already in memory:→ If it’s not, then the DBMS has to retrieve it from disk and
copy it into a free frame in the buffer pool.→ If there are no free frames, then find a page to evict
guided by the page replacement policy.→ If the page being evicted is dirty, then the DBMS has to
write it back to disk to ensure the durability (ACID) of data.
7
![Page 8: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/8.jpg)
PAGE REPLACEMENT POLICY
Page replacement policy is a differentiating factor between open-source and commercial DBMSs.→ What kind of data does it contain?→ Is the page dirty?→ How likely is the page to be accessed in the near future?→ Examples: LRU, LFU, CLOCK, ARC (Adaptive
Replacement Cache)
More Information on Page Replacement Policies: Wikipedia
8
![Page 9: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/9.jpg)
BUFFER POOL
Once the page is in memory, the DBMS translates any on-disk addresses to their in-memory addresses.
(Page Identifier) (Page Pointer)[#100] [0x5050]
9
![Page 10: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/10.jpg)
DATA ORGANIZATION
10
Buffer Pool
page6
page4
Index Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 11: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/11.jpg)
DATA ORGANIZATION
11
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 12: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/12.jpg)
DATA ORGANIZATION
12
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 13: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/13.jpg)
DATA ORGANIZATION
13
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 14: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/14.jpg)
DATA ORGANIZATION
14
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 15: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/15.jpg)
DATA ORGANIZATION
15
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 16: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/16.jpg)
DATA ORGANIZATION
16
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 17: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/17.jpg)
DATA ORGANIZATION
17
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
![Page 18: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/18.jpg)
DATA ORGANIZATION
18
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page1
![Page 19: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/19.jpg)
DATA ORGANIZATION
19
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page1
![Page 20: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/20.jpg)
DATA ORGANIZATION
20
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page1
![Page 21: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/21.jpg)
BUFFER POOL
Every tuple access has to go through the buffer pool manager regardless of whether that data will always be in memory.→ Always have to translate a tuple’s record id to its memory
location.→ Worker thread has to pin pages that it needs to make
sure that they are not swapped to disk.
21
![Page 22: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/22.jpg)
CONCURRENCY CONTROL
In a disk-oriented DBMS, the systems assumes that a txn could stall at any time when it tries to access data that is not in memory.Execute other txns at the same time so that if one txn stalls then others can keep running.→ This is not because the DBMS is trying to use all cores in
the CPU. We are still focusing on single-core CPUs.→ We do this to let the system make forward progress by
executing another txn while the current txn is waiting for data to be fetched from disk
22
![Page 23: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/23.jpg)
CONCURRENCY CONTROL
Concurrency control policy→ Responsible for deciding how to interleave the
operations of concurrency transactions in such a way that it appears as if they are running one after each other
→ This property is referred to as serializability of transactions
→ Has to set locks and latches to ensure the highest level of isolation (ACID) between transactions
→ Locks are stored in a separate data structure (lock table) to avoid being swapped to disk.
23
![Page 24: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/24.jpg)
LOCKS VS. LATCHES
Locks→ Protects the database's logical contents (e.g., tuple, table)
from other txns.→ Held for txn duration.→ Need to be able to rollback changes.Latches→ Protects the DBMS's internal physical data structures
(e.g., page table) from other threads.→ Held for operation duration.→ Do not need to be able to rollback changes.
24
A SURVEY OF B-TREE LOCKING TECHNIQUESTODS 2010
![Page 25: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/25.jpg)
LOCKS VS. LATCHES
25
Locks LatchesSeparate… User transactions ThreadsProtect… Database Contents In-Memory Data StructuresDuring… Entire Transactions Critical SectionsModes… Shared, Exclusive, Update,
IntentionRead, Write
Deadlock Detection & Resolution Avoidance…by… Waits-for, Timeout, Aborts Coding Discipline
Kept in… Lock Manager Protected Data StructureSource: Goetz Graefe
![Page 26: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/26.jpg)
LOGGING & RECOVERY
This protocol is adopted by the DBMS to ensure the atomicity and durability properties (ACID)→ Durability: Changes made by committed transactions
must be present in the database after recovering from a power failure.
→ Atomicity: Changes made by uncommitted (in-progress/aborted) transactions must not be present in the database after recovering from a power failure.
26
![Page 27: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/27.jpg)
LOGGING & RECOVERY
Most DBMSs use STEAL + NO-FORCE buffer pool policies.→ STEAL: DBMS can flush pages dirtied by uncommitted
transactions to disk.→ NO-FORCE: DBMS is not required to flush all pages
dirtied by committed transactions to disk.→ So all page modifications have to be flushed to the
write-ahead log (WAL) before a txn can commit
27
![Page 28: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/28.jpg)
LOGGING & RECOVERY
Each log entry contains the before and after images of modified tuples.→ STEAL: Modifications made by uncommitted
transactions that are flushed to disk have to rolled back.→ NO-FORCE: Modifications made by committed
transactions might not have been flushed to disk.→ Recording the before and after images in the log is critical
to ensuring the atomicity and durability properties→ Lots of work to keep track of log sequence numbers
(LSNs) all throughout the DBMS.
28
![Page 29: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/29.jpg)
DISK-ORIENTED DBMS OVERHEAD
29
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 30: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/30.jpg)
DISK-ORIENTED DBMS OVERHEAD
30
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 31: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/31.jpg)
DISK-ORIENTED DBMS OVERHEAD
31
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
34%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 32: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/32.jpg)
DISK-ORIENTED DBMS OVERHEAD
32
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
14%
34%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 33: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/33.jpg)
DISK-ORIENTED DBMS OVERHEAD
33
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
16%14%
34%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 34: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/34.jpg)
DISK-ORIENTED DBMS OVERHEAD
34
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
16%14%
34%
12%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 35: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/35.jpg)
DISK-ORIENTED DBMS OVERHEAD
35
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
16%14%
34%
12%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
16%
![Page 36: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/36.jpg)
DISK-ORIENTED DBMS OVERHEAD
36
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
16%14%
34%
12%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
16%
7%
![Page 37: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/37.jpg)
TAKEAWAYS
Disk-oriented DBMSs do a lot of extra stuff because they are predicated on the assumption that data has to reside on disk
In-memory DBMSs maximize performance by optimizing these protocols and algorithms
37
![Page 38: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/38.jpg)
IN-MEMORY DBMSS
Assume that the primary storage location of the database is permanently in memory.
Early ideas proposed in the 1980s but it is now feasible because DRAM prices are low and capacities are high.
38
![Page 39: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/39.jpg)
BOTTLENECKS
If I/O is no longer the slowest resource, much of the DBMS’s architecture will have to change account for other bottlenecks:→ Locking/latching→ Cache misses→ Pointer chasing (e.g., virtual function lookup tables)→ Predicate evaluations→ Data movement & copying (e.g., multi-socket machine)→ Networking (between application & DBMS)
39
![Page 40: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/40.jpg)
STORAGE ACCESS LATENCIES
40
L3 DRAM SSD HDD
Read Latency ~20 ns 60 ns 25,000 ns 10,000,000 ns
Write Latency ~20 ns 60 ns 300,000 ns 10,000,000 ns
LET’S TALK ABOUT STORAGE & RECOVERY METHODS FOR NON-VOLATILE MEMORY DATABASE SYSTEMSSIGMOD, pp. 707-722, 2015.
![Page 41: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/41.jpg)
IN-MEMORY DATABASES
41
Jim Gray’s analogy:→ Reading from L3 cache: Reading a book on a table→ Reading from HDD: Flying to Pluto to read that book
Because everything fits in DRAM, we can do more sophisticated things in software.
![Page 42: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/42.jpg)
DATA ORGANIZATION
An in-memory DBMS does not need to store the database in slotted pages but it will still organize tuples in blocks/pages:→ Direct memory pointers vs. tuple identifiers→ Separate pools for fixed-length (e.g., date of birth) and
variable-length data (e.g., medical notes)→ Use checksums to detect software errors from trashing
the database.The OS organizes memory in pages too. We will cover this later.
42
![Page 43: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/43.jpg)
DATA ORGANIZATION
43
Fixed-LengthData Blocks
Index Variable-LengthData Blocks
![Page 44: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/44.jpg)
DATA ORGANIZATION
44
Fixed-LengthData Blocks
Index
Memory Address
Variable-LengthData Blocks
![Page 45: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/45.jpg)
DATA ORGANIZATION
45
Fixed-LengthData Blocks
Index
Memory Address
Variable-LengthData Blocks
![Page 46: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/46.jpg)
DATA ORGANIZATION
46
Fixed-LengthData Blocks
Index
Memory Address
Variable-LengthData Blocks
![Page 47: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/47.jpg)
WHY NOT MMAP?
Memory-map (mmap) a database file into DRAM and let the OS be in charge of swapping data in and out as needed.Use madvise and msync to give hints to the OS about what data is safe to flush.
Notable mmap DBMSs:→ MongoDB (pre WiredTiger)→ MonetDB→ LMDB
47
![Page 48: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/48.jpg)
WHY NOT MMAP?
Using mmap gives up fine-grained control on the contents of memory to the OS.→ Cannot perform non-blocking memory access.→ The "on-disk" representation has to be the same as the
"in-memory" representation.→ The DBMS has no way of knowing what pages are in
memory or not.→ Various mmap-related syscalls are not portable.
A well-written DBMS always knows best.
48
![Page 49: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/49.jpg)
CONCURRENCY CONTROL
Observation: The cost of a txn acquiring a lock is the same as accessing data (since the lock data is also in memory).
In-memory DBMS may want to detect conflicts between txns at a different granularity.→ Fine-grained locking allows for better concurrency but
requires more locks.→ Coarse-grained locking requires fewer locks but limits
the amount of concurrency.
49
![Page 50: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/50.jpg)
CONCURRENCY CONTROL
The DBMS can store locking information about each tuple together with its data.→ This helps with CPU cache locality.→ Mutexes are too slow. Need to use CAS instructions.
Disk-oriented DBMSs: Stalling during disk I/OMemory-oriented DBMSs: New bottleneck is contention caused from txns executing on multiple cores trying access data at the same time.
50
![Page 51: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/51.jpg)
INDEXES
Specialized main-memory indexes (e.g., T-Tree) were proposed in 1980s when cache and memory access speeds were roughly equivalent.But then caches got faster than main memory:→ Memory-optimized indexes performed worse than the
B+trees because they were not cache aware.
Indexes are usually rebuilt in an in-memory DBMS after restart to avoid logging overhead.
51
![Page 52: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/52.jpg)
QUERY PROCESSING
52
SELECT A.id, B.valueFROM A, BWHERE A.id = B.idAND B.value > 100
A B
A.id=B.id
value>100
A.id, B.value
⨝s
p
![Page 53: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/53.jpg)
QUERY PROCESSING
53
Tuple-at-a-time→ Each operator calls next on their child to get
the next tuple to process.
Operator-at-a-time→ Each operator materializes their entire output
for their parent operator.Vector-at-a-time→ Each operator calls next on their child to get
the next chunk of data to process.
SELECT A.id, B.valueFROM A, BWHERE A.id = B.idAND B.value > 100
A B
A.id=B.id
value>100
A.id, B.value
⨝s
p
![Page 54: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/54.jpg)
QUERY PROCESSING
54
Tuple-at-a-time→ Each operator calls next on their child to get
the next tuple to process.
Operator-at-a-time→ Each operator materializes their entire output
for their parent operator.Vector-at-a-time→ Each operator calls next on their child to get
the next chunk of data to process.
SELECT A.id, B.valueFROM A, BWHERE A.id = B.idAND B.value > 100
A B
A.id=B.id
value>100
A.id, B.value
⨝s
p
![Page 55: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/55.jpg)
QUERY PROCESSING
The best strategy for executing a query plan in a DBMS changes when all of the data is already in memory.→ Sequential scans are no longer significantly faster than
random access.
The traditional tuple-at-a-time iterator model is too slow because of function calls.→ This problem is more significant in OLAP DBMSs.
55
![Page 56: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/56.jpg)
LOGGING & RECOVERY
The DBMS still needs a WAL on non-volatile storage since the system could halt at anytime.→ Use group commit to batch log entries and flush them
together to amortize fsync cost.→ May be possible to use more lightweight logging schemes
(e.g., only store redo information, NO-STEAL).
But since there are no "dirty" pages, there is no need to maintain LSNs all throughout the system.
56
![Page 57: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/57.jpg)
LOGGING & RECOVERY
The system also still takes checkpoints to speed up recovery time.Different methods for checkpointing:→ Old idea: Maintain a second copy of the database in
memory that is updated by replaying the WAL.→ Switch to a special “copy-on-write” mode and then write
a dump of the database to disk.→ Fork the DBMS process and then have the child process
write its contents to disk (leveraging virtual memory).
57
![Page 58: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/58.jpg)
LARGER-THAN-MEMORY DATABASES
DRAM is fast, but data is not accessed with the same frequency and in the same manner.→ Hot Data: OLTP Operations (Tweets posted yesterday)→ Cold Data: OLAP Queries (Tweets posted last year)
We will study techniques for how to bring back disk-resident data without slowing down the entire system.
58
![Page 59: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/59.jpg)
NON-VOLATILE MEMORY
Emerging hardware that is able to get almost the same read/write speed as DRAM but with the persistence guarantees of an SSD.→ Also called storage class memory→ Examples: Phase-Change Memory, Memristors
It’s not clear how to build a DBMS to operate on this kind of memory.Again, we’ll cover this topic later.
59
![Page 60: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/60.jpg)
NOTABLE IN-MEMORY DBMSs
Oracle TimesTenDali / DataBlitzAltibaseP*TIMESAP HANAVoltDB / H-Store
60
Microsoft HekatonHarvard SiloTUM HyPerMemSQLIBM DB2 BLUApache Geode
![Page 61: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/61.jpg)
NOTABLE IN-MEMORY DBMSs
Oracle TimesTenDali / DataBlitzAltibaseP*TIMESAP HANAVoltDB / H-Store
61
Microsoft HekatonHarvard SiloTUM HyPerMemSQLIBM DB2 BLUApache Geode
![Page 62: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/62.jpg)
P*TIME
Korean in-memory DBMS from the 2000s.Performance numbers are still impressive.Lots of interesting features:→ Uses differential encoding (XOR) for log records.→ Hybrid storage layouts.→ Support for larger-than-memory databases.Sold to SAP in 2005. Now part of HANA.
62
P*TIME: HIGHLY SCALABLE OLTP DBMS FOR MANAGING UPDATE-INTENSIVE STREAM WORKLOADVLDB, pp. 1033-1044, 2004.
![Page 63: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/63.jpg)
TIMESTEN
Originally SmallBase from HP Labs in 1995.Multi-process, shared memory DBMS.→ Single-version database using two-phase locking.→ Dictionary-encoded columnar compression.
Bought by Oracle in 2005.Can work as a cache in front of Oracle DBMS.
63
ORACLE TIMESTEN: AN IN-MEMORY DATABASE FOR ENTERPRISE APPLICATIONSVLDB, pp. 1033-1044, 2004.
![Page 64: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/64.jpg)
DALI / DATABLITZ
Developed at AT&T Labs in the early 1990s.Multi-process, shared memory storage manager using memory-mapped files.Employed additional safety measures to make sure that erroneous writes to memory do not corrupt the database.→ Meta-data is stored in a non-shared location.→ A page’s checksum is always tested on a read; if the
checksum is invalid, recover page from log.
64
DALI: A HIGH PERFORMANCE MAIN MEMORY STORAGE MANAGERVLDB, pp. 48-59, 1994.
![Page 65: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/65.jpg)
PELOTON DBMS
CMU’s in-memory hybrid relational DBMS→ Latch-free Multi-version concurrency control.→ Latch-free Bw-Tree Index→ LLVM-based Execution Engine→ Tile-based storage manager.→ Multi-threaded architecture.→ Write-Ahead Logging + Checkpoints→ Cascades-style Query Optimizer→ Zone Maps→ PL/pgSQL UDFs (preliminary)Currently supports some of SQL-92.
65
![Page 66: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/66.jpg)
PARTING THOUGHTS
Disk-oriented DBMSs are a relic of the past.→ Most databases fit entirely in DRAM on a single machine.
The world has finally become comfortable with in-memory data storage and processing.
Never use mmap for your DBMS.
66
![Page 67: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/67.jpg)
COURSE LOAD REDUCTION
The frequency of reading reviews is reduced to one review due every two weeks (earlier it was one review due every week).
The final exam (15%) will be a take home assignment. The exam will be long-form questions based on the topics discussed during the entire semester and you will get a week to complete it.
67
![Page 68: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/68.jpg)
DEVELOPMENT ENVIRONMENT
Install Ubuntu 18.04 LTS Linux OS on your laptop (either natively or in a virtual machine).
You will be using this environment for programming assignments and research project.
68
![Page 69: DATABASE SYSTEM IMPLEMENTATION · The system uses an in-memory (volatile) buffer ... A SURVEY OF B-TREE LOCKING TECHNIQUES TODS 2010. LOCKS VS. LATCHES 25 Locks Latches ... →MongoDB(pre](https://reader034.vdocuments.mx/reader034/viewer/2022050408/5f851e1431d09565556874b8/html5/thumbnails/69.jpg)
NEXT CLASS
Storage Models
Reminder: Homework 0 is due on Tuesday Jan 15th. Submit via Gradescope.
69