data analytics using deep learningjarulraj/courses/8803-f19/slides/06-in... · data analytics using...
TRANSCRIPT
![Page 1: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/1.jpg)
DATA ANALYTICS
USING DEEP LEARNING
GT 8803 // FALL 2019 // JOY ARULRAJ
L E C T U R E # 0 6 : D I S K - C E N T R I C A N D I N - M E M O R Y
D A T A B A S E S Y S T E M S
![Page 2: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/2.jpg)
GT 8803 // Fall 2019
a d m i n i s t r i v i a
• Project ideas– List shared on Piazza
– Start looking for team-mates!
– Sign up for discussion slots during office hours
2
![Page 3: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/3.jpg)
GT 8803 // Fall 2019
L A S T C L A S S
• History of DBMSs– In a way though, it really was a history of data
models
• Data Models– Hierarchical data model (tree) (IMS)
– Network data model (graph) (CODASYL)
– Relational data model (tables) (System R, INGRES)
• Overarching theme about all these systems– They were all disk-based DBMSs
3
![Page 4: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/4.jpg)
GT 8803 // Fall 2019
T O D A Y ’ s A G E N D A
• Disk-centric DBMSs
• In-Memory DBMSs
4
![Page 5: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/5.jpg)
GT 8803 // Fall 2018
DISK-CENTRIC
DBMSs
5
![Page 6: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/6.jpg)
GT 8803 // Fall 2019
A N A T O M Y O F A D A T A B A S E S Y S T E M
Connection Manager + Admission Control
Query Parser
Query Optimizer
Query Executor
Lock Manager (Concurrency Control)
Access Methods (or Indexes)
Buffer Pool Manager
Log Manager
Memory Manager + Disk Manager
Networking Manager
6
QueryTransactional
Storage Manager
Query Processor
Shared Utilities
Process Manager
Source: Anatomy of a Database System
![Page 7: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/7.jpg)
GT 8803 // Fall 2019
A N A T O M Y O F A D A T A B A S E S Y S T E M
7
• Process Manager– Manages client connections
• Query Processor– Parse, plan and execute queries on top of storage manager
• Transactional Storage Manager– Knits together buffer management, concurrency control,
logging and recovery
• Shared Utilities– Manage hardware resources across threads
![Page 8: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/8.jpg)
GT 8803 // Fall 2019
T O P I C S
• Implications of availability of large DRAM
chips for database systems– Buffer Management
– Query Processing
– Concurrency Control
– Logging and Recovery
8
![Page 9: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/9.jpg)
GT 8803 // Fall 2019
B A C K G R O U N D
• Much of the history of DBMSs is about dealing
with the limitations of hardware.
• Hardware was much different when the
original DBMSs were designed:– Uniprocessor (single-core CPU)
– RAM was severely limited (few MB).– The database had to be stored on disk.
– Disk is slow. No seriously, I mean really slow.
9
![Page 10: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/10.jpg)
GT 8803 // Fall 2019
B A C K G R O U N D
• But now DRAM capacities are large enough
that most databases can fit in memory.– Structured data sets are smaller (e.g., tables with
numeric data).
– Unstructured data sets are larger (e.g., videos).
• So why not just use a "traditional" disk-
oriented DBMS with a really large cache?
10
![Page 11: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/11.jpg)
GT 8803 // Fall 2018
D I S K - O R I E N T E D D B M S O V E R H E A D
11
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 12: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/12.jpg)
GT 8803 // Fall 2018
D I S K - O R I E N T E D D B M S O V E R H E A D
12
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 13: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/13.jpg)
GT 8803 // Fall 2018
D I S K - O R I E N T E D D B M S O V E R H E A D
13
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
34%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 14: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/14.jpg)
GT 8803 // Fall 2018
D I S K - O R I E N T E D D B M S O V E R H E A D
14
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
14%
34%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 15: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/15.jpg)
GT 8803 // Fall 2018
D I S K - O R I E N T E D D B M S O V E R H E A D
15
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
16%
14%
34%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 16: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/16.jpg)
GT 8803 // Fall 2018
D I S K - O R I E N T E D D B M S O V E R H E A D
16
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
16%
14%
34%
12%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
![Page 17: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/17.jpg)
GT 8803 // Fall 2018
D I S K - O R I E N T E D D B M S O V E R H E A D
17
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
16%
14%
34%
12%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
16%
![Page 18: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/18.jpg)
GT 8803 // Fall 2018
D I S K - O R I E N T E D D B M S O V E R H E A D
18
BUFFER POOL
LATCHING
LOCKING
LOGGING
B-TREE KEYS
REAL WORK
16%
14%
34%
12%
Measured CPU Instructions
OLTP THROUGH THE LOOKING GLASS, AND WHAT WE FOUND THERESIGMOD, pp. 981-992, 2008.
16%
7%
![Page 19: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/19.jpg)
GT 8803 // Fall 2019
b U F F E R M A N A G E M E N T
• The primary storage location of the database
is on non-volatile storage (e.g., SSD).– The database is stored in a file as a collection of
fixed-length blocks called slotted pages on disk.
• The system uses an volatile in-memory buffer
pool to cache blocks fetched from disk.– Its job is to manage the movement of those blocks
back and forth between disk and memory.
19
![Page 20: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/20.jpg)
GT 8803 // Fall 2019
b U F F E R M A N A G E M E N T
• When a query accesses a page, the DBMS
checks to see if that page is already in
memory in a buffer pool– If it’s not, then the DBMS has to retrieve it from disk
and copy it into a free frame in the buffer pool.
– If there are no free frames, then find a page to evict
guided by the page replacement policy.
– If the page being evicted is dirty, then the DBMS has
to write it back to disk to ensure the durability
(ACID) of data.
20
![Page 21: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/21.jpg)
GT 8803 // Fall 2019
b U F F E R M A N A G E M E N T
• Page replacement policy is a differentiating
factor between open-source and commercial
DBMSs.– What kind of data does it contain?
– Is the page dirty?
– How likely is the page to be accessed in the near
future?
– Examples: LRU, LFU, CLOCK, ARC
21
![Page 22: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/22.jpg)
GT 8803 // Fall 2019
b U F F E R M A N A G E M E N T
• Once the page is in memory, the DBMS
translates any on-disk addresses to their in-
memory addresses.
(Page Identifier) (Page Pointer)
[#100] [0x5050]
22
![Page 23: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/23.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
23
Buffer Pool
page6
page4
Index Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 24: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/24.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
24
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 25: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/25.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
25
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 26: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/26.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
26
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 27: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/27.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
27
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 28: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/28.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
28
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 29: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/29.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
29
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page2
![Page 30: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/30.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
30
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
![Page 31: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/31.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
31
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page1
![Page 32: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/32.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
32
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page1
![Page 33: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/33.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
33
Buffer Pool
page6
page4
Index
Page Id + Slot #
Database (On-Disk)
Slotted Pages
Page Table
page0
page1
page2
page1
![Page 34: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/34.jpg)
GT 8803 // Fall 2019
b U F F E R M A N A G E M E N T
• Every tuple access has to go through the
buffer pool manager regardless of whether
that data will always be in memory.– Always have to translate a tuple’s record id to its
memory location.
– Worker thread has to pin pages that it needs to
make sure that they are not swapped to disk.
34
![Page 35: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/35.jpg)
GT 8803 // Fall 2019
B U F F E R M A N A G E M E N T
35
![Page 36: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/36.jpg)
GT 8803 // Fall 2019
B U F F E R M A N A G E M E N T
• Q: What do we gain by managing an in-
memory buffer?– A: Accelerate query processing by storing
frequently-accessed pages in fast memory
• Q: Can we “learn” an optimal page
replacement policy?– A: Recent paper from Google on learning memory
accesses based on LSTM models.
36
![Page 37: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/37.jpg)
GT 8803 // Fall 2019
B U F F E R M A N A G E M E N T
• Q: What do we gain by managing an in-
memory buffer?– A: Accelerate query processing by storing
frequently-accessed pages in fast memory
• Q: Can we “learn” an optimal page
replacement policy?– A: Recent paper from Google on learning memory
accesses based on LSTM models.
37
![Page 38: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/38.jpg)
GT 8803 // Fall 2019
B U F F E R M A N A G E M E N T
• Q: What do we gain by managing an in-
memory buffer?– A: Accelerate query processing by storing
frequently-accessed pages in fast memory
• Q: Can we “learn” an optimal page
replacement policy?– A: Recent paper from Google on learning memory
accesses based on LSTM models.
38
![Page 39: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/39.jpg)
GT 8803 // Fall 2018
Q U E R Y P R O C E S S I N G
39
Tuple-at-a-time→ Each operator calls next on their child to
get the next tuple to process.
Operator-at-a-time→ Each operator materializes their entire
output for their parent operator.
Vector-at-a-time→ Each operator calls next on their child to
get the next chunk of data to process.
SELECT A.id, B.valueFROM A, B
WHERE A.id = B.idAND B.value > 100
A B
A.id=B.id
value>100
A.id, B.value
⨝
s
p
![Page 40: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/40.jpg)
GT 8803 // Fall 2019
Q U E R Y P R O C E S S I N G
• The best strategy for executing a query plan
in a disk-centric DBMS– Sequential scans over a table are much faster than
random accesses
• The traditional tuple-at-a-time iterator
model works well– Because output of an operator will not fit in limited
memory
40
![Page 41: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/41.jpg)
GT 8803 // Fall 2019
C O N C U R R E N C Y C O N T R O L
• In a disk-oriented DBMS, the systems assumes
that a txn could stall at any time when it tries
to access data that is not in memory.
41
![Page 42: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/42.jpg)
GT 8803 // Fall 2019
C O N C U R R E N C Y C O N T R O L
• Execute other txns at the same time so that if
one txn stalls then others can keep running.– This is not because the DBMS is trying to use all
cores in the CPU (still focusing on single-core CPUs)
– We do this to let system make forward progress by
executing another txn while the current txn is
waiting for data to be fetched from disk
42
![Page 43: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/43.jpg)
GT 8803 // Fall 2019
C O N C U R R E N C Y C O N T R O L
• Concurrency control policy– Responsible for deciding how to interleave
operations of concurrent transactions in such a way
that it appears as if they are running serially
– This property is referred to as serializability of
transactions
43
![Page 44: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/44.jpg)
GT 8803 // Fall 2019
C O N C U R R E N C Y C O N T R O L
• Concurrency control policy– DBMS has to set locks and latches to ensure the
highest level of isolation (ACID) between
transactions
– Locks are stored in a separate data structure (lock
table) to avoid being swapped to disk.
44
![Page 45: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/45.jpg)
GT 8803 // Fall 2019
L O G G I N G & R E C O V E R Y
• This protocol helps ensure the atomicity and
durability properties (ACID)– Durability: Changes made by committed
transactions must be present in the database after
recovering from a power failure.
– Atomicity: Changes made by uncommitted (in-
progress/aborted) transactions must not be present
in the database after recovering from a power
failure.
45
![Page 46: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/46.jpg)
GT 8803 // Fall 2019
L O G G I N G & R E C O V E R Y
• DBMSs use STEAL and NO-FORCE buffer pool
management policies.– STEAL: DBMS can flush pages dirtied by
uncommitted transactions to disk.
– NO-FORCE: DBMS is not required to flush all pages
dirtied by committed transactions to disk.
– So all page modifications have to be flushed to the
write-ahead log (WAL) before a txn can commit
46
![Page 47: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/47.jpg)
GT 8803 // Fall 2019
L O G G I N G & R E C O V E R Y
• Each log entry contains the before and after
images of modified tuples.– STEAL: Modifications made by uncommitted
transactions that are flushed to disk have to rolled
back.
– NO-FORCE: Modifications made by committed
transactions might not have been flushed to disk.
47
![Page 48: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/48.jpg)
GT 8803 // Fall 2019
L O G G I N G & R E C O V E R Y
• Each log entry contains the before and after
images of modified tuples.– Recording the before and after images in the log is
critical to ensuring atomicity and durability
– Lots of work to keep track of log sequence numbers
(LSNs) all throughout the DBMS.
48
![Page 49: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/49.jpg)
GT 8803 // Fall 2019
L O G G I N G & R E C O V E R Y
49
![Page 50: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/50.jpg)
GT 8803 // Fall 2019
L O G G I N G & R E C O V E R Y
• Q: What would happen if we use a NO-STEAL
policy?– A: Cannot support large transactions that make
changes larger than the buffer pool
• Q: What would happen if we use a FORCE
policy?– A: Performance would drop by orders of
magnitude since need to randomly write to disk all
the time.
50
![Page 51: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/51.jpg)
GT 8803 // Fall 2019
L O G G I N G & R E C O V E R Y
• Q: What would happen if we use a NO-STEAL
policy?– A: Cannot support large transactions that make
changes larger than the buffer pool
• Q: What would happen if we use a FORCE
policy?– A: Performance would drop by orders of
magnitude since need to randomly write to disk all
the time.
51
![Page 52: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/52.jpg)
GT 8803 // Fall 2019
L O G G I N G & R E C O V E R Y
• Q: What would happen if we use a NO-STEAL
policy?– A: Cannot support large transactions that make
changes larger than the buffer pool
• Q: What would happen if we use a FORCE
policy?– A: Performance would drop by orders of
magnitude since need to randomly write to disk all
the time.
52
![Page 53: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/53.jpg)
GT 8803 // Fall 2019
T A K E A W A Y S
• Disk-oriented DBMSs do a lot of extra stuff
because they are predicated on the
assumption that data has to reside on disk
• In-memory DBMSs maximize performance by
optimizing these protocols and algorithms
53
![Page 54: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/54.jpg)
GT 8803 // Fall 2018
IN-MEMORY
DBMSs
54
![Page 55: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/55.jpg)
GT 8803 // Fall 2019
I N - M E M O R Y D B M S S
• Assume that the primary storage location of
the database is permanently in memory.
• Early ideas proposed in the 1980s but it is
now feasible because DRAM prices are low
and capacities are high.
55
![Page 56: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/56.jpg)
GT 8803 // Fall 2019
B O T T L E N E C K S
• If I/O is no longer the slowest resource, much
of the DBMS’s architecture will have to
change account for other bottlenecks:– Locking/latching
– Cache misses
– Predicate evaluations
– Data movement & copying
– Networking (between application & DBMS)
56
![Page 57: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/57.jpg)
GT 8803 // Fall 2018
S T O R A G E A C C E S S L A T E N C I E S
57
L3 DRAM SSD HDD
Read Latency ~20 ns 60 ns 25,000 ns 10,000,000 ns
Write Latency ~20 ns 60 ns 300,000 ns 10,000,000 ns
LET’S TALK ABOUT STORAGE & RECOVERY METHODS FOR NON-VOLATILE MEMORY DATABASE SYSTEMSSIGMOD, pp. 707-722, 2015.
![Page 58: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/58.jpg)
GT 8803 // Fall 2018
S T O R A G E A C C E S S L A T E N C I E S
58
Jim Gray’s analogy:→Reading from L3 cache: Reading a book on a table
→Reading from HDD: Flying to Pluto to read that book
Because everything fits in DRAM, we can do
more sophisticated things in software.
![Page 59: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/59.jpg)
GT 8803 // Fall 2019
b U F F E R M A N A G E M E N T
• An in-memory DBMS does not need to store
the database in slotted pages but it will still
organize tuples in blocks:– Direct memory pointers vs. tuple identifiers
– Separate pools for fixed-length (e.g., numeric data)
and variable-length data (e.g., images)
– Use checksums to detect software errors from
trashing the database.
59
![Page 60: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/60.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
60
Fixed-LengthData Blocks
Index Variable-LengthData Blocks
![Page 61: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/61.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
61
Fixed-LengthData Blocks
Index
Memory Address
Variable-LengthData Blocks
![Page 62: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/62.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
62
Fixed-LengthData Blocks
Index
Memory Address
Variable-LengthData Blocks
![Page 63: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/63.jpg)
GT 8803 // Fall 2018
b U F F E R M A N A G E M E N T
63
Fixed-LengthData Blocks
Index
Memory Address
Variable-LengthData Blocks
![Page 64: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/64.jpg)
GT 8803 // Fall 2019
b U F F E R M A N A G E M E N T
• DRAM is fast, but data is not accessed with
the same frequency and in the same manner.– Hot Data: OLTP Operations (Tweets posted
yesterday)
– Cold Data: OLAP Queries (Tweets posted last year)
• We will study techniques for how to bring
back disk-resident data without slowing
down the entire system.
64
![Page 65: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/65.jpg)
GT 8803 // Fall 2018
Q U E R Y P R O C E S S I N G
65
SELECT A.id, B.valueFROM A, B
WHERE A.id = B.idAND B.value > 100
A B
A.id=B.id
value>100
A.id, B.value
⨝
s
p
![Page 66: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/66.jpg)
GT 8803 // Fall 2018
Q U E R Y P R O C E S S I N G
66
Tuple-at-a-time→ Each operator calls next on their child to
get the next tuple to process.
Operator-at-a-time→ Each operator materializes their entire
output for their parent operator.
Vector-at-a-time→ Each operator calls next on their child to
get the next chunk of data to process.
SELECT A.id, B.valueFROM A, B
WHERE A.id = B.idAND B.value > 100
A B
A.id=B.id
value>100
A.id, B.value
⨝
s
p
![Page 67: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/67.jpg)
GT 8803 // Fall 2018
Q U E R Y P R O C E S S I N G
67
Tuple-at-a-time→ Each operator calls next on their child to
get the next tuple to process.
Operator-at-a-time→ Each operator materializes their entire
output for their parent operator.
Vector-at-a-time→ Each operator calls next on their child to
get the next chunk of data to process.
SELECT A.id, B.valueFROM A, B
WHERE A.id = B.idAND B.value > 100
A B
A.id=B.id
value>100
A.id, B.value
⨝
s
p
![Page 68: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/68.jpg)
GT 8803 // Fall 2019
Q U E R Y P R O C E S S I N G
• The best strategy for executing a query plan
in a DBMS changes when all of the data is
already in memory.– Sequential scans are no longer significantly faster
than random access.
• The traditional tuple-at-a-time iterator
model is too slow because of function calls.– This problem is more significant in OLAP DBMSs.
68
![Page 69: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/69.jpg)
GT 8803 // Fall 2019
Q U E R Y P R O C E S S I N G
69
![Page 70: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/70.jpg)
GT 8803 // Fall 2019
Q U E R Y P R O C E S S I N G
• Q: Query processing in in-memory systems:
sequential scans or random accesses?– A: Sequential scans are no longer significantly
faster than random access.
• Q: Will the traditional tuple-at-a-time iterator
work well now?– A: No, too slow because of function calls (virtual
table lookups).
70
![Page 71: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/71.jpg)
GT 8803 // Fall 2019
Q U E R Y P R O C E S S I N G
• Q: Query processing in in-memory systems:
sequential scans or random accesses?– A: Sequential scans are no longer significantly
faster than random access.
• Q: Will the traditional tuple-at-a-time iterator
work well now?– A: No, too slow because of function calls (virtual
table lookups).
71
![Page 72: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/72.jpg)
GT 8803 // Fall 2019
Q U E R Y P R O C E S S I N G
• Q: Query processing in in-memory systems:
sequential scans or random accesses?– A: Sequential scans are no longer significantly
faster than random access.
• Q: Will the traditional tuple-at-a-time iterator
work well now?– A: No, too slow because of function calls (virtual
table lookups).
72
![Page 73: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/73.jpg)
GT 8803 // Fall 2019
C O N C U R R E N C Y C O N T R O L
• Observation: The cost of a txn acquiring a lock
is the same as accessing data (since the lock
data is also in memory).
• In-memory DBMS may want to detect
conflicts at a different granularity.– Fine-grained locking allows for better concurrency
but requires more locks.
– Coarse-grained locking requires fewer locks but limits the amount of concurrency.
73
![Page 74: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/74.jpg)
GT 8803 // Fall 2019
C O N C U R R E N C Y C O N T R O L
• The DBMS can store locking information
about each tuple together with its data.– This helps with CPU cache locality.
– Mutexes are too slow. Need to use CAS instructions.
74
![Page 75: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/75.jpg)
GT 8803 // Fall 2019
C O N C U R R E N C Y C O N T R O L
• Disk-oriented DBMSs– Stalling during disk I/O
• Memory-oriented DBMSs– New bottleneck is contention caused from txns
executing on multiple cores trying to access data
at the same time.
75
![Page 76: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/76.jpg)
GT 8803 // Fall 2019
L O G G I N G & R E C O V E R Y
• The DBMS still needs a WAL on disk since the
system could halt at anytime.– Use group commit to batch log entries and flush
them together to amortize fsync cost.
– May be possible to use more lightweight logging
schemes (e.g., only store redo information, NO-
STEAL).
– But since there are no "dirty" pages, there is no
need to maintain LSNs all throughout the system.
76
![Page 77: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/77.jpg)
GT 8803 // Fall 2019
L O G G I N G & R E C O V E R Y
• The system also still takes checkpoints to
speed up recovery time.
• Different methods for check-pointing:– Old idea: Maintain a second copy of the database in
memory that is updated by replaying the WAL.
– Switch to a special “copy-on-write” mode and then
write a dump of the database to disk.
– Fork DBMS process and then have the child process
write its contents to disk (using virtual memory).
77
![Page 78: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/78.jpg)
GT 8803 // Fall 2019
S U M M A R Y
• Disk-oriented DBMSs are a relic of the past.– Most structured databases fit entirely in DRAM on a
single machine.
• The world has finally become comfortable
with in-memory data storage and processing.
78
![Page 79: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/79.jpg)
GT 8803 // Fall 2019
A N A T O M Y O F A D A T A B A S E S Y S T E M
Connection Manager + Admission Control
Query Parser
Query Optimizer
Query Executor
Lock Manager (Concurrency Control)
Access Methods (or Indexes)
Buffer Pool Manager
Log Manager
Memory Manager + Disk Manager
Networking Manager
79
QueryTransactional
Storage Manager
Query Processor
Shared Utilities
Process Manager
Source: Anatomy of a Database System
![Page 80: DATA ANALYTICS USING DEEP LEARNINGjarulraj/courses/8803-f19/slides/06-in... · DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ LECTURE #06: DISK- CENTRIC AND](https://reader035.vdocuments.mx/reader035/viewer/2022081406/5f0d40a77e708231d4396ced/html5/thumbnails/80.jpg)
GT 8803 // Fall 2019
N E X T L E C T U R E
• Data Storage
• Assigned Reading– BlazeIt: Fast Exploratory Video Queries using
Neural Networks
80