bloat and fragmentation in postgresql · •manage available space in tables or indexes •2-level...
TRANSCRIPT
![Page 1: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/1.jpg)
Copyright©2018 NTT Corp. All Rights Reserved.
Bloat and Fragmentation in PostgreSQL
NTT Open Source Center
Masahiko Sawada
PGConf.ASIA 2018
![Page 2: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/2.jpg)
2Copyright©2018 NTT Corp. All Rights Reserved.
• Heap and Index
• Fragmentation and Bloat
• VACUUM 、 HOT(Heap Only Tuple)
• Clustered Tables
• Future
Index
![Page 3: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/3.jpg)
3Copyright©2018 NTT Corp. All Rights Reserved.
• Almost all objects such as tables and indexes that PostgreSQL manages consist of pages
• Page size is 8kB by default• Changing by the configure script
• Block (on disk) = Page (on memory)
Pages
![Page 4: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/4.jpg)
4Copyright©2018 NTT Corp. All Rights Reserved.
Page Layouts (Heaps and Indexes)
Page Header
ItemID ItemIDItemIDItemID
Special Area
TupleTuple
TupleTuple
![Page 5: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/5.jpg)
5Copyright©2018 NTT Corp. All Rights Reserved.
• Combination of block number and offset number• TID = (123, 86)
• Unique with in a table
TID - Tuple ID
=# SELECT ctid, c FROM test ORDER BY c LIMIT 10; ctid | c ------------------+---- (88495,130) | 1 (0,1) | 1 (88495,131) | 2 (44247,179) | 2 (0,2) | 2 (88495,132) | 3 (44247,180) | 3 (0,3) | 3 (88495,133) | 4 (44247,181) | 4(10 rows)
![Page 6: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/6.jpg)
6Copyright©2018 NTT Corp. All Rights Reserved.
• MVCC = Multi Version Concurrency Control• Concurrency control using multiple versions of row
• Reads don’t block writes
• PostgreSQL’s approach:
• INSERT and UPDATE create the new version of the row
• UPDATE and DELETE don’t immediately remove the old version of the row
• Tuple visibility is determined by a ‘snapshot’
• Other DBMS might use UNDO log instead
• Old versions of rows have to be removed eventually• VACUUM
MVCC
![Page 7: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/7.jpg)
7Copyright©2018 NTT Corp. All Rights Reserved.
• Heap Table≒• User created table, materialized view as well as system catalogs use
heap
• Optimized sequential access and accessed by index lookup
• Heaps have one free space map and one visibility map
Heap
![Page 8: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/8.jpg)
8Copyright©2018 NTT Corp. All Rights Reserved.
• Tracks visibility information per blocks
• 1234_vm
• 2 bits / block• all-visible bit : Used for Index Only Scans and vacuums
• all-frozen bit : Used for aggressive vacuums
• Isn’t relevant with bloating
Visibility Map
![Page 9: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/9.jpg)
9Copyright©2018 NTT Corp. All Rights Reserved.
• Manage available space in tables or indexes
• 2-level binary tree• Bottom level store the free space on each page
• Upper levels aggregate information from the lower levels
• Record free space at a granularity of 1/256th of a page
• 1234_fsm
Free Space Map
![Page 10: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/10.jpg)
10Copyright©2018 NTT Corp. All Rights Reserved.
• Fast lookup a row in the table
• Various types of index• B-tree
• Hash
• GIN
• Gist
• SP-Gist
• BRIN
• Bloom
• Indexes eventually points to somewhere in heap
Index
![Page 11: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/11.jpg)
11Copyright©2018 NTT Corp. All Rights Reserved.
• One of the most popular type of index
• PostgreSQL has B+Tree
• One B+tree node is one page
• Leaf nodes have TIDs pointing to the heap
Need new version index tuple when a new version is created
B-tree
![Page 12: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/12.jpg)
12Copyright©2018 NTT Corp. All Rights Reserved.
FRAGMENTATION AND BLOAT
![Page 13: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/13.jpg)
13Copyright©2018 NTT Corp. All Rights Reserved.
• Fragmentation• As pages split to make room to added to a page, there might be
excessive free space left on the pages
• Bloat• Tables or indexes gets bigger than its actual size
• Less utilization efficiency of each pages
Fragmentation and Bloat
![Page 14: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/14.jpg)
14Copyright©2018 NTT Corp. All Rights Reserved.
• Garbage collection
• Doesn’t block DELETE, INSERT and UPDATE (of course and SELECT)
• VACUUM command
• VACUUM FULL is quite different feature
• Batch operation
Vacuum
![Page 15: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/15.jpg)
15Copyright©2018 NTT Corp. All Rights Reserved.
1. Scan heap and collect dead tuples1. Scanning page can skip using its visibility map
2. Recover dead space in all indexes
3. Recover dead space in heap
4. Loop 1 to 3 until the end of table
5. Truncate the tail of table if possible
Vacuum Processing
![Page 16: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/16.jpg)
16Copyright©2018 NTT Corp. All Rights Reserved.
• Always start at the beginning of table
• Always visit all indexes (at multiple times)
Batch Operation
![Page 17: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/17.jpg)
17Copyright©2018 NTT Corp. All Rights Reserved.
• Threshold-based, automatically vacuum execution
• Could be cancelled by a concurrent conflicting operation• e.g. ALTER TABLE 、 TRUNCATE
• Vacuum delay is enabled by default
Auto Vacuum
![Page 18: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/18.jpg)
18Copyright©2018 NTT Corp. All Rights Reserved.
• Cost-based vacuum delay
• Sleep vacuum_cost_delay(10 msec) time whenever the vacuum cost goes above vacuum_cost_limit(200)
• Cost configurations• vacuum_cost_page_hit (1)
• vacuum_cost_page_miss (10)
• vacuum_cost_page_dirty (20)
• By default, vacuum processes 16GB/h at maximum in case where every pages don’t hit
Vacuum Delay
![Page 19: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/19.jpg)
19Copyright©2018 NTT Corp. All Rights Reserved.
• Avoid creating new version of index entries
• Depending on updated column being updated
• Chain heap tuples
• Heap tuple chain is pruned opportunistically
HOT(Heap Only Tuple) Update
![Page 20: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/20.jpg)
20Copyright©2018 NTT Corp. All Rights Reserved.
HOT Update & HOT pruning
Page Header
(0,1) (0,3)(0,2)
TupleTuple
Tuple
(0,1) (0,2) (0,3)
Index
Heap
![Page 21: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/21.jpg)
21Copyright©2018 NTT Corp. All Rights Reserved.
HOT Update & HOT pruning
Page Header
(0,1) (0,3)(0,2)
TupleTuple
Tuple
(0,1) (0,2) (0,3)
Index
Heap
1. Update TID = (0,2) → (0,4)
(0,4)
Tuple
![Page 22: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/22.jpg)
22Copyright©2018 NTT Corp. All Rights Reserved.
HOT Update & HOT pruning
Page Header
(0,1) (0,3)(0,2)
TupleTuple
Tuple
(0,1) (0,2) (0,3)
Index
Heap
2. Update TID = (0,4) → (0,5)
(0,4)
Tuple
(0,5)
Tuple
![Page 23: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/23.jpg)
23Copyright©2018 NTT Corp. All Rights Reserved.
HOT Update & HOT pruning
Page Header
(0,1) (0,3)REDIRECTED
TupleTuple
(0,1) (0,2) (0,3)
Index
Heap
3. VACUUM
UNUSED (0,5)
Tuple
![Page 24: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/24.jpg)
24Copyright©2018 NTT Corp. All Rights Reserved.
• Two conditions to use HOT updating
Enough free space in the same page
No Indexed column is not updated
• Perform HOT-pruning when the page utilization goes above 90%
• IMO, the most important feature to prevent the table bloat
Using HOT Update
![Page 25: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/25.jpg)
25Copyright©2018 NTT Corp. All Rights Reserved.
• Fragmentation• Free space map is not up-to-date
• Shortage of vacuum
• Concurrent long-transaction
Causes of Bloat
![Page 26: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/26.jpg)
26Copyright©2018 NTT Corp. All Rights Reserved.
• Garbage collection speed < Generating garbage speed
• Solution : making vacuums run more faster• Decrease vacuum delays
• Increase maintenance_work_mem• Ideally perform the index vacuum only once
Shortage of Vacuum
![Page 27: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/27.jpg)
27Copyright©2018 NTT Corp. All Rights Reserved.
• “Due to long-running transaction dead spaces are not reclaimed” by a user
• That’s correct in a sense but this is actually inaccurate:
Vacuum doesn’t remove row if there is a concurrent snapshot that can see it, even for a one snapshot!
This also means these row is NOT dead yet
• A row is dead when no one can see it
Vacuum and Snapshot
![Page 28: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/28.jpg)
28Copyright©2018 NTT Corp. All Rights Reserved.
CLUSTERD TABLES
![Page 29: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/29.jpg)
29Copyright©2018 NTT Corp. All Rights Reserved.
• An order of value on a column matches physical block order
• pg_stats.correlation
• CLUSTER command
Clustered Table
![Page 30: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/30.jpg)
30Copyright©2018 NTT Corp. All Rights Reserved.
=# SELECT * FROM tt LIMIT 10; a | b | c -----+--------+-------- 1 | 9988 | 100000 2 | 176 | 99999 3 | 1066 | 99998 4 | 1980 | 99997 5 | 2966 | 99996 6 | 5732 | 99995 7 | 1751 | 99994 8 | 3813 | 99993 9 | 3031 | 99992 10 | 5332 | 99991(10 rows)
=# SELECT attname, correlation FROM pg_stats WHERE tablename = 'tt'; attname | correlation -------------+------------- a | 1 b | 0.00493127 c | -1(3 rows)
Correlations
![Page 31: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/31.jpg)
31Copyright©2018 NTT Corp. All Rights Reserved.
Performance Impact
=# EXPLAIN (buffers on, analyze on) SELECT * FROM tt WHERE a BETWEEN 200 AND 1000 LIMIT 1000; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Limit (cost=0.29..38.07 rows=889 width=12) (actual time=0.063..1.153 rows=801 loops=1) Buffers: shared hit=9 -> Index Scan using tt_a on tt (cost=0.29..38.07 rows=889 width=12) (actual time=0.060..0.942 rows=801 loops=1) Index Cond: ((a >= 200) AND (a <= 1000)) Buffers: shared hit=9 Planning Time: 0.388 ms Execution Time: 1.380 ms(7 rows)
=# EXPLAIN (buffers on, analyze on) SELECT * FROM tt WHERE b BETWEEN 200 AND 1000 LIMIT 1000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.29..306.76 rows=1000 width=12) (actual time=0.052..3.176 rows=1000 loops=1) Buffers: shared hit=992 -> Index Scan using tt_b on tt (cost=0.29..2435.18 rows=7945 width=12) (actual time=0.048..2.797 rows=1000 loops=1) Index Cond: ((b >= 200) AND (b <= 1000)) Buffers: shared hit=992 Planning Time: 0.749 ms Execution Time: 3.493 ms(7 rows)
![Page 32: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/32.jpg)
32Copyright©2018 NTT Corp. All Rights Reserved.
FUTURE
![Page 33: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/33.jpg)
33Copyright©2018 NTT Corp. All Rights Reserved.
• More faster• Parallelism
• Eager vacuum (retail index deletion)
• Non-batch operation
The Future of Vacuum
![Page 34: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/34.jpg)
34Copyright©2018 NTT Corp. All Rights Reserved.
• Pluggable Storage Engine
• zheap - an UNDO log based new storage engine
Prevent bloating by UPDATE
Good bye Vacuum ...?
![Page 35: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/35.jpg)
35Copyright©2018 NTT Corp. All Rights Reserved.
• PostgreSQL creates multiple versions of rows within the same table
• Has to eventually get rid of the unnecessary row versions
• Vacuum and auto vacuum• Batched garbage collection
• HOT pruning• Opportunistic garbage collection
• Clustered Table
• Future• Parallelism and eager vacuum
• zheap
Conclusion
![Page 36: Bloat and Fragmentation in PostgreSQL · •Manage available space in tables or indexes •2-level binary tree •Bottom level store the free space on each page •Upper levels aggregate](https://reader033.vdocuments.mx/reader033/viewer/2022041414/5e19ebc51c1f6f223e5bec08/html5/thumbnails/36.jpg)
36Copyright©2018 NTT Corp. All Rights Reserved.
THANK YOU!