[pgday.seoul 2017] 3. postgresql wal buffers, clog buffers deep dive - 이근오
TRANSCRIPT
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
PostgreSQL
WAL Buffers, Clog Buffers Deep Dive
Version 9.4, 9.6
엑셈 | 연구컨텐츠팀
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
Memory Architecture
WAL Buffer, XLOG file
CLOG Buffer
INDEX
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
Memory Architecture
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
Client ProcessesPostgres Instance
Server Processes
System Memory
Utility Process
Storage Manager Database Cluster
Client Application
ClientInterfaceLibrary(libpg)
Postmaster(Daemon/Listener)
PostgresServer
(backend)
Shared Buffer
PG Shared Memory
WAL Buffer
CLOG Buffer
Lock Space
Other Buffers
Buffer Manager
DiskManager
PageManager
Semaphore & Shared Memory
FileManager
Lock Manager
Sub Directory
ConfigureFile
LockFile
WAL Receiver
WAL Sender
Archiver
Stats Collector
SysLogger
BGWriter
WALWriter
AutovacuumLauncher
OS Cache
PerBackend Memory
• maintenance_work_mem• temp_buffer• work_mem• catalog_cache• optimizer/executor
PostgreSQL Architecture
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
WAL Internal (version 9.6)
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
WAL(Write Ahead Log)
정의 데이터베이스의 데이터 파일에 대한 모든 변경 기록(트랜잭션 로그)을 보관함
사용 목적 서버가 중지되었을 때 체크포인트 작업이 되지 않아 데이터 파일에 적용하지 못한 경우,
이 로그에서 읽어서 그대로 다시 실행하여 서버를 안전하게 복구
특징 디스크 쓰기의 횟수를 줄여, 성능을 향상시킴
동기쓰기 : 데이터가 물리적 디스크에 기록될 때까지 처리를 기다림
(트랜잭션 로그) I/O 작업을 기다리면 처리 속도가 떨어짐.
비동기 쓰기 : 디스크에 대한 쓰기 요청만 하고 버퍼에 기록한 후 다음 번에
처리함, 결과를 기다리지 않음 (테이블, 인덱스 등의 데이터)
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
WAL segment files의 구조
• segment 파일 크기 16M
• 로그 파일로 분할된 segment의 개수 64개
(wal_keep_segments)
-> 논리 로그 파일의 크기 64 * 16 =
1024MB = 1GB
• max_wal_size (1GB, 64 files)
• min_wal_size (80MB, 5 files)
source: http://blog.163.com/li_hx/blog/static/18399141320117984154925
Header Header Header
Record 1
…
Record K
Header Header Header
Record 1
…
Record K
Header Header Header
Record 1
…
Record K
총 2048 페이지
XLogLongPageHeaderData
XLogPageHeaderData
XLogRecord + XLogRecData
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
WAL segment의 구조
source: http://www.interdb.jp/pg/pgsql09.html#_9.4.2
00000001 00000000 00000001
00000001 00000000 00000002
00000001 00000000 000000FF
00000001 00000001 00000000
…
00000001 00000001 000000FF
00000001 00000002 00000000
000000010000000100000000
timelineld
00000001 0000000000000001
Logical ID WAL File #
00000001 00000001 00000000
transaction log (timelineld=1)0x00000000/00000000 0xFFFFFFFF/FFFFFFFF
16(Mbyte)
000000010000000100000000 …… 00000001FFFFFFFF000000FF
timelineld
LSN=0x00000001/00002D3E
000000010000000000000001…0000000100000000000000FF
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
XLOG record XLOG record XLOG record XLOG record
8192(byte) 8192(byte)
Xlog Record
XLog Record Data
Header Data
XLogLongPageHeaderData XLogPageHeaderData
000000010000000100000000
16(Mbyte)
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
XLogLongPage HeaderData
XLogPageHeaderData
XLog record
XLog RecordHeader
XLog record data8KB
XLogPageHeaderData
std standard header fields
uint64 xlp_sysid system identifier from pg_control
uint32 xlp_seg_size just as a cross-check
uint32 xlp_xlog_blcksz just as a cross-check
uint16 xlp_magic magic value for correctness checks
uint16 xlp_info flag bits
TimeLineID xlp_tli TimeLineID of first record on page
XLogRecPtr xlp_pageaddr XLOG address of this page
uint32 xlp_rem_len total len of remaining data for record
XLogPageHeaderData
XLogLongPageHeaderData
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
data portiongeneral header portion
header part data part
XLogRecord
XLogRecordBlockHeaderXLogRecordBlockHeader
(Short/Long)
2 N1
XLogRecordBlockCompressHeader
XLogRecordBlockImageHeader optional
Block data
blockdata1
blockdata2
blockdataN
main data
변경된 이유: 9.4 버전까지 XLOG record에 대한 일정한 포맷이 없었음.
각 리소스 매니저에 의해서 각각의 포맷이 정의됨.
소스코드를 유지하는데 어려움이 증가함.
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
XLogRecord pd_lsn …. 1 2 Tuple B Tuple A xl_heap_insert
backup block (block data 1)
XLogRecordDataHeaderShort
XLogRecordBlockImageHeader
XLogRecordBlockHeader
main data
32524 2
XLogRecord BkpBlock pd_lsn … 1 2 header Tuple B data Tuple A
XLog Record data
32 24
~ 9.4
9.5 ~
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
XLogRecord xl_heap_header Tuple B data xl_heap_insert
XLOG record data
24 20 2 5 3
block data 1 main data
XLogRecordDataHeaderShort
XLogRecordBlockHeader
XLogRecord xl_heap_insert xl_heap_header Tuple B data
XLOG record data
32 24 6
~ 9.4
9.5 ~
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
XLogRecord CheckPoint
XLOG record data
24 2
main data
XLogRecordDataHeaderShort
80
XLogRecord CheckPoint
32 72
~ 9.4 9.5 ~
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
pg_xlogdump
rmgr: Heap len (rec/tot): 3/ 66, tx: 118903881, lsn:
159/D4000098, prev 159/D4000028, desc: INSERT+INIT off 1, blkref #0: rel
1663/13323/22530 blk 0
postgres=# insert into t6 values ('a');
Item Description
rmgr
리소스 매니저
0 XLOG
1 Transaction
2 Storage
3 CLOG
4 Database
5 Tablespace
6 MultiXact
7 RelMap
8 Standby
9 Heap2
10 Heap
11 Btree
12 Hash
13 Gin
14 Gist
15 Sequence
16 SPGist
len (rec)WAL 레코드 헤더 및 백업 블록을 포함하지 않은 WAL 레코드 길이
-> xl_heap_insert/delete/update
len (tot) WAL 레코드의 총 길이
tx 트랜잭션 ID
lsn logical ID / WAL segment number + block offset
prev 바로 이전의 WAL 레코드 위치 (previous lsn)
desc• 트랜잭션 정보 (insert, delete, update, truncate …)• relation의 정보 (tablespace/database/relfilenode)
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
rmgr: Heap len (rec/tot): 3/66, tx: 118903881, lsn: 159/D4000098,
prev: 159/D4000028, desc: INSERT+INIT off 1, blkref #0: rel 1663/13323/22530
blk
postgres=# insert into t6 values ('a');
000090 00 00 00 00 00 00 00 00 42 00 00 00 49 54 16 07 >... . . . . .B.. . IT. .<
0000a0 28 00 00 d4 59 01 00 00 80 0a 00 00 68 ed fe 57 >(...Y.......h..W<
0000b0 00 60 11 00 7f 06 00 00 0b 34 00 00 02 58 00 00 >.`.......4...X..<
0000c0 00 00 00 00 ff 03 01 00 02 08 18 00 17 61 20 20 >... . . .. . . . . . .a <
0000d0 20 20 20 20 20 20 20 01 00 00 00 00 00 00 00 00 > .........<
000000 59 01 00 00 e0 00 00 d4 00 00 00 00 1c 00 d8 1f >Y. . . . . . . . . . . . . . .<
000010 00 20 04 20 00 00 00 00 d8 9f 46 00 00 00 00 00 >. . . . . .. .F.. . . .<
000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >. . . . . . . . . . . . . . . . <
*
001fd0 00 00 00 00 00 00 00 00 49 54 16 07 00 00 00 00 >. . . . . . . . IT . . . . . .<
001fe0 00 00 00 00 00 00 00 00 01 00 01 00 02 08 18 00 >. . . . . . . . . . . . . . . . <
001ff0 17 61 20 20 20 20 20 20 20 20 20 00 00 00 00 00 >.a .....<
002000
Page (base/13323/22530)
WAL Segment File
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
XLogRecordHeader
XRBlock Header
XR Data Header Short
xl_heap_header
Tuple dataxl_heap_
insert
24 20 2 5 3
MemberSize
(byte)Description Value
xl_tot_len 4 total len of entire record 42 00 00 00
xl_xid 4 transaction id 49 54 16 07
xl_prev 8 ptr to previous record in log28 00 00 d459 01 00 00
xl_info 1 flag bits 80
xl_rmid 1 resource manager for this record 0a
padding 2 . 00 00
xl_crc 4 CRC for this record 68 ed fe 57
XLog Record Header (24 Bytes)
MemberSize
(byte)Description Value
id 1 block reference ID 00
fork_flags 1 fork within the relation and flags 60
data_length 2number of payload bytes (not including page image)
11 00
block ref16
(4/4/8)(tablespace/database/relfilenode)
7f 06 00 00 0b 34 00 00 02 58 00 00 00 00 00 00
XLog Record Block Header (20 Bytes)
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
MemberSize
(byte)Description Value
id 1 XLR_BLOCK_ID_DATA_SHORT ff
data_length 1number of payload bytes (xl_heap_insert의 길이)
03
XLog Record Data Header Short (2 Bytes)
MemberSize
(byte)Description Value
t_infomask2 2 number of attributes + various flags 01 00
t_infomask 2 various flag bits 02 08
t_hoff 3 size of header incl. bitmap, padding 18 00 17
xl_heap_header (7 Bytes)
MemberSize
(byte)Description Value
offnum 2 inserted tuple’s offset 01 00
flags 1
XLH_INSERT_ALL_VISIBLE_CLEARED (1<<0)XLH_INSERT_LAST_IN_MULTI (1<<1)XLH_INSERT_IS_SPECULATIVE (1<<2)XLH_INSERT_CONTAINS_NEW_TUPLE (1<<3)
00
xl_heap_insert (3 Bytes)
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
rmgr: Heap len (rec/tot): 14/177, tx: 118903882, lsn: 159/D4000178,
prev: 159/D4000108, desc: HOT_UPDATE off 1 xmax 118903882 ; new off 2 xmax 0,
blkref #0: rel 1663/13323/22530 blk 0 FPW
postgres=# update t6 set id = 'b' where id = 'a';
000170 00 00 00 00 00 00 00 00 b1 00 00 00 4a 54 16 07 >.. . . . . . . . . . .JT. .<
000180 08 01 00 d4 59 01 00 00 40 0a 00 00 1e eb f4 f3 >....Y...@.......<
000190 00 10 00 00 70 00 20 00 01 7f 06 00 00 0b 34 00 >....p. . .. ... .4.<
0001a0 00 02 58 00 00 00 00 00 00 ff 0e 59 01 00 00 e0 >..X.. . . . . . .Y.. . .<
0001b0 00 00 d4 00 00 00 00 20 00 b0 1f 00 20 04 20 4a >... .. .. . ... . J<
0001c0 54 16 07 d8 9f 46 00 b0 9f 46 00 4a 54 16 07 00 >T....F...F.JT...<
0001d0 00 00 00 00 00 00 00 00 00 00 00 02 00 01 80 02 >. . . . . . . . . . . . . . . . <
0001e0 28 18 00 17 62 20 20 20 20 20 20 20 20 20 00 00 >(...b ..<
0001f0 00 00 00 49 54 16 07 4a 54 16 07 00 00 00 00 00 >...IT..JT... . . . .<
000200 00 00 00 02 00 01 40 02 01 18 00 17 61 20 20 20 >[email protected] <
000210 20 20 20 20 20 20 00 00 00 00 00 4a 54 16 07 01 > .....JT...<
000220 00 00 40 00 00 00 00 02 00 00 00 00 00 00 00 00 >..@... . . . . . . . . . .<
000000 59 01 00 00 30 02 00 d4 00 00 00 00 20 00 b0 1f >Y...0..... .. ...<
000010 00 20 04 20 4a 54 16 07 d8 9f 46 00 b0 9f 46 00 >. . JT....F...F.<
000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >. . . . . . . . . . . . . . . . <
*
001fb0 4a 54 16 07 00 00 00 00 00 00 00 00 00 00 00 00 >JT. . . . . . . . . . . . . .<
001fc0 02 00 01 80 02 28 18 00 17 62 20 20 20 20 20 20 >.....(...b <
001fd0 20 20 20 00 00 00 00 00 49 54 16 07 4a 54 16 07 > .....IT..JT..<
001fe0 00 00 00 00 00 00 00 00 02 00 01 40 02 01 18 00 >... . . . . . . . .@... .<
001ff0 17 61 20 20 20 20 20 20 20 20 20 00 00 00 00 00 >.a .....<
002000
WAL Segment File
Page (base/13323/22530)
backupblock
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
XLogRecordHeader
XRBlock Header
XR Block Image
Header
XR Data Header Short
Backup Blockxl_heap_update
24 25 2 14
MemberSize
(byte)Description Value
id 1 block reference ID 00
fork_flags 1 fork within the relation and flags 10
data_length 2number of payload bytes (not including page image)
00 00
block ref16
(4/4/8)(tablespace/database/relfilenode)
70 00 20 00 01 7f 06 00 00 0b 34 00 00 02 58 00
XLog Record Block Header (20 Bytes)
MemberSize
(byte)Description Value
length 2 number of page image bytes 00 00
hole_offset 2 number of bytes before “hole” 00 00
bimg_info 1flag bits0x01 BKPIMAGE_HAS_HOLE0x02 BKPIMAGE_IS_COMPRESSED
00
XLog Record Block Image Header (5 Bytes)
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
MemberSize
(byte)Description Value
id 1 XLR_BLOCK_ID_DATA_SHORT ff
data_length 1number of payload bytes (xl_heap_update의 길이)
0e
XLog Record Data Header Short (2 Bytes)
MemberSize
(byte)Description Value
old_xmax 4 xmax of the old tuple 4a 54 16 07
old_offnum 2 old tuple’s offset 01 00
old_infobits_set 1 infomask bits to set on old tuple 00
flags 1
/* PD_ALL_VISIBLE was cleared */XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED (1<<0)
/* PD_ALL_VISIBLE was cleared in the 2nd page */XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED (1<<1)XLH_UPDATE_CONTAINS_OLD_TUPLE (1<<2)XLH_UPDATE_CONTAINS_OLD_KEY (1<<3)XLH_UPDATE_CONTAINS_NEW_TUPLE (1<<4)XLH_UPDATE_PREFIX_FROM_OLD (1<<5)XLH_UPDATE_SUFFIX_FROM_OLD (1<<6)
40
new_xmax 4 xmax of the new tuple 00 00 00 00
new_offnum 2 new tuple’s offset 02 00
xl_heap_update (14 Bytes)
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
CLOG Buffer
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
0 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
0 1 0 1 1 1 0 0
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
. . . . . . . . .
0 1 0 1 0 1 0 0
0 0 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
0 1 0 1 1 1 0 0
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
. . . . . . . . .
0 1 0 1 0 1 0 0
0 0 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
0 1 0 1 1 1 0 0
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
. . . . . . . . .
0 1 0 1 0 1 0 0
0 0 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
1 1 0 1 0 1 0 1
0 1 0 1 1 1 0 0
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
. . . . . .
0 1 0 1 0 1 0 0
0 0 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
XID Mapping Table
256K
1 Byte
. . . . . .
020 021 022 023
016 017 018 019
012 013 014 015
008 009 010 011
004 005 006 007
000 001 002 003
./pg_clog dirt_xmin t_xmax
XID 001 XID 016
(1 트랜잭션 = 2bits)
4 * 256K =1M TX 대응
00030002
00010000
#define TRANSACTION_STATUS_IN_PROGRESS 0x00
#define TRANSACTION_STATUS_COMMITTED 0x01
#define TRANSACTION_STATUS_ABORTED 0x02
#define TRANSACTION_STATUS_SUB_COMMITTED 0x03
src/include/access/clog.h
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
Dir
CLOGPagePrecedes
do_fsync
shared
ClogCtlData (SlruCtlData)
Member Value
num_slots 4
page_buffer 1c80 3c80 5c80 7c80
page_status 2 2 2 2
page_dirty 0 0 0 0
page_number 0x44 0x41 0x42 0x43
… …
latest_page_number 0x44
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
0000 (F)
0000(p) 0001(p)
0032(p)
0033(p) 0034(p) 0035(p) 0036(p) 0037(p) 0038(p) 0039(p) 0040(p)
0041(p) 0042(p) 0043(p) 0044(p) 0045(p)
0001 (F)
Disk
0044 0041 0042 0043
Memory
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
ClogCtlData (SlruCtlData)
shared (8 Bytes) do_fsync (8 Bytes)
PagePrecedes (8 Bytes) Dir (14 Bytes)
000000e3dcc0 80 8b 4e e1 58 7f 00 00 01 00 00 00 00 00 00 00 >..N.X... . . . . . . . .<
000000e3dcd0 24 a0 50 00 00 00 00 00 70 67 5f 63 6c 6f 67 00 >$.P.....pg_clog.<
000000e3dce0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > . . . . . . . . . . . . . . . .<
000000e3dcf0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > . . . . . . . . . . . . . . . .<
000000e3dd00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > . . . . . . . . . . . . . . . .<
000000e3dd10 00 00 00 00 00 00 00 00 > . . . . . . . . <
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
ClogCtlData.shared (SlruShared)
7f58e14e8b80 00 d7 4c e1 58 7f 00 00 04 00 00 00 00 00 00 00 > . . L . X . . . . . . . . . . . <
7f58e14e8b90 10 8c 4e e1 58 7f 00 00 30 8c 4e e1 58 7f 00 00 >..N.X. ..0.N.X.. .<
7f58e14e8ba0 40 8c 4e e1 58 7f 00 00 48 8c 4e e1 58 7f 00 00 >@.N.X...H.N.X...<
7f58e14e8bb0 58 8c 4e e1 58 7f 00 00 68 8c 4e e1 58 7f 00 00 >X.N.X...h.N.X...<
7f58e14e8bc0 00 04 00 00 02 00 00 00 00 00 00 00 01 00 00 00 > . . . . . . . . . . . . . . . . <
…
000000e3dcc0 80 8b 4e e1 58 7f 00 00 01 00 00 00 00 00 00 00 > . . N . X . . . . . . . . . . . <
ControlLock (8 Bytes) num_slots (8 Bytes)
page_buffer (8 Bytes) page_status (8 Bytes)
page_dirty (8 Bytes) page_number (8 Bytes)
page_lru_count (8 Bytes) group_lsn (8 Bytes)
lsn_groups_per_page(4 Bytes)
cur_lru_count(4 Bytes)
latest_page_number(4 Bytes)
lwlock_tranche_id(4 Bytes)
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
NAVER http://cafe.naver.com/playexem
ITPUB (中) http://blog.itpub.net/31135309/
Wordpress https://playexem.wordpress.com/
Slideshare http://www.slideshare.net/playexem
교육 문의 [email protected]
EXEM Research & Contents Team
Youtube https://www.youtube.com/channel/UC5wKR_-A0eL_Pn_EMzoauJg
Tudou (中) http://www.tudou.com/home/maxgauge/
© Copyrights 2001~2017 EXEM CO.,LTD. All Rights Reserved.
감사합니다