maintenance, and best practices mysql gtid implementation,€¦ · show slave status log file...
TRANSCRIPT
MySQL GTID Implementation, Maintenance, and Best Practices
Brian Cain (Dropbox)Gillian Gunson (GitHub)
Mark Filipi (SurveyMonkey)
Agenda
❏ Intros❏ Concepts
❏ Replication overview❏ GTID Intro
❏ Implementation❏ Maintenance❏ New 5.7 Features and Advanced Concepts
2
About Mark
• Works at SurveyMonkey
• From Kansas
• Formerly of PalominoDB and Garmin and preschool
3
About Gillian
• Senior Infrastructure Engineer at GitHub
• From Vancouver, BC, Canada
• Formerly of Okta, PalominoDB, Oracle, Disney
4
About Brian
• Database Engineer, MySQL SRE at Dropbox
• From Seattle
• Formerly of PalominoDB, Zappos, EMusic, etc
• Also from Kansas
5
Tutorial Setup (Hour 2)
❏ Collect a DigitalOcean droplet host access card from the front❏ Connect to wireless and ssh to the droplet host (server1)❏ Confirm you can ssh to server2 and server3 from server1❏ Confirm replication is running
server1master
server2replica
server3replica
6
ConceptsTraditional replication primer and introduction to GTID
7
Traditional MySQL replication primer
❏ Standard topologies❏ SHOW MASTER STATUS❏ SHOW SLAVE STATUS
8
Standard topologies
server1master
server2replica
server3replica
server1master
server2relay
server3replica
9
SHOW MASTER STATUS
markf@db-wfcore03-ro [(none)]> show master status;+------------------+-----------+--------------+------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |+------------------+-----------+--------------+------------------+| mysql-bin.000695 | 264631170 | | |+------------------+-----------+--------------+------------------+1 row in set (0.08 sec)
10
SHOW SLAVE STATUSmarkf@db-wfcore03-ro [(none)]> show slave status\G*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.10.23.17 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000751 Read_Master_Log_Pos: 67329044 Relay_Log_File: mysqld-relay-bin.000403 Relay_Log_Pos: 67329189 Relay_Master_Log_File: mysql-bin.000751 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 67329044 Relay_Log_Space: 67329388 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error:1 row in set (0.08 sec)
11
SHOW SLAVE STATUSMaster information and running status
Slave_IO_State: Waiting for master to send event Master_Host: 10.10.23.17 Master_User: repl Master_Port: 3306 Slave_IO_Running: Yes Slave_SQL_Running: Yes Seconds_Behind_Master: 0
Displays state of replication IO thread - whether logs are being pulled from the master.
12
SHOW SLAVE STATUSMaster information and running status
Slave_IO_State: Waiting for master to send event Master_Host: 10.10.23.17 Master_User: repl Master_Port: 3306 Slave_IO_Running: Yes Slave_SQL_Running: Yes Seconds_Behind_Master: 0
Configured host database
13
SHOW SLAVE STATUSMaster information and running status
Slave_IO_State: Waiting for master to send event Master_Host: 10.10.23.17 Master_User: repl Master_Port: 3306 Slave_IO_Running: Yes Slave_SQL_Running: Yes Seconds_Behind_Master: 0
MySQL user configured for replication
14
SHOW SLAVE STATUSMaster information and running status
Slave_IO_State: Waiting for master to send event Master_Host: 10.10.23.17 Master_User: repl Master_Port: 3306 Slave_IO_Running: Yes Slave_SQL_Running: Yes Seconds_Behind_Master: 0
DB port being connected to (default)
15
SHOW SLAVE STATUSMaster information and running status
Slave_IO_State: Waiting for master to send event Master_Host: 10.10.23.17 Master_User: repl Master_Port: 3306 Slave_IO_Running: Yes Slave_SQL_Running: Yes Seconds_Behind_Master: 0
Two replication threads, one reading from master, other executing SQL on replica.
16
SHOW SLAVE STATUSMaster information and running status
Slave_IO_State: Waiting for master to send event Master_Host: 10.10.23.17 Master_User: repl Master_Port: 3306 Slave_IO_Running: Yes Slave_SQL_Running: Yes Seconds_Behind_Master: 0
Seconds between timestamp in binlog, and time on replica
17
SHOW SLAVE STATUSLog file information
Master_Log_File: mysql-bin.000751 Read_Master_Log_Pos: 67329044 Relay_Log_File: mysqld-relay-bin.000403 Relay_Log_Pos: 67329189 Relay_Master_Log_File: mysql-bin.000751 Exec_Master_Log_Pos: 67329044 Relay_Log_Space: 67329388
Binary log file on master, and position it’s read to
18
SHOW SLAVE STATUSLog file information
Master_Log_File: mysql-bin.000751 Read_Master_Log_Pos: 67329044 Relay_Log_File: mysqld-relay-bin.000403 Relay_Log_Pos: 67329189 Relay_Master_Log_File: mysql-bin.000751 Exec_Master_Log_Pos: 67329044 Relay_Log_Space: 67329388
Position in relay log on replica
19
SHOW SLAVE STATUSLog file information
Master_Log_File: mysql-bin.000751 Read_Master_Log_Pos: 67329044 Relay_Log_File: mysqld-relay-bin.000403 Relay_Log_Pos: 67329189 Relay_Master_Log_File: mysql-bin.000751 Exec_Master_Log_Pos: 67329044 Relay_Log_Space: 67329388
Position in binary log that SQL thread has executed on replica
20
SHOW SLAVE STATUSSSL information
Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key:
21
SHOW SLAVE STATUSFiltering information
Until_Condition: None Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table:
Filtered replication settings -- Use with caution
22
- Global Transaction IDentifier- source_id:transaction_id - e200c55b-7832-11e5-9d51-00259082ca78:1- source_id - normally the server_uuid of the master- transaction_id - sequential integer (starts at 1) representing the order a transaction
was committed on the source
Defining GTID
23
Binary Log ContentsStandard replication
# at 2637016#160307 15:05:42 server id 3031 end_log_pos 2637016 Table_map: `C0070735`.`FormStats` mapped to number 139874072#160307 15:05:42 server id 3031 end_log_pos 2637088 Update_rows: table id 139874072 flags: STMT_END_F
BINLOG 'RgneVhPXCwAAOAAAANg8KAAABhPVggAAAEACUMwMDcwMzczNQAJm9ybVN0YXRzAAQDDAMDAAA=RgneVhjXCwAASAAAACA9KAAABhPVggAAAEABP//8AdPAACwPPLvVIAACgBAAAMAAAA8AdPAACwPPLvVRIAACkBAAAMAAAA'/*!*/;### UPDATE C0070735.FormStats### WHERE### @1=20231### @2=2016-03-07 15:00:00### @3=296### @4=12### SET### @1=20231### @2=2016-03-07 15:00:00### @3=297### @4=12# at 2637088#160307 15:05:42 server id 3031 end_log_pos 2637115 Xid = 2004736153COMMIT/*!*/; 24
Binary Log ContentsGTID Enabled
#160323 14:37:48 server id 168433453 end_log_pos 41956 CRC32 0xee79822d GTID [commit=yes]SET @@SESSION.GTID_NEXT= '81b0bb5e-f004-11e5-aaa3-b8ca3a676681:100'/*!*/;# at 41956#160323 14:37:48 server id 168433453 end_log_pos 42033 CRC32 0xdac047b0 Query thread_id=4611exec_time=0 error_code=0SET TIMESTAMP=1458769068/*!*/;BEGIN/*!*/;# at 42033#160323 14:37:48 server id 168433453 end_log_pos 42094 CRC32 0xd3c70a01 Table_map: `C01840587`.`FormStats` mapped to number 74# at 42094#160323 14:37:48 server id 168433453 end_log_pos 42166 CRC32 0xa031417e Update_rows: table id 74 flags: STMT_END_F
BINLOG 'rAzzVhMtFwoKPQAAAG6kAAAAAEoAAAAAAUMwMTg0MDU4NwAJRm9ybVN0YXRzAAQDEgMDAQAAAQrH0w==rAzzVh8tFwoKSAAAALakAAAAAEoAAA///wBAAAAJmY7uAAAgAAAAIAAADwBAAAAJmY7uAAAwAAAAIAAAB+QTGg'/*!*/;### UPDATE `C01840587`.`FormStats`### WHERE### @1=4### @2='2016-03-23 14:00:00'### @3=2 25
SHOW SLAVE STATUSNEW GTID Information
Master_UUID: 81b0bb5e-f004-11e5-aaa3-b8ca3a676681
Retrieved_Gtid_Set: 81b0bb5e-f004-11e5-aaa3-b8ca3a676681:1-51
Executed_Gtid_Set: 81b0bb5e-f004-11e5-aaa3-b8ca3a676681:1-51
26
- Rather than just a single transaction_id, an interval is given- A range of transactions - e200c55b-7832-11e5-9d51-00259082ca78:1-1234- Two ranges with a gap - e200c55b-7832-11e5-9d51-00259082ca78:1-1234,1236-1240- Commonly used variables related to GTID
- server_uuid- enforce_gtid_consistency- gtid_mode- gtid_next
27
GTID Sets & Related Variables
- Traditional replication coordinates- MASTER_LOG_FILE- MASTER_LOG_POS
- GTID replication coordinates- MASTER_AUTO_POSITION
28
GTID vs Binlog Position
- Guarantee master and replica(s) are in sync by enabling read_only on the master and allow replication to catch up
- Shutdown the master and replica(s)- Add the following to my.cnf
- enforce-gtid-consistency- gtid-mode = ON- skip-slave-start- log-slave-updates- read-only = 1
- Start the master and replica(s)- Start replication
CHANGE MASTER TO MASTER_HOST=’server1’, MASTER_AUTO_POSITION=1; START SLAVE;
- Disable read_only on the master and remove read-only from the my.cnf29
Enabling GTIDs in Oracle MySQL 5.6
- Traditional replication required unique server_id values in a cluster- GTID replication requires unique sources (server_uuid)
- data_dir/auto.cnf contains the server_uuid value- when the server starts it will generate a new server_uuid and save it in auto.cnf if not found
- Beware cloning of replicas- The auto.cnf file will be copied as well and needs to be removed prior to server start- This is similar to the process of changing the server_id in my.cnf
30
Potential Replication Conflicts
ImplementationEnabling GTID and making topology changes
31
Connection setup
❏ See https://goo.gl/NVKBL3 for all the commands being run■ Start in the “Prep work” tab
❏ ssh to your provided host in 3 terminal windows/tabs/panes■ this will be server1 ■ ssh directly to server2 and server3 in the other windows
❏ Use mysql (no password) to connect to local mysql instance
32
A note about conventions
❏ A lot of this is BAD PRACTICE■ root user, no password■ Hacky, incorrect SQL for fixes■ Slow progress between steps ■ Not worrying about errant writes between steps■ Making deliberate mistakes
33
Starting topology
server1master
server2replica
server3replica
34
MySQL instance information
❏ Ubuntu on DigitalOcean droplet
❏ Percona Server 5.6.32-78.0
❏ Important file locations:■ MySQL config file: /etc/mysql/my.cnf■ datadir: /var/lib/mysql/■ Binary logs: /var/lib/mysql/mysql-bin.00000x■ Relay logs: /var/lib/mysql/relay-bin.00000x
❏ Restart mysqld via service mysql start/stop/restart
35
MySQL instance: replication configuration
❏ Regular non-GTID replication
❏ binlog_format=ROW ■ mysqlbinlog --no-defaults --base64-output=DECODE-ROWS -vvv
[binlog]
❏ Some variables already set:■ skip-slave-start■ log-bin, log-slave-updates
36
Important GTID variables
❏ gtid_mode■ Static variable (requires restart)■ If ON, requires log-bin, log-slave-updates, enforce-gtid-consistency
also set■ “Disabled” by gtid_deployment_step
❏ gtid_deployment_step ■ Percona-specific dynamic variable■ Used as a temporary setting on replicas■ ON means:
● can replicate non-GTID binary log events from master● direct writes won’t have GTIDs
37
Important GTID variables (cont.)
❏ master_auto_position■ Tells server to use GTID replication protocol
● Needed for simplified server failover/repointing■ Set in CHANGE MASTER statement instead of master_log_file,
master_log_pos■ Shown as Auto_Position in SHOW SLAVE STATUS■ Setting to 1 tells server to only replicate GTID events
38
Important GTID variables (cont.)
❏ Setting both gtid_deployment_step=ON and master_auto_position=1 will result in silently ignored non-GTID events
■ server3 will be used to demonstrate this
39
Prep work: general steps
❏ Use mysqlslap and inserts to generate writes between steps
❏ Edit the [mysqld] section in /etc/mysql/my.cnf on all 3 servers:
enforce-gtid-consistency = 1gtid-mode = ON
❏ Restart server2 and server3 (service mysql restart)
❏ Set these variables on server2 and server3:
SET GLOBAL gtid_deployment_step=ON;SET GLOBAL super_read_only=ON;
40
Prep work: results
❏ non-GTID writes to server1 still replicating properly
server1master
server2replica
server3replica
gtid_mode=ONgtid_deployment_step=ON
gtid_mode=ONgtid_deployment_step=ON
gtid_mode=OFFgtid_deployment_step=OFF
41
Enable GTIDs: Topology change
❏ Failover to server2❏ Intentional breaking/fixing of replication
server1master
server2replica
server3replica
server1replica
server2master
server3replica
42
Important GTID status info variables
❏ Retrieved_Gtid_Set■ All GTIDs received from the master■ Resets on:
● CHANGE MASTER ● RESET SLAVE● server restart (if relay-log-recovery is on)
❏ Executed_Gtid_Set ■ All GTIDs written to binary log■ Same value seen in:
● SHOW MASTER STATUS ● SHOW SLAVE STATUS● gtid_executed variable
43
GTID slave status
(root@server2) [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: No
...
Master_UUID: cc83d91e-d0e4-11e5-9faf-02cddc874cbb
...
Retrieved_Gtid_Set: cc83d91e-d0e4-11e5-9faf-02cddc874cbb:1-107
Executed_Gtid_Set: c866b7ac-d0e4-11e5-9faf-020a6fe2a217:1-442,
cc83d91e-d0e4-11e5-9faf-02cddc874cbb:1-107
Auto_Position: 0
44
GTID master status
(root@server2) [(none)]> show master status\G
*************************** 1. row ***************************
File: mysql-bin.000002
Position: 1976131
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: c866b7ac-d0e4-11e5-9faf-020a6fe2a217:1-442,
cc83d91e-d0e4-11e5-9faf-02cddc874cbb:1-107
1 row in set (0.00 sec)
45
Enable GTIDs: general steps
❏ Set server1 to read-only❏ Point server1 to server2❏ Repoint server3 to server2 (incorrectly)❏ Test server2 to server3 replication (broken)❏ Fix server3 replication❏ Fix server1 replication
46
Enable GTIDs: breaking things
47
server1replica
server2master
server3replica
gtid_mode=OFF gtid_mode=ONgtid_deployment_step=ON
gtid_mode=ONgtid_deployment_step=ONmaster_auto_position=1
error
silently dropped writes
Promote server-1: Topology change
❏ Failover back to original topology
server1master
server2replica
server3replica
48
server1replica
server2master
server3replica
Promote server1: general steps
❏ Set server2 to read-only❏ Repoint server2 to server1 ❏ Repoint server3 to server1❏ Turn off read-only on server1❏ Turn off replication on server1
49
Server2 relay to server3: Topology change
❏ Repoint server3 to server2
server1master
server2replica
server3replica
50
server1master
server2relay
server3replica
Server2 relay to server3: non-GTID long way
❏ server3■ STOP SLAVE
❏ server2■ wait until replication ahead of server3■ FLUSH TABLES; FLUSH TABLES WITH READ LOCK; SHOW MASTER STATUS;
SHOW SLAVE STATUS\G UNLOCK TABLES;
❏ server3■ START SLAVE UNTIL [recorded server2 slave file/position]■ wait until SHOW SLAVE STATUS says Slave_SQL_Running: No■ CHANGE MASTER TO [recorded server2 master file/position]■ START SLAVE
51
Server2 relay to server3: with GTIDs and master_auto_position=1❏ server3
■ stop slave;■ CHANGE MASTER TO master_host=’server2’;■ start slave;
52
Maintenance
53
Maintenance
❏ Determine the currently writing master❏ GTID set gaps❏ Finding transactions in the binary logs❏ Fixing transactions with gtid_next❏ Faking and skipping transactions
54
Currently writing master
❏ Master_UUID can be misleading as to who is writing
server1master
server2relay
server3replica
server_uuidbd933998-f2c5-11e5-bc9a-021b71e877a3Executed_Gtid_Setbcc3d83d-f2c5-11e5-bc9a-029d985ea7a3:1-2,bd933998-f2c5-11e5-bc9a-021b71e877a3:1-111
server_uuidbcc3d83d-f2c5-11e5-bc9a-029d985ea7a3Master_UUID bd933998-f2c5-11e5-bc9a-021b71e877a3
server_uuidbe2ed142-f2c5-11e5-bc9a-0274358cd201Master_UUIDbcc3d83d-f2c5-11e5-bc9a-029d985ea7a3
55
GTID set gaps
❏ Gaps within a GTID set occur when❏ slave_parallel_workers > 1 ❏ A transaction is missing
Executed_Gtid_Setbcc3d83d-f2c5-11e5-bc9a-029d985ea7a3:1-2,bd933998-f2c5-11e5-bc9a-021b71e877a3: 1-111:113-120
56
Finding transactions
❏ mysqlbinlog❏ include_gtids❏ exclude_gtids❏ Beware transactions without gtid_next (gtid_mode=OFF)
mysqlbinlog --no-defaults -vvv --base64-output=DECODE-ROWS --include-gtids='bd933998-f2c5-11e5-bc9a-021b71e877a3:112' /var/lib/mysql/mysql-bin.000002
SET @@SESSION.GTID_NEXT= 'bd933998-f2c5-11e5-bc9a-021b71e877a3:112'/*!*/;
57
Fixing transactions with gtid_next
❏ Accidental writes on a replica happen❏ Who hasn’t forgotten to set sql_log_bin=0?
❏ Apply DDL to replicas for very large tables then promote❏ Realign GTID sets to match the recorded direct write to the replica
ALTER TABLE mysqlslap.t1 ADD COLUMN newcol3 varchar(128);
58
Faking and skipping transactions
❏ What if the change on the replica was already fixed or irrelevant❏ How to skip a transaction
❏ sql_slave_skip_counter
set gtid_next='xxx_gtid_xxx'; BEGIN; COMMIT;
59
Advanced ConceptsThings to investigate
60
Advanced Concepts
❏ GTID variables❏ binlog_gtid_simple_recovery❏ GTID Set functions❏ START SLAVE UNTIL …❏ SHOW SLAVE STATUS NONBLOCKING
61
GTID Variables
❏ gtid_executed❏ Same information as seen in SHOW MASTER/SLAVE STATUS
❏ gtid_purged❏ Subset of gtid_executed that are no longer in the binary logs
62
binlog_gtid_simple_recovery
❏ Controls how binary logs are iterated over when MySQL starts❏ When set to TRUE
❏ gtid_executed is set based on Gtid_log_event in the newest binary log file ❏ gtid_purged is set based on Previous_gtids_log_event in the oldest file
❏ When FALSE❏ Both variables are computed by iterating through the binary logs from
newest to oldest (gtid_executed) and oldest to newest (gtid_purged) ❏ Used when there may be transactions without GTIDs prior to enabling
gtid_mode or setting gtid_purged
63
GTID Set functions
❏ GTID_SUBSET(subset,set)❏ Return true if subset is in the set
❏ GTID_SUBTRACT(set,subset)❏ Return what is in set less the subset❏ Know what is still available on the master or difference between binary logs
select gtid_subtract(@@global.gtid_executed,@@global.gtid_purged)
❏ WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(gtid_set,timeout)❏ Wait for a gtid_set to complete or timeout, returning a count of transactions
completed ❏ MASTER_POS_WAIT(log_name,log_pos,timeout)
64
START SLAVE UNTIL ...
❏ SQL_BEFORE_GTIDS❏ SQL_AFTER_GTIDS❏ SQL_AFTER_MTS_GAPS
❏ Reverting slave_parallel_workers to 0
65
SHOW SLAVE STATUS NONBLOCKING
❏ Issuing STOP SLAVE creates a global lock on other SLAVE commands❏ Lock remains while the last replication event group finishes❏ rpl_stop_slave_timeout controls how long stop slave will wait (defaults to 1
YEAR)❏ Issuing SHOW SLAVE STATUS will wait until the STOP SLAVE unlocks❏ SHOW SLAVE STATUS NONBLOCKING allows bypassing the lock
❏ 5.7 no longer blocks on SHOW SLAVE STATUS
66
New 5.7 FeaturesThings to look forward to
67
New items or changes in 5.7
❏ Enabling GTID online❏ mysql.gtid_executed table❏ Use performance_schema to view MTR details
68
Enabling GTID online
❏ gtid_mode and enforce_gtid_consistency are now dynamic ❏ No longer requires restarts or promotions❏ enforce_gtid_consistency - WARN -> ON❏ gtid_mode - OFF <-> OFF_PERMISSIVE <-> ON_PERMISSIVE <-> ON❏ No longer requires log_bin and log_slave_updates
69
mysql.gtid_executed table
❏ Stores the source uuid and start/end intervals for executed statements (GTID set)❏ The table is compressed for contiguous GTID sets depending on log_bin
❏ If log_bin is OFF, then executed_gtids_compression_period determines how many transactions are executed before compressing
❏ If log_bin is ON, then the table is compressed during each binary log rotation
70
❏ Give us your feedback!❏ https://www.surveymonkey.com/r/PLAM16GTID
71
See us again
❏ Mark Filipi❏ Using Ansible to Manage MySQL
4 October 5:20 PM - 06:10 PM in Lausanne
❏ Gillian Gunson and Brian Cain❏ At the bar
72
Thank you!
73