with replication lag how booking avoids and deals...binlog waypoints binary log file and position as...

31
How Booking avoids and deals with replication lag (and how you can too) Eric Herman (Principal Developer) [email protected]

Upload: others

Post on 25-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

How Booking avoids and deals with replication lag

(and how you can too)

Eric Herman (Principal Developer)[email protected]

Page 2: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

In this talk1. Booking.com2. MySQL/MariaDB replication (@Booking.com)3. Replication lag: what/how/why4. Bad solutions to cope with lag5. A Booking.com solution to cope with lag6. Improving on our current solution7. Take home8. Links, questions, and closing

2

Page 3: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Booking.com at a glance● Started in 1996; still based in Amsterdam

● Member of the Priceline Group since 2005 (stock: PCLN)● Amazing growth; continuous scaling challenges

● Online Hotel/Accommodation/Travel Agent (OTA):● Over 1.2 million active properties in 227 countries● Over 1.2 million room nights reserved daily● 40+ languages (website and customer service)● Over 13,000 people working in 187 offices in 70 countries

● We use a lot of MySQL and MariaDB:● Thousands (1000s) of servers, ~90% replicating● >150 masters: ~30 >50 slaves & ~10 >100 slaves

3

Page 4: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Reminder on replication● Typically one read/write master with one or more read-only slaves● The master records all its writes in a journal: the binary logs● Each slave:

● Downloads the journal and saves it locally (IO thread): relay logs● Executes the relay logs on the local copy of the database (SQL thread)● May produce binary logs to be itself a master (log-slave-updates)● Intermediate binlogs have different name+positions from original binlogs

● Replication is:● Asynchronous, thus lag● Slave replay is typically single threaded or less parallel● Overall execution may be slower on slaves than the master

4

Page 5: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

MySQL replication at Booking.com● Typical Booking.com MySQL replication deployment: +-----+ | PM | +-----+ | +----+----+-------------- ... ------+ | | | +-----+ +-----+ +-----+ | IM1 | | IM2 | | IMn | +-----+ +-----+ +-----+ | | +---+----+---- ... --+ +---+----+----- ...

| | | | | +-----+ +-----+ +-----+ +-----+ +-----+ | SA1 | | SA2 | | SAm | | SX1 | | SX2 | +-----+ +-----+ +-----+ +-----+ +-----+5

Page 6: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Replication Management● Lots of home-grown tooling for monitoring, alerting, management● We are using and contributing to Orchestrator:

6

Page 7: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Orchestrator’s lag visualization

7

Page 8: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Extreme lag

8

Page 9: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Why does lag happen ?● In which condition can lag be experienced ?

● Too many transactions for replication to keep up● capacity problem● fix by scaling (splitting, sharding, parallel replication, …)

● Very large and long transactions or DDL (self induced)● fix by a developer in the application● on-line schema change

● Overly aggressive “batch” workload on the master● optimize the batch sizes● slow down, consider a “backpressure” mechanism

9

Page 10: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Lag consequences (1)● What really are the consequences of lag?

● Yes, stale reads on slaves, but this is not necessarily a problem

10

Page 11: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Lag consequences (2)● What really are the consequences of lag?

● Yes, stale reads on slaves, but this is not necessarily a problem● e.g.: A product description is updated, we see the old value for

two minutes

● Examples of stale read problems● A user changes their email address but still sees the old one● A hotel changes its inventory but still sees old availability ● A user books a hotel but does not see it in his reservations

11

Page 12: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Bad solution #1 to cope with lag● Bad solution #1: falling back to reading from the master

● If slaves are lagging, maybe we should read from the master● Maybe this looks like an attractive solution to avoid stale reads● But this does not scale

● Consider why are we reading from slaves in the first place● May cause a sudden load on the master (in case of lag)● And it might cause an outage on the master (yikes!)

● It might be better to fail a read than to fallback to (and kill) the master● Reading from the master might be okay in very specific cases

12

Page 13: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Bad solution #2 to cope with lag● Bad solution #2: retry on another slave

● When reading from a slave: if lag, then retry on another slave● Scales “better" and is maybe okay-ish

● if only ever a few slaves are lagging, never many● But what happens if all slaves are lagging?● Increased load (retries) can slow down replication● This might overload the slaves and cause a good slave to start lagging● In the worst case, this might kill slaves and cause a domino effect

● Again: probably better to fail a read than to cause a bigger problem● Truly, I do not know of a special case where this tactic makes sense.

● Consider separate “interactive” and “reporting” pools of slaves instead13

Page 14: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Some tools already cope with lag● Percona’s pt-online-schema-change

● small chunks● slows down when lag is present

● Github’s gh-ost● built-in heartbeat mechanism which it utilizes to examine replication lag

● Many others, of course.

14

Page 15: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Coping with lag @ Booking.com (1)● A Booking.com solution: “waypoint”

● Place a “waypoint” marker in the replication stream● Waiting for a waypoint is similar to waiting for a slave to catch-up● Presence of that marker on the slave tells us the slave is caught up

● Maybe there is still lag, but we know we are caught up enough● Creating a waypoint is similar to creating a “read view”

15

Page 16: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Coping with lag @ Booking.com (2)● Booking.com waypoint implementation:

● Table: db_waypoint (a single waypoint is a row in that table)● API function: commit_wait(timeout) → (err_code, waypoint)

● INSERTs a waypoint, then executes the COMMIT● polls – until timeout – for row arrival on a slave● Does this look a bit similar to semi-sync?

● API function: waypoint_wait(waypoint, timeout) → err_code● Waits for a waypoint – until timeout – on a slave● This is waiting for a slave to catch-up enough

● Garbage collection: cleanup job that DELETEs old waypoints

16

Page 17: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Coping with lag @ Booking.com (3)● Booking.com waypoint use-cases

● Throttling batches● use commit_wait with a high timeout (backpressure)● use “small” transactions (chunks of 100 to 1000 rows)● sometimes add some extra sleep() between chunks

● Protect from stale reads after writing● store changed data in session● commit_wait with zero timeout● store the waypoint in web session● and waypoint_wait when reading

17

Page 18: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Improving Booking.com waypoints● The waypoint design and implementation still suits us.

● Although, sometimes we have a “fast slave” problem:● Throttling batches on a fast slave is sub-optimal● It would be easy-ish to fix: “find the slowest (or any slow) slave”● But for us, this does not arise very often in practice

● Yet, starting from scratch, we might do things differently:● Inserting, deleting, and purging waypoint could be simplified● And we could get rid of the waypoint table

18

Page 19: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

GTID waypoints (1)● Global Transaction IDs as waypoint

● Get the GTID of the last transaction● last_gtid session variable in MariaDB Server

From https://mariadb.com/kb/en/mariadb/master_gtid_wait/:

MASTER_GTID_WAIT() can also be used in client applications together with the last_gtid session variable. This is useful in a read-scaleout replication setup, where the application writes to a single master but divides the reads out to a number of slaves to distribute the load. In such a setup, there is a risk that an application could first do an update on the master, and then a bit later do a read on a slave, and if the slave is not fast enough, the data read from the slave might not include the update just made, possibly confusing the application and/or the end-user. One way to avoid this is to request the value of last_gtid on the master just after the update. Then before doing the read on the slave, do a MASTER_GTID_WAIT() on the value obtained from the master; this will ensure that the read is not performed until the slave has replicated sufficiently far for the update to have become visible.

19

Page 20: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

GTID waypoints (2)● Global Transaction IDs as waypoint:

● Get the GTID of the last transaction● last_gtid session variable in MariaDB Server● gtid_executed global variable in Oracle MySQL (get all executed GTIDs)● the last GTID can also be requested in the OK packet (only Oracle MySQL)

(session_track_gtids variable and mysql_session_track_get_{first,next} API functions)● Waiting for GTID:

● MASTER_GTID_WAIT in MariaDB Server● WAIT_FOR_EXECUTED_GTID_SET in Oracle MySQL

● Not portable● replicating from MySQL to MariaDB or vice-versa

20

Page 21: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

BinLog waypoints● Binary log file and position as waypoint:

● MASTER_POS_WAIT● However this breaks using intermediate masters● But it is OK with Binlog Servers[1]

● digress for a moment: what is BinLog Server● with a binlog server, the binlog file and position is a GTID

● But currently no way of getting file and position after committing

[1]: https://blog.booking.com/abstracting_binlog_servers_and_mysql_master_promotion_wo_reconfiguring_slaves.html

21

Page 22: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Lag info on the master?● Information about slave lag on the master would be great

● Potentially interesting for monitoring and alerting● Enable better solution for throttling

● Connecting to the right slave is a challenge

● A plugin exists for something close: semi-sync● Using this to track transaction execution on slaves?● Is this the No-Slave-Left-Behind MariaDB Server patch?

22

Page 23: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

No-Slave-Left-Behind● No-Slave-Left-Behind MariaDB Server patch[1]

● Thanks Jonas Oreland and Google● The semi-sync reply also reports SQL-thread position● Transactions are kept in the master plugin until slaves execute

● Slave lag can be approximated?

● Maybe this could easily be modified to implement commit_wait● wait until lag is acceptable● without connecting to any slave

[1]: https://jira.mariadb.org/browse/MDEV-811223

Page 24: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Feature requests● Bug#84747: Expose last transaction GTID in a session variable

● Bug#84748: Request transaction GTID in OK packet on COMMIT(without needing a round-trip)

● MDEV-11956: Get last_gtid in OK packet

● Bug#84779: Expose binlog file and position of last transaction● MDEV-11970: Expose binlog file and position of last transaction

● MDEV-8112: No Slave Left Behind24

Page 25: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

What about you?● Right now you will need to roll your own, like we have

● Rolling your own “waypoint” (even today) is not too hard● Use these and other ideas● Adapt the ideas to your environment and needs● Collaborate● Contribute? (Github? Or maybe even just a talk like this)

● Add yourself to the feature requests● “Affects Me” on bugs.mysql.com● “Vote for this issue” on jira.mariadb.org● “Does this bug affect you?” on bugs.launchpad.net/percona-server

Together, let’s expand what can be done25

Page 26: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Feature requests (again)● Bug#84747: Expose last transaction GTID in a session variable

● Bug#84748: Request transaction GTID in OK packet on COMMIT(without needing a round-trip)

● MDEV-11956: Get last_gtid in OK packet

● Bug#84779: Expose binlog file and position of last transaction● MDEV-11970: Expose binlog file and position of last transaction

● MDEV-8112: No Slave Left Behind26

Page 27: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Oh, and Booking.com is hiring!● Almost any role:

● MySQL Engineer / DBA● System Administrator● System Engineer● Site Reliability Engineer● Developer● Designer● Technical Team Lead● Product Owner● Data Scientist● And many more…

● https://workingatbooking.com/ 27

Page 28: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Links● Booking.com:

● https://blog.booking.com/● https://workingatbooking.com/● https://secure.booking.com/general.en-gb.html?tmpl=docs/about

● MariaDB Server last_gtid (thanks Kristian Nielsen for implementing this):● https://mariadb.com/kb/en/mariadb/master_gtid_wait/

● MySQL GTIDs in OK packet:● session_track_gtids …● mysql_session_track_get_{first,next} …

● No-Slave-Left-Behind MariaDB Server patch:● https://jira.mariadb.org/browse/MDEV-8112 (thanks Jonas Oreland and Google)

28

Page 29: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

More Links● Pull request to extent Perl-DBI for reading GTID in OK packet:

● https://github.com/perl5-dbi/DBD-mysql/pull/77 (thanks Daniël van Eeden)

● Bug reports/Feature requests: ● Bug#84747: Expose last transaction GTID in a session variable.● Bug#84748: Request transaction GTID in OK packet on COMMIT

(without needing a round-trip).● Bug#84779: Expose binlog file and position of last transaction.

● MDEV-11956: Get last_gtid in OK packet.● MDEV-11970: Expose binlog file and position of last transaction.

29

Page 30: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Questions

Page 31: with replication lag How Booking avoids and deals...BinLog waypoints Binary log file and position as waypoint: MASTER_POS_WAIT However this breaks using intermediate masters But it

Thank You!And special thanks with sugar on top to:

PerconaBooking.com

Jean-François Gagné

Eric Herman <[email protected]>