etcd based postgresql ha cluster
TRANSCRIPT
![Page 1: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/1.jpg)
etcd based PostgreSQL HA cluster
TL;DR: github.com/compose/template-etcd-based-postgres-ha
![Page 2: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/2.jpg)
Introduction
Chris Winslett
@winsletts
compose.io
reading the top 5 comments on Imgur since 2012
![Page 3: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/3.jpg)
How we started using PostgreSQL
MongoDB was a primary datastore
launched project to understand financial metrics
required data exploration, which is brutal in MongoDB
![Page 4: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/4.jpg)
Our database product
our platform runs databases
these databases scale automatically as a customer
increases data size
![Page 5: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/5.jpg)
Our database product
could we run PostgreSQL on our platform?
![Page 6: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/6.jpg)
Database operational requirements
• replicated • highly-available • no human interaction for failover • minimize core-engine
modifications • customers use entire
deployment
![Page 7: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/7.jpg)
Tools investigated
repmgr with pgpool II
required human interaction for failover
does not use PostgreSQL streaming
pgpool was flakey on failover
![Page 8: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/8.jpg)
Tools investigated
PostgreSQL streaming replication
no automatic failover
![Page 9: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/9.jpg)
Tools investigated
bi-directional replicationi.e. master-master
only runs on one database per cluster
requires a patch on core engine
![Page 10: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/10.jpg)
is automated failover too ambitious with PostgreSQL?
![Page 11: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/11.jpg)
Learned from tools investigation
PostgreSQL should not be the canonical store of its own state, investigated:
serf - not consensus based consul - runs with consensus
etcd - run with conensus
![Page 12: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/12.jpg)
Consulwe built the prototype on Consul
using:
locking sessions
health checks
code at: https://github.com/MongoHQ/consul_ha
![Page 13: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/13.jpg)
ConsulCode at: https://github.com/MongoHQ/
consul_ha
Tight coupling between:
Consul interaction and
HA decision loop
![Page 14: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/14.jpg)
Consul Diagram 1
![Page 15: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/15.jpg)
Final Consul Diagram
![Page 16: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/16.jpg)
Consul Results
amazing
automatically growing and shrinking Consul clusters
health checks to prevent unhealthy secondaries from acquiring locks
![Page 17: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/17.jpg)
Consul
until, we ran into massive swap allocation.
40 GB swap allocation.
fine for prototypes, not for production.
![Page 18: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/18.jpg)
Results from Consul
HA PostgreSQL is possible
but, we need a tool which uses our resources more wisely.
![Page 19: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/19.jpg)
Switch to etcd
because of what we’d learned in Consul, the switch to etcd took a
day to have a working sample
![Page 20: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/20.jpg)
Modern etcd diagramStart
Connect to etcd?
Is data directory empty?
yes
Win race to set initialization
key?yes Initialize
database
Take over lead TTL
keyStart
PostgreSQL as a
leaderless Secondary
no
yes
Leader owns key?
pg_basebackup from leader
Do I own leader key?
Acquire leader lock?
yes
Update leader
TTL lock
yes
Promote to leader
Is leader key
owned?
no
Am I following
the correct leader?
yes
Am I the healthiest member?
no
Am I the leader?
no
Wait 30 seconds
yes
yes
no
yes
Start Postgres
wait 5 seconds
no
wait 5 seconds
no
follow proper leader
no
yes
Running Loop
Start Postgres
Start-up Process
![Page 21: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/21.jpg)
etcd features used
concensus recursivettl prevValue prevExist
https://coreos.com/docs/distributed-configuration/etcd-api/
![Page 22: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/22.jpg)
etcd: recursive
used to find all members known to a cluster
![Page 23: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/23.jpg)
etcd: ttl
used with our keep alive from a PostgreSQL runner
![Page 24: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/24.jpg)
etcd: prevValue
used in conjunction with TTL to ensure the leader remains the leader when updating the TTL
![Page 25: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/25.jpg)
etcd: prevExist
used to create a deployment initialization race
![Page 26: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/26.jpg)
Improved with etcd
removed tight coupling in classes:
HA decision process
etcd state interaction
PostgreSQL handler
![Page 27: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/27.jpg)
Issues with etcd
overly aggressive about consensus
instructions for optimization at https://coreos.com/docs/cluster-management/debugging/etcd-
tuning/
![Page 28: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/28.jpg)
Issues with etcd
overly aggressive about consensus
we quit running etcd along side PostgreSQL because we wanted expanding PostgreSQL clusters
![Page 29: etcd based PostgreSQL HA Cluster](https://reader035.vdocuments.mx/reader035/viewer/2022062216/55a6879c1a28ab341e8b4610/html5/thumbnails/29.jpg)
Time for live demo?