building a high-availability postgresql cluster - team...
TRANSCRIPT
Building a High-Availability PostgreSQL Cluster
Presenter: Devon Mizelle System Administrator
Co-Author: Steven Bambling System Administrator
ARIN — “critical internet infrastructure”
What is ARIN?•Regional internet registry for North
America and parts of the Caribbean.
•Distributes IPv4 & IPv6 addresses and
Autonomous System Numbers (Internet
number resources) in the region
•Provides authoritative WHOIS services
for number resources in the region
2
ARIN’s Internal Data
3
!Inside of our database exists all of the v4 and v6 networks that we manage, the organizations that they belong to, and the contacts at those organizations. This means that data integrity and how we store said data is extremely important.
Requirements
4
Multi-‐member Automatic Failover Prevent a ‘tainted’ master from coming online Needs to be ACID-‐Compliant
Why Not Slony or pgpool-II?
• Slony replaces pgSQL’s replication – Why do this? –Why not let pgSQL handle it?
• Pgpool is not ACID-Compliant – Doesn’t confirm writes to multiple nodes
5
Our solution
• CMAN / Corosync – Red Hat + Open-source solution for cross-
node communication • Pacemaker – Red Hat and Novell’s solution for service
management and fencing • Both under active development by
Clusterlabs
6
Interested in using it due to active development by Clusterlab
CMAN/ Corosync
• Provides a messaging framework between nodes
• Handles a heartbeat between nodes – “Are you up and available?” – Does not provide ‘status’ of service,
Pacemaker does • Pacemaker uses Corosync to send
messages between nodes
7
CMAN has the ability to do more - but we just use it as a messaging framework
CMAN / Corosync
8
Builds a cluster ‘ring’ using a configuration file Used by Pacemaker in order to pass status messages between the nodes Simply a framework for communication – no heavy lifting in our implementation
About Pacemaker
• Developed / maintained by Red Hat and Novell • Scalable – Anywhere from a two-node to a 16-
node setup • Scriptable – Resource scripts can be written in
any language – Monitoring – Watches out for service state changes – Fencing – Disables a box and switches roles when
failures occur • Shareable database between nodes about
status of services / nodes
9
Pacemaker
10
Master
AsyncSync
?
An XML ‘database’ (known as a CIB -‐ cluster information base) is generated with the status of each resource and passed between nodes The state of pgSQL is controlled by Pacemaker itself Pacemaker uses a ‘resource script’ to interact with pgSQL Can determine the state of the service (Master / Sync / Async)
Other Pacemaker Resources
11
Fencing IP Addresses
Pacemaker also handles the following resources besides PGSQL: * Fencing of resources * IP Address colocation
How does it all tie together?From the bottom up…
12
Pacemaker
13
Client “vip”Replication “vip”
Master
Sync Async App
All slaves in the cluster point to a replication ‘vip’ This interface moves to whichever node is the master -‐ this is called a colocation constraint Another ‘vip’ for our application servers to connect to follows the master as well
Event Scenario
14
?X
XMaster Sync AsyncMaster SyncAsync
In the event that a node becomes unavailable, cman notifies pacemaker to ‘fence’ or shut off communication to the node via SNMP to the switch The SYNC slave becomes the Master The ASYNC slave becomes the SYNC slave Upon manual recovery, the old Master becomes the async slave If any resources inside of Pacemaker on the master fail their monitoring check, fencing occurs as well These resources include:
Both replication and client ‘vips’
PostgreSQL
• Still in charge of replicating data • The state of the service and how it
starts is controlled by Pacemaker
15
Layout
16
💙 💙
MasterSlave Slave
cman cman cman
Client
Using Tools to Look DeeperIntrospection…
17
# crm_mon -i 1 -Arf
18
We disable quorum within the pacemaker HA cluster to allow for failure down to a single node cluster in the event multiple nodes fail • 8 Resources configured • ofce::heartbeat::IPaddr2 is the resource used to create the vip – can be shell, ruby, etc. • Primitive vs multistate
• Primitive – only runs on one of the nodes in the cluster (vips, fencing) • Multi-‐state resource – runs on multiple nodes (pgsql)
• The vips are colocated. If anything happens to either of them, the entire node fails and moves to the next master • There is a specific check interval for each resource • stonith for fencing
# crm_mon –i 1 -Arf (cont)
19
* All of the status comes from the pgsql pacemaker resource script • receiver-‐status is error because the resource is written to monitor and check for cascading. We don’t use cascading, haven’t invested cycles • Master-‐postgresql is the ‘weight’. Uses the weight to determine whom should be promoted next in line, which is why async has –INFINITY • STREAMING
Questions?20
Devon Mizelle