high availability - brett thurber - manageiq design summit 2016

30
High Availability ManageIQ/CloudForms Brett Thurber - Red Hat June 2016

Upload: manageiq

Post on 16-Apr-2017

380 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: High Availability - Brett Thurber - ManageIQ Design Summit 2016

High AvailabilityManageIQ/CloudForms

Brett Thurber - Red HatJune 2016

Page 2: High Availability - Brett Thurber - ManageIQ Design Summit 2016

AgendaIntroduction & Acknowledgements

What is HA?

Traditional HA

What’s on the horizon?

pglogical

BDR

Containers & Kubernetes

Summary

Q & A

Page 3: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Introduction & Acknowledgements

Brett Thurber - RHCT, RHCE, RHCDS, RHCA, RHCVA

20+ years of IT experience

Been with Red Hat since 2011

Team lead in Systems Engineering focused on management and integrated solutions

Worked with MIQ/CloudForms since 2013

Authored 11 Reference Architectures

Presented at RH Summit 2015 - Application portability & interoperability with Red Hat Cloud Infrastructure

Contact: [email protected]

Special thanks to:

Gregg Tanzillo, Nick Carboni, Joe Rafaniello

Page 4: High Availability - Brett Thurber - ManageIQ Design Summit 2016

What is HA?“A system or component that is continuously operational for a desirably long

length of time. Availability can be measured relative to "100% operational" or "never failing."” - Source: SearchDataCenter

“A characteristic of a system, which aims to ensure an agreed level of operational performance for a higher than normal period.” - Source: Wikipedia

Page 5: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Traditional HA

Page 6: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Heavy Lift

Highly complex and resource intensive

Shared storage

iSCSI, NFS, fibre channel

Multiple number of bare metal or VM hosts

Minimum of 2 cluster hosts for pgsql database

2+ MIQ/CFME instances

Haproxy to load balance

Complex and time intensive deployment

Typical deployment time measured in days

Stretch cluster risks

Expensive, dedicated high speed connection

Supportability

Data consistency

Page 7: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Active/Passive Deployment Pattern: intra-site

MIQ/CFME

haproxy

VIP

MIQ/CFME

pgsql pgsql pacemaker

VIP

Page 8: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Active/Passive Deployment Pattern: inter-site

MIQ/CFME

haproxy

VIP

MIQ/CFME

pgsql pgsql

Streaming Replication

Site 1 Site 2

Page 9: High Availability - Brett Thurber - ManageIQ Design Summit 2016

What’s on the horizon?

Page 10: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Interesting possibilities...Emerging technologies present the possibility of reducing the complexity of HA

and postgresql.

pglogical

BDR

Containers & Kubernetes

Page 11: High Availability - Brett Thurber - ManageIQ Design Summit 2016

pglogical

Page 12: High Availability - Brett Thurber - ManageIQ Design Summit 2016

pglogical

What is pglogical?

pglogical offers Logical Replication as a PostgreSQL extension and is a replacement for streaming replication

Introduced in postgresql 9.4 (MIQ Capablanca, CloudForms 4.1)

Less complex solution for database replication

pglogical works on a per-database level, not whole server level like physical streaming replication

One Provider may feed multiple Subscribers without incurring additional disk write overhead

One Subscriber can merge changes from several origins and detect conflict between changes with automatic and configurable conflict resolution

Replication across major releases is supported (9.4 and >)

Page 13: High Availability - Brett Thurber - ManageIQ Design Summit 2016

How would it work?

pgsql pgsql pgsql pgsql

VMDB Database

MIQ/CFME MIQ/CFME

haproxy

VIP

SubscribersPublisher

Page 14: High Availability - Brett Thurber - ManageIQ Design Summit 2016

What about failover?

pgsql pgsql pgsql pgsql

VMDB Database

MIQ/CFME MIQ/CFME

haproxy

VIP

SubscribersPublisher

??? ??? ???

Page 15: High Availability - Brett Thurber - ManageIQ Design Summit 2016

pglogical limitations...Not suitable for failover

Automatic DDL (data definition language) replication is not supported

Logical decoding doesn't decode catalog changes directly. So the plugin can't just send a CREATE TABLE statement when a new table is added.

If the data being decoded is being applied to another PostgreSQL database then its table definitions must be kept in sync via some means external to the logical decoding plugin itself, such as:

Event triggers using DDL deparse to capture DDL changes as they happen and write them to a table to be replicated and applied on the other end

Doing DDL management via tools that synchronise DDL on all nodes

Page 16: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Bi-Directional Replication

Page 17: High Availability - Brett Thurber - ManageIQ Design Summit 2016

BDRWhat is BDR?

Bi-Directional Replication (BDR) is an asynchronous multi-master replication system for PostgreSQL, specifically designed to allow geographically distributed clusters. Supporting up to 48 nodes (and possibly more in future releases). BDR is a low overhead, low maintenance technology for distributed databases.

BDR excels in environments where users are distributed across high-latency and/or unreliable network links where conventional tightly-coupled clustering software does not work well

Support for DDL replication and Global DDL locking

Page 18: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Active/Active BDR Deployment Pattern: intra-site

MIQ/CFME

haproxy

VIP

MIQ/CFME

pgsql pgsqlBDR

Page 19: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Active/Active BDR Deployment Pattern: inter-site

MIQ/CFME

haproxy

VIP

MIQ/CFME

pgsql pgsqlBDR

Site 1 Site 2

Page 20: High Availability - Brett Thurber - ManageIQ Design Summit 2016

BDR limitations...Still under development; not production ready (requires modified version of 9.4)

Asynchronous replication

Changes made on one BDR node are not replicated to other nodes before they are committed locally. As a result the data is not exactly the same on all nodes at any given time

Non-shared storage architecture means additional storage space considerations

Page 21: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Containers & Kubernetes

Page 22: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Containers?Docker image for ManageIQ under development

Currently monolithic

Allows for a MIQ container image to be deployed to Atomic Host and other container providers

Service decoupling on the horizon

Utilizing kubernetes pods, allows for:

Service distribution across multiple hosts

Persistent storage to be used for database

Highly available and scalable architecture

Easily upgradeable with quick roll-back capabilities

Self-healing

Page 23: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Possible Container Architecture

Container

Pod

httprails

pgsql

Persistent Storage

Container

Pod

httprails

pgsql

Persistent StorageBDR

Node Proxy

Page 24: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Possible Container Architecture (con’t)

Container

Pod

httprails

pgsql

Persistent Storage

Container

Pod

httprails

pgsql

Persistent StorageBDR

NodeProxy

NodeProxy

Overlay Network

Page 25: High Availability - Brett Thurber - ManageIQ Design Summit 2016

What about networking?Kubernetes imposes the following network rules:

All containers can communicate with all other containers without NATAll nodes can communicate with all containers (and vice-versa) without NATThe IP that a container sees itself as is the same IP that others see it as

Supported overlay networks

L2 networks and linux bridging

Flannel

OpenVSwitch

Romana

OpenShift SDN

etc...

Page 26: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Summary

Page 27: High Availability - Brett Thurber - ManageIQ Design Summit 2016

In closing….

Traditional HA clustering is complex, expensive, time consuming to implement

and poses some support limitations

pglogical is a good replacement for streaming replication however lacks some

needed features to make it a viable HA solution

BDR bridges the necessary gaps with pglogical to offer a viable HA solution

however is still growing in maturity (> postgresql 9.4)

Containers, coupled with Kubernetes, offer compelling use cases to include self-

healing, upgrades, scaling and high availability

Page 28: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Q & A

Page 29: High Availability - Brett Thurber - ManageIQ Design Summit 2016

Thank You!