high-availability using mysql fabric

61
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | High Availability using MySQL Fabric: Managing Farms of Servers Mats Kindahl Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Upload: mats-kindahl

Post on 23-Jul-2015

296 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: High-Availability using MySQL Fabric

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

High Availability using MySQL Fabric:Managing Farms of Servers

Mats Kindahl

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 2: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.2

The following is intended to outline our general product direction. It is intended

for information purposes only, and may not be incorporated into any contract.

It is not a commitment to deliver any material, code, or functionality, and

should not be relied upon in making purchasing decision. The development,

release, and timing of any features or functionality described for Oracle’s

products remains at the sole discretion of Oracle.

Safe Harbor Statement

Page 3: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.3

Program Agenda

Building reliable systems MySQL Fabric overview Managing redundancy Procedure automation and the Executor Failure detection and failure handling

Page 4: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.4

Program Agenda

Using Fabric with existing high-availability setups Making Fabric highly available Thoughts for the future Closing remarks

Page 5: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.5

Insert Picture Here

Building Reliable SystemsInsert Picture Here

Page 6: High-Availability using MySQL Fabric

Copyright © 20135 Oracle and/or its affiliates. All rights reserved.6

Insert Picture Here

High-availability is an integral part of designing a reliable system

Building for reliability

Page 7: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.7

What causes downtime?

● System failures● Hardware faults

● Software bugs

● Disasters

● Maintenance

● User errors

Page 8: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.8

High-availability concepts

● Redundancy● Duplicate critical components

● Monitoring● Detecting failing components● Monitor load

● Procedures● Activate replacements● Distribute load

Page 9: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.9

High-availability solutions

● Primary-seconday approach

● MySQL Replication

● Shared-nothing clusters

● MySQL Cluster

● MySQL Group Replication (not GA)

● Tightly coupled clusters

● DRBD

● WSFC

● Solaris Clustering

● Oracle Clusterware

● Oracle VM High Availability

Page 10: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.10

Insert Picture Here

MySQL Fabric Overview

Page 11: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.11

What is MySQL Fabric?

An extensible and easy-to-use framework for managing a farm of MySQL servers supporting high-availability and sharding.

Page 12: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.12

What does it mean?

● Management system● Manages a MySQL Farm● Distributed framework

● Framework● Procedure execution● State store● Transaction Routing

● Extensible● High-availability groups● Sharding● Cloud support

● Written in Python

● MySQL 5.6 (or later)

● Open Source● You can participate

Page 13: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.13

Birds-eye view

MySQL Fabric Node

Application

Operator

High-Availability Groups (Shards)

Page 14: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.14

MySQL Fabric Components

● Fabric-aware connectors● Enhanced Connector API● Python, PHP, Java, .NET, C

● MySQL Fabric controller● Manage farm meta-data● Provide status information● Execute Procedures

● MySQL servers● Organized in high-availability groups● Handle application data

MySQL Fabriccontroller node

High-availabilitygroup

Application withFabric-aware connectors

Page 15: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.15

MySQL Fabric Controller Architecture

XML-RPC

MySQL-RPC

AMQP

Protocol Server

XML-RPC

MySQL-RPC

AMQP

Protocol Server

Sharding

Master-Slave

Providers

Extensions

StateStore

XML-RPC

MySQL-RPC

AMQP

Protocol Server

Executor

Model

Persistance

Fabric Core

Requests

Events

Results

Eample only!

Eample only!

Page 16: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.16

Insert Picture Here

Managing Redudancy

Page 17: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.17

High-Availability Group Concept

● Group of servers● Hardware redundancy● Data redundancy

● Generic Concept● Implementation-independent● Self-managed or externally managed

● Different Types● Primary-Backup (Master-Slave)● Shared or Replicated Storage● MySQL Cluster

DRBD

ndbdndbd

ndbd ndbdDefault

Eamples Only

Not Implemented

Page 18: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.18

● Create a logical group for the servers

● Empty initially

mysqlfabric group create my_group --description='My Group'

Creating a high-availability groupCreate an empty group

Page 19: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.19

● Add servers to group

● Group will have no master

● All servers are secondaries (!)

Creating a high-availability groupAdding servers to the group

mysqlfabric group add my_group server1.example.com

mysqlfabric group add my_group server2.example.com

mysqlfabric group add my_group server3.example.com

Page 20: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.20

● Promote one secondary to primary

● Selects secondary at random

● Specific secondary can be selected

Creating a high-availability groupPromote a primary

mysqlfabric group promote my_group

mysqlfabric group promote my_group --slave_id='server1.example.com'

Page 21: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.21

● Enable built-in failure detector

● Monitor servers in group

Creating a high-availability groupEnable failure detector

mysqlfabric group activate my_group

Page 22: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.22

● On primary failure● Mark primary as faulty● Trigger fail-over

● On secondary failure● Mark secondary as faulty

Creating a high-availability group

Page 23: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.23

Insert Picture Here

Procedure Automation and the Executor

Page 24: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.24

● Management Procedures● Fail-over● Slave promotion● Shard split

● Triggered on events● Crashing server● Administrative decision● Increasing load

● Resilient execution● Controller node can crash● Recover partially executed procedure

Automating management of a farm

Find Candidate

Check Candidate

Disable Read-only

Process Backlog

Re-direct Slaves

SLAVE_PROMOTED

SERVER_LOST

Page 25: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.25

MySQL Fabric executor

● Event driven executor● Events will trigger execution of procedures● Procedures can trigger events themselves● Each step of a procedure is called a job

● Procedures● Written in Python● Interacts with servers● Write state changes into backing store● Lock manager for conflict resolution

– Conservative 2PL– Avoid deadlocks

Queue

BackingStore

Events

Page 26: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.26

Example: keep high-availability profile

Page 27: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.27

● Register job for event● @on_event decorator● Register job with event

● Fetch group of lost server

● Fetch new server from provider

● Add server to group

@on_event(SERVER_LOST)def _add_server(group_id, server_uuid): group = Group.fetch(group_id) machines = PROV.create_machines( parameters ) server = MySQLServer( server_uuid, address ) MySQLServer.add(server) group.add(server) _configure_as_slave(server)

Automating adding a server

Page 28: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.28

● Before starting a job:● Aquire the necessary locks● Checkpoint execution state in backing store● Start a transaction on the backing store

● When executing job:● Updates to backing store inside transaction● Interact with servers

● After executing a job:● Mark job completed in internal log● Commit transaction on backing store

What about crashes?

Queue

BackingStore

Events

MySQL Fabric execution flow

Page 29: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.29

● Two types of jobs:● Idempotent: Restart the job● Not idempotent: Execute compensation

● Recovery procedure● Start the executor● Collect unfinished checkpoints● Execute compensation actions

… if there are any

● Re-schedule each job in checkpoint

Queue

BackingStore

Events

MySQL Fabric executor recovery

Page 30: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.30

Insert Picture Here

Failure detection and failure handling

Page 31: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.31

● Group level detection● Fabric node ping servers in group● Servers need to be managed by Fabric

● On primary failure● Mark primary as faulty● Trigger fail-over of connectors and slaves

● On secondary failure● Mark secondary as faulty

Built-in failure detector

Page 32: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.32

Built-in failure detectorConfiguration

● Detections● Number of failed pings before

marked as faulty

● Detection Interval● Interval between server ping,

in seconds

● Detection Timeout● Timeout for ping, in seconds

[failure_tracking]detections = 3detection_interval = 6detection_timeout = 1

Page 33: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.33

● External failure detectors● Connectors● Custom failure detectors

● Reporting API● Error: suspected server failure● Failure: server is known to have failed

● Reporting server error● Trigger fail-over if threshold is exceeded

● Reporting server failure● Trigger immediate fail-over

External failure detectors

?

!

MySQL Fabriccontroller node

Page 34: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.34

External failure detectorsConfiguration

● Notifications● Error threshold

● Notification clients● Threshold for number of

unique clients

● Notification interval● Notification window

[failure_tracking]notifications = 300notification_clients = 50notification_interval = 60failover_interval = 0prune_time = 3600

Page 35: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.35

External failure detectorsConfiguration

● Failover interval● Minimum interval between

failovers● Used to prevent flapping

● Prune time● Size of error log (in seconds)

to keep

[failure_tracking]notifications = 300notification_clients = 50notification_interval = 60failover_interval = 0prune_time = 3600

Page 36: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.36

Connector as external failure detector

● Error reporting from connector● Depends on connector support● Report suspected failures

● Enabling error reporting● Error reporting off by default

● Avoid a thundering herd● Do not enable error reporting

for all connectors!● Failing server will cause all

connectors to report failure

fabric_config = { … 'report_errors': True, …}

cnx = connect( … fabric=fabric_config …)

Page 37: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.37

Connector as external failure detectorError reporting

● Default errors reported

● Extra errors can be added● extra_failure_report

CR_SERVER_LOSTCR_SERVER_GONE_ERRORCR_CONN_HOST_ERRORCR_CONNECTION_ERRORCR_IPSOCK_ERROR

from mysql.connector.fabric import extra_failure_report

extra_failure_report([error1, error2, …, errorn])

Page 38: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.38

Connector as external failure detectorCache invalidation

● Cache invalidation by default on● Server Lost (CR_SERVER_LOST)● Server read-only (ER_OPTION_PREVENTS_STATEMENT)

from mysql.connector.fabric import RESET_CACHE_ON_ERROR

RESET_CACHE_ON_ERROR.append(error)

Page 39: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.39

Insert Picture Here

Using Fabric with Existing High-availability Setups

Page 40: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.40

Using Fabric with Existing Solution

● Servers already managed● Group Based Solutions● Virtual IP-based solutions

● Fabric as lookup server● Connectors can route transactions● Application can retrieve information from Fabric

● Update state store only

Page 41: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.41

Example: An existing setup

● DRBD for redundancy● Disk replicated

● Pacemaker for fail-over● Heartbeat detect failure● Resource agent handle

fail-over

● Fabric as lookup server

● Fabric for routing transactions

Secondary Node

Primary Node

DRBD Replication

Pacemaker Pacemaker

Page 42: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.42

Example: An existing setupCreate a group

● Create a group

● Add server to group● Fabric should only update state store

● “Promote” the DRBD primary to be primary in group

mysqlfabric group create my_group

mysqlfabric group add my_group server1.example.com --update_only

mysqlfabric group promote my_group --update_only --slave_id=...

Page 43: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.43

Example: An existing setupUpdate resource agent

● Change resource agent script● On Ubuntu: /usr/lib/ocf/resource.d/heartbeat/mysql

● Update resource agent actions to inform Fabric● Remove old server● Only update the state store

mysqlfabric group demote --update_only --slave_id=7bcb0804-...

mysqlfabric group remove --update_only 7bcb0804-...

Page 44: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.44

Example: An existing setupUpdate resource agent

● Change resource agent script● On Ubuntu: /usr/lib/ocf/resource.d/heartbeat/mysql

● Update resource agent actions to inform Fabric● Add standby server● Only update the state store

mysqlfabric group add --update_only standby.example.com

mysqlfabric group promote --update_only 8308b0c4-...

Page 45: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.45

Insert Picture Here

Making Fabric highly available

Page 46: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.46

Making Fabric highly available

● Standard deployment● Fabric node and state store on

same machine● Need to use TCP

– Socket connection not available yet (Bug#71946)

● Three things can fail:● State store● Fabric node● Machine

Page 47: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.47

Making Fabric highly availableHandling state store failure

● If state store connection is lost:● Fabric retry until state store

becomes available● Ongoing transactions fail● Fabric report error if

connection not recovered “quickly enough”

● Solution: restart state store● MySQL handle recovery● Fabric re-connect

automatically

Page 48: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.48

Making Fabric highly availableHandling state store failure

● Connection timeout● Timeout (in seconds) for

connection attempt

● Connection attempts● Number of attempts before

reporting state store failed

● Connection interval● Delay (in seconds) between

connection attempts

[storage]connection_timeout = 6connection_attempts = 6connection_interval = 1

Page 49: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.49

Making Fabric highly availableHandling Fabric controller node failure

● If Fabric node is lost:● Ongoing jobs fail● Execution state checkpointed

● On Fabric node restart:● Execution state recovered

● Solution: restart Fabric node● Detect failure

– Local ping script● Restart Fabric node

– init.d script● Neither distributed with Fabric

Page 50: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.50

Making Fabric Highly AvailableHandling machine failures

● If the machine fails:● State store is lost● Fabric node is lost● Catastrophic failures can

prevent machine recovery

● Solution:● Replicate meta-data● Detect machine failure● Activate duplicate deployment

Page 51: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.51

Making Fabric Highly AvailableReplicate meta-data

● Replicate state store● DRBD● MySQL Cluster● MySQL Replication

● Configure DRBD● Version 8.3 or later● Replicate block device

● Configure MySQL Servers● Data directory on replicated

device

Page 52: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.52

Making Fabric Highly AvailableReplicate meta-data

● Active node● MySQL Fabric● MySQL Server● DRBD primary

● Passive node● DRBD secondary● Server and Fabric started on

fail-over

Page 53: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.56

Making Fabric Highly AvailableDetect machine failure & activate replacement

● Detecting machine failure● Corosync● Version 2.0 or later

● Activate Replacement● Pacemaker● Version 1.1 or later

Page 54: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.57

Making Fabric Highly AvailableDetect machine failure & activate replacement

● Configure MySQL Fabric● State store in DRBD volume

● Configure Corosync● Set no-quorum-policy to

'ignore'– Prevent remaining node to

shut down● Turn off STONITH

– Node will commit suicide

Page 55: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.58

Making Fabric Highly AvailableDetect machine failure & activate replacement

● Configure Pacemaker● Add MySQL Fabric resource

agent● Colocate Fabric, DRBD, and

MySQL and order them

● Avoiding split-brain● Reliably detect network

partition● Ping reliable resource

– Example: Router

Page 56: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.61

Insert Picture Here

Closing Remarks &Ideas for the Future

Page 57: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.62

Multi-Node FabricReplicated State Machine

● Multiple Fabric Nodes● Built-in support● Fail-over● Local read instance● Distributed execution

● Replicated State Machine● Coordinate procedure execution● Automatic fail-over● Paxos or Raft-like implementation

Page 58: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.63

More Flexibility

● Server Providers● Amazon AWS● Kubernetes?

● Built-in high-availability group types● DRBD● MySQL Cluster● Amazon RDS?

Page 59: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.64

MySQL Fabric ResourcesUseful links

● Download and try

● http://dev.mysql.com/downloads/utilities/

● MySQL Fabric Documentation

● http://dev.mysql.com/doc/mysql-utilities/1.5/en/fabric.html

● Forum (MySQL Fabric, Sharding, HA, Utilities)

● http://forums.mysql.com/list.php?144

Page 60: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.65

MySQL Fabric ResourcesBlogs

● MySQL High-Availability

● http://mysqlhighavailability.com

● Mats Kindahl

● http://mysqlmusings.blogspot.com

● Alfranio Correia

● http://alfranio-distributed.blogspot.com

● Narayanan Venkateswaran

● http://vnwrites.blogspot.com

Page 61: High-Availability using MySQL Fabric

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.66

Thank You!