migrating to xtradb cluster

Migrating to XtraDB Cluster

Jay Janssen, MySQL Consulting LeadPercona Live University, Toronto

March 22nd, 2013

Overview of Xtradb Cluster

• Percona Server 5.5 + Galera Codership sync repl addon

• “Cluster of MySQL nodes”– Have all the data, all the time– Readable and writeable

• Established cluster:– Synchronizes new nodes– Handles node failures– Handles Node resync– Split brain protection (quorum)

Company Confidential December 2010

-2-

• Standard MySQL replication– into or out of the cluster

• Write scalable to a point– all writes still hit all nodes

• LAN/WAN architectures– write latency ~1 RTT

• MyISAM experimental– big list of caveats– designed and built for Innodb

XtraDB Cluster FAQ


-3-

• Is it production worthy?

– Several production users of Galera/PXC

– You should really evaluate your workload to see if it’s a

good fit for Galera/PXC

• What are the limitations of using Galera?

– http://www.codership.com/wiki/doku.php?id=limitations

What you really want to know


-4-

CONFIGURING XTRADB CLUSTER


-5-

• Configured via wsrep_provider_options• Can be a separate network from mysqld• Default cluster replication port is 4567 (tcp)

– Supports multicast

– Supports SSL

• Starting node needs to know one cluster node ip– you can list all the nodes you know and it will find one

that is a member of the cluster

Cluster Replication Config


-6-

• Outside of galera replication (gcomm)• SST

– full state transfers– Donor picked from running cluster, gives full backup to

joiner node– Might be blocking (various methods allowed)– default tcp 4444

• IST– incremental state transfers– default wsrep port + 1 (tcp 4568)

Other intra-cluster communication


-7-

• [mysqld]– wsrep_provider = /usr/lib64/libgalera_smm.so– wsrep_cluster_name - Identify the cluster– wsrep_cluster_address - Where to find the cluster– srep_node_address - tell Galera what IP to use for

replication/SST/IST– wsrep_sst_method - How to synchronize nodes– binlog_format = ROW– innodb_autoinc_lock_mode=2– innodb_locks_unsafe_for_binlog=1 - performance

Essential Galera settings


-8-

• [mysqld]– wsrep_node_name - Identify this node

– wsrep_provider_options - cluster comm opts

• wsrep_provider_options="gcache.size=<gcache size>"

• http://www.codership.com/wiki/doku.php?id=galera_parameters

– wsrep_node_incoming_address=<node mysql IP>

– wsrep_slave_threads - apply writesets in parallel

• http://www.codership.com/wiki/doku.php?id=mysql_options_0.8

Other Galera Settings


-9-

1. [mysqld]2. datadir=/var/lib/mysql3. binlog_format=ROW5. wsrep_cluster_name=trimethylxanthine6. wsrep_cluster_address=gcomm://192.168.70.2,192.168.70.3,192.168.70.48. # Only use this before the cluster is formed9. # wsrep_cluster_address=gcomm://11. wsrep_node_name=percona112. wsrep_node_address=192.168.70.213. wsrep_provider=/usr/lib64/libgalera_smm.so15. wsrep_sst_method=xtrabackup16. wsrep_sst_auth=backupuser:password18. wsrep_slave_threads=220. innodb_locks_unsafe_for_binlog=121. innodb_autoinc_lock_mode=223. innodb_buffer_pool_size=128M24. innodb_log_file_size=64M

Example configuration


-10-

CONVERTING STANDALONE MYSQL TO XTRADB CLUSTER


-11-

• Migrating a single server:– stop MySQL– replace the packages– add essential Galera settings– start MySQL

• A stateless, peerless node will form its own cluster– if an empty cluster address is given (gcomm://)

• That node is the baseline data for the cluster• Easiest from Percona Server 5.5

Method 1 - Single Node


-12-

• All at once (with downtime):– Stop all writes, stop all nodes

after replication is synchronized– skip-slave-start / RESET SLAVE– Start first node - initial cluster– Start the others with

wsrep_sst_mode=skip

• The slaves will join the cluster,skipping SST

• Change wsrep_sst_method !=skip

Method 2 - Blanket changeover


-13-

Method 2 - Blanket changeover


-14-

• No downtime– Form new cluster from one slave– Node replicates from old master

• log-slave-updates on this node

– Test like any other slave– Move more slave nodes to

cluster– Cut writes over to the cluster– Absorb master into cluster.

• Non-skip SST

OPERATIONAL CONSIDERATIONS


-15-

• SHOW GLOBAL STATUS like ‘wsrep%’;• Cluster integrity - same across all nodes

– wsrep_cluster_conf_id - configuration version

– wsrep_cluster_size - number of active nodes

– wsrep_cluster_status - should be Primary

• Node Status– wsrep_ready - indicator that the node is healthy

– wsrep_local_state_comment - status message

– wsrep_flow_control_paused/sent - replication lag feedback

– wsrep_local_send_q_avg - possible network bottleneck

• http://www.codership.com/wiki/doku.php?id=monitoring

Monitoring


-16-

Realtime Wsrep status


-17-

Maintenance


-18-

• Rolling package updates• Schema changes

– potential for blocking the whole cluster– Galera supports a rolling schema upgrade feature

• http://www.codership.com/wiki/doku.php?id=rolling_schema_upgrade

• Isolates DDL to individual cluster nodes• Won’t work if replication events become incompatible

– pt-online-schema-change

• Prefer IST over SST– be sure you know when IST will and won’t work!

Architecture


-19-

• How many nodes should I have?– >= 3 nodes for quorum purposes

• 50% is not a quorum

– garbd - Galera Arbitrator Daemon• Contributes as a voting node for

quorum• Does not store data, but does

replicate

• What gear should I get?– Writes as fast as your slowest node– Standard MySQL + Innodb choices– garbd could be on a cloud server

APPLICATION WORKLOADS


-20-

How (Virtually) Synchronous Writes Work


-21-

• Source node - pessimistic locking– Innodb transaction locking

• Cluster repl - optimistic locking– Before source returns commit:

• replicates to all nodes, GTID chosen• source certifies

– PASS: source applies– FAIL: source deadlock error (LCF)

– Other nodes• receive, certify, apply (or drop)• Certification deterministic on all nodes

– Apply can abort open trxs (BFA)• First commit wins!

Why does the Application care?


-22-

• Workload dependent!• Write to all nodes simultaneously and evenly:

– Increase of deadlock errors on data hot spots

• Can be avoided by– Writing to only one node at a time

• all pessimistic locking happens on one node

– Data subsets written only on a single node

• e.g., different databases, tables, rows, etc.

• different nodes can handle writes for different datasets

• pessimistic locking for that subset only on one node

Workloads that work best with Galera


-23-

• Multi-node writing– Low Data hotspots

– Auto-increment-offset/increment is ok

• Galera sets automatically by default

• Small transactions– Expose serialization points in replication and certification

• Tables– With PKs

– Innodb

– Avoid triggers, FKs, etc. -- supported, but problematic

APPLICATION CLUSTER HA


-24-

Application to Cluster Connects


-25-

• For writes:– Best practice: (any) single node

• For Reads:– All nodes load-balanced

• Can be hashed to hit hot caches

• Geo-affinity for WAN setups

– Replication lag still possible, but minimal. Avoidable with

wsrep_causal_reads (session|global).

• Be sure to monitor that nodes are functioning members of the cluster!

Load balancing and Node status


-26-

• Health check:– TCP 3306– SHOW GLOBAL STATUS

• wsrep_ready = ON• wsrep_local_state_comment !~ m/

Donor/?• /usr/bin/clustercheck

• Maintain a separate rotations:– Reads

• RR or Least Connected all available

– Writes• Single node with backups on failure

Load Balancing Technologies


-27-

• glbd - Galera Load Balancer– similar to Pen, can utilize multiple cores

– http://www.codership.com/products/galera-loadbalancer

• HAProxy– httpchk to monitor node status

– http://www.percona.com/doc/percona-xtradb-cluster/

haproxy.html

• Watch out for a lot of TIME_WAIT conns!

HAProxy Sample config


-28-

1. # Random Reads connection (any node)2. listen all *:33063. server db1 10.2.46.120:3306 check port 92004. server db2 10.2.46.121:3306 check port 92005. server db3 10.2.46.122:3306 check port 92007. # Writer connection (first available node)8. listen writes *:43069. server db1 10.2.46.120:3306 track all/db110. server db2 10.2.46.121:3306 track all/db2 backup11. server db3 10.2.46.122:3306 track all/db3 backup

Resources


-29-

• XtraDB Cluster homepage and documentation:– http://www.percona.com/software/percona-xtradbcluster/

• Galera Documentation:– http://www.codership.com/wiki/doku.php

• PXC tutorial (self-guided or at a conference):– https://github.com/jayjanssen/percona-xtradb-cluster-

tutorial

• http://www.mysqlperformanceblog.com/category/xtradb-cluster/

THANK YOU

Jay Janssen

@jayjanssen

http://www.percona.com/software/percona-xtradb-cluster


-30-

migrating to xtradb cluster

Software