mysql and ceph - percona · 64x mysql instances on ceph cluster: each with 25x tpc-c warehouses 1%...
TRANSCRIPT
![Page 1: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/1.jpg)
MySQL and Ceph 2 August 2016
![Page 2: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/2.jpg)
WHOIS
Brent Compton and Kyle Bader Storage Solution Architectures Red Hat
Yves Trudeau Principal Architect Percona
![Page 3: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/3.jpg)
AGENDA
MySQL on Ceph
• Ceph Architecture • MySQL on Ceph RBD • Sample Benchmark Results • Hardware Selection Considerations
![Page 4: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/4.jpg)
Why MySQL on Ceph
![Page 5: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/5.jpg)
• Ceph #1 block storage for OpenStack clouds
• 70% apps on OpenStack use LAMP stack
• MySQL leading open-source RDBMS
• Ceph leading open-source software-defined storage
WHY MYSQL ON CEPH? MARKET DRIVERS
![Page 6: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/6.jpg)
• Shared, elastic storage pool on commodity servers
• Dynamic DB placement
• Flexible volume resizing
• Live instance migration
• Backup block pool to object pool
• Read replicas via copy-on-write snapshots
• … commonality with public cloud deployment models
WHY MYSQL ON CEPH? EFFICIENCY DRIVERS
![Page 7: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/7.jpg)
CEPH ARCHITECTURE
![Page 8: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/8.jpg)
ARCHITECTURAL COMPONENTS
RGW A web services
gateway for object storage, compatible with S3 and Swift
LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD A reliable, fully-distributed block device with cloud
platform integration
CEPHFS A distributed file
system with POSIX semantics and scale-
out metadata
APP HOST/VM CLIENT
![Page 9: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/9.jpg)
RADOS COMPONENTS
OSDs • 10s to 10000s in a cluster • Typically one per disk • Serve stored objects to clients • Intelligently peer for replication & recovery
Monitors • Maintain cluster membership and state • Provide consensus for distributed decision-making • Small, odd number • These do not serve stored objects to clients
![Page 10: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/10.jpg)
CEPH OSD
![Page 11: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/11.jpg)
RADOS CLUSTER
RADOS CLUSTER
![Page 12: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/12.jpg)
WHERE DO OBJECTS LIVE?
??
![Page 13: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/13.jpg)
A METADATA SERVER?
1
2
![Page 14: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/14.jpg)
CALCULATED PLACEMENT
![Page 15: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/15.jpg)
EVEN BETTER: CRUSH
CLUSTER PLACEMENT GROUPS (PGs)
![Page 16: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/16.jpg)
CRUSH IS A QUICK CALCULATION
CLUSTER
![Page 17: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/17.jpg)
DYNAMIC DATA PLACEMENT
CRUSH: • Pseudo-random placement algorithm
• Fast calculation, no lookup • Repeatable, deterministic
• Statistically uniform distribution • Stable mapping
• Limited data migration on change • Rule-based configuration
• Infrastructure topology aware • Adjustable replication • Weighting
![Page 18: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/18.jpg)
DATA IS ORGANIZED INTO POOLS
CLUSTER POOLS (CONTAINING PGs)
POOL A
POOL B
POOL C
POOL D
![Page 19: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/19.jpg)
ACCESS METHODS
![Page 20: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/20.jpg)
ARCHITECTURAL COMPONENTS
RGW A web services
gateway for object storage, compatible with S3 and Swift
LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD A reliable, fully-distributed block device with cloud
platform integration
CEPHFS A distributed file
system with POSIX semantics and scale-
out metadata
APP HOST/VM CLIENT
![Page 21: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/21.jpg)
ARCHITECTURAL COMPONENTS
RGW A web services
gateway for object storage, compatible with S3 and Swift
LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD A reliable, fully-distributed block device with cloud
platform integration
CEPHFS A distributed file
system with POSIX semantics and scale-
out metadata
APP HOST/VM CLIENT
![Page 22: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/22.jpg)
STORING VIRTUAL DISKS
RADOS CLUSTER
![Page 23: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/23.jpg)
STORING VIRTUAL DISKS
RADOS CLUSTER
![Page 24: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/24.jpg)
STORING VIRTUAL DISKS
RADOS CLUSTER
![Page 25: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/25.jpg)
PERCONA SERVER ON KRBD
RADOS CLUSTER
![Page 26: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/26.jpg)
TUNING MYSQL ON CEPH
![Page 27: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/27.jpg)
HEAD-TO-HEAD: MYSQL ON CEPH VS. AWS
31
18 18
78
-
10
20
30
40
50
60
70
80
90
IOPS
/GB
(S
ysbe
nch
Writ
e)
AWS EBS Provisioned-IOPS Ceph on Supermicro FatTwin 72% Capacity Ceph on Supermicro MicroCloud 87% Capacity Ceph on Supermicro MicroCloud 14% Capacity
![Page 28: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/28.jpg)
TUNING FOR HARMONY OVERVIEW
Tuning MySQL • Buffer pool > 20%
• Flush each Tx or batch?
• Parallel double write-buffer
flush Tuning Ceph • RHCS 1.3.2, tcmalloc 2.4
• 128M thread cache
• Co-resident journals
• 2-4 OSDs per SSD
![Page 29: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/29.jpg)
TUNING FOR HARMONY SAMPLE EFFECT OF MYSQL BUFFER POOL ON TpmC
-
200,000
400,000
600,000
800,000
1,000,000
1,200,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpm
C
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool
![Page 30: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/30.jpg)
TUNING FOR HARMONY SAMPLE EFFECT OF MYSQL Tx FLUSH ON TpmC
-
500,000
1,000,000
1,500,000
2,000,000
2,500,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpm
C
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
Batch Tx flush (1 sec) Per Tx flush
![Page 31: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/31.jpg)
TUNING FOR HARMONY CREATING A SEPARATE POOL TO SERVE IOPS WORKLOADS
Creating multiple pools in the CRUSH map
• Distinct branch in OSD tree
• Edit CRUSH map, add SSD rules
• Create pool, set crush_ruleset to SSD rule
• Add Volume Type to Cinder
![Page 32: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/32.jpg)
TUNING FOR HARMONY IF YOU MUST USE MAGNETIC MEDIA
Reducing seeks on magnetic pools
• RBD cache is safe
• RAID Controllers with write-back cache
• SSD Journals
• Software caches
![Page 33: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/33.jpg)
HARDWARE SELECTION CONSIDERATIONS
![Page 34: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/34.jpg)
ARCHITECTURAL CONSIDERATIONS UNDERSTANDING THE WORKLOAD
Traditional Ceph Workload • $/GB
• PBs
• Unstructured data
• MB/sec
MySQL Ceph Workload • $/IOP
• TBs
• Structured data
• IOPS
![Page 35: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/35.jpg)
ARCHITECTURAL CONSIDERATIONS FUNDAMENTALLY DIFFERENT DESIGN
Traditional Ceph Workload • 50-300+ TB per server
• Magnetic Media (HDD)
• Low CPU-core:OSD ratio
• 10GbE->40GbE
MySQL Ceph Workload • < 10 TB per server
• Flash (SSD -> NVMe)
• High CPU-core:OSD ratio
• 10GbE
![Page 36: MySQL and Ceph - Percona · 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool . TUNING](https://reader034.vdocuments.mx/reader034/viewer/2022042910/5f3fe79ca67f3d3f517abbaa/html5/thumbnails/36.jpg)
Ceph Test Drive: bit.ly/cephtestdrive
Percona Blog: https://www.percona.com/blog/2016/07/13/using-ceph-mysql/
Author: Yves Trudeau