linux internals for mysql dbas - percona · pdf filelinux internals for mysql dbas ryan lowe...

43
Linux Internals For MySQL DBAs Ryan Lowe Marcos Albe Chris Giard Daniel Nichter Syam Purnam Emily Slocombe Le Peter Boros

Upload: lexuyen

Post on 06-Feb-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

Linux Internals For MySQL DBAs

Ryan Lowe Marcos Albe Chris Giard

Daniel Nichter Syam Purnam

Emily Slocombe Le Peter Boros

Linux Kernel• It’s big (almost 20 million lines of code)

• It’ll take you YEARS to be an expert

• Resources:

• linuxfromscratch.org

• kernel.org

• man pages

• lwn.net

What do we care about?

• Networking

• Storage

• Memory

• Processing

What do we care about?

• Storage

• Memory

• Processing

• Networking

/proc

/proc• Processes and other system information in a

hierarchical file-like structure

• Interaction between kernel space and user space

• Exposes kernel knobs and sliders

• Plain text files

• echo 'value' > /proc/you/want/adjusted

/proc• cpuinfo

• meminfo

• interrupts

• diskstats

• fs/

• net/

• sys/

ulimits

• Considered part of security system

• For performance/operations we only care about open files (-n)

• For debugging might need to set core dump size limit to Unlimited (-c)

Storage

Virtual File System (VFS)

• Abstraction layer to allow Linux to handle many filesystems.

• Provides a common interface to make your life easier.

• Introduces the Common File Model

Common File Model

• Superblock

• Inode

• File

• Dentry

Filesystems• Btrfs

• ext2/3/4

• ReiserFS

• XFS

• ZFS

ext2/3/4• ext2: Old; No Journaling (SSD/Flash, maybe)

• ext3: ext2 + journaling + HTree Indexing

• ext4

• Large Volumes

• Extents

• Checksummed Journal

• Nanosecond Timestamps

XFS

• Standard recommendation for DB workloads

• Highly proficient with parallel IO

• Current (started 1998, but development is active)

AIO

A method for performing IO operations so that the process that issued an IO request is not blocked till

the data is available.

Instead, after an IO request is submitted, the process continues to execute its code and can later check the

status of the submitted request.

Mount Options• no[dir]atime

• nobarrier

• discard

!

%> mount -o remount,rw,new,options,here

IO Schedulers• noop

• FIFO Queue w/Request Merging

• deadline

• Impose a deadline on all operations

• cfq

• Completely Fair Queueing

• anticipatory

• “Anticipates” synchronous read operations

Storage: Disks• Rotational Latency: The delay waiting for the rotation of

the disk to bring the required disk sector under the read-write head.

• Seek time: Time to move the Read/Write Head from current position to the desired track location

• Access time (Response time): How fast we can locate a position of a file

• Transfer time (Throughput): How fast we can get bytes from disk to RAM

IOPS

1 / (average latency in ms + average seek time in ms)

IOPS Example• Model: Western Digital VelociRaptor 2.5" SATA hard

drive

• Rotational speed: 10,000 RPM

• Average latency: 3 ms (0.003 seconds)

• Average seek time: 4.2 (r)/4.7 (w) = 4.45 ms (0.0045 seconds)

• Calculated IOPS for this disk: 1/(0.003 + 0.0045) = about 133 IOPS

RAID

• Hardware RAID w/BBU (or NVRAM)

• RAID 0, 1, 5, 6, 10, 50, 60, etc

RAID Alignment

RAID Alignment

http://www.mysqlperformanceblog.com/2011/06/09/aligning-io-on-a-hard-disk-raid-the-theory/

InnoDB Flush Method

• fdatasync

• O_DSYNC

• O_DIRECT

• O_DIRECT_NO_FSYNC

Memory

Memory: Swap

InnoDB Buffer Pool Size

• Allocated Dynamically

• unless you use innodb_buffer_pool_populate

• InnoDB checks on boot if enough RAM is available

• 10% overhead

Huge Pages• Translation Lookaside Buffer (TLB)

• Default 4k page size

• Larger pages = Smaller TLB

• Huge Pages:

• 2MB - commodity HW

• 1GB - High End (1TB+)

• Huge Pages are good for huge Buffer Pool

NUMA• Each physical die is a NUMA node

• introspection: numactl --hardware / numactl --show

O(log(N))O(1) for local

O(log(N)) for remote

NUMA• Disable MySQL NUMA affinity / pinning except:

• When you run multiple mysqld instances on the same system

• AND you have data for each instance separated on different PCIe cards

• AND the PCIe cards are local to different CPU sockets

• Percona implements Jeremy Cole’s suggestions from http://blog.jcole.us/2012/04/16/a-brief-update-on-numa-and-mysql/

malloc

Processing

CPU Time

Interrupts

Context Switching

CPU Governors• Control CPU speed and power consumption

• # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors

• control through /proc (or /etc/init.d/cpuspeed or cpupower)

• More: http://forum.xda-developers.com/showthread.php?t=1736168

• Most distros use on-demand by default (/etc/defaults/cpupower)

• We care about on-demand, performance, powersave. We usually want performance;

Networking

Backlog• Queue for new TCP connections

• MySQL: back_log

• Linux: tcp_max_syn_backlog

• sysctl -w net.ipv4.tcp_max_syn_backlog = 4096

• sysctl -w net.core.somaxconn = 1024

Slow Links• /proc/net/core/wmem_max = 104857600

• /proc/net/core/rmem_max = 104857600

• /proc/net/ipv4/tcp_wmem = 4096   86400   66060288

• /proc/net/ipv4/tcp_rmem = 8192    86400   66060288

• /proc/net/ipv4/tcp_mem  = 104857600 104857600 104857600

• /proc/net/ipv4/tcp_window_scaling = 1

• /proc/net/ipv4/tcp_sack           = 1

• /proc/net/ipv4/tcp_timestamps     = 1

• /proc/net/ipv4/tcp_no_metrics_save = 0

• /proc/net/ipv4/tcp_moderate_rcvbuf = 1

High QPS• /proc/net/ipv4/ip_local_port_range = 15000 61000

• /proc/net/ipv4/tcp_max_tw_buckets = 2000000

• /proc/net/ipv4/tcp_tw_reuse = 1

• /proc/net/ipv4/tcp_syncookies = 1

• /proc/net/core/wmem_default = 135168

• /proc/net/core/rmem_default = 135168

• /proc/net/ipv4/tcp_wmem = 4096 86384 104857600

• /proc/net/ipv4/tcp_rmem = 8192 86384 104857600

• /proc/net/ipv4/tcp_mem = 104857600 104857600 104857600

• /proc/net/core/rmem_max = 104857600

• /proc/net/core/wmem_max = 104857600

Questions