filesystem performance from a database perspective

57
Filesystem Performance From a Database Perspective Mark Wong [email protected] LinuxFest Northwest 2009 April 25 - 26, 2009

Upload: mark-wong

Post on 26-Dec-2014

4.713 views

Category:

Technology


1 download

DESCRIPTION

How do you choose the right filesystem for your database management system? Administrators have a variety of filesystems to choose from, as well as volume management and hardware or software RAID. This talk will examine how different the performance of filesystems really are, and how do you go about systematically determining which configuration will be the best for your application and hardware. This talk will present data generated by a group of volunteers running performance tests for database tuning. We were curious if the file systems would really behave like we expected them to, especially when used in conjunction with RAID or volume management. There is also more to file systems than how fast we can read to or write to them. Reliability is critical for production environments, and proving that is a key part of evaluating performance. The talk will review and confirm or deny assumptions that many system administrators and developers make about filesystems and databases. Data shared will include baseline throughput determined with exhaustive fio tests.

TRANSCRIPT

Page 1: Filesystem Performance from a Database Perspective

Filesystem PerformanceFrom a Database Perspective

Mark [email protected]

LinuxFest Northwest 2009

April 25 - 26, 2009

Page 2: Filesystem Performance from a Database Perspective

How we are here today

Generous equipment donations from HP and hosting fromCommand Prompt, Inc.

Page 3: Filesystem Performance from a Database Perspective

Setting Expectations

This is:

◮ A guide for how tounderstand the i/o behaviorof your own system.

◮ A model of simple i/obehavior based onPostgreSQL

◮ Suggestion for testing yourown system.

This is not:

◮ A comprehensive Linuxfilesystem evaluation.

◮ A filesystem tuning guide(mkfs and mount options).

◮ Representative of yoursystem (unless you have theexact same hardware.)

◮ A test of reliability ordurability (mostly).

Page 4: Filesystem Performance from a Database Perspective

__ __

/ \~~~/ \ . o O ( Know your system! )

,----( oo )

/ \__ __/

/| (\ |(

^ \ /___\ /\ |

|__| |__|-"

Page 5: Filesystem Performance from a Database Perspective

Contents

◮ Testing Criteria◮ What do we want to test.◮ How we test it with fio.

◮ Test Results

Page 6: Filesystem Performance from a Database Perspective

Hardware Details

◮ HP DL380 G5

◮ 2 x Quad Core Intel(R) Xeon(R) E5405

◮ HP 32GB Fully Buffered DIMM PC2-5300 8x4GB DR LPMemory

◮ HP Smart Array P800 Controller w/ 512MB Cache

◮ HP 72GB Hot Plug 2.5 Dual Port SAS 15,000 rpm HardDrives

__ __

/ \~~~/ \ . o O ( Thank you HP! )

,----( oo )

/ \__ __/

/| (\ |(

^ \ /___\ /\ |

|__| |__|-"

Page 7: Filesystem Performance from a Database Perspective

Test Criteria

PostgreSQL dictates the following (even though some of them areactually configurable:

◮ 8KB block size

◮ No use of posix fadvise

◮ No use of direct i/o

◮ No use of async i/o

◮ Uses processes, not threads.

Based on the hardware:

◮ 56GB data set - To minimize any caching in the 32GB ofsystem memory. (Want to use 64GB but these 72GB disks arein millions of bytes, i.e. not big enough for the 1 disk tests.)

◮ 8 processes performing i/o - One fio process per core, exceptfor sequential i/o tests, otherwise the i/o won’t be sequential.

Page 8: Filesystem Performance from a Database Perspective

Linux v2.6.28-gentoo Filesystems Tested

◮ ext2

◮ ext3

◮ ext4

◮ jfs

◮ reiserfs

◮ xfs

a8888b. . o O ( What is your favorite filesystem? )

d888888b.

8P"YP"Y88

8|o||o|88

8’ .88

8‘._.’ Y8.

d/ ‘8b.

.dP . Y8b.

d8:’ " ‘::88b.

d8" ‘Y88b

:8P ’ :888

8a. : _a88P

._/"Yaa_ : .| 88P|

\ YP" ‘| 8P ‘.

/ \._____.d| .’

‘--..__)888888P‘._.’

Page 9: Filesystem Performance from a Database Perspective

Hardware RAID Configurations

These are the RAID configurations supported by the HP SmartArray p8001 disk controller:

◮ RAID 0

◮ RAID 1+0

◮ RAID 5

◮ RAID 6

__ __

/ \~~~/ \ . o O ( Which RAID would you use? )

,----( oo )

/ \__ __/

/| (\ |(

^ \ /___\ /\ |

|__| |__|-"

1Firmare v5.22

Page 10: Filesystem Performance from a Database Perspective

Test Program

Use fio2(Flexible I/O) v1.23 by Jens Axboe to test:

◮ Sequential Read

◮ Sequential Write

◮ Random Read

◮ Random Write

◮ Random Read/Write

2http://brick.kernel.dk/snaps/

Page 11: Filesystem Performance from a Database Perspective

__ __

/ \~~~/ \ . o O ( fio profiles )

,----( oo )

/ \__ __/

/| (\ |(

^ \ /___\ /\ |

|__| |__|-"

Page 12: Filesystem Performance from a Database Perspective

Sequential Read

[seq-read]

rw=read

size=56g

directory=/test

fadvise_hint=0

blocksize=8k

direct=0

numjobs=1

nrfiles=1

runtime=1h

time_based

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

Page 13: Filesystem Performance from a Database Perspective

Sequential Write

[seq-write]

rw=write

size=56g

directory=/test

fadvise_hint=0

blocksize=8k

direct=0

numjobs=1

nrfiles=1

runtime=1h

time_based

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

Page 14: Filesystem Performance from a Database Perspective

Random Read

[random-read]

rw=randread

size=7g

directory=/test

fadvise_hint=0

blocksize=8k

direct=0

numjobs=8

nrfiles=1

runtime=1h

time_based

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

Page 15: Filesystem Performance from a Database Perspective

Random Write

[random-write]

rw=randwrite

size=7g

directory=/test

fadvise_hint=0

blocksize=8k

direct=0

numjobs=8

nrfiles=1

runtime=1h

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

Page 16: Filesystem Performance from a Database Perspective

Random Read/Write

[read-write]

rw=rw

rwmixread=50

size=7g

directory=/test

fadvise_hint=0

blocksize=8k

direct=0

numjobs=8

nrfiles=1

runtime=1h

time_based

exec_prerun=echo 3 > /proc/sys/vm/drop_caches

Page 17: Filesystem Performance from a Database Perspective

__ __

/ \~~~/ \ . o O ( Results! )

,----( oo )

/ \__ __/

/| (\ |(

^ \ /___\ /\ |

|__| |__|-"

There is more information than what is in this presentation, rawdata is at:http://207.173.203.223/~markwkm/community10/fio/linux-2.6.28-gentoo/

◮ Raw mpstat, vmstat, iostat, and sar data.

Page 18: Filesystem Performance from a Database Perspective

A few words about statistics...

Not enough time or data to calculate good statistics. My excuses:

◮ 1 hour per test

◮ 5 tests per filesystem

◮ 6 filesystems per RAID configuration (1 filesystem consistentlyhalts testing, requiring manual intervention...)

◮ 7 RAID configurations for this presentation

◮ 210+ hours of continuous testing for this presentation.

Page 19: Filesystem Performance from a Database Perspective

Examine the following charts for a feel of how reliable the resultsmay be.

Page 20: Filesystem Performance from a Database Perspective

Random Read with Linux 2.6.28-gentoo ext2 on 1 Disk

Page 21: Filesystem Performance from a Database Perspective

Random Write with Linux 2.6.28-gentoo ext2 on 1 Disk

Page 22: Filesystem Performance from a Database Perspective

Read/Write Mix (Reads) with Linux 2.6.28-gentoo ext2 on

1 Disk

Page 23: Filesystem Performance from a Database Perspective

Read/Write Mix (Writes) with Linux 2.6.28-gentoo ext2

on 1 Disk

Page 24: Filesystem Performance from a Database Perspective

Sequential Read with Linux 2.6.28-gentoo ext2 on 1 Disk

Page 25: Filesystem Performance from a Database Perspective

Sequential Write with Linux 2.6.28-gentoo ext2 on 1 Disk

Page 26: Filesystem Performance from a Database Perspective

A couple notes about the following charts

◮ If you see missing data, it means the test didn’t completesuccessfully.

◮ Charts are not all scaled the same, focus is on trends within achart. (But maybe that is not a good choice?)

Page 27: Filesystem Performance from a Database Perspective

Results for all the filesystems when using only a single disk...

Page 28: Filesystem Performance from a Database Perspective
Page 29: Filesystem Performance from a Database Perspective

Results for all the filesystems when striping across four disks...

Page 30: Filesystem Performance from a Database Perspective
Page 31: Filesystem Performance from a Database Perspective

Let’s look at that step by step for RAID 0.

Page 32: Filesystem Performance from a Database Perspective
Page 33: Filesystem Performance from a Database Perspective
Page 34: Filesystem Performance from a Database Perspective
Page 35: Filesystem Performance from a Database Perspective
Page 36: Filesystem Performance from a Database Perspective
Page 37: Filesystem Performance from a Database Perspective
Page 38: Filesystem Performance from a Database Perspective

Trends might not be the same when adding disks to other RAIDconfigurations. (We didn’t test it because we moved on todatabase systems testings.)

Page 39: Filesystem Performance from a Database Perspective

What is the best use of disks with respect to capacity?

Page 40: Filesystem Performance from a Database Perspective
Page 41: Filesystem Performance from a Database Perspective
Page 42: Filesystem Performance from a Database Perspective
Page 43: Filesystem Performance from a Database Perspective
Page 44: Filesystem Performance from a Database Perspective
Page 45: Filesystem Performance from a Database Perspective
Page 46: Filesystem Performance from a Database Perspective

Filesystem Readahead

Values are in number of 512-byte segments, set per device.To get the current readahead size:

$ blockdev --getra /dev/sda

256

To set the readahead size to 8 MB:

$ blockdev --setra 16384 /dev/sda

Page 47: Filesystem Performance from a Database Perspective
Page 48: Filesystem Performance from a Database Perspective

What does this all mean for my database system?

It depends... Let’s look at one sample of just using differencefilesystems.

Page 49: Filesystem Performance from a Database Perspective
Page 50: Filesystem Performance from a Database Perspective

Raw data for comparing filesystems

Initial inspection of results suggests we are close to being bound byprocessor and storage performance, meaning we don’t really knowwho performs the best:

◮ http://207.173.203.223/~markwkm/community6/dbt2/m-fs/ext2.1500.2/

◮ http://207.173.203.223/~markwkm/community6/dbt2/m-fs/ext3.1500.2/

◮ http://207.173.203.223/~markwkm/community6/dbt2/m-fs/ext4.1500.2/

◮ http://207.173.203.223/~markwkm/community6/dbt2/m-fs/jfs.1500.2/

◮ http://207.173.203.223/~markwkm/community6/dbt2/m-fs/reiserfs.1500.2/

◮ http://207.173.203.223/~markwkm/community6/dbt2/m-fs/xfs.1500.2/

Page 51: Filesystem Performance from a Database Perspective

Linux I/O Elevator Algorithm

To get current algorithm in []’s:

$ cat /sys/block/sda/queue/scheduler

noop anticipatory [deadline] cfq

To set a different algorithm:

$ echo cfg > /sys/block/sda/queue/scheduler

noop anticipatory [deadline] cfq

Page 52: Filesystem Performance from a Database Perspective

Why use deadline?3

◮ Attention! Database servers, especially those using ”TCQ”disks should investigate performance with the ’deadline’ IOscheduler. Any system with high disk performancerequirements should do so, in fact.

◮ Also, users with hardware RAID controllers, doing striping,may find highly variable performance results with using theas-iosched. The as-iosched anticipatory implementation isbased on the notion that a disk device has only one physicalseeking head. A striped RAID controller actually has a headfor each physical device in the logical RAID device.

3Linux source code: Documentation/block/as-iosched.txt

Page 53: Filesystem Performance from a Database Perspective

Final Notes

You have to test your own system to see what it can do, don’t takesomeone else’s word for it.

Page 54: Filesystem Performance from a Database Perspective

http://www.slideshare.net/markwkm

Page 55: Filesystem Performance from a Database Perspective

__ __

/ \~~~/ \ . o O ( Thank you! )

,----( oo )

/ \__ __/

/| (\ |(

^ \ /___\ /\ |

|__| |__|-"

Page 56: Filesystem Performance from a Database Perspective

Acknowledgements

Haley Jane Wakenshaw

__ __

/ \~~~/ \

,----( oo )

/ \__ __/

/| (\ |(

^ \ /___\ /\ |

|__| |__|-"

Page 57: Filesystem Performance from a Database Perspective

License

This work is licensed under a Creative Commons Attribution 3.0Unported License. To view a copy of this license, (a) visithttp://creativecommons.org/licenses/by/3.0/us/; or, (b)send a letter to Creative Commons, 171 2nd Street, Suite 300, SanFrancisco, California, 94105, USA.