making the most of ssd in oracle11g

78
© 2012 Quest Software Inc. All rights reserved. Making the most of Solid State Disk in Oracle Guy Harrison VP R&D Database Management

Upload: guy-harrison

Post on 28-Jan-2015

108 views

Category:

Technology


1 download

DESCRIPTION

Presentation given at IOUG Collaborate 2013

TRANSCRIPT

Page 1: Making the most of ssd in oracle11g

© 2012 Quest Software Inc. All rights reserved.

Making the most of Solid State Disk in Oracle

Guy HarrisonVP R&D Database Management

Page 2: Making the most of ssd in oracle11g

Making the most of Solid State Disk in Oracle 11g

Guy HarrisonExecutive Director, R&D Business Intelligence Software

Page 3: Making the most of ssd in oracle11g

Pg. 3© 2012 Quest Software Inc. All rights reserved.

Introductions

www.guyharrison.net [email protected]

http://twitter.com/guyharrison

Page 4: Making the most of ssd in oracle11g

Pg. 4© 2012 Quest Software Inc. All rights reserved.

Page 5: Making the most of ssd in oracle11g

Pg. 5© 2012 Quest Software Inc. All rights reserved.

Page 6: Making the most of ssd in oracle11g

Pg. 6© 2012 Quest Software Inc. All rights reserved. 6

Page 7: Making the most of ssd in oracle11g

Pg. 7© 2012 Quest Software Inc. All rights reserved.

Page 8: Making the most of ssd in oracle11g

Pg. 8© 2012 Quest Software Inc. All rights reserved.

Agenda

• Brief History of Magnetic Disk

• Solid State Disk (SSD) technologies

• SSD internals

• Oracle DB flash cache architecture

• Performance comparisons

• SSD in Exadata

• Recommendations and Suggestions

Page 9: Making the most of ssd in oracle11g

Pg. 9© 2012 Quest Software Inc. All rights reserved.

A brief history of disk

Page 10: Making the most of ssd in oracle11g

Pg. 10© 2012 Quest Software Inc. All rights reserved.

5MB HDD circa 1956

Page 11: Making the most of ssd in oracle11g

Pg. 11© 2012 Quest Software Inc. All rights reserved.

1800 RPM

28MB HDD - 1961

Page 12: Making the most of ssd in oracle11g

Pg. 12© 2012 Quest Software Inc. All rights reserved.

The more that things change....

Page 13: Making the most of ssd in oracle11g

Pg. 13© 2012 Quest Software Inc. All rights reserved.

Moore’s law•Transistor density doubles every 18 months

•Exponential growth is observed in most electronic components:−CPU clock speeds−RAM−Hard Disk Drive storage density

•But not in mechanical components−Service time (Seek latency) – limited by actuator

arm speed and disk circumference −Throughput (rotational latency) – limited by

speed of rotation, circumference and data density

Page 14: Making the most of ssd in oracle11g

Pg. 14© 2012 Quest Software Inc. All rights reserved.

Disk trends 2001-2009

IO Rate Disk Capacity IO/Capacity CPU IO/CPU-1,000

-500

0

500

1,000

1,500

2,000

260 1,635

-630

1,013

-390

%ag

e ch

ang

e

Page 15: Making the most of ssd in oracle11g

Pg. 15© 2012 Quest Software Inc. All rights reserved.

Solid State Disk

Page 16: Making the most of ssd in oracle11g

SSD to the rescue?

Magnetic Disk

SSD SATA Flash

SSD PCI flash

SSD DDR-RAM

0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500

4,000

80

25

15

Seek time (us)

Page 17: Making the most of ssd in oracle11g

Economics of SSD

Seagate SATA HDD

Seagate SAS HDD

Intel MLC SATA SSD

Intel SLC SATA SSD

FusionIO PCI MLC Duo SSD

FusionIO PCI SLC SSD

0.00 0.50 1.00 1.50 2.00 2.50

0.00 10.00 20.00 30.00 40.00 50.00 60.00

2.38

1.53

0.05

0.05

0.06

0.06

0.09

1.00

6.88

21.88

24.92

53.44

$/GB

$/IOP

$/IOP

$/GB

$/GB

$/IOPS

Page 18: Making the most of ssd in oracle11g

Tiered storage management

Main Memory

DDR SSD

Flash SSD

Fast Disk (SAS, RAID 0+1)

Slow Disk (SATA, RAID 5)

Tape, Flat Files, Hadoop

$/IOP$/

GB

Page 19: Making the most of ssd in oracle11g

12c Automatic Data Placement

Active

• Segment on SSD tablespace

Frequent Acce

ss

• Segment on SAS tablespace

Occassion

al Acce

ss

• OLTP compression on SATA tablespace

Dormant

• Archive Compressed on RAID5 SATA

ALTER TABLE …. ADD ILM POLICY

TIER TO …. Tablespace AFTER 6 months of no access

COMPRESS FOR QUERY LOW AFTER 12 months of no access

Page 20: Making the most of ssd in oracle11g

Pg. 20© 2012 Quest Software Inc. All rights reserved.

SSD technology and internals

Page 21: Making the most of ssd in oracle11g

Flavours of Flash SSD DDR RAM Drive

SATA flash drive

PCI flash drive

SSD storage Server

Page 22: Making the most of ssd in oracle11g

PCI SSD vs SATA SSD

PCI vs SATA− SATA was designed for traditional disk drives with high

latencies− PCI is designed for high speed devices− PCI SSD has latency ~ 1/3rd of SATA

Page 23: Making the most of ssd in oracle11g

23

Page 24: Making the most of ssd in oracle11g

Dell Express Flash

Higher performance, durability, flexibility

Express Flash PCIe-SSD DrivesThe power to do more

Up to 1000x more IOPs than traditional HDDFront loading, hot swappableMaximum read and write lifespanImproves workload processingEnhances virtual environments

Page 25: Making the most of ssd in oracle11g

PCIe SSD Solution Framework

− Frees Up Valuable PCIe Slot Real Estate

− 1X16 Slot Supports 4X4 PCIe SSD Backplane− Modular, scalable

− Storage Device− HDD form factor – fits in 2.5”

carrier− SLC: 175G & 350G

Media Bay & Control Panel S

AS

0

SA

S 1

SA

S 2

SA

S 3

SA

S 4

SA

S 5

SA

S 6

SA

S 7

PC

Ie S

SS

0

PC

Ie S

SS

1

PC

Ie S

SS

2

PC

Ie S

SS

3

4x Drives w/ BP

Extender Card (Adapter FF)

Page 26: Making the most of ssd in oracle11g

Pg. 26© 2012 Quest Software Inc. All rights reserved.

Flash SSD Technology

• Cell: One (SLC) or Two (MLC) bits• Page: Typically 4K • Block: Typically 128-512K

Storage Hierarchy:

• Read and first write require single page IO• Overwriting a page requires an erase & overwrite of the block

Writes:

• 100,000 erase cycles for SLC before failure • 5,000 – 10,000 erase cycles for MLC

Write endurance:

Page 27: Making the most of ssd in oracle11g

Pg. 27© 2012 Quest Software Inc. All rights reserved.

Flash SSD performance

Read (4k page seek)

First insert (4k page write)

Update (256K block erase)

0 200 400 600 800 1000 1200 1400 1600 1800 2000

25

250

2000

Microseconds

Page 28: Making the most of ssd in oracle11g

Flash Disk write degradation

All Blocks empty:Write time=250 us

25% part full:− Write time= ( ¾ * 250 us + 1/4 * 2000 us) = 687 us

75% part full − Write time = ( ¼ * 250 us + ¾ * 2000 us ) = 1562 us

Empty

Partially Full

Page 29: Making the most of ssd in oracle11g

Valid Data Page

Empty Data Page

InValid Data Page

Free Block Pool

Used Block Pool

SSD ControllerInsert

Data Insert

Page 30: Making the most of ssd in oracle11g

Valid Data Page

Empty Data Page

Invalid Data Page

Free Block Pool

Used Block Pool

SSD ControllerUpdate

Data Update

Page 31: Making the most of ssd in oracle11g

Valid Data Page

Empty Data Page

Invalid Data Page

Free Block Pool

Used Block Pool

SSD Controller

Garbage Collection

Page 32: Making the most of ssd in oracle11g

Pg. 32© 2012 Quest Software Inc. All rights reserved.

Page 33: Making the most of ssd in oracle11g

Pg. 33© 2012 Quest Software Inc. All rights reserved.

11g DB flash Cache

Page 34: Making the most of ssd in oracle11g

Pg. 34© 2012 Quest Software Inc. All rights reserved.

Oracle DB flash cache

• Introduced in 11gR2 for OEL and Solaris only

• Secondary cache maintained by the DBWR, but only when idle cycles permit

• Architecture is tolerant of poor flash write performance

Page 35: Making the most of ssd in oracle11g

Buffer cache and Free buffer waits

Database files

Buffer cache

DBWR

Oracle process

Free Buffer Waits

Write dirty blocks to disk

Write to buffer cache

Read from disk

Read from buffer cache

Free buffer waits often occur when reads are much faster than writes....

Page 36: Making the most of ssd in oracle11g

Flash Cache

Database files

Buffer cache

DBWR

Oracle process

Write dirty blocks to disk

Write to buffer cache

Read from disk

Read from buffer cache

Flash Cache

Write clean blocks (time permitting)

Read from flash cache

DB Flash cache architecture is designed to accelerate buffered reads

Page 37: Making the most of ssd in oracle11g

Pg. 37© 2012 Quest Software Inc. All rights reserved.

Configuration• Create filesystem from flash device

• Set DB_FLASH_CACHE_FILE and DB_FLASH_CACHE_SIZE.

• Consider Filesystemio_options=setall

Page 38: Making the most of ssd in oracle11g

Pg. 38© 2012 Quest Software Inc. All rights reserved.

Flash KEEP pool• You can prioritise blocks for important objects using the FLASH_CACHE clause:

Page 39: Making the most of ssd in oracle11g

Pg. 39© 2012 Quest Software Inc. All rights reserved.

Oracle Db flash cache statistics

http://guyharrison.squarespace.com/storage/flash_insert_stats.sql

Page 40: Making the most of ssd in oracle11g

Flash Cache Efficiency

http://guyharrison.squarespace.com/storage/flash_time_savings.sql

Page 41: Making the most of ssd in oracle11g

Flash cache Contents

http://guyharrison.squarespace.com/storage/flashContents.sql

Page 42: Making the most of ssd in oracle11g

Pg. 42© 2012 Quest Software Inc. All rights reserved.

Performance tests

Page 43: Making the most of ssd in oracle11g

Pg. 43© 2012 Quest Software Inc. All rights reserved.

Test systems•Low end system:

−Dell Optiplex dual-core 4GB RAM−2xSeagate 7500RPM Baracuda SATA HDD−Intel X-25E SLC SATA SSD

•Higher end system:−Dell R510 2xquad core, 32 GB RAM−4x300GB 15K RPM,6Gbps Dell SAS HDD−1xFusionIO ioDrive SLC PCI SSD

Page 44: Making the most of ssd in oracle11g

Performance: indexed reads (SATA Flash)

No Flash

Flash cache

Flash tablespace

0 100 200 300 400 500 600

529.7

143.27

48.17

Total

db file IO

flash cache IO

Other

Elapsed (s)

Page 45: Making the most of ssd in oracle11g

Performance: Read/Write (SATA Flash)

No Flash

Flash Cache

Flash tablespace

0 500 1000 1500 2000 2500 3000 3500

3,289

1,693

200

Total

db file IO

write complete

free buffer

flash cache IO

Other

Elapsed time (s)

Page 46: Making the most of ssd in oracle11g

Random reads – PCI flash

SAS disk, no flash cache

SAS disk, flash cache

Table on SSD

0 500 1000 1500 2000 2500

2,211

583

121

Total

Other

DB File IO

Flash cache IO

Elapsed time (s)

Page 47: Making the most of ssd in oracle11g

Updates – PCI Flash

SAS disk, no flash cache

SAS disk, flash cache

Table on SSD

0 1000 2000 3000 4000 5000 6000 7000

6,219

1,934

529

Total

db file IO

log file IO

flash cache

free buffer waits

Other

Elapsed Time (s)

Page 48: Making the most of ssd in oracle11g

Pg. 48© 2012 Quest Software Inc. All rights reserved.

Buffer Cache bottlenecks • Flash cache architecture avoids ‘free buffer waits’ due to flash IO, but write complete waits can still occur on hot blocks.

• Free buffer waits are still likely against the database files, due to high physical read rates created by the flash cache

Page 49: Making the most of ssd in oracle11g

Full table scan – PCI flash

SAS disk, no flash cache

SAS disk, flash cache

Table on SSD

0 50 100 150 200 250 300 350 400 450

418

398

72

Total

Other

DB File IO

Flash Cache IO

Elasped time (s)

Page 50: Making the most of ssd in oracle11g

50

Sorting – what we expect

50Table/Index IO CPU Time Temp Segment IO

PGA Memory available (MB)

Tim

e

Memory Sort

Single PassDisk Sort

Multi-passDisk Sort

Page 51: Making the most of ssd in oracle11g

Disk Sorts – temporary tablespace

51

0501001502002503000

500

1000

1500

2000

2500

3000

3500

4000

SAS based TTS SSD based TTS

Sort Area Size

Ela

pse

d t

ime

(s)

Single PassDisk Sort

Multi-passDisk Sort

Page 52: Making the most of ssd in oracle11g

Redo performance – Fusion IO

SAS based redo log

Flash based redo log

0 50 100 150 200 250 300 350

292.39

291.93

Total

Log IO

Elapsed time (s)

Page 53: Making the most of ssd in oracle11g

53

Concurrent redo workload (x10)

53

SAS based redo log

Flash based redo log

0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500

1,605

1,637

397

331

1,944

1,681

CPU

Other

Log File IO

Elapsed time (s)

Page 54: Making the most of ssd in oracle11g

Pg. 54© 2012 Quest Software Inc. All rights reserved.

Write degradation• In theory, high sustained write IO can lead to SSD degradation when GC fails to cope with the block erase/update cycle

• In practice, this is rarely noticeable from Oracle:−Oracle write IO is largely asynchronous (DBWR)−Almost all write activity has at least an equal amount of read activity− Garbage collection and wear levelling algorithms are sophisticated in

decent SSD drives

Page 55: Making the most of ssd in oracle11g

Pg. 55© 2012 Quest Software Inc. All rights reserved.

Page 56: Making the most of ssd in oracle11g

Pg. 56© 2012 Quest Software Inc. All rights reserved.

Page 57: Making the most of ssd in oracle11g

OS level direct cache

Read-intensive, potentially massive

tablespaces

•Temp Tablespace• Hot Segments• Hot Partitions• DB Flash Cache

(limited to the size of the SSD)

Regular Block Device

ioMemory VSL

File System/ Raw Devices/ ASM

directCache

File System/ Raw Devices/ ASM

Caching Block Device

ioMemory VSL

LUN

Page 58: Making the most of ssd in oracle11g

Fusion IO direct cache – Table scans

No cache 1st scan

No cache 2nd scan

direct cache on 1st scan

direct cache on 2nd scan

0 20 40 60 80 100 120 140 160

147

147

147

36

Total

IO

Other

Elapsed time (s)

Page 59: Making the most of ssd in oracle11g

Pg. 59© 2012 Quest Software Inc. All rights reserved.

Exadata

59

Page 60: Making the most of ssd in oracle11g
Page 61: Making the most of ssd in oracle11g

Pg. 61© 2012 Quest Software Inc. All rights reserved.

Exadata flash storage • 4x96GB PCI Flash drives on each storage server (X2)

• Flash can be configured as:− Exadata Smart Flash Cache (ESFC)− Solid State Disk available to ASM disk groups

• ESFC is not the same as the DB flash cache:− Maintained by cellsrv, not DBWR− DOES support full table scans− DOES NOT support smart scans

− Unless CELL_FLASH_CACHE= KEEP,

− Statistics accessed via the cellcli program

• Considerations for cache vs. SSD are similar

Page 62: Making the most of ssd in oracle11g

Exadata: Flash grid disk vs ESFC

100M row table, 200,000 random PK lookups, 1M possible keys

SAS disk no flash cache

SAS disk with flash cache

SSD disks (no flash cache)

0 500 1000 1500 2000 2500 3000

1,240

429

119

CPU

Total

Seconds

Page 63: Making the most of ssd in oracle11g

Exadata: SSD disk for Redo

SAS redo 10 concurrent (avg)

SSD redo 10 concurrent (avg)

SAS redo 1 concurrent

SSD redo 1 concurrent

0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000

352

6,885

236

1,007

Log IO

Other

Total

Elapsed time (s)

Page 64: Making the most of ssd in oracle11g

Exadata: the effect of redo block size

CellCLI: Release 11.2.2.3.2

SAS disk blocksize 512

SAS disk blocksize 4K

SSD disk blocksize 512

SSD disk blocksize 4k

0.00 500.00 1,000.00 1,500.00 2,000.00 2,500.00

236.05

243.79

1,009.67

309.68

Redo IO Time

Db time

Elasped time (s)

Page 65: Making the most of ssd in oracle11g

Pg. 65© 2012 Quest Software Inc. All rights reserved.

Exadata Smart FlashLog

Page 66: Making the most of ssd in oracle11g

Pg. 66© 2012 Quest Software Inc. All rights reserved.

Smart Flash Log

Designed to reduce “outlier” redo log sync waits

Redo is written simultaneously to disk and flash

First write to complete wins

Introduced in Exadata storage software 11.2.2.4 Disk Flash

CellSrv

DB Node

Page 67: Making the most of ssd in oracle11g

Pg. 67© 2012 Quest Software Inc. All rights reserved.

All Redo log writes (16M log writes)

Flash Log

Min Median Mean 99% Max

ON 1.0 650 723 1,656 75,740

OFF 1.0 627 878 4,662 291,800

Page 68: Making the most of ssd in oracle11g

Pg. 68© 2012 Quest Software Inc. All rights reserved.

Redo log outliers

WAIT #47124064145648: nam='log file sync' ela= 710 buffer#=129938 sync scn=1266588258 p3=0 obj#=-1 tim=1347583167579790WAIT #47124064145648: nam='log file sync' ela= 733 buffer#=130039 sync scn=1266588297 p3=0 obj#=-1 tim=1347583167580808WAIT #47124064145648: nam='log file sync' ela= 621 buffer#=130124 sync scn=1266588332 p3=0 obj#=-1 tim=1347583167581695WAIT #47124064145648: nam='log file sync' ela= 507 buffer#=130231 sync scn=1266588371 p3=0 obj#=-1 tim=1347583167582486WAIT #47124064145648: nam='log file sync' ela= 683 buffer#=101549 sync scn=1266588404 p3=0 obj#=-1 tim=1347583167583398WAIT #47124064145648: nam='log file sync' ela= 2084 buffer#=130410 sync scn=1266588442 p3=0 obj#=-1 tim=1347583167585748WAIT #47124064145648: nam='log file sync' ela= 798 buffer#=130535 sync scn=1266588488 p3=0 obj#=-1 tim=1347583167586864WAIT #47124064145648: nam='log file sync' ela= 1043 buffer#=101808 sync scn=1266588527 p3=0 obj#=-1 tim=1347583167588250WAIT #47124064145648: nam='log file sync' ela= 2394 buffer#=130714 sync scn=1266588560 p3=0 obj#=-1 tim=1347583167590888WAIT #47124064145648: nam='log file sync' ela= 932 buffer#=101989 sync scn=1266588598 p3=0 obj#=-1 tim=1347583167592057WAIT #47124064145648: nam='log file sync' ela= 291780 buffer#=102074 sync scn=1266588637 p3=0 obj#=-1 tim=1347583167884090WAIT #47124064145648: nam='log file sync' ela= 671 buffer#=102196 sync scn=1266588697 p3=0 obj#=-1 tim=1347583167885294WAIT #47124064145648: nam='log file sync' ela= 957 buffer#=102294 sync scn=1266588730 p3=0 obj#=-1 tim=1347583167886575WAIT #47124064145648: nam='log file sync' ela= 852 buffer#=120 sync scn=1266588778 p3=0 obj#=-1 tim=1347583167887763WAIT #47124064145648: nam='log file sync' ela= 639 buffer#=214 sync scn=1266588826 p3=0 obj#=-1 tim=1347583167888778WAIT #47124064145648: nam='log file sync' ela= 699 buffer#=300 sync scn=1266588853 p3=0 obj#=-1 tim=1347583167889767WAIT #47124064145648: nam='log file sync' ela= 819 buffer#=102647 sync scn=1266588886 p3=0 obj#=-1 tim=1347583167890829

Page 69: Making the most of ssd in oracle11g

Pg. 69© 2012 Quest Software Inc. All rights reserved.

Top 10,000 waits

Page 70: Making the most of ssd in oracle11g

Pg. 70© 2012 Quest Software Inc. All rights reserved.

Exadata 12c Smart Flash Cache Write-back Database writes go to flash cache

− LRU aging to mag disk − Reads serviced by flash prior to age out− Similar restrictions to flach cache (smart scans, etc)

Performance claims:− On X3 system, 1M random single block write IOPS− On X3 system, 1.5M random single block read IOPS− 500K write IOPS claimed for Exadata V2− Note: random IO writes are less problematic for flash than

sequential writes.

Page 71: Making the most of ssd in oracle11g

Pg. 71© 2012 Quest Software Inc. All rights reserved.

Summary

Page 72: Making the most of ssd in oracle11g

Pg. 72© 2012 Quest Software Inc. All rights reserved.

Recommendations•Don’t wait for SSD to become as cheap as HDD

−Magnetic HDD will always be cheaper per GB, SSD cheaper per IO−Oracle’s Exadata strategy suggests that the era of SSD for Oracle is here

•Consider a mixed or tiered storage strategy−Using DB flash cache, selective SSD tablespaces or partitions−Use SSD where your IO bottleneck is greatest and SSD advantage is

significant−Oracle 12c ILM offers a perfect solution for data tablespaces

•DB flash cache offers an easy way to leverage SSD for OLTP workloads, but has fewer advantages for OLAP or Data Warehouse

Page 73: Making the most of ssd in oracle11g

Pg. 73© 2012 Quest Software Inc. All rights reserved.

How to use SSD•Database flash cache

−If your bottleneck is single block (indexed reads) and you are on OEL or Solaris 11GR2

•Flash tablespace−Optimize read/writes against “hot” segments or partitions

•Flash temp tablespace−If multi-pass disk sorts or hash joins are your bottleneck

• Exadata (esp X3)−Provides most of the advantages without much setup

• 12c ILM−Should provide an ideal solution for DIY tiering

73

Page 74: Making the most of ssd in oracle11g

Pg. 74© 2012 Quest Software Inc. All rights reserved.

References• Latest version of this presentation:

http://www.slideshare.net/gharriso/ssd-and-the-db-flash-cache

• Quest whitepaper:• http://www.quest.com/documents/landing.aspx?id=15423

• Guy’s SSD guide• http://guyharrison.squarespace.com/ssdguide/

Page 75: Making the most of ssd in oracle11g

Pg. 75© 2012 Quest Software Inc. All rights reserved.

Page 76: Making the most of ssd in oracle11g

Pg. 76© 2012 Quest Software Inc. All rights reserved.

Page 77: Making the most of ssd in oracle11g

Pg. 77© 2012 Quest Software Inc. All rights reserved.

Page 78: Making the most of ssd in oracle11g

© 2012 Quest Software Inc. All rights reserved. Pg. 78

Thank You

[email protected]@guyharrison