the design and evolution of live storage migration in vmware esx

49
© 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware, Inc. Tal Garfinkel, VMware, Inc. Min Cai, VMware, Inc.

Upload: kerryn

Post on 25-Feb-2016

53 views

Category:

Documents


1 download

DESCRIPTION

The Design and Evolution of Live Storage Migration in VMware ESX. Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware, Inc. Tal Garfinkel, VMware, Inc. Min Cai, VMware, Inc. Agenda. What is live migration? Migration architectures Lessons. What is live migration (vMotion)?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Design and Evolution of Live Storage Migration in VMware ESX

© 2010 VMware Inc. All rights reserved

The Design and Evolution ofLive Storage Migration in VMware ESXAli Mashtizadeh, VMware, Inc.Emré Celebi, VMware, Inc.Tal Garfinkel, VMware, Inc.Min Cai, VMware, Inc.

Page 2: The Design and Evolution of Live Storage Migration in VMware ESX

2

Agenda What is live migration? Migration architectures Lessons

Page 3: The Design and Evolution of Live Storage Migration in VMware ESX

3

What is live migration (vMotion)?

Moves a VM between two physical hosts No noticeable interruption to the VM (ideally)

Use cases:• Hardware/software upgrades

• Distributed resource management

• Distributed power management

Page 4: The Design and Evolution of Live Storage Migration in VMware ESX

4

Virtual Machine

Live Migration

Virtual Machine

Disk is placed on a shared volume (100GBs-1TBs) CPU and Device State is copied (MBs)

Memory is copied (GBs)• Large and it changes often → Iteratively copy

Source Destination

Page 5: The Design and Evolution of Live Storage Migration in VMware ESX

5

Live Storage Migration

What is live storage migration?• Migration of a VM’s virtual disks

Why does this matter?• VMs can be very large

• Array maintenance means you may migrate all VMs in an array

• Migration time in hours

Page 6: The Design and Evolution of Live Storage Migration in VMware ESX

6

Live Migration and Storage Live Migration – a short history

ESX 2.0 (2003) – Live migration (vMotion)• Virtual disks must live on shared volumes

ESX 3.0 (2006) – Live storage migration lite (Upgrade vMotion)• Enabled upgrade of VMFS by migrating the disks

ESX 3.5 (2007) – Live storage migration (Storage vMotion)• Storage array upgrade and repair

• Manual storage load balancing

• Snapshot based

ESX 4.0 (2009) – Dirty block tracking (DBT) ESX 5.0 (2011) – IO Mirroring

Page 7: The Design and Evolution of Live Storage Migration in VMware ESX

7

Agenda What is live migration? Migration architectures Lessons

Page 8: The Design and Evolution of Live Storage Migration in VMware ESX

8

Goals

Migration Time• Minimize total end-to-end migration time

• Predictability of migration time

Guest Penalty• Minimize performance loss

• Minimize downtime

Atomicity• Avoid dependence on multiple volumes (for replication fault domains)

Guarantee Convergence• Ideally we want migrations to always complete successfully

Page 9: The Design and Evolution of Live Storage Migration in VMware ESX

9

Convergence

Migration time vs. downtime

More migration time → more guest performance impact More downtime → more service unavailability

Factors that effect convergence:• Block dirty rate

• Available storage network bandwidth

• Workload interactions

Challenges:• Many workloads interacting on storage array

• Unpredictable behavior

Migration Time Downtime

Dat

a R

emai

ning

Time

Page 10: The Design and Evolution of Live Storage Migration in VMware ESX

10

Migration Architectures

Snapshotting

Dirty Block Tracking (DBT)• Heat Optimization

IO Mirroring

Page 11: The Design and Evolution of Live Storage Migration in VMware ESX

11

Dest VolumeSrc Volume

ESX Host

Snapshot Architecture – ESX 3.5 U1

VMDKVMDKVMDK VMDK

Page 12: The Design and Evolution of Live Storage Migration in VMware ESX

12

Synthetic Workload

Synthetic Iometer workload (OLTP):• 30% Write/70% Read

• 100% Random

• 8KB IOs

• Outstanding IOs (OIOs) from 2 to 32

Migration Setup:• Migrated both the 6 GB System Disk and 32 GB Data Disk

Hardware:• Dell PowerEdge R710: Dual Xeon X5570 2.93 GHz

• Two EMC CX4-120 arrays connected via 8Gb Fibre Channel

Page 13: The Design and Evolution of Live Storage Migration in VMware ESX

13

Migration Time vs. Varying OIO

2 4 8 16 320

100

200

300

400

500

600

Snapshot

Snapshot

Workload OIO

Mig

ratio

n Ti

me

(s)

Page 14: The Design and Evolution of Live Storage Migration in VMware ESX

14

Downtime vs. Varying OIO

2 4 8 16 320

5

10

15

20

25

Snapshot

Snapshot

Workload OIO

Dow

ntim

e (s

)

Page 15: The Design and Evolution of Live Storage Migration in VMware ESX

15

Total Penalty vs. Varying OIO

2 4 8 16 320

50100150200250300350400450500

Snapshot

Snapshot

Workload OIO

Effe

ctiv

e Lo

st W

orkl

oad

Tim

e (s

)

Page 16: The Design and Evolution of Live Storage Migration in VMware ESX

16

Snapshot Architecture

Benefits• Simple implementation

• Built on existing and well tested infrastructure

Challenges• Suffers from snapshot performance issues

• Disk space: Up to 3x the VM size

• Not an atomic switch from source to destination• A problem when spanning replication fault domains

• Downtime

• Long migration times

Page 17: The Design and Evolution of Live Storage Migration in VMware ESX

17

Snapshot versus Dirty Block Tracking Intuition

Virtual disk level snapshots have overhead to maintain metadata Requires lots of disk space

We want to operate more like live migration• Iterative copy

• Block level copy rather than disk level – enables optimizations

We need a mechanism to track writes

Page 18: The Design and Evolution of Live Storage Migration in VMware ESX

18

Dest VolumeSrc VolumeVMDK VMDKVMDK

Dirty Block Tracking (DBT) Architecture – ESX 4.0/4.1

ESX Host

Page 19: The Design and Evolution of Live Storage Migration in VMware ESX

19

Data Mover (DM)

Kernel Service• Provides disk copy operations

• Avoids memory copy (DMAs only)

Operation (default configuration)• 16 Outstanding IOs

• 256 KB IOs

Page 20: The Design and Evolution of Live Storage Migration in VMware ESX

20

Migration Time vs. Varying OIO

2 4 8 16 320

100

200

300

400

500

600

700

800

SnapshotDBT

Workload OIO

Mig

ratio

n Ti

me

(s)

Page 21: The Design and Evolution of Live Storage Migration in VMware ESX

21

Downtime vs. Varying OIO

2 4 8 16 320

5

10

15

20

25

30

35

SnapshotDBT

Workload OIO

Dow

ntim

e (s

)

Page 22: The Design and Evolution of Live Storage Migration in VMware ESX

22

Total Penalty vs. Varying OIO

2 4 8 16 320

50

100

150

200

250

300

350

400

450

500

SnapshotDBT

Workload OIO

Effe

ctiv

e Lo

st W

orkl

oad

Tim

e (s

)

Page 23: The Design and Evolution of Live Storage Migration in VMware ESX

23

Dirty Block Tracking Architecture

Benefits• Well understood architecture based similar to live VM migration

• Eliminated performance issues associated with snapshots

• Enables block level optimizations

• Atomicity

Challenges• Migrations may not converge (and will not succeed with reasonable downtime)

• Destination slower than source

• Insufficient copy bandwidth

• Convergence logic difficult to tune

• Downtime

Page 24: The Design and Evolution of Live Storage Migration in VMware ESX

24

Block Write Frequency – Exchange Workload

Page 25: The Design and Evolution of Live Storage Migration in VMware ESX

25

Heat Optimization – Introduction

Defer copying data that is frequently written

Detects frequently written blocks• File system metadata

• Circular logs

• Application specific data

No significant benefit for:• Copy on write file systems (e.g. ZFS, HAMMER, WAFL)

• Workloads with limited locality (e.g. OLTP)

Page 26: The Design and Evolution of Live Storage Migration in VMware ESX

26

Heat Optimization – Design

Disk LBAs

Page 27: The Design and Evolution of Live Storage Migration in VMware ESX

27

DBT versus IO Mirroring Intuition

Live migration intuition – intercepting all memory writes is expensive • Trapping interferes with data fast path

• DBT traps only first write to a page

• Writes a batched to aggregate subsequent writes without trap

Intercepting all storage writes is cheap• IO stack processes all IOs already

IO Mirroring• Synchronously mirror all writes

• Single pass copy of the bulk of the disk

Page 28: The Design and Evolution of Live Storage Migration in VMware ESX

28

Dest VolumeSrc Volume

VMDKVMDK

IO Mirroring – ESX 5.0

ESX Host

VMDK

Page 29: The Design and Evolution of Live Storage Migration in VMware ESX

29

Migration Time vs. Varying OIO

2 4 8 16 320

100

200

300

400

500

600

700

800

SnapshotDBTMirror

Workload OIO

Mig

ratio

n Ti

me

(s)

Page 30: The Design and Evolution of Live Storage Migration in VMware ESX

30

Downtime vs. Varying OIO

2 4 8 16 320

5

10

15

20

25

30

35

SnapshotDBTMirror

Workload OIO

Dow

ntim

e (s

)

Page 31: The Design and Evolution of Live Storage Migration in VMware ESX

31

Total Penalty vs. Varying OIO

2 4 8 16 320

50

100

150

200

250

300

350

400

450

500

SnapshotDBTMirror

Workload OIO

Effe

ctiv

e Lo

st W

orkl

oad

Tim

e (s

)

Page 32: The Design and Evolution of Live Storage Migration in VMware ESX

32

IO Mirroring

Benefits• Minimal migration time

• Near-zero downtime

• Atomicity

Challenges• Complex code to guarantee atomicity of the migration

• Odd guest interactions require code for verification and debugging

Page 33: The Design and Evolution of Live Storage Migration in VMware ESX

33

Throttling the source

Page 34: The Design and Evolution of Live Storage Migration in VMware ESX

34

IO Mirroring to Slow Destination

0

500

1000

1500

2000

2500

3000

3500

Read IOPS Source

Write IOPS Source

Write IOPS Destination

Time (seconds)

IOPS

Start

End

Page 35: The Design and Evolution of Live Storage Migration in VMware ESX

35

Agenda What is live migration? Migration architectures Lessons

Page 36: The Design and Evolution of Live Storage Migration in VMware ESX

36

Recap

In the beginning live migration

Snapshot:• Usually has the worst downtime/penalty

• Whole disk level abstraction

• Snapshot overheads due to metadata

• No atomicity

DBT:• Manageable downtime (except when OIO > 16)

• Enabled block level optimizations

• Difficult to make convergence decisions

• No natural throttling

Page 37: The Design and Evolution of Live Storage Migration in VMware ESX

37

Recap – Cont.

Insight: storage is not memory• Interposing on all writes is practical and performant

IO Mirroring:• Near-zero downtime

• Best migration time consistency

• Minimal performance penalty

• No convergence logic necessary

• Natural throttling

Page 38: The Design and Evolution of Live Storage Migration in VMware ESX

38

Future Work

Leverage workload analysis to reduce mirroring overhead• Defer mirroring regions with potential sequential write IO patterns

• Defer hot blocks

• Read ahead for lazy mirroring

Apply mirroring to WAN migrations• New optimizations and hybrid architecture

Page 39: The Design and Evolution of Live Storage Migration in VMware ESX

39

Thank You!

Page 40: The Design and Evolution of Live Storage Migration in VMware ESX

40

Backup Slides

Page 41: The Design and Evolution of Live Storage Migration in VMware ESX

41

Exchange Migration with Heat

1 2 3 4 5 6 7 8 9 10

0

50

100

150

200

250

300

350

400

450

500

Hot Block

Cold Block

Baseline without Heat

Iteration

Data

Cop

ied

(MB)

Page 42: The Design and Evolution of Live Storage Migration in VMware ESX

42

Exchange Workload

Exchange 2010:• Workload generated by Exchange Load Generator

• 2000 User mailboxes

• Migrated only the 350 GB mailbox disk

Hardware:• Dell PowerEdge R910: 8-core Nehalem-EX

• EMC CX3-40

• Migrated between two 6 disk RAID-0 volumes

Page 43: The Design and Evolution of Live Storage Migration in VMware ESX

43

Exchange Results

Type Migration Time DowntimeDBT 2935.5 13.297

Incremental DBT 2638.9 7.557

IO Mirroring 1922.2 0.220

DBT (2x) Failed -

Incremental DBT (2x) Failed -

IO Mirroring (2x) 1824.3 0.186

Page 44: The Design and Evolution of Live Storage Migration in VMware ESX

44

IO Mirroring Lock Behavior

Moving the lock region1. Wait for non-mirrored inflight read IOs to complete. (queue all IOs)

2. Move the lock range

3. Release queued IOs

IILocked region:

IOs deferred until lock release IIIUnlocked region:

Write IOs to source only

IMirrored region:

Write IOs mirrored

Page 45: The Design and Evolution of Live Storage Migration in VMware ESX

45

Non-trivial Guest Interactions

Guest IO crossing disk locked regions

Guest buffer cache changing

Overlapped IOs

Page 46: The Design and Evolution of Live Storage Migration in VMware ESX

46

Lock Latency and Data Mover Time

0 111.071649896852 234.57560937731 403.7525319068390

200000

400000

600000

800000

DM

Cop

y (m

s)

0 111.071649896852 234.57560937731 403.7525319068390

20000

40000

60000

Lock

Lat

ency

(m

s)

Time (s)

No ContentionLock Contention

Lock Contention IO Mirroring Lock 2nd Disk

Page 47: The Design and Evolution of Live Storage Migration in VMware ESX

47

Source/Destination Valid Inconsistencies

Normal Guest Buffer Cache Behavior

This inconsistency is okay!• Source and destination are both valid crash consistent views of the disk

Time

Guest OSIssues IO

Source IOIssued

DestinationIO Issued

Guest OSModifies

Buffer Cache

Page 48: The Design and Evolution of Live Storage Migration in VMware ESX

48

Source/Destination Valid Inconsistencies

Overlapping IOs (Synthetic workloads only)

Seen in Iometer and other synthetic benchmarks File systems do not generate this

Virtual Disk

Source Disk Destination Disk

IO 1IO 2

IO 1IO 2 IO 1

IO 2Issue Order Issue Order

Issue Order

LBALBA

LBA

IO Reordering

Page 49: The Design and Evolution of Live Storage Migration in VMware ESX

49

Incremental DBT Optimization – ESX 4.1

Write to blocks Dirty block

? ? ? ? ?

Disk Blocks

Copy Ignore Copy Ignore