© 2010 vmware inc. all rights reserved the design and evolution of live storage migration in vmware...

49
© 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware, Inc. Tal Garfinkel, VMware, Inc. Min Cai, VMware, Inc.

Upload: melanie-blacknall

Post on 31-Mar-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

© 2010 VMware Inc. All rights reserved

The Design and Evolution ofLive Storage Migration in VMware ESXAli Mashtizadeh, VMware, Inc.

Emré Celebi, VMware, Inc.

Tal Garfinkel, VMware, Inc.

Min Cai, VMware, Inc.

Page 2: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

2

Agenda

What is live migration?

Migration architectures

Lessons

Page 3: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

3

What is live migration (vMotion)?

Moves a VM between two physical hosts

No noticeable interruption to the VM (ideally)

Use cases:

• Hardware/software upgrades

• Distributed resource management

• Distributed power management

Page 4: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

4

Virtual Machine

Live Migration

Virtual Machine

Disk is placed on a shared volume (100GBs-1TBs)

CPU and Device State is copied (MBs)

Memory is copied (GBs)

• Large and it changes often → Iteratively copy

Source Destination

Page 5: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

5

Live Storage Migration

What is live storage migration?

• Migration of a VM’s virtual disks

Why does this matter?

• VMs can be very large

• Array maintenance means you may migrate all VMs in an array

• Migration time in hours

Page 6: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

6

Live Migration and Storage Live Migration – a short history

ESX 2.0 (2003) – Live migration (vMotion)

• Virtual disks must live on shared volumes

ESX 3.0 (2006) – Live storage migration lite (Upgrade vMotion)

• Enabled upgrade of VMFS by migrating the disks

ESX 3.5 (2007) – Live storage migration (Storage vMotion)

• Storage array upgrade and repair

• Manual storage load balancing

• Snapshot based

ESX 4.0 (2009) – Dirty block tracking (DBT)

ESX 5.0 (2011) – IO Mirroring

Page 7: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

7

Agenda

What is live migration?

Migration architectures

Lessons

Page 8: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

8

Goals

Migration Time

• Minimize total end-to-end migration time

• Predictability of migration time

Guest Penalty

• Minimize performance loss

• Minimize downtime

Atomicity

• Avoid dependence on multiple volumes (for replication fault domains)

Guarantee Convergence

• Ideally we want migrations to always complete successfully

Page 9: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

9

Convergence

Migration time vs. downtime

More migration time → more guest performance impact

More downtime → more service unavailability

Factors that effect convergence:

• Block dirty rate

• Available storage network bandwidth

• Workload interactions

Challenges:

• Many workloads interacting on storage array

• Unpredictable behavior

Migration Time Downtime

Da

ta R

em

ain

ing

Time

Page 10: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

10

Migration Architectures

Snapshotting

Dirty Block Tracking (DBT)

• Heat Optimization

IO Mirroring

Page 11: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

11

Dest VolumeSrc Volume

ESX Host

Snapshot Architecture – ESX 3.5 U1

VMDKVMDKVMDK VMDK

Page 12: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

12

Synthetic Workload

Synthetic Iometer workload (OLTP):

• 30% Write/70% Read

• 100% Random

• 8KB IOs

• Outstanding IOs (OIOs) from 2 to 32

Migration Setup:

• Migrated both the 6 GB System Disk and 32 GB Data Disk

Hardware:

• Dell PowerEdge R710: Dual Xeon X5570 2.93 GHz

• Two EMC CX4-120 arrays connected via 8Gb Fibre Channel

Page 13: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

13

Migration Time vs. Varying OIO

2 4 8 16 320

100

200

300

400

500

600

Snapshot

Snapshot

Workload OIO

Mig

rati

on

Tim

e (s

)

Page 14: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

14

Downtime vs. Varying OIO

2 4 8 16 320

5

10

15

20

25

Snapshot

Snapshot

Workload OIO

Do

wn

tim

e (s

)

Page 15: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

15

Total Penalty vs. Varying OIO

2 4 8 16 320

50100150200250300350400450500

Snapshot

Snapshot

Workload OIO

Eff

ecti

ve L

ost

Wo

rklo

ad T

ime

(s)

Page 16: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

16

Snapshot Architecture

Benefits

• Simple implementation

• Built on existing and well tested infrastructure

Challenges

• Suffers from snapshot performance issues

• Disk space: Up to 3x the VM size

• Not an atomic switch from source to destination

• A problem when spanning replication fault domains

• Downtime

• Long migration times

Page 17: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

17

Snapshot versus Dirty Block Tracking Intuition

Virtual disk level snapshots have overhead to maintain metadata

Requires lots of disk space

We want to operate more like live migration

• Iterative copy

• Block level copy rather than disk level – enables optimizations

We need a mechanism to track writes

Page 18: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

18

Dest VolumeSrc Volume

VMDK VMDKVMDK

Dirty Block Tracking (DBT) Architecture – ESX 4.0/4.1

ESX Host

Page 19: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

19

Data Mover (DM)

Kernel Service

• Provides disk copy operations

• Avoids memory copy (DMAs only)

Operation (default configuration)

• 16 Outstanding IOs

• 256 KB IOs

Page 20: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

20

Migration Time vs. Varying OIO

2 4 8 16 320

100

200

300

400

500

600

700

800

SnapshotDBT

Workload OIO

Mig

rati

on

Tim

e (s

)

Page 21: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

21

Downtime vs. Varying OIO

2 4 8 16 320

5

10

15

20

25

30

35

SnapshotDBT

Workload OIO

Do

wn

tim

e (s

)

Page 22: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

22

Total Penalty vs. Varying OIO

2 4 8 16 320

50

100

150

200

250

300

350

400

450

500

SnapshotDBT

Workload OIO

Eff

ecti

ve L

ost

Wo

rklo

ad T

ime

(s)

Page 23: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

23

Dirty Block Tracking Architecture

Benefits

• Well understood architecture based similar to live VM migration

• Eliminated performance issues associated with snapshots

• Enables block level optimizations

• Atomicity

Challenges

• Migrations may not converge (and will not succeed with reasonable downtime)

• Destination slower than source

• Insufficient copy bandwidth

• Convergence logic difficult to tune

• Downtime

Page 24: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

24

Block Write Frequency – Exchange Workload

Page 25: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

25

Heat Optimization – Introduction

Defer copying data that is frequently written

Detects frequently written blocks

• File system metadata

• Circular logs

• Application specific data

No significant benefit for:

• Copy on write file systems (e.g. ZFS, HAMMER, WAFL)

• Workloads with limited locality (e.g. OLTP)

Page 26: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

26

Heat Optimization – Design

Disk LBAs

Page 27: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

27

DBT versus IO Mirroring Intuition

Live migration intuition – intercepting all memory writes is expensive

• Trapping interferes with data fast path

• DBT traps only first write to a page

• Writes a batched to aggregate subsequent writes without trap

Intercepting all storage writes is cheap

• IO stack processes all IOs already

IO Mirroring

• Synchronously mirror all writes

• Single pass copy of the bulk of the disk

Page 28: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

28

Dest VolumeSrc Volume

VMDKVMDK

IO Mirroring – ESX 5.0

ESX Host

VMDK

Page 29: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

29

Migration Time vs. Varying OIO

2 4 8 16 320

100

200

300

400

500

600

700

800

SnapshotDBTMirror

Workload OIO

Mig

rati

on

Tim

e (s

)

Page 30: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

30

Downtime vs. Varying OIO

2 4 8 16 320

5

10

15

20

25

30

35

SnapshotDBTMirror

Workload OIO

Do

wn

tim

e (s

)

Page 31: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

31

Total Penalty vs. Varying OIO

2 4 8 16 320

50

100

150

200

250

300

350

400

450

500

SnapshotDBTMirror

Workload OIO

Eff

ecti

ve L

ost

Wo

rklo

ad T

ime

(s)

Page 32: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

32

IO Mirroring

Benefits

• Minimal migration time

• Near-zero downtime

• Atomicity

Challenges

• Complex code to guarantee atomicity of the migration

• Odd guest interactions require code for verification and debugging

Page 33: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

33

Throttling the source

Page 34: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

34

IO Mirroring to Slow Destination

0

500

1000

1500

2000

2500

3000

3500

Read IOPS Source

Write IOPS Source

Write IOPS Destination

Time (seconds)

IOP

S

Start

End

Page 35: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

35

Agenda

What is live migration?

Migration architectures

Lessons

Page 36: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

36

Recap

In the beginning live migration

Snapshot:

• Usually has the worst downtime/penalty

• Whole disk level abstraction

• Snapshot overheads due to metadata

• No atomicity

DBT:

• Manageable downtime (except when OIO > 16)

• Enabled block level optimizations

• Difficult to make convergence decisions

• No natural throttling

Page 37: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

37

Recap – Cont.

Insight: storage is not memory

• Interposing on all writes is practical and performant

IO Mirroring:

• Near-zero downtime

• Best migration time consistency

• Minimal performance penalty

• No convergence logic necessary

• Natural throttling

Page 38: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

38

Future Work

Leverage workload analysis to reduce mirroring overhead

• Defer mirroring regions with potential sequential write IO patterns

• Defer hot blocks

• Read ahead for lazy mirroring

Apply mirroring to WAN migrations

• New optimizations and hybrid architecture

Page 39: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

39

Thank You!

Page 40: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

40

Backup Slides

Page 41: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

41

Exchange Migration with Heat

1 2 3 4 5 6 7 8 9 10

0

50

100

150

200

250

300

350

400

450

500

Hot Block

Cold Block

Baseline without Heat

Iteration

Dat

a C

op

ied

(M

B)

Page 42: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

42

Exchange Workload

Exchange 2010:

• Workload generated by Exchange Load Generator

• 2000 User mailboxes

• Migrated only the 350 GB mailbox disk

Hardware:

• Dell PowerEdge R910: 8-core Nehalem-EX

• EMC CX3-40

• Migrated between two 6 disk RAID-0 volumes

Page 43: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

43

Exchange Results

Type Migration Time Downtime

DBT 2935.5 13.297

Incremental DBT 2638.9 7.557

IO Mirroring 1922.2 0.220

DBT (2x) Failed -

Incremental DBT (2x) Failed -

IO Mirroring (2x) 1824.3 0.186

Page 44: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

44

IO Mirroring Lock Behavior

Moving the lock region

1. Wait for non-mirrored inflight read IOs to complete. (queue all IOs)

2. Move the lock range

3. Release queued IOs

II

Locked region:

IOs deferred until lock releaseIII

Unlocked region:

Write IOs to source only

I

Mirrored region:

Write IOs mirrored

Page 45: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

45

Non-trivial Guest Interactions

Guest IO crossing disk locked regions

Guest buffer cache changing

Overlapped IOs

Page 46: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

46

Lock Latency and Data Mover Time

0 111.071649896852 234.57560937731 403.7525319068390

200000

400000

600000

800000

DM

Co

py

(ms)

0 111.071649896852 234.57560937731 403.7525319068390

20000

40000

60000

Lo

ck L

aten

cy

(ms)

Time (s)

No ContentionLock Contention

Lock Contention IO Mirroring Lock 2nd Disk

Page 47: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

47

Source/Destination Valid Inconsistencies

Normal Guest Buffer Cache Behavior

This inconsistency is okay!

• Source and destination are both valid crash consistent views of the disk

Time

Guest OS

Issues IO

Source IO

Issued

Destination

IO Issued

Guest OS

Modifies

Buffer Cache

Page 48: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

48

Source/Destination Valid Inconsistencies

Overlapping IOs (Synthetic workloads only)

Seen in Iometer and other synthetic benchmarks

File systems do not generate this

Virtual Disk

Source Disk Destination Disk

IO 1IO 2

IO 1IO 2 IO 1

IO 2Issue Order Issue Order

Issue Order

LBALBA

LBA

IO Reordering

Page 49: © 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware,

49

Incremental DBT Optimization – ESX 4.1

Write to blocks Dirty block

? ? ? ? ?

Disk Blocks

Copy Ignore Copy Ignore