the design and evolution of live storage migration in ... · esx 2.0 (2003) – live migration...

49
© 2010 VMware Inc. All rights reserved The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware, Inc. Tal Garfinkel, VMware, Inc. Min Cai, VMware, Inc.

Upload: others

Post on 28-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

© 2010 VMware Inc. All rights reserved

The Design and Evolution of Live Storage Migration in VMware ESX Ali Mashtizadeh, VMware, Inc. Emré Celebi, VMware, Inc. Tal Garfinkel, VMware, Inc. Min Cai, VMware, Inc.

Page 2: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

2

Agenda

 What is live migration?  Migration architectures   Lessons

Page 3: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

3

What is live migration (vMotion)?

 Moves a VM between two physical hosts   No noticeable interruption to the VM (ideally)

  Use cases: • Hardware/software upgrades

• Distributed resource management • Distributed power management

Page 4: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

4

Virtual Machine

Live Migration

Virtual Machine

  Disk is placed on a shared volume (100GBs-1TBs)   CPU and Device State is copied (MBs)

 Memory is copied (GBs) •  Large and it changes often → Iteratively copy

Source Destination

Page 5: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

5

Live Storage Migration

 What is live storage migration? • Migration of a VM’s virtual disks

 Why does this matter? •  VMs can be very large •  Array maintenance means you may migrate all VMs in an array

• Migration time in hours

Page 6: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

6

Live Migration and Storage Live Migration – a short history

  ESX 2.0 (2003) – Live migration (vMotion) •  Virtual disks must live on shared volumes

  ESX 3.0 (2006) – Live storage migration lite (Upgrade vMotion) •  Enabled upgrade of VMFS by migrating the disks

  ESX 3.5 (2007) – Live storage migration (Storage vMotion) •  Storage array upgrade and repair • Manual storage load balancing

•  Snapshot based

  ESX 4.0 (2009) – Dirty block tracking (DBT)   ESX 5.0 (2011) – IO Mirroring

Page 7: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

7

Agenda

 What is live migration?  Migration architectures   Lessons

Page 8: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

8

Goals

 Migration Time • Minimize total end-to-end migration time

•  Predictability of migration time

 Guest Penalty • Minimize performance loss • Minimize downtime

  Atomicity •  Avoid dependence on multiple volumes (for replication fault domains)

 Guarantee Convergence •  Ideally we want migrations to always complete successfully

Page 9: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

9

Convergence

 Migration time vs. downtime

 More migration time → more guest performance impact  More downtime → more service unavailability

  Factors that effect convergence: •  Block dirty rate

•  Available storage network bandwidth • Workload interactions

  Challenges: • Many workloads interacting on storage array • Unpredictable behavior

Migration Time Downtime

Dat

a R

emai

ning

Time

Page 10: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

10

Migration Architectures

  Snapshotting

  Dirty Block Tracking (DBT) • Heat Optimization

  IO Mirroring

Page 11: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

11

Dest Volume Src Volume

ESX Host

Snapshot Architecture – ESX 3.5 U1

VMDK VMDK VMDK VMDK

Page 12: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

12

Synthetic Workload

  Synthetic Iometer workload (OLTP): •  30% Write/70% Read

•  100% Random •  8KB IOs

• Outstanding IOs (OIOs) from 2 to 32

 Migration Setup: • Migrated both the 6 GB System Disk and 32 GB Data Disk

  Hardware: • Dell PowerEdge R710: Dual Xeon X5570 2.93 GHz

•  Two EMC CX4-120 arrays connected via 8Gb Fibre Channel

Page 13: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

13

Migration Time vs. Varying OIO

0

100

200

300

400

500

600

2 4 8 16 32

Mig

ratio

n Ti

me

(s)

Workload OIO

Snapshot

Snapshot

Page 14: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

14

Downtime vs. Varying OIO

0

5

10

15

20

25

2 4 8 16 32

Dow

ntim

e (s

)

Workload OIO

Snapshot

Snapshot

Page 15: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

15

Total Penalty vs. Varying OIO

0 50

100 150 200 250 300 350 400 450 500

2 4 8 16 32

Effe

ctiv

e Lo

st W

orkl

oad

Tim

e (s

)

Workload OIO

Snapshot

Snapshot

Page 16: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

16

Snapshot Architecture

  Benefits •  Simple implementation

•  Built on existing and well tested infrastructure

  Challenges •  Suffers from snapshot performance issues • Disk space: Up to 3x the VM size

• Not an atomic switch from source to destination •  A problem when spanning replication fault domains

• Downtime •  Long migration times

Page 17: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

17

Snapshot versus Dirty Block Tracking Intuition

  Virtual disk level snapshots have overhead to maintain metadata   Requires lots of disk space

 We want to operate more like live migration •  Iterative copy

•  Block level copy rather than disk level – enables optimizations

 We need a mechanism to track writes

Page 18: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

18

Dest Volume Src Volume VMDK VMDK VMDK

Dirty Block Tracking (DBT) Architecture – ESX 4.0/4.1

ESX Host

Page 19: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

19

Data Mover (DM)

  Kernel Service •  Provides disk copy operations

•  Avoids memory copy (DMAs only)

 Operation (default configuration) •  16 Outstanding IOs •  256 KB IOs

Page 20: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

20

Migration Time vs. Varying OIO

0

100

200

300

400

500

600

700

800

2 4 8 16 32

Mig

ratio

n Ti

me

(s)

Workload OIO

Snapshot DBT

Page 21: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

21

Downtime vs. Varying OIO

0

5

10

15

20

25

30

35

2 4 8 16 32

Dow

ntim

e (s

)

Workload OIO

Snapshot DBT

Page 22: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

22

Total Penalty vs. Varying OIO

0

50

100

150

200

250

300

350

400

450

500

2 4 8 16 32

Effe

ctiv

e Lo

st W

orkl

oad

Tim

e (s

)

Workload OIO

Snapshot DBT

Page 23: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

23

Dirty Block Tracking Architecture

  Benefits • Well understood architecture based similar to live VM migration

•  Eliminated performance issues associated with snapshots •  Enables block level optimizations

•  Atomicity

  Challenges • Migrations may not converge (and will not succeed with reasonable downtime)

•  Destination slower than source •  Insufficient copy bandwidth

• Convergence logic difficult to tune • Downtime

Page 24: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

24

Block Write Frequency – Exchange Workload

Page 25: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

25

Heat Optimization – Introduction

  Defer copying data that is frequently written

  Detects frequently written blocks •  File system metadata

• Circular logs •  Application specific data

  No significant benefit for: • Copy on write file systems (e.g. ZFS, HAMMER, WAFL) • Workloads with limited locality (e.g. OLTP)

Page 26: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

26

Heat Optimization – Design

Disk LBAs

Page 27: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

27

DBT versus IO Mirroring Intuition

  Live migration intuition – intercepting all memory writes is expensive •  Trapping interferes with data fast path

• DBT traps only first write to a page • Writes a batched to aggregate subsequent writes without trap

  Intercepting all storage writes is cheap •  IO stack processes all IOs already

  IO Mirroring •  Synchronously mirror all writes •  Single pass copy of the bulk of the disk

Page 28: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

28

Dest Volume Src Volume

VMDK VMDK

IO Mirroring – ESX 5.0

ESX Host

VMDK

Page 29: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

29

Migration Time vs. Varying OIO

0

100

200

300

400

500

600

700

800

2 4 8 16 32

Mig

ratio

n Ti

me

(s)

Workload OIO

Snapshot DBT Mirror

Page 30: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

30

Downtime vs. Varying OIO

0

5

10

15

20

25

30

35

2 4 8 16 32

Dow

ntim

e (s

)

Workload OIO

Snapshot DBT Mirror

Page 31: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

31

Total Penalty vs. Varying OIO

0

50

100

150

200

250

300

350

400

450

500

2 4 8 16 32

Effe

ctiv

e Lo

st W

orkl

oad

Tim

e (s

)

Workload OIO

Snapshot DBT Mirror

Page 32: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

32

IO Mirroring

  Benefits • Minimal migration time

• Near-zero downtime •  Atomicity

  Challenges • Complex code to guarantee atomicity of the migration

• Odd guest interactions require code for verification and debugging

Page 33: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

33

Throttling the source

Page 34: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

34

IO Mirroring to Slow Destination

0

500

1000

1500

2000

2500

3000

3500

0 500 1000 1500 2000 2500 3000

IOPS

Time (seconds)

Read IOPS Source

Write IOPS Source

Write IOPS Destination

Start

End

Page 35: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

35

Agenda

 What is live migration?  Migration architectures   Lessons

Page 36: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

36

Recap

  In the beginning live migration

  Snapshot: • Usually has the worst downtime/penalty

• Whole disk level abstraction •  Snapshot overheads due to metadata

• No atomicity

  DBT: • Manageable downtime (except when OIO > 16)

•  Enabled block level optimizations • Difficult to make convergence decisions

• No natural throttling

Page 37: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

37

Recap – Cont.

  Insight: storage is not memory •  Interposing on all writes is practical and performant

  IO Mirroring: • Near-zero downtime •  Best migration time consistency

• Minimal performance penalty • No convergence logic necessary

• Natural throttling

Page 38: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

38

Future Work

  Leverage workload analysis to reduce mirroring overhead • Defer mirroring regions with potential sequential write IO patterns

• Defer hot blocks • Read ahead for lazy mirroring

  Apply mirroring to WAN migrations • New optimizations and hybrid architecture

Page 39: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

39

Thank You!

Page 40: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

40

Backup Slides

Page 41: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

41

Exchange Migration with Heat

0

50

100

150

200

250

300

350

400

450

500

1 2 3 4 5 6 7 8 9 10

Dat

a C

opie

d (M

B)

Iteration

Hot Block

Cold Block

Baseline without Heat

Page 42: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

42

Exchange Workload

  Exchange 2010: • Workload generated by Exchange Load Generator

•  2000 User mailboxes • Migrated only the 350 GB mailbox disk

  Hardware: • Dell PowerEdge R910: 8-core Nehalem-EX

•  EMC CX3-40

• Migrated between two 6 disk RAID-0 volumes

Page 43: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

43

Exchange Results

Type Migration Time Downtime DBT 2935.5 13.297 Incremental DBT 2638.9 7.557 IO Mirroring 1922.2 0.220 DBT (2x) Failed - Incremental DBT (2x) Failed - IO Mirroring (2x) 1824.3 0.186

Page 44: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

44

IO Mirroring Lock Behavior

 Moving the lock region 1.  Wait for non-mirrored inflight read IOs to complete. (queue all IOs)

2.  Move the lock range 3.  Release queued IOs

II

Locked region: IOs deferred until lock release III

Unlocked region: Write IOs to source only

I

Mirrored region: Write IOs mirrored

Page 45: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

45

Non-trivial Guest Interactions

 Guest IO crossing disk locked regions

 Guest buffer cache changing

 Overlapped IOs

Page 46: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

46

Lock Latency and Data Mover Time

0

500 0 21

43

62

81

10 12 13 15 8 17 20 3 23 3 26 29 32 36 39 41 1 42 44 46

DM

Cop

y (m

s)

0

20

40

60

0 21

43

62

81

10 1 12 13 9 15 8 17 20 3 23 26 29 32 36 39 5 41 1 42 9 44 6 46

Lock

Lat

ency

(m

s)

Time (s)

No Contention Lock Contention

Lock Contention IO Mirroring Lock 2nd Disk

Page 47: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

47

Source/Destination Valid Inconsistencies

  Normal Guest Buffer Cache Behavior

  This inconsistency is okay! •  Source and destination are both valid crash consistent views of the disk

Time

Guest OS

Issues IO

Source IO

Issued

Destination

IO Issued

Guest OS

Modifies

Buffer Cache

Page 48: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

48

Source/Destination Valid Inconsistencies

 Overlapping IOs (Synthetic workloads only)

  Seen in Iometer and other synthetic benchmarks   File systems do not generate this

Virtual Disk

Source Disk Destination Disk

IO 1 IO 2

IO 1 IO 2 IO 1

IO 2 Issue Order Issue Order

Issue Order

LBA LBA

LBA

IO Reordering

Page 49: The Design and Evolution of Live Storage Migration in ... · ESX 2.0 (2003) – Live migration (vMotion) • Virtual disks must live on shared volumes ESX 3.0 (2006) – Live storage

49

Incremental DBT Optimization – ESX 4.1

Write to blocks Dirty block

? ? ? ? ?

Disk Blocks

Copy Ignore Copy Ignore