live migrating a container: pros, cons and gotchas

22
Live migrating a container: pros, cons and gotchas Pavel Emelyanov Principal engineer @ Virtuozzo

Upload: docker-inc

Post on 23-Jan-2018

10.523 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Live migrating a container: pros, cons and gotchas

Live migrating a container:pros, cons and gotchas

Pavel EmelyanovPrincipal engineer @ Virtuozzo

Page 2: Live migrating a container: pros, cons and gotchas

AgendaAgenda

• Why you might want to live migrate a container

• Why (and how) to avoid live migration

• Why is container live migration so complex

2

Page 3: Live migrating a container: pros, cons and gotchas

Migration in a nutshelMigration in a nutshel

• Save state

• Copy state

• Restore from state

3

Page 4: Live migrating a container: pros, cons and gotchas

Why you might want to live migrate a containerWhy you might want to live migrate a container

• Spectacular

• Load balancing

• Updating kernel

– Can avoid live migration, just C/R

• Updaring or replacing hardware

4

Page 5: Live migrating a container: pros, cons and gotchas

Why to avoid live migrationWhy to avoid live migration

5

Page 6: Live migrating a container: pros, cons and gotchas

How to avoid live migrationHow to avoid live migration

• Balance network traffic

• Microservices

• Crash-driven updates

• Planned downtime

6

Page 7: Live migrating a container: pros, cons and gotchas

Making live migration liveMaking live migration live

• State saving, transfering and restoring happens with tasks frozen

• (Big) memory transfer should not be done at that time

• Memory pre-copy

• Memory post-copy

7

Page 8: Live migrating a container: pros, cons and gotchas

Pre-copyPre-copy

• Track memory changes,

copy memory while tasks are running, goto again

• Pros:

– Safe: once migrated, source node can disappear

• Cons:

– Unpredictable: iterations may take long

– Non-guaranteed: “dirty” memory next round may remain big

8

Page 9: Live migrating a container: pros, cons and gotchas

Post-copyPost-copy

• Migrate all but memory, turn on “network swap” on destination

• Pros:

– Predictable: time to migrate can be well estimated

• Cons:

– Unsafe: src node death means death of container on destination

9

Page 10: Live migrating a container: pros, cons and gotchas

Live migration at lengthLive migration at length

• Memory pre-copy (iteratively, optional)

• Freeze + Save state

• Copy state

• Restore from state + Unfreeze and resume

• Memory post-copy (optional)

10

Page 11: Live migrating a container: pros, cons and gotchas

GotchasGotchas

11

VS

Page 12: Live migrating a container: pros, cons and gotchas

Things to work withThings to work with

• VM

– Environment: virtual hardware, paravirt

– CPU

– Memory

• Container

– Environment: cgroups, namespaces

– Processes and other animals

– Memory

12

Page 13: Live migrating a container: pros, cons and gotchas

Memory pre-copyMemory pre-copy

• VM

– All memory at hands

– Plain address space

• Container

– Memory

● is scatered over the processes

● can be (or can be not) shared

● can be (or can be not) mapped to disk files

13

Page 14: Live migrating a container: pros, cons and gotchas

Save stateSave state

• VM

– Hardware state

● Tree of ~100 objects

● Fixed amount of data per each

• Container

– State of all objects

● Graph of up to ~1000 objects

● All have different amount of data, different reading API

14

Page 15: Live migrating a container: pros, cons and gotchas

Restore from stateRestore from state

• VM

– Copy memory in place, write state into devices

• Container

– Creation of many small objects

– Not all have sane API for creation

● Creation sequence can be non-trivial

15

Page 16: Live migrating a container: pros, cons and gotchas

Memory post-copyMemory post-copy

• UserfaultFD from Andrea Archangeli

• VM

– Merged into 4.2

• Container

– Non-cooperative work of uffd monitor and client,

need further patching

16

Page 17: Live migrating a container: pros, cons and gotchas

And we also need this, this and this!And we also need this, this and this!

• Check for CPUs compatibility

• Check and load necessary kernel modules (iptables, filesystems)

• Non-shared filesystem should be copied

• Roll-back on source node if something fails in between

– Keep tasks frozen after dump, kill after restore

17

Page 18: Live migrating a container: pros, cons and gotchas

ImplementationImplementation

• CRIU

– Save & restore state

– Memory pre/post copy

• P.Haul

– Checks

– Orchestrate all C/R steps

– Deal with filesystem

18

Page 19: Live migrating a container: pros, cons and gotchas

P.Haul goalsP.Haul goals

• Provide engine for containers live miration using CRIU

• Perform necessary pre-checks (e.g. CPU compatibility)

• Organize memory pre-copy and/or post-copy

• Take care of file-system migration (if needed)

19

Page 20: Live migrating a container: pros, cons and gotchas

Under the hoodUnder the hood

20

CRIU CRIUp.haul p.hauldocker -d docker -dmigrate

src dst

check (CPUs, kernels)

pre-dumpmemory

dump

other images

restore

memorylazy mem

FS

FS copy

done

pre-cop ypost-co py

kill

freezetime

Page 21: Live migrating a container: pros, cons and gotchas

More infoMore info

• http://criu.org

• http://criu.org/P.Haul

[email protected]

• +CriuOrg / @__criu__

• https://github.com/xemul/(criu|p.haul)

21

Page 22: Live migrating a container: pros, cons and gotchas

Thank you!Pavel Emelyanov@[email protected]