criu texas-linux-fest-2014
TRANSCRIPT
SWsoft Corporate
CRIU:
Time and Space Travel Service
for Linux Applications
Kir KolyshkinTexas Linux Fest, 14 Jun 2014
Agenda
What is CRIU?Project history and stateUsage scenariosLive migration
Reboot-less kernel upgrade
Slow services startup
Advanced debugging and testing
and more...
What is CRIU?
Checkpoint Restore In Userspace
Checkpoint
or
Dump
Restore
or
Restart
Full
info
about
state
CRIU pre-history
OpenVZ project
Containers live migration feature
Containers Upstream Linux1500+ kernel patches from us
Kernel-level checkpoint-restore merge failed
User-level checkpoint-restore ...
Why in userspace?
Kernel
User-space
Dump:
- ptrace
- /proc
- netlink
- syscalls
Restore:
- syscalls
Process
kmod
C/R API
Some history
Project started almost 3 years ago an RFC on kernel memory API extension
small command line tool
minimal dump of process' internals
First release v0.1 -- 23 Jul 2012 (x86 and basic stuff)
Since then Kernel part completed a year ago (150+ kernel
patches:
new APIs for reading and setting process' state)
Current project state
The latest releasev1.3rc1
supports x86_64 & ARM & AARM64
support features that typical apps use
works on unmodified linux-3.11+
Included into Debian, Fedora, Ubuntu, Arch, SUSE, Gentoo, CoreOS...
Explicitly checkedApache, nginx, Oracle*, mysql, mongodb
ssh/sshd, openvpn, cron, sendmail
Java, gcc, make
VNC + { gimp, mplayer, blender, supertux }
Screen + { bash, top, tcpdump, tar/bz2 }
* some kernel tweaks required
Some vitals
- 55K lines of code
- 150+ kernel patches
- contribs from Google, Huawei, Samsung, Canonical
Usage scenarios
Live migrationincl. Docker, LXC, OpenVZ containers
Kernel upgrade w/o reboot
Slow services startup
Periodic snapshots (HPC)
Advanced debugging and testing
Live migration
Host A
Host B
Live migration
Host A
Host B
Shared FS
Pre-migrate memory
with memory tracker
http://criu.org/P.Haul
Load balancing on cluster
Host A
Host C
Host B
Power saving on cluster
Host A
Host C
Host B
Node maintenance
Host A
Host B
Kernel upgrade w/o reboot
Host
Kernel A
Kexec
Kernel B
Slow services startup
time
# service foo start
Service readiness
Spawn process
Load config
Top-up caches
Initialize resource pools
Ready
T
100%
Slow services startup
time
T
t < T
Ready
Spawn process
100%
Service readiness
# service foo restore
Periodic snapshots
time
Memory tracker helps
to keep images smaller
HPC
time
Power
failure
0%
20%
40%
60%
60%
Advanced debugging
Production Host
Application
in trouble
Developer Host
Debugger
Advanced testing
...
New test
or
new hardware
?
More (funny) use cases
Forgot to launch your program in screenLive-migrate it there
Playing a game without the save buttonSnapshot it
[Put your own use case here]
http://criu.org/Usage_scenarios
Recap
Started as containers live-migration tool
General tool to dump/restore apps state
v1.2 + Linux-3.11+ can do the trick
A lot of interesting technologiesMemory tracker
Migration of TCP connections
Injecting your code into a running application
Detecting kernel objects sharing
etc.
Resources
http://criu.org main site, documentation
http://git.criu.org git repo with tool sources
http://plus.google.com/+CRIU page
[email protected] mailing list
Kir Kolyshkin that's me
Thank you!
Parallels Optimized ComputingTM
Confidential