criu: time and space travel service for linux applications

Download CRIU: Time and Space travel Service for Linux Applications

If you can't read please download the document

Upload: openvz

Post on 28-Jul-2015

30 views

Category:

Software


1 download

TRANSCRIPT

1. CRIU: Time and Space Travel Service for Linux Applications Kir Kolyshkin Texas Linux Fest, 14 Jun 2014 2. 2 Agenda What is CRIU? Project history and state Usage scenarios Live migration Reboot-less kernel upgrade Slow services startup Advanced debugging and testing and more... 3. 3 What is CRIU? Checkpoint Restore In Userspace Checkpoint or Dump Restore or Restart Full info about state 4. 4 CRIU pre-history OpenVZ project Containers live migration feature Containers Upstream Linux 1500+ kernel patches from us Kernel-level checkpoint-restore merge failed User-level checkpoint-restore ... 5. 5 Why in userspace? Kernel User-space Dump: - ptrace - /proc - netlink - syscalls Restore: - syscalls Process kmod C/R API 6. 6 Some history Project started almost 3 years ago an RFC on kernel memory API extension small command line tool minimal dump of process' internals First release v0.1 -- 23 Jul 2012 (x86 and basic stuff) Since then Kernel part completed a year ago (150+ kernel patches: new APIs for reading and setting process' state) 7. 7 Current project state The latest release v1.3rc1 supports x86_64 & ARM & AARM64 support features that typical apps use works on unmodified linux-3.11+ Included into Debian, Fedora, Ubuntu, Arch, SUSE, Gentoo, CoreOS... Explicitly checked Apache, nginx, Oracle*, mysql, mongodb ssh/sshd, openvpn, cron, sendmail Java, gcc, make VNC + { gimp, mplayer, blender, supertux } Screen + { bash, top, tcpdump, tar/bz2 } * some kernel tweaks required 8. 8 Some vitals - 55K lines of code - 150+ kernel patches - contribs from Google, Huawei, Samsung, Canonical 9. 9 Usage scenarios Live migration incl. Docker, LXC, OpenVZ containers Kernel upgrade w/o reboot Slow services startup Periodic snapshots (HPC) Advanced debugging and testing 10. 10 Live migration Host A Host B 11. 11 Live migration Host A Host B Shared FS Pre-migrate memory with memory tracker http://criu.org/P.Haul 12. 12 Load balancing on cluster Host A Host C Host B 13. 13 Power saving on cluster Host A Host C Host B 14. 14 Node maintenance Host A Host B 15. 15 Kernel upgrade w/o reboot Host Kernel A Kexec Kernel B 16. 16 Slow services startup time# service foo start Service readiness Spawn process Load config Top-up caches Initialize resource pools Ready T 100% 17. 17 Slow services startup time Tt < T Ready Spawn process 100% Service readiness # service foo restore 18. 18 Periodic snapshots time Memory tracker helps to keep images smaller 19. 19 HPC time Power failure 0% 20% 40% 60% 60% 20. 20 Advanced debugging Production Host Application in trouble Developer Host Debugger 21. 21 Advanced testing ... New test or new hardware ? 22. 22 More (funny) use cases Forgot to launch your program in screen Live-migrate it there Playing a game without the save button Snapshot it [Put your own use case here] http://criu.org/Usage_scenarios 23. 23 Recap Started as containers live-migration tool General tool to dump/restore apps state v1.2 + Linux-3.11+ can do the trick A lot of interesting technologies Memory tracker Migration of TCP connections Injecting your code into a running application Detecting kernel objects sharing etc. 24. 24 Resources http://criu.org main site, documentation http://git.criu.org git repo with tool sources http://plus.google.com/+CRIU page [email protected] mailing list Kir Kolyshkin that's me Thank you!