distributed object storage system ceph in practice · distributed object storage system ceph in...
TRANSCRIPT
![Page 1: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/1.jpg)
Distributed Object Storage System Ceph in Practice
Dominik Joe [email protected]
Trustica
8.10.2016
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 1 / 32
![Page 2: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/2.jpg)
Legal notice.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 2 / 32
![Page 3: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/3.jpg)
Distributed Object Storage System Ceph in Practice
Dominik Joe [email protected]
Trustica
8.10.2016
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 3 / 32
![Page 4: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/4.jpg)
Why?
� Why daily operations?� We need convenient deployment of VMs: private IaaS cloud.� Why cloud?� It’s convenient.� Why private cloud?� Negotiations with cloud service providers are tough.� Why Ceph?� Everything else failed miserably.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 4 / 32
![Page 5: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/5.jpg)
What is Ceph?
� Distributed,� highly scalable,� open source,� object storage...
� block storage,� file system� and other applications.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 5 / 32
A parallel universe, where, thanks to masterfully written Ceph, peace and prosperity prevail.
![Page 6: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/6.jpg)
Ceph Architecture
� Daemons:� OSD - Object Data storage� Monitor� Metadata Server – MDS
� Nodes:� OSD, Monitor, MDS,� client,� admin.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 6 / 32
![Page 7: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/7.jpg)
Ceph OSD
� Object Storage Device� Disk or at least part of it.� The Ceph OSD daemon.� Multiple OSDs in one physical node,� managed in pools by monitors.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 7 / 32
![Page 8: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/8.jpg)
Ceph Monitor
� Cluster state – cluster map,� consisting of maps:
� monitor map,� OSD map,� placement group map,� CRUSH map,� MDS map.
� Only 1 monitor and 2 OSDs are needed for bare minimum cluster.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 8 / 32
![Page 9: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/9.jpg)
Ceph (OSD) Hierarchy
Cluster:� Node
� OSD� Monitor� MDS� Client
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 9 / 32
![Page 10: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/10.jpg)
Objects placement
� No files or directories.� Object storage stores only objects.� Behaves like key:value store.� The value can be really big.� Find balance:
� across all OSDs� and across nodes.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 10 / 32
![Page 11: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/11.jpg)
Placement groups
� Set of OSDs within a pool� which can store an object.� Size based upon pool number of replicas.� Vast number of objects – compared to number of PGs.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 11 / 32
![Page 12: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/12.jpg)
CRUSH
� Controlled Replication Under Scalable Hashing.� Each client computes which PGs to use.� Describes cluster hierarchy as a weighted tree.� Selects sets of disks based on deterministic criteria.� Does not need any central authority.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 12 / 32
![Page 13: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/13.jpg)
RADOS
� Set of protocols governing CRUSH,� RBD and� metadata.� Reliable Autonomic Distributed Object Store.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 13 / 32
![Page 14: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/14.jpg)
Monitors
� Cluster management,� voting,� osd handling,� yes, everything.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 14 / 32
![Page 15: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/15.jpg)
Ceph File System
� Built on top of the underlying objects,� POSIX file system with advanced features,� directory size is reported immediately,� virtually no limits on file sizes,� data/metadata separate redundancy settings:
� different pools.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 15 / 32
![Page 16: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/16.jpg)
Metadata Server
� Separate pool for metadata,� MDS serves objects metadata,� reasonable performance,� required for the actual FS implementation.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 16 / 32
![Page 17: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/17.jpg)
In the cloud
� Amazon Simple Storage Service (S3)?� Openstack?� Swift.� Ceph backend.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 17 / 32
![Page 18: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/18.jpg)
RBD for disks
� Block device,� from multiple objects,� naming conventions – prefix,� provides caching,� true concurrent access.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 18 / 32
![Page 19: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/19.jpg)
Ceph FS for shared data
� ISO images,� configurations,� shared data.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 19 / 32
![Page 20: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/20.jpg)
Ceph FS release
� Stable in 10 series,� available as FUSE.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 20 / 32
![Page 21: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/21.jpg)
Kernel support
� Both Ceph RBD and File Systems� Support merged into 2.6.34 (released May 16, 2010)� Convenient usage� Ceph FS:
mount -t ceph mon1,mon2,mon3:/ /mnt/ceph� RBD:
rbd map mypool/mydevicemkfs -t ext4 /dev/rbd1
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 21 / 32
![Page 22: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/22.jpg)
But...
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 22 / 32
![Page 23: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/23.jpg)
RBD thin provisioning
� Provision 20PB,� wait 8 days before de-provisioning finishes� ...� profit???
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 23 / 32
![Page 24: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/24.jpg)
RBD locking
� $ rbd map� On two nodes,� on one mkfs on other mount,� everything works,� another time on single node...� D indefinitely.� $ reboot
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 24 / 32
![Page 25: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/25.jpg)
Ceph-FS locking
� Processes in D state any time,� no way to unmount,� $ reboot
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 25 / 32
![Page 26: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/26.jpg)
Log clogging
� libceph ...� Does not prevent the locking bugs.� Does not even log them ...� $ reboot
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 26 / 32
![Page 27: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/27.jpg)
Userspace RBD
� RBD as storage backend for VM� in KVM – librbd in userspace (qemu-kvm)
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 27 / 32
![Page 28: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/28.jpg)
Userspace Ceph filesystem
� In 10.x ceph-fuse implementation.� Stable.� Decent performance.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 28 / 32
![Page 29: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/29.jpg)
RBD cache
� Performance is an issue,� commits are a problem,� always use backup battery for the controllers.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 29 / 32
![Page 30: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/30.jpg)
References
� Ceph, http://ceph.com� Bugemos — Jojin&HedgeHog: The Chronicles of KOS - Alternativnı prıtomnost,
2006, available online at http://www.bugemos.com/?q=node/357� CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data, Sage
A. Weil, Scott A. Brandt, Ethan L. Miller and Carlos Maltzahn, Storage SystemsResearch Center University of California, Santa Cruz, SC2006 November 2006,Tampa, Florida, USA 0-7695-2700-0/06, available online athttp://ceph.com/papers/weil-crush-sc06.pdf
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 30 / 32
![Page 31: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/31.jpg)
Questions
... and answers.
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 31 / 32
![Page 32: Distributed Object Storage System Ceph in Practice · Distributed Object Storage System Ceph in Practice Dominik Joe Pant˚uˇcek dominik.pantucek@trustica.cz Trustica 8.10.2016 Dominik](https://reader030.vdocuments.mx/reader030/viewer/2022040523/5e859d0a87a99119e92c17ae/html5/thumbnails/32.jpg)
Thank you!
Dominik Joe Pantucek Trustica Practical Ceph 8.10.2016 32 / 32