global deduplication for ceph

12
Global deduplication for Ceph Myoungwon Oh SW-Defined Storage Lab SK Telecom

Upload: others

Post on 18-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Global deduplication for Ceph

GlobaldeduplicationforCeph

Myoungwon Oh

SW-DefinedStorageLab

SKTelecom

Page 2: Global deduplication for Ceph

Agenda

1. WhydoweneedGlobaldedup ?

2. Ceph deduplicationdesign

3. Ceph extensibletier(implementation)

4. Upstream

5. Plan&issues

Page 3: Global deduplication for Ceph

5G

FlashdeviceHighPerformance,LowLatency,SLA

UHD4K

Scalable, Available, Reliable, Unified Interface, Open Platform

High Performance, Low Latency

All-flashCeph !

Contribution : QoS, Deduplication, etc.

Storage Solution for- Private Cloud for Developers- Virtual Desktop Infra

SKwithsoftware-definedstorage

Page 4: Global deduplication for Ceph

Whydoweneedglobaldedup?

A B C D

Data A Data A

A B C D

Data A Data A

1)Localdeduplication 2)Global deduplication

4 OSD 8 OSD 12 OSD 16 OSD

Local Dedup 15.5% 8.1% 5.5% 4.1%

Global Dedup 50% 50% 50% 50%

B)FIOworkloadwithdeduplicationratioof50%(32KBblocksize)

A)Designcomparison

• Upto40%oftotalstoragespacecanbesavedviadeduplication(inourprivatecloud)• Localdedup (inablockdevicelevel)cannotcoverwholedatareductionintermsof

Cluster-wide

Page 5: Global deduplication for Ceph

Designchallenges

• Whichimplementationisthemostappropriateforshared-nothingscale-outstorage?§ Applicabletoexistingsource

§ Transparenttotheapplication

§ Efficientmetadatamanagement

• Howtomanagededup metadata?

• Whatisthemostappropriatededup method(e.g.,inlineorpost)?§ Performance

§ I/Ocost

Page 6: Global deduplication for Ceph

Design1:Doubledistributionhash

• DoweneedanewMDS(metadataserver)fordedup?§ Shared-nothingfilesystemisscalablebecausethereisnoMDS.

§ MDSdoesnotfittheShared-nothingdesign

§ MDSneedsadditionalI/Os tocompleteI/Orequests

§ E.g.,metadataquerytoMDS

§ HowtodorebalanceifweaddMDS?

§ SynchronizationbetweenMDSs

Page 7: Global deduplication for Ceph

Design1:Doubledistributionhash

• Canweimplementdedup withoutMDS?

§ Yes,CAS(content-addressablestorage)poolwithdoubledistributionhash!§ (OID,Data)– chunkingandfingerprinting->(OID,Offests[],FPs[])- FPisnewOID->(FP,Data)

§ Pros

§ NocentralMDS

§ Applicableto existingsource withoutmajormodifications.

§ Transparenttotheapplication

§ Efficientmetadatamanagement

§ Reusingexistingarchitecture

§ DH,recovery,rebalance,dataplacement

§ Cons

§ I/Oredirection(needatranslationlayer)

Page 8: Global deduplication for Ceph

Design2:Self-containedobjectfordeduplication

• Externalmetadatastructureneedsadditionalcomplexlinkingbetween

deduplicationmetadataandexistingscale-outstoragesystem

• Self-containedobjectcanbeanswer

§ Dedup metadataisincludedintheoriginalobject

MetadataPool

Chunk Pool

DataClient

BaseTier

Dedup Tier

ObjectfooObjectSize=4MBChunkSize=1MB

Objects(Chunkedobject)OID:fxc039 ,Referencecount:1OID:Dxc045,Referencecount:2OID:fZc0y9,Referencecount:4

foo-object(hasmanifest){0– 32K,fxc03932– 64K,Dxc04564– 128K,fZc0y9}

Post-processing

1. FinddirtymetadataobjectwhichcontainsdirtychunksfromthedirtyobjectIDlist.

2. FindthedirtychunkIDfromthedirtymetadataobject'schunkmap.

3. Thededuplicationenginegenerateachunkobjectandsendittothechunkpool.

4. Inthechunkpool,thechunkobjectgeneratedinstep3isplaced

5. Addreferencecountinformationtotheobject.

6. Whenthechunkwriteatthechunkpoolends,updatethemetadataobject'schunkmap.

Asinglechunk

Page 9: Global deduplication for Ceph

Implementation:Extensibletier

• Thekeystructureforextensibletierstruct object_manifest_t {

enum {TYPE_NONE=0,TYPE_REDIRECT=1,TYPE_CHUNKED=2,TYPE_DEDUP=3,

};uint8_ttype;//redirect,chunked,...ghobject_t redirect_target;map<uint64_t(offset),chunk_info_t>

chunk_map;};

• Operations§ Proxyread,write§ Flush,promote

object_info_t

obj

Page 10: Global deduplication for Ceph

Implementation:writepath

RBDClient

1. Get dirty chunklist2. while(Chunklist.dirty_chunks())

{ if (has_old_reference)

decrement old chunk’s reference objecter->write(Chunklist[i])i++;

}3. Receive all of the Ack and update chunk’ state (dirty à clean)

Write(foo, offset, size)

if check eviction limit (all chunks are dirty)Write request is blocked until dirty object is flushed

elseHandle a write request

Base tier (post-processing)

Write chunk dataIncrement reference count

Lower Tier

If chunk objectset_chunk (source, target)

else if redirect objectset_redirect(source, target)

Handle set_chunk or set_redirect

Base tier (set-redirect or set-chunk)

1

2

3

1. Chunking and write the object2. Update chunk_map (clean à dirty)

HandleawriterequestEvictionlimit(chunkcase)

44

Page 11: Global deduplication for Ceph

Upstream

• Proposal§ http://marc.info/?l=ceph-devel&m=148172886923985&w=2

• Design§ http://marc.info/?l=ceph-devel&m=148646542200947&w=2§ Pad document (with Sage Weil)

• http://pad.ceph.com/p/deduplication_how_dedup_manifists• http://pad.ceph.com/p/deduplication_how_do_we_store_chunk• http://pad.ceph.com/p/deduplication_how_do_we_chunk• http://pad.ceph.com/p/deduplication_how_to_drive_dedup_process

• Progress§ osd,librados: add manifest, redirect https://github.com/ceph/ceph/pull/14894§ osd,librados: add manifest, operations for chunked object

https://github.com/ceph/ceph/pull/15482§ osd: flush operations for chunked objects https://github.com/ceph/ceph/pull/19294§ osd, librados: add a rados op (TIER_PROMOTE) https://github.com/ceph/ceph/pull/19362§ WIP: osd: refcount for manifest object (redirect, chunked)

https://github.com/ceph/ceph/pull/19935

Page 12: Global deduplication for Ceph

Plan&Issues

• Plan (To Do)§ Reference counting methods and data types for redirect and chunk§ Offline fingerprinting and then storing of dedup chunked manifest (whole object or parts of it)§ Dedup processing

§ Background dedup worker§ Refcount manager and methods for dedup (http://pad.ceph.com/p/deduplication_how_do_we_store_chunk)

§ Fixed-sized backpointers§ Scrub

§ Test cases

§ Issues§ Small chunk (< 64 KB)§ Minimizing performance degradation§ Dedup methods (inline?)§ CDC (contents defined chunking)