redesigning protection storage for modern workloads · 2019. 9. 16. · sfp z. file metadata (fmd)...
TRANSCRIPT
Redesigning Protection Storage for Modern Workloads
Can’t we all get along?
Yamini AlluFred Douglis*Mahesh KamatRamya PrabhakarPhilip ShilaneRahul Ugale
* Perspecta Labs
DataDomainFileSystem• Purpose-built backup appliance
– Designed to identify duplicate regions of files and replace with references
– Designed around backup workloads which is typically data written and read sequentially.
• De-duplication– Content-defined chunks, fingerprinted with
secure hash– Generally claim 10-40x data reduction
• Avoiding disk bottlenecks in Data Domain[Zhu 08]- SISL: Stream informed segment layout- Locality preserved caching for segments
NFS, CIFS, VTL, DDBOOST
Partition data into chunks
Fingerprint chunks uniquely
Filter duplicates
Locally compress
Store to disk
De-duplicationstoragepipeline
TraditionalBackup/RestoreWorkloads
SequentialWorkload
FewLargeFiles
OfflineBackupImageTransferforRecovery
BackupDataFormat
ThroughputOriented
WeeklyFull,Daily
Incremental
ModernBackup/RestoreWorkloads
NonSequentialI/O(NSIO)
+Sequential
ManySmallFiles
NativeDataFormat
ThroughputandLatencySensitive
ShiftinDataProtectionWorkloads
InstantDataAccess/Recovery
IncrementalForever/Virtua
lfull
ChallengewastoenhanceourfilesystemstacktosupportBOTHtraditionalandmodernworkloads
HDDFPI Containers
0 1…. nFingerprint
Index
ABC
DE…
………
……Z
ALfp 0Mapsfingerprintofdatachunksto
containers
BLfp 0DLfp 1
……
File represented as Merkle Tree
Afp Bfp Cfp
VM1
Xfp Yfp Zfp
Primary Storage/Backup Applications
Access Protocols
Directory ManagerVM-
1
User2
/
User1VM-
2VM-
3
TraditionalWorkloads:• LargefilesamortizeDirectoryManagerlookupcost
• FileMetadata(FMD)canbepre-fetchedintomemoryforsequentialI/O
• Gooddatalocality–OnlyoneFPIlookupfor1MB
• Gooddatalocality– fewerI/Os tofetchdatachunksnumberofdatacanbePre-fetchedtomemory
ModernWorkloads:
• IncreasedDirectoryManagerlookupsfornativeformatbackups(smallfiles)
• FMDpre-fetchintomemoryisnotefficientforNSIO
• Increasedindexlookupsduetorandomlocality(onceeveryI/O)
• RandomI/OtoHDDforeveryrequest
IOProfileforTraditionalvs Modern
p1 p2
1MBSequentialI/O
I/OforDMlookup
I/OforFMDchunkload
I/OforFPI
I/OforChunkAccess
0.001
2
1+1
TotalnumberofHDDI/Os =~8I/Os per1MB
Example
1
HDDFPI Containers
0 1…. nFingerprint
Index
ABC
DE…
………
……Z
ALfp 0Mapsfingerprintofdatachunksto
containers
BLfp 0DLfp 1
……
File represented as Merkle Tree
Afp Bfp Cfp
VM1
Xfp Yfp Zfp
Primary Storage/Backup Applications
Access Protocols
Directory ManagerVM-
1
User2
/
User1VM-
2VM-
3
p1 p2
I/Ofordatachunkload
2
1+4
7
8KBNSIO
I/OforDMlookup
I/OforFMDchunkload
I/OforFPI
I/OforChunkAccess
0.001
2
1+1
TotalnumberofHDDI/Os =~8I/Os per8KB~1024per1MB
Example
4
HDDFPI Containers
0 1…. nFingerprint
Index
ABC
DE…
………
……Z
ALfp 0Mapsfingerprintofdatachunksto
containers
BLfp 0DLfp 1
……
File represented as Merkle Tree
Afp Bfp Cfp
VM1
Xfp Yfp Zfp
Primary Storage/Backup Applications
Access Protocols
Directory ManagerVM-
1
User2
/
User1VM-
2VM-
3
p1 p2I/Ofordatachunkload
2
1+1
4
FingerprintIndex(FPI)Cache• Twolevelindexrequires2HDDI/Os perchunk
read• Forsequentialrestores,indexI/Os are25%of
totalI/Os,forNSIO,itis50%• Indexmetadatais1.5%oftotalphysicalspace• Cachingshortfingerprints(4bytes)inSSD
reducesSSDspacerequirementto0.4%,withacollisionrate<0.01%.
• CollisionsresolvedbycomparingfullFPondisk• FPIcacheimprovessequentialrestore
performancebyupto32%onI/Oboundconfigurations
FPI on SSD ALfp 0
BLfp 0CLfp 1DLfp 1
…
FPI on HDD
0
500
1000
1500
2000
2500
3000
3500
96StreamRead
IOPS
DiskOnly
Disk+SSDforFingerprintcache
1MBSequentialI/O
SSDI/OforDMlookup
Transla
tes
HDDI/OforFMDlookup
SSDI/OforFPIlookup
HDDI/OforDataAccess
0.001
1
1+4
TotalnumberofHDDI/Os = ~6I/Os per1MB
AdditionofSSDCacheforPBBA(densedrivesupport)
1
HDDFPI Containers
0 1…. nFingerprint
Index
ABC
DE…
………
……Z
ALfp 0Mapsfingerprintofdatachunksto
containers
BLfp 0DLfp 1
……
File represented as Merkle Tree
Afp Bfp Cfp
VM1
Xfp Yfp Zfp
Primary Storage/Backup Applications
Access Protocols
Directory ManagerVM-
1
User2/
User1VM-
2VM-
3
p1 p2
ASfp DM FMD ABCDE…
……………Z
ASfp
BSfp
FileMetadata(FMD)Cache• IOsforfilemetadataaccountto50%oftotalI/Os for
NSIO,FMDiscachedonlyonNSIOaccesses• Filemetadataisvariablesized(Average16kb)
– Accountsto0.05%oflogicalbackupsize• 10%ofavailableSSDisreservedasFMDCacheto
accountforhighmetadatachurnduringNSIO• FMDispackedinto1MBWriteevictionunitstoSSD
cache• Itisawritethroughcache,populatedonFMD
updatesandreadmisses• EvictionisLRUbased,thiskeepsmostrelevantdata
incache JustFMDCachereducesHDDI/Os requiredfor8KBNSIOfrom8I/Os to4I/Os
DataCacheonSSD• Cachechunksofdatathatarerandomlyaccessedwithinafile• Dataiscachedonareadmissandonwrite• Chunksaregatheredandwrittentocacheas1MBEvictionunit.Eviction
algorithmisLRU• Largerchunksareread-aheadandcacheduntilthecacheissufficiently
warm• 40%ofSSDcacheisallocatedforData,sufficienttoinstantlyaccessupto
32VMDKsandsupportup to50kIOPsonourlargersystems• DataisnotcompressedonSSDtoreduceCPUutilization
• DataCachereducesNSIOI/Os toHDDto0
8KBNSIO
SSDI/OforDMlookup
Transla
tes
SSDI/OforFMDlookup
SSDI/OforIndexlookup
SSDI/OforDataAccess
0.001
1
1
TotalnumberofHDDI/Os = 0*I/Os
AdditionofSSDCacheforNSIO
1
HDDFPI Containers
0 1…. nFingerprint
Index
ABC
DE…
………
……Z
ALfp 0Mapsfingerprintofdatachunksto
containers
BLfp 0DLfp 1
……
File represented as Merkle Tree
Afp Bfp Cfp
VM1
Xfp Yfp Zfp
Primary Storage/Backup Applications
Access Protocols
Directory ManagerVM-
1
User2/
User1 VM-
2VM-
3
p1 p2
ASfp DM FMD ABCDE…
……………Z
ASfp
BSfp
*Assumes100%cachehit,typicalread/writeworkloadsees>95%cachehits
Cachewarmup
File-systemModificationsForNSIOSupport
Accesspatterndetection• DetectSequential,NSIOMonotonicandNSIO
patterns• Detectmultipleaccesspatternsacrossdifferent
regionsofafile• BasedonasufficienthistoryofpastI/Os withina
regionofafile
Disablesimpleprefetch forNSIO EnableparallelFMDandFPIlookupsforNSIO
Fixedsizechunkingforimagebackups
DisablecontainermetadatareadforNSIO Disablededup forsmallNSIO DelayedFMDupdatesforNSIO
0MB-1MB1MB-2MB2MB-3MB
2GB-2.1GB2.8GB-3GB3.1GB-3.2GB
4.2GB-4.3GB4.0GB-4.1GB4.9GB-5.0GB
Span: 0GB-2GB2GB-4GB 4GB-6GBlabel:Sequential NSIO-Monotonic NSIO
Accesshistoryperregion
QoS ForMixedWorkloads• Tunablesharesfornonsequentialworkloads,
defaultis20%• CPUschedulingbasedonleastloadedCPU,
previouslyroundrobin• Higherpriorityforrandomreadscomparedto
writes• Edgethrottlingbasedonfeedbackfrom
differentmodulesandsubsystemhealth• IncreasingQoS shareforNSIOworkloads
increasedNSIOperformanceinourmixedworkloadexperiments
PBBAShares80
NSIOShares20
Root
External(50)
Internal(50)
Backup(25)
Restore(30)
Repl(25)
NSIO(20)
MixedWorkloadPerformanceNSIOperformancewascappedat10KIOPs
QOSthrottleforNSIOsetat10%,Backup/Restoreimpactedbyat-most10%
*DDBOOST– BandwidthOptimizedOpenStorageProtocol 0
2000
4000
6000
8000
10000
12000
14000
NFSWrite DDBOOSTWrite
DDBOOSTRead
NFSRead/Write
DDBOOSTRead/Write
Throug
hputinMB/s
BackupWorkload
BackupandNSIOworkloadwith100%read
BackupandNSIOworkloadwith70%read
NSIOPerformanceEvaluation
0
10000
20000
30000
40000
50000
60000
No.ofVMs 1 8 16 24 32
561 1932 3276 4312 5157
IOPS
WithoutFlashCachewithsoftwareoptimizations
NSIOPerformanceEvaluation
0
10000
20000
30000
40000
50000
60000
No.ofVMs 1 8 16 24 32
0 5611932
3276 4312 5157
0 1389
9896
17735
24018
29427
IOPS
WithoutFlashCachewithsoftwareoptimizations
WithmetadatainFlashCachewithsoftwareoptimizations
NSIOPerformanceEvaluation
0
10000
20000
30000
40000
50000
60000
No.ofVMs 1 8 16 24 32
0 5611932
3276 4312 5157
01389
9896
17735
24018
29427
9965
48159
56134 57658
50213
IOPS
WithoutFlashCachewithsoftwareoptimizations
WithmetadatainFlashCachewithsoftwareoptimizations
Withdata&metadatainFlashCachewithsoftwareoptimizations
NSIOPerformanceEvaluation
0
10000
20000
30000
40000
50000
60000
No.ofVMs 1 8 16 24 32
0 5611932
3276 4312 5157
01389
9896
17735
24018
29427
9965
48159
5613457658
50213
1610
9488
1558819258 18529
IOPS
WithoutFlashCachewithsoftwareoptimizations
WithmetadatainFlashCachewithsoftwareoptimizations
Withdata&metadatainFlashCachewithsoftwareoptimizations
Withdata&metadatainFlashCachewithoutsofwareoptimizations
Conclusion/FutureWorkImprovementstooursoftwareandtheadditionofSSDcachesallowDataDomaintosupportbothnewandtraditionalworkloads• WithaNSIOworkload,withSSDsforcachingmetadata,wemeasureda5.7xIOPSimprovement
relativetoasystemwithoutSSDs• Addingdatacacheimprovedperformancebyfurther1.7x• CombiningSSDcachingwithsoftwareoptimizationsthroughoutoursystem,addedanadditional
2.7xIOPsincreaseforNSIOworkloads.• Performancefortraditionalworkloadsdidnotseeanydegradation
Futurework:AdditionalSSDsandIOPsbasedonuse-caseQOS/prioritieswithinrandomworkloadsOptimizationstoimproveCPUanddiskutilizationaswesupporthigherIOPs
THANKYOU