alleviating garbage collection interference through spatial … · 2019. 10. 28. · through...
TRANSCRIPT
2019. 10. 23.
1
AlleviatingGarbageCollectionInterferencethroughSpatialSeparationinAllFlashArrays
Sam H. Noh노삼혁/盧三赫
UNIST(Ulsan National Institute of Science & Technology)
§ Trendofthetimes
§ SWAN
§ SummaryandConclusion
Outline
2019. 10. 23.
2
호랑이담배피던시절에…
3
호랑이담배피던시절에…
4
2019. 10. 23.
3
호랑이담배피던시절에…
5
호랑이담배피던시절에…
6
2019. 10. 23.
4
호랑이담배피던시절에…
7
Today’sNewGenerationofStorageDevices
§ Newgenerationofstorage• UltraLowLatency(ULL)drives
− NVMe
8
2019. 10. 23.
5
Today’s NewGenerationofStorageDevices
§ Newgenerationofstorage• DIMMslottedstorage
CourtesyofNVSL,UCSDarXiv:1903.05714v29
10
2019. 10. 23.
6
§ RAID• IncreaseI/Obandwidth
§ BufferCaching• Improvelatency
§ Swapping• Improveresourcesharing
§ ETC
PASTstoragetopicsofinterest?
11
§ RAID• IncreaseI/Obandwidth
§ BufferCaching• Improvelatency
§ Swapping• Improveresourcesharing
§ ETC
PASTstoragetopicsofinterest?
12
2019. 10. 23.
7
§ Trendofthetimes
§ SWAN
§ SummaryandConclusion
Outline
SWAN
It’s the network, stupid!
2019. 10. 23.
8
AlleviatingGarbageCollectionInterferencethroughSpatialSeparationinAllFlashArrays
HotStorage ‘17 & ATC ’19
김재호(UNIST-VT-Huawei)김병석(UNIST), 임광현 (Cornell)
정영돈 (DGIST), 이성진 (DGIST), 민창우 (VT)
§ BackgroundandObservations
§ DesignofSWAN
§ Evaluation
SWAN
2019. 10. 23.
9
§ AllFlashArray(AFA)• Storageinfrastructurethatcontainsonlyflashmemorydrives
− Solid-StateArray(SSA)
AllFlashArray
From:https://images.google.com/https://www.purestorage.com/resources/glossary/all-flash-array.html 17
ArchitectureofAll-FlashArray
18
2019. 10. 23.
10
ArchitectureofAll-FlashArray
19
SSDProductsforDataCenter
Manufacturer ProductName SequentialRead/Write(uptoGB/s)
Random4KBRead/Write(uptoIOPS)
Interface
Intel
P3700 2.1/1 470K/65K PCIe 3*4
P3520 1.7/1.3 370K/26K PCIe 3*4
P3608 5/3 850K/150K PCIe 3*8
S3710 0.5/0.5 85K/45K SATA6Gb/s
Samsung
PM1725a 6.4/3 1M/170K PCIe 3 *8
PM963 2/1.2 430K/40K PCIe 3*4
PM1633a 1.2/0.9 190K /31K SAS3.0
SM863 0.5 /0.5 97K/30K SATA6Gb/s
Intel:https://www-ssl.intel.com/content/www/us/en/solid-state-drives/data-center-family.htmlSamsung:http://www.samsung.com/semiconductor/products/flash-storage/enterprise-ssd/
20
2019. 10. 23.
11
BandwidthTrends
Interfaces:https://en.wikipedia.org/wiki/List_of_interface_bit_rates#Local_area_networksSATA:https://en.wikipedia.org/wiki/Serial_ATAPCIe:https://en.wikipedia.org/wiki/PCI_Express
01020304050607080
2000
2002
2004
2005
2007
2009
2010
2011
2012
2013
2014
2015
2017
2017
2018
2019
Band
width(G
B/s)
StorageInterface
NetworkInterface
Year
10GbE40GbE
Infiniband
Infiniband
100GbE
200GbE
400GbE
Infiniband
SATA2 SATA3
SAS-2
PCI-3
SAS-3SATAExp. SAS-4
PCIe4
PCIe5
2016
Storage is no longer the bottleneck!
21
ComparisonofAll-flashArray
SolidFire(NetApp) EMC PureStorage Nimble
Model SF19210 6X-Brick M70 AF9000
Capacity 20TB(10SSDs)
240TB(150SSDs)
136TB 500TB
Performance(Random I/O)
100K 7GB(900K IOPS*8KB)
9GB(300KIOPS* 32KB)
350K
Network 20Gb(iSCSI10Gb *2port)
240Gb(iSCSI10Gb*24port)
40Gb(iSCSI10Gb* 4port)
40Gb(iSCSI10Gb*4port)
Bottleneck Network Storage Network Network
EMC:https://www.emc.com/collateral/data-sheet/h12451-xtremio-4-system-specifications-ss.pdfPureStorage:https://www.purestorage.com/content/dam/purestorage/pdf/datasheets/ps_ds5p_flasharraym_04.pdfSolidFire:http://info.solidfire.com/rs/solidfire/images/SolidFire_ProductDatasheet.pdfNimblestorage:https://www.nimblestorage.com/technology-products/all-flash-array-specifications/
23
2019. 10. 23.
12
ComparisonofAll-flashArray
SolidFire(NetApp) EMC PureStorage Nimble
Model SF19210 6X-Brick M70 AF9000
Capacity 20TB(10SSDs)
240TB(150SSDs)
136TB 500TB
Performance(Random I/O)
100K 7GB(900K IOPS*8KB)
9GB(300KIOPS* 32KB)
350K
Network 20Gb(iSCSI10Gb *2port)
240Gb(iSCSI10Gb*24port)
40Gb(iSCSI10Gb* 4port)
40Gb(iSCSI10Gb*4port)
Bottleneck Network Storage Network Network
EMC:https://www.emc.com/collateral/data-sheet/h12451-xtremio-4-system-specifications-ss.pdfPureStorage:https://www.purestorage.com/content/dam/purestorage/pdf/datasheets/ps_ds5p_flasharraym_04.pdfSolidFire:http://info.solidfire.com/rs/solidfire/images/SolidFire_ProductDatasheet.pdfNimblestorage:https://www.nimblestorage.com/technology-products/all-flash-array-specifications/
DothesemanySSDsreallyhelp?AfewSSDseasilysaturates
networkthroughput!
24
RAID:TraditionalUseofMultipleDisks
SSD SSD SSDSSD
Randomwrites
GC
• Previoussolutions1)Harmonia[MSST’11]2)HPDA[TOS’12]3)GC-Steering[IPDPS’18]
• TraditionalRAIDemploysin-placeupdatestoservewriterequests
• HighGCoverheadinsideSSDduetorandomwrite fromthehost
Randomwrites
RAID4/5In-placewrite
OS
APP
AFA
Limitations
25
2019. 10. 23.
13
Experimentswith4SSDRAID0
RAID0with4NVMeSSDs(spec.read:2.4GB/s,write:1.2GB/s)(Measuredread:2.0GB/s,write1.0GB/s)
Idealperformance
Sequentialwritewith128KBI/Osize 26
Experimentswith4SSDRAID0
RAID0with4NVMeSSDs(spec.read:2.4GB/s,write:1.2GB/s)(Measuredread:2.0GB/s,write1.0GB/s)
Idealperformance
Sequentialwritewith128KBI/Osize
Inconsistentperformance
27
2019. 10. 23.
14
Experimentswith4SSDRAID0
RAID0with4NVMeSSDs(spec.read:2.4GB/s,write:1.2GB/s)(Measuredread:2.0GB/s,write1.0GB/s)
Idealperformance
Sequentialwritewith128KBI/Osize
Inconsistentperformance
10GbE(1.25GB/s)Doesnotevensaturatenetworkbandwidth
Sequentialwritewith128KBI/Osize 28
Log-(based)RAID:SSDAdaptedApproach
SSD SSD SSDSSD
Randomwrites
Sequentialwrites
Log-RAIDLog-structuredwrite
OS
APP
AFA
GC
• Log-basedRAIDemployslog-structuredwritestoreduceGCoverheadinsideSSD
• Log-structuredwritesinvolvehost-levelGC,whichreliesonidletime
• Ifnoidletime,GCwillcauseperformancedrop
• Previoussolutions1) SOFA[SYSTOR’14]2) SRC[Middleware’15]3) SALSA[MASCOTS’18]
Limitations
29
2019. 10. 23.
15
PerformanceofLog-basedRAID
GCstartshere
InterferencebetweenGCI/OanduserI/O
§ Configuration• 8SSDs(roughly1TB)
§ Workload• Randomwriterequestsfor2hours
30
Observations
▪ Inconsistentperformance
▪ duetogarbagecollection
▪ Performancewall
▪ networkbandwidth NOTstorage
31
2019. 10. 23.
16
Observations
▪ Inconsistentperformance
▪ duetogarbagecollection
▪ Performancewall
▪ networkbandwidth NOTstorage
Getridofgarbagecollection!
Asbestthatnetworkcansupport!
32
Ourgoal
Sustained,consistentfullnetworkbandwidthperformance!
33
2019. 10. 23.
17
§ BackgroundandObservations
§ DesignofSWAN
§ Evaluation
SWAN
§ Oursystem• SWAN (SpatialseparationWithinanArrayofSSDsonaNetwork)
§ Goals• ProvidesustainablehighperformanceforAFA
− AlleviatingGCinterferenceatbothSSD-levelandAFA-level
§ Approach• SpatialseparationofapplicationI/OandAFAI/O• MinimizeGCinterferencebyorganizingSSDsintotwo-dimensionalarray
DesignofSWAN
35
2019. 10. 23.
18
ComparisonofRAIDschemes writereq. readreq.
SSD
RAID
SSD SSD SSD SSD SSDTraditionalRAID
Log-RAIDLog-structuredwritingonRAID SSD SSD SSD SSD SSD SSD
SSD
SWAN
SSD
SSD
SSD
SSD
SSD
R-group0(Front-end)
R-group1(Back-end)
R-group2(Back-end)
SWAN- Twodimensionalarray- Log-structuredwritingperR-group- Front-end serverswriterequests- Back-end isusedforAFA-levelGC
36
§ KeyoperationsofSWAN
HowSWANWorks
SSD
SSD
SSD
SSD
SSD
SSD
R-group0(Front-end)
R-group1(Back-end)
R-group2(Back-end)
writereq.
readreq.
AFA-levelGC(includingTRIM)
GC TRIM
Writerequest(Log-structuredplacement)
SWAN
GCinterferencefree throughspatialseparation ofapplicationI/Oand
AFA-levelGCI/Oto
R-groupsizedeterminedbynetworkbandwidth
GCinterferencefree throughspatialseparation ofapplicationI/Oand
AFA-levelGCI/Oto38
2019. 10. 23.
19
§ Operations• SWANappendswritereq.tothelogandissueswritereq.tothefront-end• Readreq.willbeservedbyanyR-groupholdingtherequestedblocks
HandlingRead/WriteReq.inSWAN
w1
R-group 0
w3
r12
R-group 1 R-group 2
r27
w1
w3
r12
r27
... ...
... ...
Logical
Volume
segment
w3
w1
Physical
Volume
SSD
Array
<w1, r
12, w
3, r
27>
Logging
Block
Interface
w:writereq.forblocknwn
:readreq.forblocknrn
Conf.- R-group0:Front-end- R-group1,2:Back-end- Read/writereq.arrivesviablock
interface
40
ExampleofHandlingI/OinSWAN
… W1 W3 R7 R8 …
BlockI/OInterface W1 R7 W3 R8
… W1 W3 …
LogicalVolume
PhysicalVolume Logging
W1
W3
Parity
Segment
R7
R8
SSD
SSD
SSD
Writereq.
Readreq.
W RSSD
R7
likeRAIDparallelism
R8W1 W3
W1 W3
Front-end Back-end Back-end
41
2019. 10. 23.
20
ProcedureofI/OHandling:Phase1
SSD
SSD
SSD
SSD
SSD
SSD
Front-end Back-end Back-end
Segment
W
W
P
WriteReq.
Appendonly
• Front-end absorbsallwriterequestsinappend-only manner• ExploitsfullperformanceofSSDs
parallelismunit
Parity
PW
Write
42
SSD
SSD
SSD
SSD
SSD
SSD
Front-end Back-end Back-end
Segment
W
W
P
WriteReq.
• Whenthefront-endbecomesfull• Emptyback-endbecomes newfront-end toservewriterequests• Oldfullfront-endbecomesback-end• Again,newfront-endserveswriterequests
Parity
PW
Write
Front-endBack-end
Segment
Segment
Front-endbecomesfull
ProcedureofI/OHandling:Phase2
43
2019. 10. 23.
21
SSD
SSD
SSD
SSD
SSD
SSD
Back-end Back-end Front-end
Segment
W
W
P
WriteReq.
• Whenthereisnomoreemptyback-end• SWAN’sGC istriggered tomakefreespace• SWANchooses avictimsegmentfromoneoftheback-ends• SWANwritesvalidblockswithinthechosenback-end• Finally,thevictimsegmentistrimmed
Parity
PW
Write
Segment
Segment
Segment
Segment
Segment
SWANGC
TRIMmed
EnsuresegmentsarewrittensequentiallyinsideSSDs
GC TRIM
W
W
P
AllwriterequestsandGCarespatiallyseparated
ProcedureofI/OHandling:Phase3
44
FeasibilityAnalysisofSWAN
Front-end Back-end Back-end
SSD
SSD
SSD
SSD
SSD
SSDHowmanySSDsinfront-end?
Howmanyback-endsinAFA?
Pleaserefertoourpaperfordetails!
SWANGC
AnalyticmodelofSWANGC
45
2019. 10. 23.
22
§ BackgroundandObservations
§ DesignofSWAN
§ Evaluation
SWAN
§ Environment• DellR730serverequippedwith2XeonCPUsand64GBDRAM• Samsung850PRO128GBx9SATASSDs(upto1TBcapacity)• OpenchannelSSDformonitoringSSDinternalactivities
§ Targetconfig.• RAID-0/4/5• Log-RAID-0/4• SWAN-0/4
§ Workloads• Microbenchmark• YCSB-A,B,C,andD
Evaluation
47
2019. 10. 23.
23
AnalysisofGCBehavior
a
Thro
ughp
ut (M
B/se
c)
Time (Sec)600 1200 1800 2400 3000 3600
b
c
Thro
ughp
ut (M
B/se
c)
Time (Sec)
SSD 1
SSD 2
SSD 3
SSD 4
SSD 5
SSD 6
SSD 7
SSD 8
USER600 1200 1800 2400 3000 3600
Log-RAID(8SSDs) SWAN(4R-groups/2SSDperR-group)
§ Randomwriteworkload
48
AnalysisofLog-RAIDPerformance
a
Thro
ughp
ut (M
B/se
c)
Time (sec)
600 1200 1800 2400 3000 3600
SSD1
SSD2
SSD3
SSD4
SSD5
SSD6
SSD7
SSD8
USER
SSD1
SSD2
SSD3
SSD4
SSD5
SSD6
SSD7
SSD8
GCstartshere:AllSSDsinvolvedinGC
GCstartshere
Log-RAID0Userobservedthroughput
Write throughputRead throughput
PerformancefluctuatesasallSSDsareinvolvedinGC
Redlines increaseswhilebluelinesdropdownsinceGCincursreadandwriteoperations
49
2019. 10. 23.
24
AnalysisofSWANPerformance
b
c
Thro
ughp
ut (M
B/se
c)
Time (sec)
600 1200 1800 2400 3000 3600
SSD1
SSD2
SSD3
SSD4
SSD5
SSD6
SSD7
SSD8
USER
SSD1
SSD2
SSD3
SSD4
SSD5
SSD6
SSD7
SSD8
GCstartshere
GCstartshereOnlyoneback-endisinvolvedinGC
Userobservedthroughput
Front-end
Front-end
Front-end
Front-end
Back-end
Write throughputRead throughput
§ SWAN has1front-endand4back-ends
§ Front/back-endsconsistsof2SSDs
Configuration
§ Thispatterncontinues§ SWANseparateswrite
requestsandGC
50
§ Configuration• RAID4/5:8dataSSDs+1paritySSD• Log-RAID:8dataSSDs+1paritySSD• SWAN4:3R-groupwith2dataSSDsand1paritySSDperR-group
ThroughputResults
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
YCSB-A YCSB-B YCSB-C YCSB-D
Throughput(ops/sec)
Run Load
51
2019. 10. 23.
25
ReadLatencyResults(CDF)
0.995
0.996
0.997
0.998
0.999
1
0 20
0 40
0 60
0 80
0 10
00 12
00 14
00 16
00 18
00
CD
F
Time (msec)
SWANRAID5
RAID4LOGRAID
0.9995 0.9996 0.9997 0.9998 0.9999
1
0 10
0 20
0 30
0 40
0 50
0 60
0 0.995
0.996
0.997
0.998
0.999
1
0 10
0 20
0 30
0 40
0 50
0 60
0 70
0 80
0 90
0
Time (msec)
SWANRAID5
RAID4LOGRAID
0.9995 0.9996 0.9997 0.9998 0.9999
1
0 10
0 20
0 30
0 40
0 50
0 60
0
0.995
0.996
0.997
0.998
0.999
1
0 20
0 40
0 60
0 80
0 10
00
Time (msec)
SWANRAID5
RAID4LOGRAID
0.9995 0.9996 0.9997 0.9998 0.9999
1
0 10
0 20
0 30
0 40
0 50
0 60
0
YCSB-A YCSB-B
YCSB-C
0.995
0.996
0.997
0.998
0.999
1
0 20
0 40
0 60
0 80
0 10
00
Time (msec)
SWANRAID5
RAID4LOGRAID
0.9995 0.9996 0.9997 0.9998 0.9999
1
0 10
0 20
0 30
0 40
0 50
0 60
0
YCSB-D
52
BenefitswithSimplerSSDs
• SWANcansavecostandpowerconsumptionwithoutcompromisingperformancebyadoptingsimplerSSDs1) SmallerDRAMsize2) Smallerover-provisioningspace(OPS)
3) BlockorsegmentlevelFTLinsteadofpage-levelFTL
SWANsequentiallywritesdatatosegmentsandTRIMsalargechunkofdatainthesamesegmentatonce
53
2019. 10. 23.
26
§ Trendofthetimes
§ SWAN
§ SummaryandConclusion
Outline
§ ProposedSWAN• NewmanagementpolicyforAllFlashArray
§ KeyideaofSWAN• Performanceneedonlybetomaximumofnetwork• Spatialseparation
− DecoupleGCI/Os fromnormalonesbypartitioningtheSSDarrayinto2groups− full(networkbandwidth)writeperformance− “eliminate”GCeffect
§ ExtrabenefitsofSWAN• SSDcanbesimpler
SummaryandConclusion
It’s the network stupid!
55
2019. 10. 23.
27
Thankyou!!!
56