controlling qos metrics in nvme ssds
TRANSCRIPT
Quality of Service Implications of the Error Correction Techniques in Solid State Drives
Lorenzo Zuolo†, Cristian Zambelli†, Rino Micheloni‡, Stephen Bates‡ and Piero Olivo† †Dipartimento di Ingegneria – Università di Ferrara (Italy)
‡PMC-Sierra [email protected]
Solid State Drives (SSDs) are now the most effective solution for mass storage applications
• High performance• Read and write bandwidth up to GB/s• Million IOPS
• High robustness• No mechanical parts
• Low power consumption• Power wall at 25W
Reliability?SSD’s reliability is tightly coupled with that of the exploited storage system NAND FLASH MEMORIES
NAND Flash memories are subjected to a progressive wear-out due to endurance (program and erase cycles i.e., P/E cycles) and data retention
Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)
A direct indicator of such wear-out is the RBER
Solution: To improve RBER figures thus lowering the percentage of uncorrectable pages, NAND flash vendors have introduced the Read Retry (RR) algorithm
Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)
Take away: when RR is used, memory read time increases with respect to normal read time (up to 256% in tested device)
Problem: Increasing RBERs result in higher probability of erroneously decoding the bits read in a page (percentage of uncorrectable pages).
Take away 1: It is tightly coupled with the ECC’s correction capabilitiesTake away 2: As soon as it becomes higher than zero the whole SSD is
considered no longer reliable
Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)
Take away: To improve SSD’s reliability advanced ECCs must be used
When looking at SSDs where Quality of Service (QoS) standards must be met, exploiting RR techniques could introduce a performance/reliability trade-off for the whole system
SSD’s bandwidth and latencies have been collected by means of the SSDExplorer co-simulation framework
RBER and percentage of uncorrectable pages
characteristics
host system command submission/completion
timings
Queue depth(1 Thread) Real Device SSDExplorer Matching
Read BW Read BW BW Delta
32 642691 KB/s 626112 KB/s -2.58%64 813615 KB/s 822295 KB/s 1.07%128 861639 KB/s 870096 KB/s 0.98%
Hardware/Software Co-simulation
Endurance in 1X MLC
Simulated architecture: 8 Channels, 4 targets per channel, 512GBECC: multi-threaded BCH 100bit/4320Bytes
Retention in 1X MLC(@ 10 kP/E Cycles)
SSD’s read latency distributions: minimum, 25th percentile, median, 75th percentile and maximum were measured.
An enterprise class QoS of 5ms has been set as reference
Conclusions:1. RR is able to enhance both endurance and data retention. In measured device endurance and retention were extended up to 31% and 300% respectively.2. SSD’s read performance and latencies were heavily impacted by RR. If QoS has to be guaranteed actual endurance and data retention extensions
provided by RR settle around 10% and 7% respectively.
SSD’s read bandwidth.
Endurance in 1X MLCRetention in 1X MLC(@ 10 kP/E Cycles)
QoS limit marks a 10% performance degradation compared to the beginning of life (endurance) and beginning of retention time.
ECC/SSD fail point (ECCTH)ECC/SSD fail point (ECCTH)