ddr4 memory compliance testing barbara aichinger futureplus systems
TRANSCRIPT
FuturePlus Systems Corporation
15 Constitution Drive
Bedford NH 03110 USA
Barbara P. Aichinger Vice President New Business Development
DDR4 Memory Compliance Testing
Agenda
• DDR Memory Standards for Compliance Testing
• Memory problems continue to plague the industry – Recent Published Papers
– Row Hammer Failures
– Security Issues
• The concept of an Audit for Compliance Testing – Electrical
– Protocol
– Row Hammer
– SPD/MRS
– Performance/Margin
• Summary
Compliance Testing Documents
• Not yet…getting closer…
• FuturePlus Systems Sponsoring a
Protocol Checks Document
– Task Group has several Industry members
and several T&M vendors
– Several ballots have been passed and a
document is expected in 2017
Memory Errors continue to
plague the industry
Memory Errors in Modern Systems
This is called Thresholding
Average ~2%
Errors in Facebook’s Fleet of
Servers
If FB has 100K Servers • ~2% have a memory failure every month
• Of that number 46% of those have a DIMM
swap
• Doing the math….2% of 100K is 2000
• 46% of 2000 = 920 DIMM Swaps a Month!
• 30 days a month, 24 hours a day = 720 hours
in a month
Facebook is swapping out DIMMs every hour of every day of every month all year long!
An Update on Row Hammer
Failures
• Seen on DDR4
– Passmark Blog
• Several reports for DDR4 failing the Row Hammer
test
– ThirdIO paper
• http://www.thirdio.com/rowhammer.pdf
– Usenix
– Blackhat
– SGI seeing DDR4 RH failures in HPC
Row Hammer
A quick review!
0
1
0 0 0 0 0 0 0 0
1 1 1 1 1
Activate Command
Columns
Rows (pages)
Victim Row
USENIX Security Symposium
August 2016
ECC will not save you!
Row Hammer Failures on
DDR4
https://www.sgi.com/pdfs/4567.pdf
Introducing: The concept of an
AUDIT for JEDEC Compliance
Testing
• Not a repeat of a Design Verification
• A check to make sure the JEDEC
specification is being met
For the System and DIMM
• Audit the signal integrity of the memory channel
• Monitor the system for Protocol Violations
– BIOS programming errors
– SPD programmed incorrectly
– Memory Controller Issues
• SPD Check
• Row Hammer Testing
• Performance/Margin Testing
Using a Scan from a Logic
Analyzer instead of a Scope
• Allows for an easy and quick check of:
– Signal Alignment
– Relative Data Valid Eye
– Signal Swing
To see all signals at once a slot
interposer is used
DIMM Slot Interposer allows the system to operate up to 4200MT/s and run any application
Audit: Signal Swing
Slide Courtesy of
Overdriving DDR4 DRAM
to 1.4V could cause
damage.
Potential ODT setting issue. Threshold of first bit in burst has less swing than remainder of burst. Could also be ISI (inter-symbol interference)
Audit: Signal Alignment
For READS the Strobe is level
aligned For WRITES the
Strobe is Edge Aligned to the
Data
Signal Alignment
All the Data signals in a
Byte should be aligned
Relative Data Eye
DQ Write Eye overlay on Byte 5
5000 cycles (2400MT/s)
Eye threshold
centered at 790mv – 838mv
Eye size
Avg. of 272mV x 205 ps
Observations
All eyes are consistent in size and alignment.
Address Signals
Easy to check even at higher speeds
3200MT/s
Read data with Strobe
Write data with Strobe
Next Check for JEDEC Protocol
Violations by the memory controller
• The DDR4 JEDEC spec contains rules on
event ordering
• Examples
– Do not ACTIVATE a bank that is already open
– Do not PRECHARGE a bank that is already
closed
– Do not RD/WR a non open page
Memory Controller
Timing Violations
• Clock edge boundary
– Commands can not be too close together or too far apart
– Examples
• tREFI - Average refresh interval
• tRC - ACT to ACT or REF
• tMOD - MRS to PDE
• tCCD_L - RD to RD to Same Bank Group
65 violations identified with over
1000+ simultaneous checks
Protocol and Timing Compliance
‘in the wild’
JEDEC Specification Violation
The SPD has to be checked! Serial Presence Detect Device
Mistakes in the SPD can lead to the BIOS not
programming the Memory
Controller correctly
Mode Register Settings
Performance Metrics Not necessary for JEDEC compliance but a nice to
know!
• Which power management features are implemented
– Is Self Refresh ever being used?
– Is Max Power Down implemented?
• Can we look to see if any timing parameters can be improved?
Increasing Performance by
looking at timing margins RD to WR same Rank
Spec says 7 system operating at 10
Operating right at
Specification
Not happening! No Power
Management
Making the Measurement
Photos Courtesy of Keysight Technologies Photos Courtesy of FuturePlus Systems
Summary
• Memory Errors in the Field are pervasive!
• DDR Memory Compliance Testing can be
achieved using the method outlined
• Tools are available
– Purchase or Rent
• Companies needing help can hire industry
experts to perform the testing for them
Contact Information
Barbara P. Aichinger
FuturePlus Systems
603-472-5905
www.FuturePlus.com
www.DDRDetective.com