3d stacked memoriesmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · memory wall...
TRANSCRIPT
3D STACKED MEMORIES-PRESENTED BY KARISHMA REDDY
AGENDA• OBJECTIVES BEHIND DEVELOPING 3D STACKED MEMORIES
- Memory wall
- Existing memory technologies and their drawbacks
• HYBRID MEMORY CUBE- Introduction
- Architecture
- Conceptual layout
- Benefits offered by the architecture
• HIGH BANDWIDTH MEMORY
• - Introduction
- Architecture
- Conceptual layout
- Benefits offered by the architecture
• COMPARISON BETWEEN DDr4, HMC AND HBM
OBJECTIVES BEHIND DEVELOPING 3D STACKED
MEMORIES
MEMORY WALL
• Memory bandwidth is a more fundamental bottleneck to higher performance
of computer architectures than any other factor.
• In order to continue exploiting Moore’s law, the multicore and multithread
processors were introduced which do provide the required high performances.
• However, we notice a decrease in the efficient utilization of such
machines as we continue to increase the number of cores or threads for
enhanced performance.
CONTINUED…
• This can be attributed to the fact that over the years processors have become
faster but the memory bandwidth has not improved much.
• So as the processors become faster than memory, the program execution time
would depend entirely on how fast the memory could feed the data to these
multiprocessors.
• This leads to a situation for a greater need of memory bandwidth and
density, more commonly known today as the ‘memory wall’ phenomenon.
EXISTING MEMORY TECHNOLOGIES
• Majority of the computing machines make use of DRAM as main memory
since it provides large capacity at low cost.
• A DDRx DRAM system consists of a memory controller present on processor
chip issuing commands to the DRAM devices plugged into the motherboard.
• Each device consists of multiple memory banks and associated circuitry.
• The newer versions basically maintains this same technology and implement
additional circuitry to enhance performance.
LAYOUT OF THE DDRx DRAM
MAIN DRAWBACKS
• However the performance improvement from these new versions is not much
and further improvement in performance will require DRAM scaling.
• But DRAM scaling can only be done up to a point where the devices are still
able to hold charge without being required to be incessantly charged.
• Electrical wires used to form connections between controller and memory are
dense and hence tend to consume more power.
• These wires are connected using pins which may again increase the cost of the
system if the memory bus requires many such electrical wires.
CONTINUED…
• DRAM modules can be considered ‘not smart’ in the sense that they do not
function on their own, instead they depend on the memory controller for their
functioning.
• A proposed solution is to leverage the recent advances in the 3D fabrication
technology to develop memory architectures with 3D configuration.
• This proposed solution is the inspiration behind development of the new
innovative architectures explained in the following slides.
HYBRID MEMORY CUBE
INTRODUCTION
• The HMC is a memory technology announced by Micron in 2011 that consists
of a high performance RAM interface for TSV-based stacked DRAM.
• It consists of a 3D configuration made up of DRAM layers stacked on top of
each other and a single control logic layer to handle all read/write traffic.
• The DRAM layers are connected using TSV (through silicon via) which are
vertical electrical connections passing entirely through die.
CLOSE UP OF HMC
ARCHITECTURE
• Start with a clean slate. Re-partition the DRAM layer and strip away the
common logic as we do not want a common logic associated with each and
every layer.
• Stack such multiple DRAM layers together using TSVs.
• The stacking and partitioning of DRAM layers results in the creation of vaults.
A column of independent memory banks is referred to as a vault.
DESIGN PROCESS: STEP 1
DESIGN PROCESS: STEP 2
DESIGN PROCESS: STEP 3
CONCEPTUAL LAYOUT OF THE ARCHITECTURE
LAYOUT DESCRIPTION
• Single package containing multiple memory die and a single logic die stacked
together using TSV technology.
• It consists of memory organized into vaults with each vault being functionally
and operationally independent.
• Each vault has a memory controller in the logic base that manages all memory
reference operations within that vault.
CONTINUED…
• The segmentation the DRAM layers results in the creation of structures known
as vaults, each made up of several banks.
• The main purpose of the vaults is to enhance parallelism within the
architecture.
• Similar to a DDRx channel, a vault consists of a common memory bus for the
several memory banks and the memory controller.
CONTINUED…
• However, in this case the common memory bus is formed by the TSVs and the
memory controller is the vault controller.
• A vault controller is present at the base of each vault and acts as a memory
controller for that vault.
• It performs the functions of monitoring the timing constraints and transmitting
different commands to the modules above.
BENEFITS OFFERED BY THE ARCHITECTURE
• The 3D design of the HMC helps in providing more density in terms of memory
available and reduced package footprint.
• Higher parallelism is possible due to multiple independent vaults within the
hybrid memory cube.
• Heterogeneity of the layers is made possible by the use of the TSV
technology.
CONTINUED…
• The memory device at the end of the link is now ‘smart’.
• Near-memory computation is possible reducing the amount of data that must
be transferred back and forth between the memory and the processor.
• Higher bandwidth between the layers is made possible due to the use of
TSV connections between the layers which are denser and can transfer
data at higher rates due to shorter lengths.
CONTINUED…
• As electrical connections become shorter and peripheral circuitry is moved
into the logic layer, the power cost is reduced.
• Reduced CPU pin requirement.
HIGH BANDWIDTH MEMORY
INTRODUCTION
• High Bandwidth memory is yet another 3D architecture based solution to the
memory bandwidth problem offered jointly by AMD and Hynix.
• The main inspiration behind the development of the HBM was to satisfy the
needs of future high performance GPU and high performance systems.
• Basically as discussed before in the case of DRAM memory, DRAM scaling is
a drawback as far as the future of memories is concerned.
CONTINUED…
• Similarly in the case of GDDR5, if we are to develop the next version using
scaling with the same growth in bandwidth as in the case from GDDR3 to
GDDR5, then the power costs are significant.
• Due to all the drawbacks mentioned in the previous slide, a new approach
was required which guaranteed higher performance and lower power
consumption.
• This is were the HBM comes in and is a new type of CPU/GPU memory.
Similar to the architecture of the HMC, the HBM also consists of DRAM dies
stacked on top of each other with a logic base at the bottom.
ARCHITECTURE
• The connections between the DRAM dies are made using TSVs and in
addition, the HBM also consists of an ultra wide bus width.
• These stacks are connected to the CPU/GPU using a fast interconnect known
as the interposer.
• Each HBM stack provides 8 independent channels in the sense that no
operation in one channel can affect the other channel.
CONTINUED…
• Each channel in turn provides a 128-bit data interface which is bi-directional
and similar to a standard DDR interface and provides up to 16-32 GB/sec
bandwidth.
• Since each stack provides 8 channels, a total of128-256 GB/sec bandwidth
is possible per stack.
LAYOUT OF THE HBM
BENEFITS OFFERED BY THE HBM:
• Characteristics similar to on chip integrated RAM since the memory and the
CPU/GPU are closely connected through an interposer.
• It provides 3 times the bandwidth per watt of GDDR5.
• It fulfills the requirement of smaller space and it can fit the same amount of
memory in 94 percent less space.
COMPARISON BETWEEN DDR4, HMC AND HBM
COMPARISON
•DDr4 HMC HBM
- General purpose applications
- High end servers and enterprises
- Graphics and Computing
- JEDEC standard - Not a JEDEC standard - JEDEC standard
- Maximum Bandwidth up to 25.6 GBps
- Maximum Bandwidth up to 320 GBps
- Maximum Bandwidth up to 1 TBps
- Maximum speed up to 3200 Mbps
- Maximum speed up to 30 Gbps
- Maximum speed up to 2 Gbps
- No inbuilt logic layer - Has inbuilt logic layer - Has inbuilt logic layer
REFERENCES
• http://www.hotchips.org
• http://www.hybridmemorycube.org/news.html
• http://community.cadence.com
• www.amd.com
• http://www.cs.utah.edu/thememoryforum/mike.pdf
REFERENCES CONTINUED…
• http://wccftech.com/
• http://www.memcon.com/
• https://www.ece.umd.edu/~blj/papers/thesis-PhD-
paulr--HMC.pdf
• https://en.wikipedia.org
• http://www.extremetech.com