3d stacked memoriesmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · memory wall...

3D STACKED MEMORIES-PRESENTED BY KARISHMA REDDY

AGENDA• OBJECTIVES BEHIND DEVELOPING 3D STACKED MEMORIES

- Memory wall

- Existing memory technologies and their drawbacks

• HYBRID MEMORY CUBE- Introduction

- Architecture

- Conceptual layout

- Benefits offered by the architecture

• HIGH BANDWIDTH MEMORY

• - Introduction

- Architecture

- Conceptual layout

- Benefits offered by the architecture

• COMPARISON BETWEEN DDr4, HMC AND HBM

OBJECTIVES BEHIND DEVELOPING 3D STACKED

MEMORIES

MEMORY WALL

• Memory bandwidth is a more fundamental bottleneck to higher performance

of computer architectures than any other factor.

• In order to continue exploiting Moore’s law, the multicore and multithread

processors were introduced which do provide the required high performances.

• However, we notice a decrease in the efficient utilization of such

machines as we continue to increase the number of cores or threads for

enhanced performance.

CONTINUED…

• This can be attributed to the fact that over the years processors have become

faster but the memory bandwidth has not improved much.

• So as the processors become faster than memory, the program execution time

would depend entirely on how fast the memory could feed the data to these

multiprocessors.

• This leads to a situation for a greater need of memory bandwidth and

density, more commonly known today as the ‘memory wall’ phenomenon.

EXISTING MEMORY TECHNOLOGIES

• Majority of the computing machines make use of DRAM as main memory

since it provides large capacity at low cost.

• A DDRx DRAM system consists of a memory controller present on processor

chip issuing commands to the DRAM devices plugged into the motherboard.

• Each device consists of multiple memory banks and associated circuitry.

• The newer versions basically maintains this same technology and implement

additional circuitry to enhance performance.

LAYOUT OF THE DDRx DRAM

MAIN DRAWBACKS

• However the performance improvement from these new versions is not much

and further improvement in performance will require DRAM scaling.

• But DRAM scaling can only be done up to a point where the devices are still

able to hold charge without being required to be incessantly charged.

• Electrical wires used to form connections between controller and memory are

dense and hence tend to consume more power.

• These wires are connected using pins which may again increase the cost of the

system if the memory bus requires many such electrical wires.

CONTINUED…

• DRAM modules can be considered ‘not smart’ in the sense that they do not

function on their own, instead they depend on the memory controller for their

functioning.

• A proposed solution is to leverage the recent advances in the 3D fabrication

technology to develop memory architectures with 3D configuration.

• This proposed solution is the inspiration behind development of the new

innovative architectures explained in the following slides.

HYBRID MEMORY CUBE

INTRODUCTION

• The HMC is a memory technology announced by Micron in 2011 that consists

of a high performance RAM interface for TSV-based stacked DRAM.

• It consists of a 3D configuration made up of DRAM layers stacked on top of

each other and a single control logic layer to handle all read/write traffic.

• The DRAM layers are connected using TSV (through silicon via) which are

vertical electrical connections passing entirely through die.

CLOSE UP OF HMC

ARCHITECTURE

• Start with a clean slate. Re-partition the DRAM layer and strip away the

common logic as we do not want a common logic associated with each and

every layer.

• Stack such multiple DRAM layers together using TSVs.

• The stacking and partitioning of DRAM layers results in the creation of vaults.

A column of independent memory banks is referred to as a vault.

DESIGN PROCESS: STEP 1

CONCEPTUAL LAYOUT OF THE ARCHITECTURE

LAYOUT DESCRIPTION

• Single package containing multiple memory die and a single logic die stacked

together using TSV technology.

• It consists of memory organized into vaults with each vault being functionally

and operationally independent.

• Each vault has a memory controller in the logic base that manages all memory

reference operations within that vault.

CONTINUED…

• The segmentation the DRAM layers results in the creation of structures known

as vaults, each made up of several banks.

• The main purpose of the vaults is to enhance parallelism within the

architecture.

• Similar to a DDRx channel, a vault consists of a common memory bus for the

several memory banks and the memory controller.

CONTINUED…

• However, in this case the common memory bus is formed by the TSVs and the

memory controller is the vault controller.

• A vault controller is present at the base of each vault and acts as a memory

controller for that vault.

• It performs the functions of monitoring the timing constraints and transmitting

different commands to the modules above.

BENEFITS OFFERED BY THE ARCHITECTURE

• The 3D design of the HMC helps in providing more density in terms of memory

available and reduced package footprint.

• Higher parallelism is possible due to multiple independent vaults within the

hybrid memory cube.

• Heterogeneity of the layers is made possible by the use of the TSV

technology.

CONTINUED…

• The memory device at the end of the link is now ‘smart’.

• Near-memory computation is possible reducing the amount of data that must

be transferred back and forth between the memory and the processor.

• Higher bandwidth between the layers is made possible due to the use of

TSV connections between the layers which are denser and can transfer

data at higher rates due to shorter lengths.

CONTINUED…

• As electrical connections become shorter and peripheral circuitry is moved

into the logic layer, the power cost is reduced.

• Reduced CPU pin requirement.

HIGH BANDWIDTH MEMORY

INTRODUCTION

• High Bandwidth memory is yet another 3D architecture based solution to the

memory bandwidth problem offered jointly by AMD and Hynix.

• The main inspiration behind the development of the HBM was to satisfy the

needs of future high performance GPU and high performance systems.

• Basically as discussed before in the case of DRAM memory, DRAM scaling is

a drawback as far as the future of memories is concerned.

CONTINUED…

• Similarly in the case of GDDR5, if we are to develop the next version using

scaling with the same growth in bandwidth as in the case from GDDR3 to

GDDR5, then the power costs are significant.

• Due to all the drawbacks mentioned in the previous slide, a new approach

was required which guaranteed higher performance and lower power

consumption.

• This is were the HBM comes in and is a new type of CPU/GPU memory.

Similar to the architecture of the HMC, the HBM also consists of DRAM dies

stacked on top of each other with a logic base at the bottom.

ARCHITECTURE

• The connections between the DRAM dies are made using TSVs and in

addition, the HBM also consists of an ultra wide bus width.

• These stacks are connected to the CPU/GPU using a fast interconnect known

as the interposer.

• Each HBM stack provides 8 independent channels in the sense that no

operation in one channel can affect the other channel.

CONTINUED…

• Each channel in turn provides a 128-bit data interface which is bi-directional

and similar to a standard DDR interface and provides up to 16-32 GB/sec

bandwidth.

• Since each stack provides 8 channels, a total of128-256 GB/sec bandwidth

is possible per stack.

LAYOUT OF THE HBM

BENEFITS OFFERED BY THE HBM:

• Characteristics similar to on chip integrated RAM since the memory and the

CPU/GPU are closely connected through an interposer.

• It provides 3 times the bandwidth per watt of GDDR5.

• It fulfills the requirement of smaller space and it can fit the same amount of

memory in 94 percent less space.

COMPARISON BETWEEN DDR4, HMC AND HBM

COMPARISON

•DDr4 HMC HBM

- General purpose applications

- High end servers and enterprises

- Graphics and Computing

- JEDEC standard - Not a JEDEC standard - JEDEC standard

- Maximum Bandwidth up to 25.6 GBps

- Maximum Bandwidth up to 320 GBps

- Maximum Bandwidth up to 1 TBps

- Maximum speed up to 3200 Mbps

- Maximum speed up to 30 Gbps

- Maximum speed up to 2 Gbps

- No inbuilt logic layer - Has inbuilt logic layer - Has inbuilt logic layer

REFERENCES

• http://www.hotchips.org

• http://www.hybridmemorycube.org/news.html

• http://community.cadence.com

• www.amd.com

• http://www.cs.utah.edu/thememoryforum/mike.pdf

http://www.hotchips.org/

http://www.hybridmemorycube.org/news.html

http://community.cadence.com/cadence_blogs_8/b/ii/archive/2012/09/19/memcon-keynote-why-hybrid-memory-cube-will-revolutionize-system-memory

http://www.amd.com/

http://www.cs.utah.edu/thememoryforum/mike.pdf

REFERENCES CONTINUED…

• http://wccftech.com/

• http://www.memcon.com/

• https://www.ece.umd.edu/~blj/papers/thesis-PhD-

paulr--HMC.pdf

• https://en.wikipedia.org

• http://www.extremetech.com

http://wccftech.com/

http://www.memcon.com/

https://www.ece.umd.edu/~blj/papers/thesis-PhD-paulr--HMC.pdf

https://en.wikipedia.org/

http://www.extremetech.com/

3d stacked memoriesmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · memory wall...

Documents