3d stacked memoriesmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · memory wall...

34
3D STACKED MEMORIES -PRESENTED BY KARISHMA REDDY

Upload: others

Post on 06-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

3D STACKED MEMORIES-PRESENTED BY KARISHMA REDDY

Page 2: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

AGENDA• OBJECTIVES BEHIND DEVELOPING 3D STACKED MEMORIES

- Memory wall

- Existing memory technologies and their drawbacks

• HYBRID MEMORY CUBE- Introduction

- Architecture

- Conceptual layout

- Benefits offered by the architecture

• HIGH BANDWIDTH MEMORY

• - Introduction

- Architecture

- Conceptual layout

- Benefits offered by the architecture

• COMPARISON BETWEEN DDr4, HMC AND HBM

Page 3: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

OBJECTIVES BEHIND DEVELOPING 3D STACKED

MEMORIES

Page 4: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

MEMORY WALL

• Memory bandwidth is a more fundamental bottleneck to higher performance

of computer architectures than any other factor.

• In order to continue exploiting Moore’s law, the multicore and multithread

processors were introduced which do provide the required high performances.

• However, we notice a decrease in the efficient utilization of such

machines as we continue to increase the number of cores or threads for

enhanced performance.

Page 5: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

CONTINUED…

• This can be attributed to the fact that over the years processors have become

faster but the memory bandwidth has not improved much.

• So as the processors become faster than memory, the program execution time

would depend entirely on how fast the memory could feed the data to these

multiprocessors.

• This leads to a situation for a greater need of memory bandwidth and

density, more commonly known today as the ‘memory wall’ phenomenon.

Page 6: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

EXISTING MEMORY TECHNOLOGIES

• Majority of the computing machines make use of DRAM as main memory

since it provides large capacity at low cost.

• A DDRx DRAM system consists of a memory controller present on processor

chip issuing commands to the DRAM devices plugged into the motherboard.

• Each device consists of multiple memory banks and associated circuitry.

• The newer versions basically maintains this same technology and implement

additional circuitry to enhance performance.

Page 7: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

LAYOUT OF THE DDRx DRAM

Page 8: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

MAIN DRAWBACKS

• However the performance improvement from these new versions is not much

and further improvement in performance will require DRAM scaling.

• But DRAM scaling can only be done up to a point where the devices are still

able to hold charge without being required to be incessantly charged.

• Electrical wires used to form connections between controller and memory are

dense and hence tend to consume more power.

• These wires are connected using pins which may again increase the cost of the

system if the memory bus requires many such electrical wires.

Page 9: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

CONTINUED…

• DRAM modules can be considered ‘not smart’ in the sense that they do not

function on their own, instead they depend on the memory controller for their

functioning.

• A proposed solution is to leverage the recent advances in the 3D fabrication

technology to develop memory architectures with 3D configuration.

• This proposed solution is the inspiration behind development of the new

innovative architectures explained in the following slides.

Page 10: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

HYBRID MEMORY CUBE

Page 11: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

INTRODUCTION

• The HMC is a memory technology announced by Micron in 2011 that consists

of a high performance RAM interface for TSV-based stacked DRAM.

• It consists of a 3D configuration made up of DRAM layers stacked on top of

each other and a single control logic layer to handle all read/write traffic.

• The DRAM layers are connected using TSV (through silicon via) which are

vertical electrical connections passing entirely through die.

Page 12: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

CLOSE UP OF HMC

Page 13: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

ARCHITECTURE

• Start with a clean slate. Re-partition the DRAM layer and strip away the

common logic as we do not want a common logic associated with each and

every layer.

• Stack such multiple DRAM layers together using TSVs.

• The stacking and partitioning of DRAM layers results in the creation of vaults.

A column of independent memory banks is referred to as a vault.

Page 14: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

DESIGN PROCESS: STEP 1

Page 15: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

DESIGN PROCESS: STEP 2

Page 16: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

DESIGN PROCESS: STEP 3

Page 17: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

CONCEPTUAL LAYOUT OF THE ARCHITECTURE

Page 18: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

LAYOUT DESCRIPTION

• Single package containing multiple memory die and a single logic die stacked

together using TSV technology.

• It consists of memory organized into vaults with each vault being functionally

and operationally independent.

• Each vault has a memory controller in the logic base that manages all memory

reference operations within that vault.

Page 19: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

CONTINUED…

• The segmentation the DRAM layers results in the creation of structures known

as vaults, each made up of several banks.

• The main purpose of the vaults is to enhance parallelism within the

architecture.

• Similar to a DDRx channel, a vault consists of a common memory bus for the

several memory banks and the memory controller.

Page 20: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

CONTINUED…

• However, in this case the common memory bus is formed by the TSVs and the

memory controller is the vault controller.

• A vault controller is present at the base of each vault and acts as a memory

controller for that vault.

• It performs the functions of monitoring the timing constraints and transmitting

different commands to the modules above.

Page 21: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

BENEFITS OFFERED BY THE ARCHITECTURE

• The 3D design of the HMC helps in providing more density in terms of memory

available and reduced package footprint.

• Higher parallelism is possible due to multiple independent vaults within the

hybrid memory cube.

• Heterogeneity of the layers is made possible by the use of the TSV

technology.

Page 22: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

CONTINUED…

• The memory device at the end of the link is now ‘smart’.

• Near-memory computation is possible reducing the amount of data that must

be transferred back and forth between the memory and the processor.

• Higher bandwidth between the layers is made possible due to the use of

TSV connections between the layers which are denser and can transfer

data at higher rates due to shorter lengths.

Page 23: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

CONTINUED…

• As electrical connections become shorter and peripheral circuitry is moved

into the logic layer, the power cost is reduced.

• Reduced CPU pin requirement.

Page 24: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

HIGH BANDWIDTH MEMORY

Page 25: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

INTRODUCTION

• High Bandwidth memory is yet another 3D architecture based solution to the

memory bandwidth problem offered jointly by AMD and Hynix.

• The main inspiration behind the development of the HBM was to satisfy the

needs of future high performance GPU and high performance systems.

• Basically as discussed before in the case of DRAM memory, DRAM scaling is

a drawback as far as the future of memories is concerned.

Page 26: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

CONTINUED…

• Similarly in the case of GDDR5, if we are to develop the next version using

scaling with the same growth in bandwidth as in the case from GDDR3 to

GDDR5, then the power costs are significant.

• Due to all the drawbacks mentioned in the previous slide, a new approach

was required which guaranteed higher performance and lower power

consumption.

• This is were the HBM comes in and is a new type of CPU/GPU memory.

Similar to the architecture of the HMC, the HBM also consists of DRAM dies

stacked on top of each other with a logic base at the bottom.

Page 27: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

ARCHITECTURE

• The connections between the DRAM dies are made using TSVs and in

addition, the HBM also consists of an ultra wide bus width.

• These stacks are connected to the CPU/GPU using a fast interconnect known

as the interposer.

• Each HBM stack provides 8 independent channels in the sense that no

operation in one channel can affect the other channel.

Page 28: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

CONTINUED…

• Each channel in turn provides a 128-bit data interface which is bi-directional

and similar to a standard DDR interface and provides up to 16-32 GB/sec

bandwidth.

• Since each stack provides 8 channels, a total of128-256 GB/sec bandwidth

is possible per stack.

Page 29: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

LAYOUT OF THE HBM

Page 30: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

BENEFITS OFFERED BY THE HBM:

• Characteristics similar to on chip integrated RAM since the memory and the

CPU/GPU are closely connected through an interposer.

• It provides 3 times the bandwidth per watt of GDDR5.

• It fulfills the requirement of smaller space and it can fit the same amount of

memory in 94 percent less space.

Page 31: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

COMPARISON BETWEEN DDR4, HMC AND HBM

Page 32: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

COMPARISON

•DDr4 HMC HBM

- General purpose applications

- High end servers and enterprises

- Graphics and Computing

- JEDEC standard - Not a JEDEC standard - JEDEC standard

- Maximum Bandwidth up to 25.6 GBps

- Maximum Bandwidth up to 320 GBps

- Maximum Bandwidth up to 1 TBps

- Maximum speed up to 3200 Mbps

- Maximum speed up to 30 Gbps

- Maximum speed up to 2 Gbps

- No inbuilt logic layer - Has inbuilt logic layer - Has inbuilt logic layer

Page 33: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

REFERENCES

• http://www.hotchips.org

• http://www.hybridmemorycube.org/news.html

• http://community.cadence.com

• www.amd.com

• http://www.cs.utah.edu/thememoryforum/mike.pdf

Page 34: 3D STACKED MEMORIESmeseec.ce.rit.edu/722-projects/spring2016/2-1.pdf · 2016-05-11 · MEMORY WALL •Memory bandwidth is a more fundamental bottleneck to higher performance of computer

REFERENCES CONTINUED…

• http://wccftech.com/

• http://www.memcon.com/

• https://www.ece.umd.edu/~blj/papers/thesis-PhD-

paulr--HMC.pdf

• https://en.wikipedia.org

• http://www.extremetech.com