naresh - eri summitnaresh university of illinois at shanbhag. urbana-champaign. distribution a....

Post on 22-May-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Distribution A. Approved for public release; distribution is unlimitedDistribution A. Approved for public release; distribution is unlimited

NARESH

UNIVERSITY OF ILLINOIS AT

SHANBHAG

URBANA-CHAMPAIGN

Distribution A. Approved for public release; distribution is unlimited

MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views,opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views orpolicies of the Department of Defense or the U.S. Government.

Distribution A. Approved for public release; distribution is unlimited

THE TEAM

Mike Burkland Naveen Verma(Princeton)

Pavan Hanumolu(UIUC)

Naresh Shanbhag(UIUC)

[systems & ckts][mixed-signal ICs][systems & ckts]

• L.-Y. Chen• P. Deaville• B. Zhang

• A. Patil, H. Hua, S. Gonugondla, K.-H. Kim

Ajey Jacob

[devices, foundry]

• S. Soss, D. Brown

• N. Gaul, B. Paul, B. Lanza

• J. Broussard• L. Suantak• J. Goulding• S. Cole• T. Gangwer• R. Kelley• C. Contreras

[DoD systems]

Raytheon MS GLOBALFOUNDRIES Princeton University of Illinois at Urbana-Champaign

Distribution A. Approved for public release; distribution is unlimited

REALIZING ARTIFICIAL INTELLIGENCE @ THE EDGE

AI at the Edge

model

data

AI in the Cloud(autonomous cognitive agent)

• client-server model• efficiency challenged• security + privacy issues

• energy-latency efficient, securebut……

• volume + resource constraineddecisions

[source: pexels.com][source: pexels.com]

Distribution A. Approved for public release; distribution is unlimited

THE DATA MOVEMENT PROBLEM𝑬𝑬𝒎𝒎𝒎𝒎𝒎𝒎𝑬𝑬𝒎𝒎𝒎𝒎𝒎𝒎

≈ ~𝟏𝟏𝟏𝟏𝟏𝟏 × (SRAM) → ~𝟓𝟓𝟏𝟏𝟏𝟏 × (DRAM) → ~𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏 × (Flash)

[Canziani, Arxiv’16]

The Memory Wall in von Neumann Architectures

Distribution A. Approved for public release; distribution is unlimited

fundamental question

how do we design intelligent autonomous machinesthat operate at the limits of energy-delay-accuracy?

Distribution A. Approved for public release; distribution is unlimited

Proceedings of IEEE, Special Issue on non von Neumann Computing, January 2019.

Systems on Nanoscale Information fabriCswww.sonic-center.org

[2013-’17](STARnet Program by Semiconductor Research Corporation & DARPA)

Distribution A. Approved for public release; distribution is unlimited

SHANNON-INSPIRED MODEL OF COMPUTING

use information-based metrics e.g., mutual information 𝐼𝐼(𝑌𝑌𝑜𝑜; �𝑌𝑌)1design low SNR fabrics, e.g., deep in-memory architecture (DIMA)2

develop statistical error-compensation (SEC) techniques3

(noisy channel)

DIMA

[ISSCC’18]

Distribution A. Approved for public release; distribution is unlimited

integratethe MRAM device

within a

deep in-memory architecture (DIMA)

using

Shannon-inspired compute models

FRANC OBJECTIVEGLOBALFOUNDRIES PRINCETON & UIUC RAYTHEON MISSILE SYSTEMS

thermally limited

MRAM device

deep in-memory arch.

Shannon-inspired compute models

autonomous platforms

Distribution A. Approved for public release; distribution is unlimited

[ICASSP 2014 UIUC, JSSC 2017 Princeton, JSSC 2018 UIUC]

read functions of datastandard bit-cell array(preserves storage density)

THE DEEP IN-MEMORY ARCHITECTURE (DIMA)

Distribution A. Approved for public release; distribution is unlimited

SRAM DIMA PROTOTYPES

Multi-functionalinference processor

(65nm CMOS)

Random forest processor

(65nm CMOS)

On-chip training processor

(65nm CMOS)

Fully (128) row-parallel compute

(130nm CMOS)

100X EDP reduction over von Neumann equivalent* @ iso-accuracy* 8b fixed-function digital architecture with identical SRAM size

[Feb. JSSC’18] [ESSCIRC’17, JSSC July’18] [ISSCC’18, JSSC Nov.’18] [VLSI’16, JSSC’17]

Distribution A. Approved for public release; distribution is unlimited

MRAM OPPORTUNITIES & CHALLENGES

high conductance → large array currentssmall TMR → high sensitivity readouthuge 𝑅𝑅𝑀𝑀𝑀𝑀𝑀𝑀 − 𝑅𝑅𝑀𝑀𝑀𝑀𝑀𝑀 limits cell compute modelshigh cell density → severe pitch-matching const.high MTJ process variation restricts bit-line SNR

MTJ resistance distribution

Free Layer

Ref. Layer

Tunnel Barrier

Select Transistor

Top Electrode

BottomElectrode

WL

Source Line

pMTJ stack 40Mb MRAM demo

Challenges:Opportunities:high density → high on-chip compute densitynon-volatility → ultra low-power duty-cycled operationin production → enables system prototyping &

reduces time to DoD availability

Distribution A. Approved for public release; distribution is unlimited

MRAM-BASED DEEP IN-MEMORY ARCHITECTURES

four DIMAs proposed – two selected for prototypingBit-line Sensing (DIMA-BLS) Current Drive Time-based

Sensing (DIMA-CDTS)

𝑥𝑥[𝑖]

𝑀𝑀 × 𝑁𝑁 single-shot matrix vector multiply Functional read: 5-bit × 4-bit dot products on SLs Compute cell: 1T-1MTJ ( 4 × 1 multiplication) SL current sensing via time-based 5-bit ADCs SNR vs. energy trade-off due to sensing circuits

𝑀𝑀 × 𝑁𝑁 single-shot matrix vector multiply Functional read: binary dot products on BLs Compute cell: 2T-2MTJ (1b XNOR/AND) BL voltage sensing via 4-bit SAR ADCs SNR vs. energy trade-off due to sensing circuits

Distribution A. Approved for public release; distribution is unlimited

EDP GAINS OVER VON NEUMANN EQUIVALENT

(includes all peripherals)

43× EDP reduction (256-dimensional dot products)

250×-to-1000× EDP reduction (128-dimensional dot products)

DIMA-BLS

DIMA-CDTS

MVM: matrix-vector multiply

Distribution A. Approved for public release; distribution is unlimited

SHANNON-INSPIRED COMPUTE MODELS

• relaxes MRAM-based DIMA’s compute SNR requirements → enables substantial energy savings

• two methods: stochastic data-driven hardware resilience & coded DIMA

MRAM-based DIMA

use information-based metrics develop error-compensation techniques

Distribution A. Approved for public release; distribution is unlimited

STOCHASTIC DATA-DRIVEN HARDWARE RESILIENCE

• noise-tolerance enhanced well beyond MTJ variations

• can be exploited for energy-SNR tradeoff

MRAM-based DIMA InferenceModel Parameters𝜃𝜃𝑀𝑀𝑀𝑀𝑀𝑀𝑆𝑆𝑆𝑆(𝑥𝑥,𝐺𝐺)G (MRAM variation, ADC noise) gi

DIMA-aware Stochastic Training

[B. Zhang, ICASSP’19]

Simulated MRAM-based BNN (CIFAR-10):

Distribution A. Approved for public release; distribution is unlimited

CODED DIMA - ENHANCING DIMA’S COMPUTE SNRMTJ variation aware coding enables HIGH system SNR in spite of LOW circuit SNR

part

ial d

ecim

atio

n-in

-tim

eA

rchi

tect

ure

for m

appi

ng

FFT

on D

IMA

N-point Input Sequence

N/ M-pointInput

Sequence

N/ M-pointInput

Sequence

Input shuffling

N/ M-pointDFT

N/ M-pointDFT

Merge N/ M-point DFTs

N-point DFTCoefficients

N- p

oint

DFT

DFT, DHT, QFT

Coefficient precision (𝐵𝐵𝑊𝑊)

Out

put S

NR(d

B)

Limited byinput SNR

Limited byMRAM device

variation

Limited byquantization

noise

Input SNR: 40 dB; = 8; = 128

18dB

61 dB

41 dB

MRAM-DI MAWrite data

Functional read

Calculate error

Calibrate

write-time compensation (WTC)of MRAM variations

WTC

MTJ

𝐵𝐵𝑥𝑥 𝑁𝑁

Distribution A. Approved for public release; distribution is unlimited

CURRENT STATUS

Technology transition via SRAM-based DIMA DIMA prototype designs taped out

(2019) [single-bank] validate EDPgains & calibrate models

(2020) [multi-bank] enable scaling and system integration

DIMA technology transition to RMS foralgorithm prototyping & evaluation

facilitates future MRAM-based DIMAintegration by RMS

verified & quantified RMS’s mission capability enabled by MRAM-based DIMA MRAM-based DIMA demonstrates > 2 orders-of-magnitude EDP reduction

[H. Jia, arXiv:1811.04047, Princeton]

Distribution A. Approved for public release; distribution is unlimited

NEXT STEPS & SUMMARY

Mission: To realize > 200X in EDP gains in DoD workloads by integrating MRAM devicewithin Deep In-memory Architectures using Shannon-inspired compute models

FRANC Program: “to provide the foundation for new materials technology and newintegration approaches to be exploited in pursuit of novel compute architectures”

Multi-functioned DIMAs for Diverse Applications Parameterized DIMAs for Platform-design Tools

Summary

[Srivastava, et al., ISCA’18]

top related