device technologies for the n3xt 1,000x improvementdevice technologies for the n3xt 1,000x...

122
Stanford University Department of Electrical Engineering 2007.11.08 Stanford SystemX Alliance Device Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University

Upload: others

Post on 10-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong1

Stanford University

Department of Electrical Engineering2007.11.08Stanford SystemX Alliance

Device Technologies for the N3XT 1,000X Improvement in Computing Performance

H.-S. Philip WongStanford University

Page 2: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong2

21st century workload:Large amounts of loosely-structured data

Streaming video/audio Natural languages Real-time sensor Contextual environment

Page 3: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong3

Abundant-Data Applications

3

Genomics Smart Cities

Health Care

RetailScience

SecurityFinance Government

Page 4: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong4

Abundant-Data Applications

4

Genomics Smart Cities

Health Care

RetailScience

SecurityFinance Government

Computational demands

exceed

Processing capability

Page 5: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong5

Abundant-Data Applications

Huge memory wall

Application execution time

96%

Compute

Memory access

Source: S. Mitra (Stanford)

Page 6: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong6

Hardware Software

TodayCommodityhardware

Advanced image

analysis

Hardware-Software Co-Evolution

Source: Subhasish Mitra, Chris Ré

Page 7: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong7

Hardware Software

Today

Tomorrow

Commodityhardware

CGRAs

Advanced image

analysis

Automaticvideo

annotation

Hardware-Software Co-Evolution

Source: Subhasish Mitra, Chris Ré

Page 8: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong8

Hardware Software

Today

Tomorrow

The DayAfter Tomorrow

Commodityhardware

CGRAs

N3XTBrain

network

Advanced image

analysis

Automaticvideo

annotation

Hardware-Software Co-Evolution

Source: Subhasish Mitra, Chris Ré

Page 9: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong9

Hardware Software

Today

Tomorrow

The DayAfter Tomorrow

Commodityhardware

CGRAs

N3XTBrain

network

Advanced image

analysis

Automaticvideo

annotation

Source: Subhasish Mitra, Chris Ré

Page 10: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong10

Source: Intel

Logic

Memory Wire

Page 11: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong11

Computing Chips TodayLimited to Two-Dimensional CircuitsComputation immersed in memory

N3XT Nanosystems

Page 12: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong12

N3XT NanosystemsComputation immersed in memory

Page 13: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong13

MemoryUltra-dense

vertical connections

N3XT NanosystemsComputation immersed in memory

Computing logic

Page 14: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong14

MemoryUltra-dense

vertical connections

N3XT NanosystemsComputation immersed in memory

Impossible with today’s technologies

Computing logic

Page 15: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong15

Many of these breakthroughs will require new kinds of nanoscale devices and materials integrated into three-dimensional systemsand may take a decade or more to achieve.

Page 16: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong16

MemoryUltra-dense

vertical connections

N3XT NanosystemsComputation immersed in memory

Computing logic

Page 17: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong17

“New” Memories

filament

oxygen ion

Top Electrode

Bottom Electrode

metaloxide

oxygen vacancy

Bottom Electrode

Top Electrode

oxide isolation

switching region

phase change material

filament

Bottom Electrode

solid electrolyte

Active Top Electrode

metal atoms

STT-MRAM CBRAMRRAMPCM

Spin torque transfer magnetic random access memory

Phase change memory

Resistive switching

random access memory

Conductive bridge random access memory

Soft Magnet

Pinned Magnet

tunnel barrier (oxide)

current

Random access, non-volatile, no erase before write, on-chip integration

internal gate

HfZrO2

p-Si

n+ n+

HfO2

top gate

FERAM

Ferro-electric random access

memory

Page 18: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong18

Memory Hierarchy

CPU cycles 1 10 - 100 >100 106 - 108 >109

Speed 0.1’s ns 1 - 10’s ns 10’s -100’s ns 0.1-10’s ms > 100’s ms

100 M G T PSize (bytes) K

Main Memory(DRAM)

Tertiary Storage

(Disk, RAID)

Registers [R]L1 Instruction Cache (SRAM) [L1I]

L1 Data Cache (SRAM) [L1D]L2 Cache (SRAM) [L2]

L3 Cache (eDRAM) [L3]R L1I

L1DSecondary

Storage(Flash, Disk)

Core 1

Interco

nn

ect

L2 Cache

Peripheral controller

Mem

ory

Co

ntro

ller(s)

……

L2 Cache

L3 Cache (eDRAM)

Core N

L2

L3

Must change drastically in the coming decade

H.-S. P. Wong, S. Salahuddin, Nature Nanotech (2015)

Page 19: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong19

STT-MRAM

CPU cycles 1 10 - 100 >100 106 - 108 >109

Speed 0.1’s ns 1 - 10’s ns 10’s -100’s ns 0.1-10’s ms > 100’s ms

100 M G T PSize (bytes) K

Main Memory(DRAM)

Tertiary Storage

(Disk, RAID)R L1I

L1D

Secondary Storage

(Flash, Disk)

L2L3

Soft Magnet

Pinned Magnet

tunnel barrier (oxide)

current

You already know: VPROG ≤ 0.5V, VDD = 1.2 – 1.5 V A few – 10’s of ns read/write Write current ~ 10’s μA (F<45 nm) “infinite” endurance 1T1MTJ, > 6F2

Scalable to 20 nm (expt.)

Page 20: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong20

Embedded STT-MRAM Cache

H. Noguchi et al., “A 3.3ns-Access-Time 71.2μW/MHz 1Mb Embedded STT-MRAM Using Physically Eliminated Read-Disturb Scheme and Normally-Off Memory Architecture,” ISSCC, pp. 136 – 137 (2015) [Toshiba]

Page 21: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong21

Why has STT-MRAM not arrived?

Maybe I will be proven wrong soon …

Source: Intel

Page 22: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong22

Homework for STT-MRAM Folks

Device technology

Current (10’s μA) too high

Spin Hall effect promises to reduce current by 10X

Scalability demonstrated to 20 nm so far

Manufacturing

Not ready at the same time as logic (cf. SRAM)

MTJ – stacks of 16 layers , 4Å thick

Deposited and ion milled across 300 mm wafer

Does not survive 400°C BEOL fab temperature

(# = nm)

Page 23: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong23

STT-MRAM Design Opportunity

TkE BB /Thermal stability,

expoRetention time,

Bco EI Write current,

10-year retention: 40

Large array 60

But we don’t need 10 year retention for cache and main memory!

Write current can be significantly reduced

Page 24: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong24

RRAM – Fab Friendly, “Easy” to Make

CPU cycles 1 10 - 100 >100 106 - 108 >109

Speed 0.1’s ns 1 - 10’s ns 10’s -100’s ns 0.1-10’s ms > 100’s ms

100 M G T PSize (bytes) K

Main Memory(DRAM)

Tertiary Storage

(Disk, RAID)R L1I

L1D

Secondary Storage

(Flash, Disk)

L2L3

You already know: 10 ns read/write, VPROG ~1 – 2 V Write current ~ nA – 10’s μA 1E12 cycles endurance @ device 1T1R, ~ 6F2

3D RRAM (similar to 3D NAND) Scalable to < 5 nm & smaller Fab-friendly, easy to embed

filament

oxygen ion

Top Electrode

Bottom Electrode

metaloxide

oxygen vacancy

TiN, Al

HfOx, TaOx, AlOx, AlOxNy

TiN

Page 25: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong25

RRAM Integrated on CMOSRRAM

J. Provine et al., GOMAC, 2014

CMOS

Page 26: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong26

RRAM FPGA Integration

FPGA Chip

Cell

A

A’

Cross-section View

M1

M2

M3

M4

M5

M6

A A’

PR

Ti

N-doped AlOx (RCF) / Al (BE)

Al (TE)

Al (M6)

SiO2

Programmable Resistor

Al

N-doped AlOx

Al

FPGA Tiles & Configuration Memory

Test

Decoder/Driver

Dec

od

er/D

rive

r

FPGA IO

FPGA IO

FPG

A IO

FPG

A IO

Y. Liauw et al., Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012.

Page 27: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong27

RRAM Is Scalable

Bi-layer TiOx (2.5nm) / HfOx (1.5nm)

Y. Wu et al., IEDM 2013.

Scalable: 10 nmScalable: 12 nm

Page 28: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong28

RRAM Promises1012 cycles Multi-bit

VLSI’11 (Samsung)VLSI’12 (Samsung)

IEDM’12 (Stanford)

IPROG < 100 nA

VLSI’11 (Stanford)

40 nm

Pt

SiO2

SiO2

SiO2

Pt

TiN

HfOx

VRRAM

Bit-Cost Scalable

Page 29: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong29

3D RRAM

40 nm

Pt

SiO2

SiO2

SiO2

Pt

TiN

HfOx

VRRAM

S. Yu, H.-Y. Chen et al., Symp. VLSI Tech. 2013

Page 30: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong30

High Density 3D Memory

< 1 μA

1 – 2 V

5 ns

> 1G cycles

F = 5 nm

128 layers

64 Tb per chip

IEDM ’12, ’13, ’14VLSI ’13, ’14, ’16DATE ’15, Nature Comm ’15

Page 31: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong31

https://nano.stanford.edu/stanford-memory-trends

Energy vs Speed @ Device Level

Nothing faster than STT-MRAM

Speed limited by physics

10-1

100

101

102

103

104

105

106

10-14

10-13

10-12

10-11

10-10

10-9

10-8

10-7

10-6

10-5

RRAM

CBRAM

PCM

STT-MRAM

Write

En

erg

y (

J)

Write Time (ns)

Page 32: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong32

Write Energy Scaling @ Device Level

https://nano.stanford.edu/stanford-memory-trends

NANDSTT-MRAM Proportional to area

PCM: Proportional to area

RRAM & CBRAM: Material and resistance values

100

101

102

103

104

105

106

107

10-3

10-2

10-1

100

101

102

103

104

105

106

107

1 4 11 36 113 357 1128 35681 4 11 36 113 357 1128 3568

RRAM

CBRAM

PCM

STT-MRAM

Pro

gra

mm

ing

En

erg

y (

pJ)

Cell Area (nm2)

Equivalent Contact Diameter (nm)

Page 33: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong33

Logic Switches for 10, 7, 5, 3, 1 … nm

ETSOI

NWFET = “nanowire FET”, ETSOI = “extremely-thin silicon-on-insulator”, CNFET = “carbon nanotube FET”

NWFETFinFET CNFET

d~1nmCNT

Thin channel body (electrostatics control) Minimize parasitic capacitance and resistance

Source: G. Hills, S. Mitra (Stanford)

Page 34: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong34

Carbon Nanotube FET (CNFET)

10X EDP Benefit vs. silicon

Sub-10 nm node VLSI circuits

EDP: “energy-delay product”

2 µm

Gate

2 µm

Gate

LG<10nm

d~1.2nm

CNT

Source: G. Hills, S. Mitra (Stanford)

Page 35: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong35

CNFET: ~10X EDP vs. Si-CMOS

Nodes: 22 … 14 ... 11… 8 ... 5 (FinFET)

20 W/cm2

100 W/cm2

Nodes: 8 .. 5 (CNFET)

Performance (1015 transitions/sec.)

2

1.6

1.2

0.8

0.4

0

Energ

y (

fJ/t

ransitio

n)

0 4 8 12 16

x3

x1/3

IBM Power 7 modeled

Source: D. Frank and W.

Haensch, IBM (IEDM Short

Course 2012)

Page 36: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong36

CNT Computer

Instruction FetchData Fetch ALU Write-back

Instruction Fetch Data Fetch Write BackArithmetic Block

M. Shulaker et al., Nature 13

Page 37: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong37

CNT Computer

Instruction FetchData Fetch ALU Write-back

M. Shulaker et al., Nature 13

Page 38: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong38

CNFET Enables Monolithic 3D

CNT transfer decouples high temperature growth

High-temperature CNT growth

900 °C

2 µm

CNT transfer

Low-temperature circuit fabrication

120 °C

CNTs

Growth

2 µm

Post-transfer

N. Patil, Symp. VLSI Tech. 08

Page 39: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong39

Temperature < 250 oC

VO

UT

(V)

VIN (V)

Inter-layer logic 3-Layer integration Memory on logic

RRAM

Layer 1Layer 2

CNFETs

Conventional

vias, no TSVs

CNFET: Monolithic 3D Fabrication Flow

CNT transfer

Imperfection-immune circuit

Passivation

Inter-layer vias &

interconnects

H. Wei, IEDM 09, IEDM 13

Page 40: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong40

2D → 3D → 2D Structure, 2D → 3D Integration

FinFET NWFET ETSOI CNFET

5nm

[Tsutsui IEDM 05]

100nm

10nm8nm

[Intel VLSI 15] [IBM VLSI 15]

Source: G. Hills, S. Mitra (Stanford)

[Shulaker IEDM 14]

Page 41: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong41

Computation Immersed in Memory

thermal

thermal

MRAMQuick access

3D Resistive RAMMassive storage

Not TSV

thermal

1D CNFET, 2D FETCompute, RAM access

1D CNFET, 2D FETCompute, RAM access

1D CNFET, 2D FETCompute, Power,

Clock

Silicon compatible

Ultra-dense, fine-grained

vias

Page 42: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong42

Aly et al., IEEE Computer, 2015

Page 43: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong43

N3XT Architecture

Full physical design

–Standard tool flow

Cores: CNFET

Cache: STTRAMCNFET access

Memory control: CNFET

Memory: RRAM CNFET access

2 Million ILVs

Source: M. Sabry Aly, S. Mitra (Stanford)

Page 44: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong44

N3XT: 850× EDP Benefit

M. Aly…S. Mitra, IEEE Computer (2015)

Same power density (67 W/cm2) and peak temperature 63°C

Exec. Time: 23XEnergy: 37X

IBM graph

analytics

Data-intensive

computing

PageRank app.

0%

20%

40%

60%

80%

100%

2D N3XT

0%

20%

40%

60%

80%

100%

2D N3XT

2.7% 4.3%

Energy: 37X Exec. Time: 23X

Page 45: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong45

N3XT: 850× EDP Benefit

M. Aly…S. Mitra, IEEE Computer (2015)

0%

20%

40%

60%

80%

100%

2D N3XT0%

20%

40%

60%

80%

100%

2D N3XT

Processor active Processor idle Memory access

0%

1%

2%

3%

0%

3%

5%

Exec. Time: 23XEnergy: 37X

IBM graph

analytics

Data-intensive

computing

PageRank app.

Page 46: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong46

Nanosystems

46

Nano-Engineered Computing Systems Technology

Nanoscalecooling

Abundantdataapps

1D / 2D FETs,

RRAM, mRAM

Architecture&

software

Yield, reliability

New3D

fabrication

End-to-end approach

[M. Aly et al., IEEE Computer ‘15]

0 1 0 1

Page 47: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong47

CNT computing logic

Can Build Nanosystems Today

47

Classification accelerator

>2 Million carbon nanotube FETs, 1 Mbit Resistive RAM

[M. Shulaker et al., submitted ‘16]

Page 48: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong48

CNT computing logic

Can Build Nanosystems Today

48

Memory1 Megabit RRAM

[M. Shulaker et al., submitted ‘16]

Page 49: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong49

CNT computing logic

Can Build Nanosystems Today

49

Memory

Millions of sensorsCNTs

X100,000

[M. Shulaker et al., submitted ‘16]

Page 50: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong50

CNT computing logic

Can Build Nanosystems Today

50

Memory

Millions of sensors

Ultra-densevertical connections

[M. Shulaker et al., submitted ‘16]

Page 51: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong51

CNT computing logic

Can Build Nanosystems Today

51

Memory

Millions of sensors

Ultra-densevertical connections

Abundant data: Terabytes / second

[M. Shulaker et al., submitted ‘16]

Page 52: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong52

CNT computing logic

Can Build Nanosystems Today

Memory

Millions of sensors

Ultra-densevertical connections

Abundant data: Terabytes / second

In-situ classification:Extensive, accurate

[M. Shulaker et al., submitted ‘16]

Page 53: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong53 Source: Subhasish Mitra, Chris Ré

Page 54: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong54

Hardware Software

Today

Tomorrow

The DayAfter Tomorrow

Commodityhardware

CGRAs

N3XTBrain

network

Advanced image

analysis

Automaticvideo

annotation

Source: Subhasish Mitra, Chris Ré

Page 55: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong55

0 1 0 1

Page 56: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong56

MemoryUltra-dense

vertical connections

N3XT NanosystemsComputation immersed in memory

Computing logic

0 1 0 1

Page 57: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong57

These nanotechnology innovations will have to be developed in close coordination with new computer architectures, and will likely be informed by our growing understanding of the brain—a remarkable, fault-tolerant system that consumes less power than an incandescent light bulb.

Page 58: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong58

Goal: Energy Efficiency of the Brain

Learning algorithms

Image sources: stackoverflow

Page 59: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong59

Goal: Energy Efficiency of the Brain

Learning algorithms

Biological modelsImage sources: stackoverflow, Stanford

Pre-synaptic neuron

Post-synaptic neuron

Pre-spike

Synapse

Post-spike

Axon

hillock

Δt < 0 Δt > 0

pre

post

pre

post

-100 -50 0 50 100-60

-40

-20

0

20

40

60

80

100

120

Syn

ap

tic w

eig

ht

ch

an

ge

w (

%)

Spike timing t (ms)

(a)

(b) (c)

Page 60: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong60

Goal: Energy Efficiency of the Brain

Learning algorithms Electronic implementation

Biological modelsImage sources: stackoverflow, Stanford, IBM

Pre-synaptic neuron

Post-synaptic neuron

Pre-spike

Synapse

Post-spike

Axon

hillock

Δt < 0 Δt > 0

pre

post

pre

post

-100 -50 0 50 100-60

-40

-20

0

20

40

60

80

100

120

Syn

ap

tic w

eig

ht

ch

an

ge

w (

%)

Spike timing t (ms)

(a)

(b) (c)

Page 61: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong61

Goal: Energy Efficiency of the Brain

Learning algorithms Electronic implementation

Biological models

poly c-GSTpoly c-GSTpoly c-GST

amorphousamorphous

SiO2 SiO2 SiO2TiN TiN TiN

TiNTiNTiN

fully set state partially reset state fully reset state

Nanoscale electronic synapseImage sources: stackoverflow, Stanford, IBM

Pre-synaptic neuron

Post-synaptic neuron

Pre-spike

Synapse

Post-spike

Axon

hillock

Δt < 0 Δt > 0

pre

post

pre

post

-100 -50 0 50 100-60

-40

-20

0

20

40

60

80

100

120

Syn

ap

tic w

eig

ht

ch

an

ge

w (

%)

Spike timing t (ms)

(a)

(b) (c)

Page 62: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong62

Application Hardware used

Estimated

power

consumption

Emulating 4.5% of human brain:1013

synapses, 109 neurons

Blue Gene/P:

36,864 nodes,

147,456 cores

2.9 MW

(LINPACK)

Deep sparse autoencoder:

109 synapses, 10M images

1,000 CPUs

(16,000 cores)

~100 kW

(cores only)

Convolutional neural net with 60M

synapses, 650K neurons

2 GPUs 1,200 W

Restricted Boltzmann Machine:

28M synapses; 69,888 neurons

GPU 550 W

CPU 65 W

Processing 1 s of speech using deep

neural network

GPU 238 W

CPU (4 cores) 80 W

Larg

e s

cale

Sm

all

to

mo

dera

te s

cale

Scale Up Requires Energy Efficiency

S. B. Eryilmaz et al., IEDM 2015

Page 63: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong63

Custom Hardware for Energy Efficient AI

E. Lee et al, ISSCC 2016Y.-H. Chen et al, ISSCC 2016 S. Han et al., arXiv 2016

Inference in CNN hardware280 mW for AlexNet[MIT]

Inference in deep compressed networks600 mW for AlexNet[Nvidia/Stanford]

Matrix-vector multiplication with switch caps8 TOPS/W @ 2.5 GHz[Stanford]

Page 64: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong64

Approaches of Neuromorphic Hardware

Page 65: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong65

Approaches of Neuromorphic HardwareB

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

s

Co

nve

nti

on

al

ML

algo

rith

ms

Page 66: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong66

Approaches of Neuromorphic Hardware

Conventional hardware (CPU, GPU, supercomputers, etc)

Neuromorphic hardware

Page 67: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong67

Approaches of Neuromorphic Hardware

Conventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Page 68: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong68

Approaches of Neuromorphic HardwareB

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

sConventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Brain emulation on BlueGene [7]

HTM [3]

Page 69: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong69

Approaches of Neuromorphic HardwareB

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

s

Co

nve

nti

on

al

ML

algo

rith

ms

Conventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Brain emulation on BlueGene [7]

HTM [3]

“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]

Page 70: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong70

Approaches of Neuromorphic HardwareB

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

s

Co

nve

nti

on

al

ML

algo

rith

ms

Conventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Brain emulation on BlueGene [7]

HTM [3]

“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]

Human Brain Project [20]

TrueNorth [16]

SpiNNaker [19]

Page 71: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong71

Approaches of Neuromorphic HardwareB

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

s

Co

nve

nti

on

al

ML

algo

rith

ms

Conventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Brain emulation on BlueGene [7]

HTM [3]

“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]

Human Brain Project [20]

TrueNorth [16]

SpiNNaker [19]

Hebbian learningSpike-based ANN

PCM, RRAM, CBRAM

Page 72: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong72

Approaches of Neuromorphic HardwareB

iolo

gy-b

ase

d

mo

de

ls /

al

gori

thm

s

Co

nve

nti

on

al

ML

algo

rith

ms

Conventional hardware (CPU, GPU, supercomputers, etc)

with analog non-volatile memory synapses

Neuromorphic hardware

Brain emulation on BlueGene [7]

HTM [3]

“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]

Human Brain Project [20]

TrueNorth [16]

SpiNNaker [19]

Hebbian learningSpike-based ANN

PCM, RRAM, CBRAM

ANN, RBM, sparse learning

PCM, RRAM

Page 73: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong74

Today’s “Large” Scale Architectures

IFAT+HiAER (UCSD) Neurogrid (Stanford)

TrueNorth (IBM)

Digital neurons Mesh-based routing

Analog neurons Tree routing

Analog neurons Tree routing

SpiNNaker (U of Manchester)

Digital neurons Mesh routing

Page 74: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong75

Synapses Implemented Today

IFAT+HiAER (UCSD) Neurogrid (Stanford)

TrueNorth (IBM) SpiNNaker (U of Manchester)

DRAM (off-chip)

SRAM (on-chip)

DRAM (off-chip)

SRAM (on-chip)+DRAM (off-chip)

Energetically expensive Refresh Off-chip access

Scalability?

Area inefficient

Page 75: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong76

Non-Volatile Memory (NVM)

D. Kuzum et al., Nano Lett. 2013, Y. Wu et al., IEDM 2013; A. Calderoni et al., IMW 2014

Metal oxide resistive switching memory (RRAM)

25nm

12 nm

TiOx/

HfOx

TiN

10 nm

SiO2

filament

oxygen ion

Top Electrode

Bottom Electrode

metaloxide

oxygen vacancy

poly c-GSTpoly c-GSTpoly c-GST

amorphousamorphous

SiO2 SiO2 SiO2TiN TiN TiN

TiNTiNTiN

fully set state partially reset state fully reset state

Phase change memory (PCM)

Bottom Electrode

Top Electrode

oxide isolation

switching region

phase change material

Conductive bridge memory (CBRAM)

filament

Bottom Electrode

solid electrolyte

Active Top Electrode

metal atoms

Cu ionElectrolyte

Bottom electrode

Page 76: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong77

Non-Volatile Memory (NVM) → Synapse

Analog programmable

Scalable to a few nm

Stack in 3D

D. Kuzum et al., Nano Lett. 2013, Y. Wu et al., IEDM 2013; A. Calderoni et al., IMW 2014

Metal oxide resistive switching memory (RRAM)

25nm

12 nm

TiOx/

HfOx

TiN

10 nm

SiO2

poly c-GSTpoly c-GSTpoly c-GST

amorphousamorphous

SiO2 SiO2 SiO2TiN TiN TiN

TiNTiNTiN

fully set state partially reset state fully reset state

Phase change memory (PCM)

Conductive bridge memory (CBRAM)

Cu ionElectrolyte

Bottom electrode

Page 77: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong78

0 500 1000 1500 2000 250010

3

104

105

Re

sis

tan

ce

(O

hm

)

Pulse Number2100 2200 2300

103

104

105

Resis

tan

ce (

Oh

m)

Pulse Number

100-step grey scale (1% resolution)

Partial RESET

Partial SET

Phase change synapse

Nanoscale Memory as Synaptic Weights

Synaptic updates in the brain: basis for learningRequirement: analog resistance change

D. Kuzum et al., Nano Lett., p. 2179 (2012)

Page 78: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong79

Gradual Resistance Change

Larger pulse amplitude Fewer states

S. Yu et al. Adv. Mater. vol. 25, pp. 1774-1779, 2013

filament

oxygen ion

Top Electrode

Bottom Electrode

metaloxide

oxygen vacancy

RRAM

Page 79: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong80

Nanoscale Memory Can Emulate Biological Synaptic Behavior

-60 -40 -20 0 20 40 60

-40

0

40

80

120

=-11.3

=-18.6

=-29

=11

=20.7

LTP-1

LTP-2

LTP-3

LTD-1

LTD-2

LTD-3

Sy

na

pti

c w

eig

ht

ch

an

ge

w

(%

)

Spike timing t (ms)

=29.5

Various time constants

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

Various STDP kernels

0 20 40 60 80 100

0

20

40

60

80

100

120

140 10 ms 25 ms

15 ms 35 ms

20 ms 45 ms

Syn

ap

tic

Weig

ht

Ch

an

ge

w

(%

)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

D. Kuzum et al., Nano Lett., p. 2179 (2012)

Page 80: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong81

Nanoscale Memory Can Emulate Biological Synaptic Behavior

-60 -40 -20 0 20 40 60

-40

0

40

80

120

=-11.3

=-18.6

=-29

=11

=20.7

LTP-1

LTP-2

LTP-3

LTD-1

LTD-2

LTD-3

Sy

na

pti

c w

eig

ht

ch

an

ge

w

(%

)

Spike timing t (ms)

=29.5

Various time constants

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

Various STDP kernels

0 20 40 60 80 100

0

20

40

60

80

100

120

140 10 ms 25 ms

15 ms 35 ms

20 ms 45 ms

Syn

ap

tic

Weig

ht

Ch

an

ge

w

(%

)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

D. Kuzum et al., Nano Lett., p. 2179 (2012)

Page 81: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong82

Nanoscale Memory Can Emulate Biological Synaptic Behavior

-60 -40 -20 0 20 40 60

-40

0

40

80

120

=-11.3

=-18.6

=-29

=11

=20.7

LTP-1

LTP-2

LTP-3

LTD-1

LTD-2

LTD-3

Sy

na

pti

c w

eig

ht

ch

an

ge

w

(%

)

Spike timing t (ms)

=29.5

Various time constants

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

Various STDP kernels

0 20 40 60 80 100

0

20

40

60

80

100

120

140 10 ms 25 ms

15 ms 35 ms

20 ms 45 ms

Syn

ap

tic

Weig

ht

Ch

an

ge

w

(%

)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

D. Kuzum et al., Nano Lett., p. 2179 (2012)

Page 82: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong83

Nanoscale Memory Can Emulate Biological Synaptic Behavior

-60 -40 -20 0 20 40 60

-40

0

40

80

120

=-11.3

=-18.6

=-29

=11

=20.7

LTP-1

LTP-2

LTP-3

LTD-1

LTD-2

LTD-3

Sy

na

pti

c w

eig

ht

ch

an

ge

w

(%

)

Spike timing t (ms)

=29.5

Various time constants

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

-50 0 50

-50

0

50

100

w

(%

)

t (ms)-50 0 50

-50

0

50

100

w

(%

)

t (ms)

Various STDP kernels

0 20 40 60 80 100

0

20

40

60

80

100

120

140 10 ms 25 ms

15 ms 35 ms

20 ms 45 ms

Syn

ap

tic

Weig

ht

Ch

an

ge

w

(%

)

Number of pre/post spike pairs

Weight update saturation

STDP (spike-timing-dependent plasticity)

D. Kuzum et al., Nano Lett., p. 2179 (2012)

Page 83: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong84

Stochastic Weight Update

Program memory close to switching threshold

Emulate grey scale

Escape local minima in gradient descent

SET success probability: 14% @ VSET = 1.3 V 40% @ VSET = 1.6 V 76% @ VSET = 1.9 V

0 20 40 60 80 100

100

1k

10k

100k

1M

SET success probability ~14%

SET +1.3V/10ns

RESET -1.7V/10ns

Res

ista

nc

e (

)

Cycle (#)

0 20 40 60 80 100

100

1k

10k

100k

1M

SET success probability ~40%

SET +1.6V/10ns

RESET -1.7V/10ns

Resis

tan

ce (

)

Cycle (#)

0 20 40 60 80 100

100

1k

10k

100k

1M

Resis

tan

ce (

)Cycle (#)

SET +1.9V/10ns

RESET -1.7V/10ns

SET success probability ~76%SETRESET

S. Yu et al., Frontiers of Neuroscience, 2013

Page 84: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong85

Stochastic Weight Update

Program memory close to switching threshold

Emulate grey scale

Escape local minima in gradient descent

SET success probability: 14% @ VSET = 1.3 V 40% @ VSET = 1.6 V 76% @ VSET = 1.9 V

0 20 40 60 80 100

100

1k

10k

100k

1M

SET success probability ~14%

SET +1.3V/10ns

RESET -1.7V/10ns

Res

ista

nc

e (

)

Cycle (#)

0 20 40 60 80 100

100

1k

10k

100k

1M

SET success probability ~40%

SET +1.6V/10ns

RESET -1.7V/10ns

Resis

tan

ce (

)

Cycle (#)

0 20 40 60 80 100

100

1k

10k

100k

1M

Resis

tan

ce (

)Cycle (#)

SET +1.9V/10ns

RESET -1.7V/10ns

SET success probability ~76%SETRESET

S. Yu et al., Frontiers of Neuroscience, 2013

Page 85: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong86

Stochastic Weight Update

Program memory close to switching threshold

Emulate grey scale

Escape local minima in gradient descent

SET success probability: 14% @ VSET = 1.3 V 40% @ VSET = 1.6 V 76% @ VSET = 1.9 V

0 20 40 60 80 100

100

1k

10k

100k

1M

SET success probability ~14%

SET +1.3V/10ns

RESET -1.7V/10ns

Res

ista

nc

e (

)

Cycle (#)

0 20 40 60 80 100

100

1k

10k

100k

1M

SET success probability ~40%

SET +1.6V/10ns

RESET -1.7V/10ns

Resis

tan

ce (

)

Cycle (#)

0 20 40 60 80 100

100

1k

10k

100k

1M

Resis

tan

ce (

)Cycle (#)

SET +1.9V/10ns

RESET -1.7V/10ns

SET success probability ~76%SETRESET

S. Yu et al., Frontiers of Neuroscience, 2013

Page 86: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong87

RRAM Stochastic Weight Update

1.4 1.5 1.6 1.7 1.810

100

Pulse Height (V)

Pu

lse W

idth

(n

s)

4.0

16

29

41

54

66

78

91

100

SET Probability (%) 100

0

Page 87: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong88

3D VRRAM

3D NN

Vertical

logic nv

-PE

RM

nv-M

ajo

rity

nv

-NO

T

nv

-AD

D

nv

-MU

LT

IPLY

nv

-OR

nv

-AN

D

nv

-Ha

mm

ing

nv

-NA

ND

nv

-NO

R

nv

-XO

R

nv

-XN

OR

In-memory logic kernels

Input

neurons

Output

neurons

4th-layer VRRAM

3rd-layer VRRAM

2nd-layer VRRAM

1st-layer VRRAM

RRAM synapses

Vertical RRAM In-Memory Computing

Truly exploit the 3rd dimension

VRRAM: vertical RRAM; NN: neural network

H. Li et al., Symp. VLSI Tech. 2016

Page 88: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong89

In-Memory Logic Kernels for Hyper-Dimensional* Computing

Input

data

XOR

One-shot learning

Learned

information

Inference

Cognitive tasks

In-memory

logic kernels

Energy efficient

Error resilient

[*P. Kanerva, Cog. Comput.’09]

H. Li et al., Symp. VLSI Tech. 2016

Page 89: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong90

In-Memory Logic Kernel: Bit Permutation

Bit permutation is simple and efficient

VDD

VDD

VDD

gnd

gnd

gnd

L1

L2

L3

L4

0

0

0

1

0

0

1

0RESET

1

0

0

0

RESET

0

1

0

0

RESET

LRS

(~10kΩ)

HRS

(100-250kΩ)

Measured resistance states

1 0

Bit permutation

along vertical pillars

L1

L2

L3

L4

H. Li et al., Symp. VLSI Tech. 2016

Page 90: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong91

192 synapses with RRAMs: Hebbian learning

S. Park et al., Scientific Reports, 2015

Array Level Experimental Demonstrations165,000 PCM synapses: Gradient based backpropagation

G. Burr et al., IEDM 2014

100 PCM synapses: Biological Hebbian learning rule

PCM

S. B. Eryilmaz et al., IEDM 2013

Page 91: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong92

192 synapses with RRAMs: Hebbian learning

S. Park et al., Scientific Reports, 2015

Array Level Experimental Demonstrations165,000 PCM synapses: Gradient based backpropagation

G. Burr et al., IEDM 2014

100 PCM synapses: Biological Hebbian learning rule

PCM

S. B. Eryilmaz et al., IEDM 2013

Page 92: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong93

Array Level Experimental Demonstrations165,000 PCM synapses: Gradient based backpropagation

G. Burr et al., IEDM 2014

100 PCM synapses: Biological Hebbian learning rule

PCM

S. B. Eryilmaz et al., IEDM 2013

192 synapses with RRAMs: Hebbian learning

S. Park et al., Scientific Reports, 2015

Page 93: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong94

Restricted Boltzmann Machine with Resistive Phase Change Memory Synapses

Δw ∝ <vh>data - <vkhk>reconstruction

Hidden nodes = 0 (bj, j=0:4)

Visible nodes= 0 (ai, i=0:8)

Pairwise weights (wij) : train with 2-PCM synapses

Input pixels

S. B. Eryilmaz et al., submitted 2016

Page 94: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong95

Contrastive Divergence with Resistive Phase Change Memory Synapses

Number of pulses

Resis

tance (

)

Number of pulses

Resis

tance (

Ω)

S. B. Eryilmaz et al., submitted 2016

Page 95: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong96

Mapping Contrastive Divergence onto Phase Change Memory Array

PCM

WL

TE

S. B. Eryilmaz et al., submitted 2016

Page 96: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong97

Inference After Training

S. B. Eryilmaz et al., submitted 2016

Page 97: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong98

5 10 150

0.2

0.4

0.6

Error Rate vs Number of Patterns Stored

Number of patterns stored

Err

or

rate

Before training

S. B. Eryilmaz et al., submitted 2016

Page 98: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong99

5 10 150

0.2

0.4

0.6

Error Reduces After Training

Number of patterns stored

Err

or

rate

Before training

5 iteration (PCM syn.)

S. B. Eryilmaz et al., submitted 2016

Page 99: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong100

5 10 150

0.2

0.4

0.6

Error Reduces with Further Training

Number of patterns stored

Err

or

rate

Before training

5 iteration (PCM syn.)

10 iteration (PCM syn.)

S. B. Eryilmaz et al., submitted 2016

Page 100: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong101

5 10 150

0.2

0.4

0.6

Error Reduction Saturates with Training

Number of patterns stored

Err

or

rate

Before training

5 iteration (PCM syn.)

10 iteration (PCM syn.)

30 iteration (PCM syn.)

S. B. Eryilmaz et al., submitted 2016

Page 101: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong102

5 10 150

0.2

0.4

0.6

Saturation: PCM Synapse Starts to Unlearn

Number of patterns stored

Err

or

rate

70 iteration (PCM syn.)

Before training

5 iteration (PCM syn.)

10 iteration (PCM syn.)

30 iteration (PCM syn.)

S. B. Eryilmaz et al., submitted 2016

Page 102: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong103

5 10 150

0.2

0.4

0.6

“Precise” Synapse: Error Continues to Reduce

Number of patterns stored

Err

or

rate

70 iteration (PCM syn.)

Before training

5 iteration (PCM syn.)

10 iteration (PCM syn.)

30 iteration (PCM syn.)

70 iteration (64-bit syn.)

S. B. Eryilmaz et al., submitted 2016

Page 103: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong104

Energy Consumption per Epoch (Synapses only)

PCM

hardware

Conventional

hardware*

910 nJ(430 nJ logic, 480 nJ memory)

6.1 nJ

*Energy estimate of Intel Xeon Phi coprocessor (22 nm)

S. B. Eryilmaz et al., submitted 2016

Page 104: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong105

Open Research Questions1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature

Page 105: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong106

Open Research Questions1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature

Page 106: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong107

Open Research Questions1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature

Page 107: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong108

Open Research Questions1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature

Page 108: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong109

Open Research Questions1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature

Page 109: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong110

Open Research Questions1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature

Page 110: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong111

Open Research Questions1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature

Page 111: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong112

Open Research Questions1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature

Page 112: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong113

Open Research Questions1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature

Page 113: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong114

Open Research Questions1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature

Page 114: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong115

Open Research Questions1. Functionality → performance/Watt, performance/m2 →

variability → reliability

2. Scale up (system size), scale down (device size)

3. Role of variability (functionality, performance)

4. Fan-in / fan-out, hierarchical connections, power delivery

5. Low voltage (wire energy ≅ device energy)

6. Stochastic learning behavior → statistical learning rules

7. Meta-plasticity (internal state variables)

8. Timing as an internal variable

9. Learning rules: biological? AI?

10. Algorithm-device co-design

11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature

Page 115: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong116

Focus Area: Computation for Data Analytics

Page 116: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong117

Nano-EngineeredComputing Systems Technology

0 1 0 1

Aly et al., IEEE Computer, 2015

Page 117: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong118

Collaborators

Gert CauwenberghsSiddharth Joshi Emre Neftci(UC San Diego)

Jinfeng Kang (Peking U)

Chung LamSangBum KimMatt Brightsky

K.S. Lee, J.M. Shieh, W.K. Yen… (NDL, Taiwan)

Page 118: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong119

Students and Post-Docs

Page 119: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong120

Sponsors

Non-Volatile Memory Technology Research Initiative

Page 120: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong121

Stanford SystemX Alliance

Page 121: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong122

Non-Volatile Memory Technology Research Initiative (NMTRI) @ Stanford University

Page 122: Device Technologies for the N3XT 1,000X ImprovementDevice Technologies for the N3XT 1,000X Improvement in Computing Performance H.-S. Philip Wong Stanford University. 2 H.-S. Philip

Stanford University2015.04.15H.-S. Philip Wong123

End of Talk

Questions?