device technologies for the n3xt 1,000x improvementdevice technologies for the n3xt 1,000x...
TRANSCRIPT
Stanford University2015.04.15H.-S. Philip Wong1
Stanford University
Department of Electrical Engineering2007.11.08Stanford SystemX Alliance
Device Technologies for the N3XT 1,000X Improvement in Computing Performance
H.-S. Philip WongStanford University
Stanford University2015.04.15H.-S. Philip Wong2
21st century workload:Large amounts of loosely-structured data
Streaming video/audio Natural languages Real-time sensor Contextual environment
Stanford University2015.04.15H.-S. Philip Wong3
Abundant-Data Applications
3
Genomics Smart Cities
Health Care
RetailScience
SecurityFinance Government
Stanford University2015.04.15H.-S. Philip Wong4
Abundant-Data Applications
4
Genomics Smart Cities
Health Care
RetailScience
SecurityFinance Government
Computational demands
exceed
Processing capability
Stanford University2015.04.15H.-S. Philip Wong5
Abundant-Data Applications
Huge memory wall
Application execution time
96%
Compute
Memory access
Source: S. Mitra (Stanford)
Stanford University2015.04.15H.-S. Philip Wong6
Hardware Software
TodayCommodityhardware
Advanced image
analysis
Hardware-Software Co-Evolution
Source: Subhasish Mitra, Chris Ré
Stanford University2015.04.15H.-S. Philip Wong7
Hardware Software
Today
Tomorrow
Commodityhardware
CGRAs
Advanced image
analysis
Automaticvideo
annotation
Hardware-Software Co-Evolution
Source: Subhasish Mitra, Chris Ré
Stanford University2015.04.15H.-S. Philip Wong8
Hardware Software
Today
Tomorrow
The DayAfter Tomorrow
Commodityhardware
CGRAs
N3XTBrain
network
Advanced image
analysis
Automaticvideo
annotation
Hardware-Software Co-Evolution
Source: Subhasish Mitra, Chris Ré
Stanford University2015.04.15H.-S. Philip Wong9
Hardware Software
Today
Tomorrow
The DayAfter Tomorrow
Commodityhardware
CGRAs
N3XTBrain
network
Advanced image
analysis
Automaticvideo
annotation
Source: Subhasish Mitra, Chris Ré
Stanford University2015.04.15H.-S. Philip Wong10
Source: Intel
Logic
Memory Wire
Stanford University2015.04.15H.-S. Philip Wong11
Computing Chips TodayLimited to Two-Dimensional CircuitsComputation immersed in memory
N3XT Nanosystems
Stanford University2015.04.15H.-S. Philip Wong12
N3XT NanosystemsComputation immersed in memory
Stanford University2015.04.15H.-S. Philip Wong13
MemoryUltra-dense
vertical connections
N3XT NanosystemsComputation immersed in memory
Computing logic
Stanford University2015.04.15H.-S. Philip Wong14
MemoryUltra-dense
vertical connections
N3XT NanosystemsComputation immersed in memory
Impossible with today’s technologies
Computing logic
Stanford University2015.04.15H.-S. Philip Wong15
Many of these breakthroughs will require new kinds of nanoscale devices and materials integrated into three-dimensional systemsand may take a decade or more to achieve.
Stanford University2015.04.15H.-S. Philip Wong16
MemoryUltra-dense
vertical connections
N3XT NanosystemsComputation immersed in memory
Computing logic
Stanford University2015.04.15H.-S. Philip Wong17
“New” Memories
filament
oxygen ion
Top Electrode
Bottom Electrode
metaloxide
oxygen vacancy
Bottom Electrode
Top Electrode
oxide isolation
switching region
phase change material
filament
Bottom Electrode
solid electrolyte
Active Top Electrode
metal atoms
STT-MRAM CBRAMRRAMPCM
Spin torque transfer magnetic random access memory
Phase change memory
Resistive switching
random access memory
Conductive bridge random access memory
Soft Magnet
Pinned Magnet
tunnel barrier (oxide)
current
Random access, non-volatile, no erase before write, on-chip integration
internal gate
HfZrO2
p-Si
n+ n+
HfO2
top gate
FERAM
Ferro-electric random access
memory
Stanford University2015.04.15H.-S. Philip Wong18
Memory Hierarchy
CPU cycles 1 10 - 100 >100 106 - 108 >109
Speed 0.1’s ns 1 - 10’s ns 10’s -100’s ns 0.1-10’s ms > 100’s ms
100 M G T PSize (bytes) K
Main Memory(DRAM)
Tertiary Storage
(Disk, RAID)
Registers [R]L1 Instruction Cache (SRAM) [L1I]
L1 Data Cache (SRAM) [L1D]L2 Cache (SRAM) [L2]
L3 Cache (eDRAM) [L3]R L1I
L1DSecondary
Storage(Flash, Disk)
Core 1
Interco
nn
ect
L2 Cache
Peripheral controller
Mem
ory
Co
ntro
ller(s)
……
L2 Cache
L3 Cache (eDRAM)
Core N
L2
L3
Must change drastically in the coming decade
H.-S. P. Wong, S. Salahuddin, Nature Nanotech (2015)
Stanford University2015.04.15H.-S. Philip Wong19
STT-MRAM
CPU cycles 1 10 - 100 >100 106 - 108 >109
Speed 0.1’s ns 1 - 10’s ns 10’s -100’s ns 0.1-10’s ms > 100’s ms
100 M G T PSize (bytes) K
Main Memory(DRAM)
Tertiary Storage
(Disk, RAID)R L1I
L1D
Secondary Storage
(Flash, Disk)
L2L3
Soft Magnet
Pinned Magnet
tunnel barrier (oxide)
current
You already know: VPROG ≤ 0.5V, VDD = 1.2 – 1.5 V A few – 10’s of ns read/write Write current ~ 10’s μA (F<45 nm) “infinite” endurance 1T1MTJ, > 6F2
Scalable to 20 nm (expt.)
Stanford University2015.04.15H.-S. Philip Wong20
Embedded STT-MRAM Cache
H. Noguchi et al., “A 3.3ns-Access-Time 71.2μW/MHz 1Mb Embedded STT-MRAM Using Physically Eliminated Read-Disturb Scheme and Normally-Off Memory Architecture,” ISSCC, pp. 136 – 137 (2015) [Toshiba]
Stanford University2015.04.15H.-S. Philip Wong21
Why has STT-MRAM not arrived?
Maybe I will be proven wrong soon …
Source: Intel
Stanford University2015.04.15H.-S. Philip Wong22
Homework for STT-MRAM Folks
Device technology
Current (10’s μA) too high
Spin Hall effect promises to reduce current by 10X
Scalability demonstrated to 20 nm so far
Manufacturing
Not ready at the same time as logic (cf. SRAM)
MTJ – stacks of 16 layers , 4Å thick
Deposited and ion milled across 300 mm wafer
Does not survive 400°C BEOL fab temperature
(# = nm)
Stanford University2015.04.15H.-S. Philip Wong23
STT-MRAM Design Opportunity
TkE BB /Thermal stability,
expoRetention time,
Bco EI Write current,
10-year retention: 40
Large array 60
But we don’t need 10 year retention for cache and main memory!
Write current can be significantly reduced
Stanford University2015.04.15H.-S. Philip Wong24
RRAM – Fab Friendly, “Easy” to Make
CPU cycles 1 10 - 100 >100 106 - 108 >109
Speed 0.1’s ns 1 - 10’s ns 10’s -100’s ns 0.1-10’s ms > 100’s ms
100 M G T PSize (bytes) K
Main Memory(DRAM)
Tertiary Storage
(Disk, RAID)R L1I
L1D
Secondary Storage
(Flash, Disk)
L2L3
You already know: 10 ns read/write, VPROG ~1 – 2 V Write current ~ nA – 10’s μA 1E12 cycles endurance @ device 1T1R, ~ 6F2
3D RRAM (similar to 3D NAND) Scalable to < 5 nm & smaller Fab-friendly, easy to embed
filament
oxygen ion
Top Electrode
Bottom Electrode
metaloxide
oxygen vacancy
TiN, Al
HfOx, TaOx, AlOx, AlOxNy
TiN
Stanford University2015.04.15H.-S. Philip Wong25
RRAM Integrated on CMOSRRAM
J. Provine et al., GOMAC, 2014
CMOS
Stanford University2015.04.15H.-S. Philip Wong26
RRAM FPGA Integration
FPGA Chip
Cell
A
A’
Cross-section View
M1
M2
M3
M4
M5
M6
A A’
PR
Ti
N-doped AlOx (RCF) / Al (BE)
Al (TE)
Al (M6)
SiO2
Programmable Resistor
Al
N-doped AlOx
Al
FPGA Tiles & Configuration Memory
Test
Decoder/Driver
Dec
od
er/D
rive
r
FPGA IO
FPGA IO
FPG
A IO
FPG
A IO
Y. Liauw et al., Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2012.
Stanford University2015.04.15H.-S. Philip Wong27
RRAM Is Scalable
Bi-layer TiOx (2.5nm) / HfOx (1.5nm)
Y. Wu et al., IEDM 2013.
Scalable: 10 nmScalable: 12 nm
Stanford University2015.04.15H.-S. Philip Wong28
RRAM Promises1012 cycles Multi-bit
VLSI’11 (Samsung)VLSI’12 (Samsung)
IEDM’12 (Stanford)
IPROG < 100 nA
VLSI’11 (Stanford)
40 nm
Pt
SiO2
SiO2
SiO2
Pt
TiN
HfOx
VRRAM
Bit-Cost Scalable
Stanford University2015.04.15H.-S. Philip Wong29
3D RRAM
40 nm
Pt
SiO2
SiO2
SiO2
Pt
TiN
HfOx
VRRAM
S. Yu, H.-Y. Chen et al., Symp. VLSI Tech. 2013
Stanford University2015.04.15H.-S. Philip Wong30
High Density 3D Memory
< 1 μA
1 – 2 V
5 ns
> 1G cycles
F = 5 nm
128 layers
64 Tb per chip
IEDM ’12, ’13, ’14VLSI ’13, ’14, ’16DATE ’15, Nature Comm ’15
Stanford University2015.04.15H.-S. Philip Wong31
https://nano.stanford.edu/stanford-memory-trends
Energy vs Speed @ Device Level
Nothing faster than STT-MRAM
Speed limited by physics
10-1
100
101
102
103
104
105
106
10-14
10-13
10-12
10-11
10-10
10-9
10-8
10-7
10-6
10-5
RRAM
CBRAM
PCM
STT-MRAM
Write
En
erg
y (
J)
Write Time (ns)
Stanford University2015.04.15H.-S. Philip Wong32
Write Energy Scaling @ Device Level
https://nano.stanford.edu/stanford-memory-trends
NANDSTT-MRAM Proportional to area
PCM: Proportional to area
RRAM & CBRAM: Material and resistance values
100
101
102
103
104
105
106
107
10-3
10-2
10-1
100
101
102
103
104
105
106
107
1 4 11 36 113 357 1128 35681 4 11 36 113 357 1128 3568
RRAM
CBRAM
PCM
STT-MRAM
Pro
gra
mm
ing
En
erg
y (
pJ)
Cell Area (nm2)
Equivalent Contact Diameter (nm)
Stanford University2015.04.15H.-S. Philip Wong33
Logic Switches for 10, 7, 5, 3, 1 … nm
ETSOI
NWFET = “nanowire FET”, ETSOI = “extremely-thin silicon-on-insulator”, CNFET = “carbon nanotube FET”
NWFETFinFET CNFET
d~1nmCNT
Thin channel body (electrostatics control) Minimize parasitic capacitance and resistance
Source: G. Hills, S. Mitra (Stanford)
Stanford University2015.04.15H.-S. Philip Wong34
Carbon Nanotube FET (CNFET)
10X EDP Benefit vs. silicon
Sub-10 nm node VLSI circuits
EDP: “energy-delay product”
2 µm
Gate
2 µm
Gate
LG<10nm
d~1.2nm
CNT
Source: G. Hills, S. Mitra (Stanford)
Stanford University2015.04.15H.-S. Philip Wong35
CNFET: ~10X EDP vs. Si-CMOS
Nodes: 22 … 14 ... 11… 8 ... 5 (FinFET)
20 W/cm2
100 W/cm2
Nodes: 8 .. 5 (CNFET)
Performance (1015 transitions/sec.)
2
1.6
1.2
0.8
0.4
0
Energ
y (
fJ/t
ransitio
n)
0 4 8 12 16
x3
x1/3
IBM Power 7 modeled
Source: D. Frank and W.
Haensch, IBM (IEDM Short
Course 2012)
Stanford University2015.04.15H.-S. Philip Wong36
CNT Computer
Instruction FetchData Fetch ALU Write-back
Instruction Fetch Data Fetch Write BackArithmetic Block
M. Shulaker et al., Nature 13
Stanford University2015.04.15H.-S. Philip Wong37
CNT Computer
Instruction FetchData Fetch ALU Write-back
M. Shulaker et al., Nature 13
Stanford University2015.04.15H.-S. Philip Wong38
CNFET Enables Monolithic 3D
CNT transfer decouples high temperature growth
High-temperature CNT growth
900 °C
2 µm
CNT transfer
Low-temperature circuit fabrication
120 °C
CNTs
Growth
2 µm
Post-transfer
N. Patil, Symp. VLSI Tech. 08
Stanford University2015.04.15H.-S. Philip Wong39
Temperature < 250 oC
VO
UT
(V)
VIN (V)
Inter-layer logic 3-Layer integration Memory on logic
RRAM
Layer 1Layer 2
CNFETs
Conventional
vias, no TSVs
CNFET: Monolithic 3D Fabrication Flow
CNT transfer
Imperfection-immune circuit
Passivation
Inter-layer vias &
interconnects
H. Wei, IEDM 09, IEDM 13
Stanford University2015.04.15H.-S. Philip Wong40
2D → 3D → 2D Structure, 2D → 3D Integration
FinFET NWFET ETSOI CNFET
5nm
[Tsutsui IEDM 05]
100nm
10nm8nm
[Intel VLSI 15] [IBM VLSI 15]
Source: G. Hills, S. Mitra (Stanford)
[Shulaker IEDM 14]
Stanford University2015.04.15H.-S. Philip Wong41
Computation Immersed in Memory
thermal
thermal
MRAMQuick access
3D Resistive RAMMassive storage
Not TSV
thermal
1D CNFET, 2D FETCompute, RAM access
1D CNFET, 2D FETCompute, RAM access
1D CNFET, 2D FETCompute, Power,
Clock
Silicon compatible
Ultra-dense, fine-grained
vias
Stanford University2015.04.15H.-S. Philip Wong42
Aly et al., IEEE Computer, 2015
Stanford University2015.04.15H.-S. Philip Wong43
N3XT Architecture
Full physical design
–Standard tool flow
Cores: CNFET
Cache: STTRAMCNFET access
Memory control: CNFET
Memory: RRAM CNFET access
2 Million ILVs
Source: M. Sabry Aly, S. Mitra (Stanford)
Stanford University2015.04.15H.-S. Philip Wong44
N3XT: 850× EDP Benefit
M. Aly…S. Mitra, IEEE Computer (2015)
Same power density (67 W/cm2) and peak temperature 63°C
Exec. Time: 23XEnergy: 37X
IBM graph
analytics
Data-intensive
computing
PageRank app.
0%
20%
40%
60%
80%
100%
2D N3XT
0%
20%
40%
60%
80%
100%
2D N3XT
2.7% 4.3%
Energy: 37X Exec. Time: 23X
Stanford University2015.04.15H.-S. Philip Wong45
N3XT: 850× EDP Benefit
M. Aly…S. Mitra, IEEE Computer (2015)
0%
20%
40%
60%
80%
100%
2D N3XT0%
20%
40%
60%
80%
100%
2D N3XT
Processor active Processor idle Memory access
0%
1%
2%
3%
0%
3%
5%
Exec. Time: 23XEnergy: 37X
IBM graph
analytics
Data-intensive
computing
PageRank app.
Stanford University2015.04.15H.-S. Philip Wong46
Nanosystems
46
Nano-Engineered Computing Systems Technology
Nanoscalecooling
Abundantdataapps
1D / 2D FETs,
RRAM, mRAM
Architecture&
software
Yield, reliability
New3D
fabrication
End-to-end approach
[M. Aly et al., IEEE Computer ‘15]
0 1 0 1
Stanford University2015.04.15H.-S. Philip Wong47
CNT computing logic
Can Build Nanosystems Today
47
Classification accelerator
>2 Million carbon nanotube FETs, 1 Mbit Resistive RAM
[M. Shulaker et al., submitted ‘16]
Stanford University2015.04.15H.-S. Philip Wong48
CNT computing logic
Can Build Nanosystems Today
48
Memory1 Megabit RRAM
[M. Shulaker et al., submitted ‘16]
Stanford University2015.04.15H.-S. Philip Wong49
CNT computing logic
Can Build Nanosystems Today
49
Memory
Millions of sensorsCNTs
X100,000
[M. Shulaker et al., submitted ‘16]
Stanford University2015.04.15H.-S. Philip Wong50
CNT computing logic
Can Build Nanosystems Today
50
Memory
Millions of sensors
Ultra-densevertical connections
[M. Shulaker et al., submitted ‘16]
Stanford University2015.04.15H.-S. Philip Wong51
CNT computing logic
Can Build Nanosystems Today
51
Memory
Millions of sensors
Ultra-densevertical connections
Abundant data: Terabytes / second
[M. Shulaker et al., submitted ‘16]
Stanford University2015.04.15H.-S. Philip Wong52
CNT computing logic
Can Build Nanosystems Today
Memory
Millions of sensors
Ultra-densevertical connections
Abundant data: Terabytes / second
In-situ classification:Extensive, accurate
[M. Shulaker et al., submitted ‘16]
Stanford University2015.04.15H.-S. Philip Wong53 Source: Subhasish Mitra, Chris Ré
Stanford University2015.04.15H.-S. Philip Wong54
Hardware Software
Today
Tomorrow
The DayAfter Tomorrow
Commodityhardware
CGRAs
N3XTBrain
network
Advanced image
analysis
Automaticvideo
annotation
Source: Subhasish Mitra, Chris Ré
Stanford University2015.04.15H.-S. Philip Wong55
0 1 0 1
Stanford University2015.04.15H.-S. Philip Wong56
MemoryUltra-dense
vertical connections
N3XT NanosystemsComputation immersed in memory
Computing logic
0 1 0 1
Stanford University2015.04.15H.-S. Philip Wong57
These nanotechnology innovations will have to be developed in close coordination with new computer architectures, and will likely be informed by our growing understanding of the brain—a remarkable, fault-tolerant system that consumes less power than an incandescent light bulb.
Stanford University2015.04.15H.-S. Philip Wong58
Goal: Energy Efficiency of the Brain
Learning algorithms
Image sources: stackoverflow
Stanford University2015.04.15H.-S. Philip Wong59
Goal: Energy Efficiency of the Brain
Learning algorithms
Biological modelsImage sources: stackoverflow, Stanford
Pre-synaptic neuron
Post-synaptic neuron
Pre-spike
Synapse
Post-spike
Axon
hillock
Δt < 0 Δt > 0
pre
post
pre
post
-100 -50 0 50 100-60
-40
-20
0
20
40
60
80
100
120
Syn
ap
tic w
eig
ht
ch
an
ge
w (
%)
Spike timing t (ms)
(a)
(b) (c)
Stanford University2015.04.15H.-S. Philip Wong60
Goal: Energy Efficiency of the Brain
Learning algorithms Electronic implementation
Biological modelsImage sources: stackoverflow, Stanford, IBM
Pre-synaptic neuron
Post-synaptic neuron
Pre-spike
Synapse
Post-spike
Axon
hillock
Δt < 0 Δt > 0
pre
post
pre
post
-100 -50 0 50 100-60
-40
-20
0
20
40
60
80
100
120
Syn
ap
tic w
eig
ht
ch
an
ge
w (
%)
Spike timing t (ms)
(a)
(b) (c)
Stanford University2015.04.15H.-S. Philip Wong61
Goal: Energy Efficiency of the Brain
Learning algorithms Electronic implementation
Biological models
poly c-GSTpoly c-GSTpoly c-GST
amorphousamorphous
SiO2 SiO2 SiO2TiN TiN TiN
TiNTiNTiN
fully set state partially reset state fully reset state
Nanoscale electronic synapseImage sources: stackoverflow, Stanford, IBM
Pre-synaptic neuron
Post-synaptic neuron
Pre-spike
Synapse
Post-spike
Axon
hillock
Δt < 0 Δt > 0
pre
post
pre
post
-100 -50 0 50 100-60
-40
-20
0
20
40
60
80
100
120
Syn
ap
tic w
eig
ht
ch
an
ge
w (
%)
Spike timing t (ms)
(a)
(b) (c)
Stanford University2015.04.15H.-S. Philip Wong62
Application Hardware used
Estimated
power
consumption
Emulating 4.5% of human brain:1013
synapses, 109 neurons
Blue Gene/P:
36,864 nodes,
147,456 cores
2.9 MW
(LINPACK)
Deep sparse autoencoder:
109 synapses, 10M images
1,000 CPUs
(16,000 cores)
~100 kW
(cores only)
Convolutional neural net with 60M
synapses, 650K neurons
2 GPUs 1,200 W
Restricted Boltzmann Machine:
28M synapses; 69,888 neurons
GPU 550 W
CPU 65 W
Processing 1 s of speech using deep
neural network
GPU 238 W
CPU (4 cores) 80 W
Larg
e s
cale
Sm
all
to
mo
dera
te s
cale
Scale Up Requires Energy Efficiency
S. B. Eryilmaz et al., IEDM 2015
Stanford University2015.04.15H.-S. Philip Wong63
Custom Hardware for Energy Efficient AI
E. Lee et al, ISSCC 2016Y.-H. Chen et al, ISSCC 2016 S. Han et al., arXiv 2016
Inference in CNN hardware280 mW for AlexNet[MIT]
Inference in deep compressed networks600 mW for AlexNet[Nvidia/Stanford]
Matrix-vector multiplication with switch caps8 TOPS/W @ 2.5 GHz[Stanford]
Stanford University2015.04.15H.-S. Philip Wong64
Approaches of Neuromorphic Hardware
Stanford University2015.04.15H.-S. Philip Wong65
Approaches of Neuromorphic HardwareB
iolo
gy-b
ase
d
mo
de
ls /
al
gori
thm
s
Co
nve
nti
on
al
ML
algo
rith
ms
Stanford University2015.04.15H.-S. Philip Wong66
Approaches of Neuromorphic Hardware
Conventional hardware (CPU, GPU, supercomputers, etc)
Neuromorphic hardware
Stanford University2015.04.15H.-S. Philip Wong67
Approaches of Neuromorphic Hardware
Conventional hardware (CPU, GPU, supercomputers, etc)
with analog non-volatile memory synapses
Neuromorphic hardware
Stanford University2015.04.15H.-S. Philip Wong68
Approaches of Neuromorphic HardwareB
iolo
gy-b
ase
d
mo
de
ls /
al
gori
thm
sConventional hardware (CPU, GPU, supercomputers, etc)
with analog non-volatile memory synapses
Neuromorphic hardware
Brain emulation on BlueGene [7]
HTM [3]
Stanford University2015.04.15H.-S. Philip Wong69
Approaches of Neuromorphic HardwareB
iolo
gy-b
ase
d
mo
de
ls /
al
gori
thm
s
Co
nve
nti
on
al
ML
algo
rith
ms
Conventional hardware (CPU, GPU, supercomputers, etc)
with analog non-volatile memory synapses
Neuromorphic hardware
Brain emulation on BlueGene [7]
HTM [3]
“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]
Stanford University2015.04.15H.-S. Philip Wong70
Approaches of Neuromorphic HardwareB
iolo
gy-b
ase
d
mo
de
ls /
al
gori
thm
s
Co
nve
nti
on
al
ML
algo
rith
ms
Conventional hardware (CPU, GPU, supercomputers, etc)
with analog non-volatile memory synapses
Neuromorphic hardware
Brain emulation on BlueGene [7]
HTM [3]
“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]
Human Brain Project [20]
TrueNorth [16]
SpiNNaker [19]
Stanford University2015.04.15H.-S. Philip Wong71
Approaches of Neuromorphic HardwareB
iolo
gy-b
ase
d
mo
de
ls /
al
gori
thm
s
Co
nve
nti
on
al
ML
algo
rith
ms
Conventional hardware (CPU, GPU, supercomputers, etc)
with analog non-volatile memory synapses
Neuromorphic hardware
Brain emulation on BlueGene [7]
HTM [3]
“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]
Human Brain Project [20]
TrueNorth [16]
SpiNNaker [19]
Hebbian learningSpike-based ANN
PCM, RRAM, CBRAM
Stanford University2015.04.15H.-S. Philip Wong72
Approaches of Neuromorphic HardwareB
iolo
gy-b
ase
d
mo
de
ls /
al
gori
thm
s
Co
nve
nti
on
al
ML
algo
rith
ms
Conventional hardware (CPU, GPU, supercomputers, etc)
with analog non-volatile memory synapses
Neuromorphic hardware
Brain emulation on BlueGene [7]
HTM [3]
“Cats on YouTube” ANNs: ConvNets, DNNs, DBNs [10-13]
Human Brain Project [20]
TrueNorth [16]
SpiNNaker [19]
Hebbian learningSpike-based ANN
PCM, RRAM, CBRAM
ANN, RBM, sparse learning
PCM, RRAM
Stanford University2015.04.15H.-S. Philip Wong74
Today’s “Large” Scale Architectures
IFAT+HiAER (UCSD) Neurogrid (Stanford)
TrueNorth (IBM)
Digital neurons Mesh-based routing
Analog neurons Tree routing
Analog neurons Tree routing
SpiNNaker (U of Manchester)
Digital neurons Mesh routing
Stanford University2015.04.15H.-S. Philip Wong75
Synapses Implemented Today
IFAT+HiAER (UCSD) Neurogrid (Stanford)
TrueNorth (IBM) SpiNNaker (U of Manchester)
DRAM (off-chip)
SRAM (on-chip)
DRAM (off-chip)
SRAM (on-chip)+DRAM (off-chip)
Energetically expensive Refresh Off-chip access
Scalability?
Area inefficient
Stanford University2015.04.15H.-S. Philip Wong76
Non-Volatile Memory (NVM)
D. Kuzum et al., Nano Lett. 2013, Y. Wu et al., IEDM 2013; A. Calderoni et al., IMW 2014
Metal oxide resistive switching memory (RRAM)
25nm
12 nm
TiOx/
HfOx
TiN
10 nm
SiO2
filament
oxygen ion
Top Electrode
Bottom Electrode
metaloxide
oxygen vacancy
poly c-GSTpoly c-GSTpoly c-GST
amorphousamorphous
SiO2 SiO2 SiO2TiN TiN TiN
TiNTiNTiN
fully set state partially reset state fully reset state
Phase change memory (PCM)
Bottom Electrode
Top Electrode
oxide isolation
switching region
phase change material
Conductive bridge memory (CBRAM)
filament
Bottom Electrode
solid electrolyte
Active Top Electrode
metal atoms
Cu ionElectrolyte
Bottom electrode
Stanford University2015.04.15H.-S. Philip Wong77
Non-Volatile Memory (NVM) → Synapse
Analog programmable
Scalable to a few nm
Stack in 3D
D. Kuzum et al., Nano Lett. 2013, Y. Wu et al., IEDM 2013; A. Calderoni et al., IMW 2014
Metal oxide resistive switching memory (RRAM)
25nm
12 nm
TiOx/
HfOx
TiN
10 nm
SiO2
poly c-GSTpoly c-GSTpoly c-GST
amorphousamorphous
SiO2 SiO2 SiO2TiN TiN TiN
TiNTiNTiN
fully set state partially reset state fully reset state
Phase change memory (PCM)
Conductive bridge memory (CBRAM)
Cu ionElectrolyte
Bottom electrode
Stanford University2015.04.15H.-S. Philip Wong78
0 500 1000 1500 2000 250010
3
104
105
Re
sis
tan
ce
(O
hm
)
Pulse Number2100 2200 2300
103
104
105
Resis
tan
ce (
Oh
m)
Pulse Number
100-step grey scale (1% resolution)
Partial RESET
Partial SET
Phase change synapse
Nanoscale Memory as Synaptic Weights
Synaptic updates in the brain: basis for learningRequirement: analog resistance change
D. Kuzum et al., Nano Lett., p. 2179 (2012)
Stanford University2015.04.15H.-S. Philip Wong79
Gradual Resistance Change
Larger pulse amplitude Fewer states
S. Yu et al. Adv. Mater. vol. 25, pp. 1774-1779, 2013
filament
oxygen ion
Top Electrode
Bottom Electrode
metaloxide
oxygen vacancy
RRAM
Stanford University2015.04.15H.-S. Philip Wong80
Nanoscale Memory Can Emulate Biological Synaptic Behavior
-60 -40 -20 0 20 40 60
-40
0
40
80
120
=-11.3
=-18.6
=-29
=11
=20.7
LTP-1
LTP-2
LTP-3
LTD-1
LTD-2
LTD-3
Sy
na
pti
c w
eig
ht
ch
an
ge
w
(%
)
Spike timing t (ms)
=29.5
Various time constants
-50 0 50
-50
0
50
100
w
(%
)
t (ms)-50 0 50
-50
0
50
100
w
(%
)
t (ms)
-50 0 50
-50
0
50
100
w
(%
)
t (ms)-50 0 50
-50
0
50
100
w
(%
)
t (ms)
Various STDP kernels
0 20 40 60 80 100
0
20
40
60
80
100
120
140 10 ms 25 ms
15 ms 35 ms
20 ms 45 ms
Syn
ap
tic
Weig
ht
Ch
an
ge
w
(%
)
Number of pre/post spike pairs
Weight update saturation
STDP (spike-timing-dependent plasticity)
D. Kuzum et al., Nano Lett., p. 2179 (2012)
Stanford University2015.04.15H.-S. Philip Wong81
Nanoscale Memory Can Emulate Biological Synaptic Behavior
-60 -40 -20 0 20 40 60
-40
0
40
80
120
=-11.3
=-18.6
=-29
=11
=20.7
LTP-1
LTP-2
LTP-3
LTD-1
LTD-2
LTD-3
Sy
na
pti
c w
eig
ht
ch
an
ge
w
(%
)
Spike timing t (ms)
=29.5
Various time constants
-50 0 50
-50
0
50
100
w
(%
)
t (ms)-50 0 50
-50
0
50
100
w
(%
)
t (ms)
-50 0 50
-50
0
50
100
w
(%
)
t (ms)-50 0 50
-50
0
50
100
w
(%
)
t (ms)
Various STDP kernels
0 20 40 60 80 100
0
20
40
60
80
100
120
140 10 ms 25 ms
15 ms 35 ms
20 ms 45 ms
Syn
ap
tic
Weig
ht
Ch
an
ge
w
(%
)
Number of pre/post spike pairs
Weight update saturation
STDP (spike-timing-dependent plasticity)
D. Kuzum et al., Nano Lett., p. 2179 (2012)
Stanford University2015.04.15H.-S. Philip Wong82
Nanoscale Memory Can Emulate Biological Synaptic Behavior
-60 -40 -20 0 20 40 60
-40
0
40
80
120
=-11.3
=-18.6
=-29
=11
=20.7
LTP-1
LTP-2
LTP-3
LTD-1
LTD-2
LTD-3
Sy
na
pti
c w
eig
ht
ch
an
ge
w
(%
)
Spike timing t (ms)
=29.5
Various time constants
-50 0 50
-50
0
50
100
w
(%
)
t (ms)-50 0 50
-50
0
50
100
w
(%
)
t (ms)
-50 0 50
-50
0
50
100
w
(%
)
t (ms)-50 0 50
-50
0
50
100
w
(%
)
t (ms)
Various STDP kernels
0 20 40 60 80 100
0
20
40
60
80
100
120
140 10 ms 25 ms
15 ms 35 ms
20 ms 45 ms
Syn
ap
tic
Weig
ht
Ch
an
ge
w
(%
)
Number of pre/post spike pairs
Weight update saturation
STDP (spike-timing-dependent plasticity)
D. Kuzum et al., Nano Lett., p. 2179 (2012)
Stanford University2015.04.15H.-S. Philip Wong83
Nanoscale Memory Can Emulate Biological Synaptic Behavior
-60 -40 -20 0 20 40 60
-40
0
40
80
120
=-11.3
=-18.6
=-29
=11
=20.7
LTP-1
LTP-2
LTP-3
LTD-1
LTD-2
LTD-3
Sy
na
pti
c w
eig
ht
ch
an
ge
w
(%
)
Spike timing t (ms)
=29.5
Various time constants
-50 0 50
-50
0
50
100
w
(%
)
t (ms)-50 0 50
-50
0
50
100
w
(%
)
t (ms)
-50 0 50
-50
0
50
100
w
(%
)
t (ms)-50 0 50
-50
0
50
100
w
(%
)
t (ms)
Various STDP kernels
0 20 40 60 80 100
0
20
40
60
80
100
120
140 10 ms 25 ms
15 ms 35 ms
20 ms 45 ms
Syn
ap
tic
Weig
ht
Ch
an
ge
w
(%
)
Number of pre/post spike pairs
Weight update saturation
STDP (spike-timing-dependent plasticity)
D. Kuzum et al., Nano Lett., p. 2179 (2012)
Stanford University2015.04.15H.-S. Philip Wong84
Stochastic Weight Update
Program memory close to switching threshold
Emulate grey scale
Escape local minima in gradient descent
SET success probability: 14% @ VSET = 1.3 V 40% @ VSET = 1.6 V 76% @ VSET = 1.9 V
0 20 40 60 80 100
100
1k
10k
100k
1M
SET success probability ~14%
SET +1.3V/10ns
RESET -1.7V/10ns
Res
ista
nc
e (
)
Cycle (#)
0 20 40 60 80 100
100
1k
10k
100k
1M
SET success probability ~40%
SET +1.6V/10ns
RESET -1.7V/10ns
Resis
tan
ce (
)
Cycle (#)
0 20 40 60 80 100
100
1k
10k
100k
1M
Resis
tan
ce (
)Cycle (#)
SET +1.9V/10ns
RESET -1.7V/10ns
SET success probability ~76%SETRESET
S. Yu et al., Frontiers of Neuroscience, 2013
Stanford University2015.04.15H.-S. Philip Wong85
Stochastic Weight Update
Program memory close to switching threshold
Emulate grey scale
Escape local minima in gradient descent
SET success probability: 14% @ VSET = 1.3 V 40% @ VSET = 1.6 V 76% @ VSET = 1.9 V
0 20 40 60 80 100
100
1k
10k
100k
1M
SET success probability ~14%
SET +1.3V/10ns
RESET -1.7V/10ns
Res
ista
nc
e (
)
Cycle (#)
0 20 40 60 80 100
100
1k
10k
100k
1M
SET success probability ~40%
SET +1.6V/10ns
RESET -1.7V/10ns
Resis
tan
ce (
)
Cycle (#)
0 20 40 60 80 100
100
1k
10k
100k
1M
Resis
tan
ce (
)Cycle (#)
SET +1.9V/10ns
RESET -1.7V/10ns
SET success probability ~76%SETRESET
S. Yu et al., Frontiers of Neuroscience, 2013
Stanford University2015.04.15H.-S. Philip Wong86
Stochastic Weight Update
Program memory close to switching threshold
Emulate grey scale
Escape local minima in gradient descent
SET success probability: 14% @ VSET = 1.3 V 40% @ VSET = 1.6 V 76% @ VSET = 1.9 V
0 20 40 60 80 100
100
1k
10k
100k
1M
SET success probability ~14%
SET +1.3V/10ns
RESET -1.7V/10ns
Res
ista
nc
e (
)
Cycle (#)
0 20 40 60 80 100
100
1k
10k
100k
1M
SET success probability ~40%
SET +1.6V/10ns
RESET -1.7V/10ns
Resis
tan
ce (
)
Cycle (#)
0 20 40 60 80 100
100
1k
10k
100k
1M
Resis
tan
ce (
)Cycle (#)
SET +1.9V/10ns
RESET -1.7V/10ns
SET success probability ~76%SETRESET
S. Yu et al., Frontiers of Neuroscience, 2013
Stanford University2015.04.15H.-S. Philip Wong87
RRAM Stochastic Weight Update
1.4 1.5 1.6 1.7 1.810
100
Pulse Height (V)
Pu
lse W
idth
(n
s)
4.0
16
29
41
54
66
78
91
100
SET Probability (%) 100
0
Stanford University2015.04.15H.-S. Philip Wong88
3D VRRAM
3D NN
Vertical
logic nv
-PE
RM
nv-M
ajo
rity
nv
-NO
T
nv
-AD
D
nv
-MU
LT
IPLY
nv
-OR
nv
-AN
D
nv
-Ha
mm
ing
nv
-NA
ND
nv
-NO
R
nv
-XO
R
nv
-XN
OR
In-memory logic kernels
Input
neurons
Output
neurons
4th-layer VRRAM
3rd-layer VRRAM
2nd-layer VRRAM
1st-layer VRRAM
RRAM synapses
Vertical RRAM In-Memory Computing
Truly exploit the 3rd dimension
VRRAM: vertical RRAM; NN: neural network
H. Li et al., Symp. VLSI Tech. 2016
Stanford University2015.04.15H.-S. Philip Wong89
In-Memory Logic Kernels for Hyper-Dimensional* Computing
Input
data
XOR
One-shot learning
Learned
information
Inference
Cognitive tasks
In-memory
logic kernels
Energy efficient
Error resilient
[*P. Kanerva, Cog. Comput.’09]
H. Li et al., Symp. VLSI Tech. 2016
Stanford University2015.04.15H.-S. Philip Wong90
In-Memory Logic Kernel: Bit Permutation
Bit permutation is simple and efficient
VDD
VDD
VDD
gnd
gnd
gnd
L1
L2
L3
L4
0
0
0
1
0
0
1
0RESET
1
0
0
0
RESET
0
1
0
0
RESET
LRS
(~10kΩ)
HRS
(100-250kΩ)
Measured resistance states
1 0
Bit permutation
along vertical pillars
L1
L2
L3
L4
H. Li et al., Symp. VLSI Tech. 2016
Stanford University2015.04.15H.-S. Philip Wong91
192 synapses with RRAMs: Hebbian learning
S. Park et al., Scientific Reports, 2015
Array Level Experimental Demonstrations165,000 PCM synapses: Gradient based backpropagation
G. Burr et al., IEDM 2014
100 PCM synapses: Biological Hebbian learning rule
PCM
S. B. Eryilmaz et al., IEDM 2013
Stanford University2015.04.15H.-S. Philip Wong92
192 synapses with RRAMs: Hebbian learning
S. Park et al., Scientific Reports, 2015
Array Level Experimental Demonstrations165,000 PCM synapses: Gradient based backpropagation
G. Burr et al., IEDM 2014
100 PCM synapses: Biological Hebbian learning rule
PCM
S. B. Eryilmaz et al., IEDM 2013
Stanford University2015.04.15H.-S. Philip Wong93
Array Level Experimental Demonstrations165,000 PCM synapses: Gradient based backpropagation
G. Burr et al., IEDM 2014
100 PCM synapses: Biological Hebbian learning rule
PCM
S. B. Eryilmaz et al., IEDM 2013
192 synapses with RRAMs: Hebbian learning
S. Park et al., Scientific Reports, 2015
Stanford University2015.04.15H.-S. Philip Wong94
Restricted Boltzmann Machine with Resistive Phase Change Memory Synapses
Δw ∝ <vh>data - <vkhk>reconstruction
Hidden nodes = 0 (bj, j=0:4)
Visible nodes= 0 (ai, i=0:8)
Pairwise weights (wij) : train with 2-PCM synapses
Input pixels
S. B. Eryilmaz et al., submitted 2016
Stanford University2015.04.15H.-S. Philip Wong95
Contrastive Divergence with Resistive Phase Change Memory Synapses
Number of pulses
Resis
tance (
MΩ
)
Number of pulses
Resis
tance (
Ω)
S. B. Eryilmaz et al., submitted 2016
Stanford University2015.04.15H.-S. Philip Wong96
Mapping Contrastive Divergence onto Phase Change Memory Array
PCM
WL
TE
S. B. Eryilmaz et al., submitted 2016
Stanford University2015.04.15H.-S. Philip Wong97
Inference After Training
S. B. Eryilmaz et al., submitted 2016
Stanford University2015.04.15H.-S. Philip Wong98
5 10 150
0.2
0.4
0.6
Error Rate vs Number of Patterns Stored
Number of patterns stored
Err
or
rate
Before training
S. B. Eryilmaz et al., submitted 2016
Stanford University2015.04.15H.-S. Philip Wong99
5 10 150
0.2
0.4
0.6
Error Reduces After Training
Number of patterns stored
Err
or
rate
Before training
5 iteration (PCM syn.)
S. B. Eryilmaz et al., submitted 2016
Stanford University2015.04.15H.-S. Philip Wong100
5 10 150
0.2
0.4
0.6
Error Reduces with Further Training
Number of patterns stored
Err
or
rate
Before training
5 iteration (PCM syn.)
10 iteration (PCM syn.)
S. B. Eryilmaz et al., submitted 2016
Stanford University2015.04.15H.-S. Philip Wong101
5 10 150
0.2
0.4
0.6
Error Reduction Saturates with Training
Number of patterns stored
Err
or
rate
Before training
5 iteration (PCM syn.)
10 iteration (PCM syn.)
30 iteration (PCM syn.)
S. B. Eryilmaz et al., submitted 2016
Stanford University2015.04.15H.-S. Philip Wong102
5 10 150
0.2
0.4
0.6
Saturation: PCM Synapse Starts to Unlearn
Number of patterns stored
Err
or
rate
70 iteration (PCM syn.)
Before training
5 iteration (PCM syn.)
10 iteration (PCM syn.)
30 iteration (PCM syn.)
S. B. Eryilmaz et al., submitted 2016
Stanford University2015.04.15H.-S. Philip Wong103
5 10 150
0.2
0.4
0.6
“Precise” Synapse: Error Continues to Reduce
Number of patterns stored
Err
or
rate
70 iteration (PCM syn.)
Before training
5 iteration (PCM syn.)
10 iteration (PCM syn.)
30 iteration (PCM syn.)
70 iteration (64-bit syn.)
S. B. Eryilmaz et al., submitted 2016
Stanford University2015.04.15H.-S. Philip Wong104
Energy Consumption per Epoch (Synapses only)
PCM
hardware
Conventional
hardware*
910 nJ(430 nJ logic, 480 nJ memory)
6.1 nJ
*Energy estimate of Intel Xeon Phi coprocessor (22 nm)
S. B. Eryilmaz et al., submitted 2016
Stanford University2015.04.15H.-S. Philip Wong105
Open Research Questions1. Functionality → performance/Watt, performance/m2 →
variability → reliability
2. Scale up (system size), scale down (device size)
3. Role of variability (functionality, performance)
4. Fan-in / fan-out, hierarchical connections, power delivery
5. Low voltage (wire energy ≅ device energy)
6. Stochastic learning behavior → statistical learning rules
7. Meta-plasticity (internal state variables)
8. Timing as an internal variable
9. Learning rules: biological? AI?
10. Algorithm-device co-design
11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature
Stanford University2015.04.15H.-S. Philip Wong106
Open Research Questions1. Functionality → performance/Watt, performance/m2 →
variability → reliability
2. Scale up (system size), scale down (device size)
3. Role of variability (functionality, performance)
4. Fan-in / fan-out, hierarchical connections, power delivery
5. Low voltage (wire energy ≅ device energy)
6. Stochastic learning behavior → statistical learning rules
7. Meta-plasticity (internal state variables)
8. Timing as an internal variable
9. Learning rules: biological? AI?
10. Algorithm-device co-design
11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature
Stanford University2015.04.15H.-S. Philip Wong107
Open Research Questions1. Functionality → performance/Watt, performance/m2 →
variability → reliability
2. Scale up (system size), scale down (device size)
3. Role of variability (functionality, performance)
4. Fan-in / fan-out, hierarchical connections, power delivery
5. Low voltage (wire energy ≅ device energy)
6. Stochastic learning behavior → statistical learning rules
7. Meta-plasticity (internal state variables)
8. Timing as an internal variable
9. Learning rules: biological? AI?
10. Algorithm-device co-design
11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature
Stanford University2015.04.15H.-S. Philip Wong108
Open Research Questions1. Functionality → performance/Watt, performance/m2 →
variability → reliability
2. Scale up (system size), scale down (device size)
3. Role of variability (functionality, performance)
4. Fan-in / fan-out, hierarchical connections, power delivery
5. Low voltage (wire energy ≅ device energy)
6. Stochastic learning behavior → statistical learning rules
7. Meta-plasticity (internal state variables)
8. Timing as an internal variable
9. Learning rules: biological? AI?
10. Algorithm-device co-design
11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature
Stanford University2015.04.15H.-S. Philip Wong109
Open Research Questions1. Functionality → performance/Watt, performance/m2 →
variability → reliability
2. Scale up (system size), scale down (device size)
3. Role of variability (functionality, performance)
4. Fan-in / fan-out, hierarchical connections, power delivery
5. Low voltage (wire energy ≅ device energy)
6. Stochastic learning behavior → statistical learning rules
7. Meta-plasticity (internal state variables)
8. Timing as an internal variable
9. Learning rules: biological? AI?
10. Algorithm-device co-design
11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature
Stanford University2015.04.15H.-S. Philip Wong110
Open Research Questions1. Functionality → performance/Watt, performance/m2 →
variability → reliability
2. Scale up (system size), scale down (device size)
3. Role of variability (functionality, performance)
4. Fan-in / fan-out, hierarchical connections, power delivery
5. Low voltage (wire energy ≅ device energy)
6. Stochastic learning behavior → statistical learning rules
7. Meta-plasticity (internal state variables)
8. Timing as an internal variable
9. Learning rules: biological? AI?
10. Algorithm-device co-design
11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature
Stanford University2015.04.15H.-S. Philip Wong111
Open Research Questions1. Functionality → performance/Watt, performance/m2 →
variability → reliability
2. Scale up (system size), scale down (device size)
3. Role of variability (functionality, performance)
4. Fan-in / fan-out, hierarchical connections, power delivery
5. Low voltage (wire energy ≅ device energy)
6. Stochastic learning behavior → statistical learning rules
7. Meta-plasticity (internal state variables)
8. Timing as an internal variable
9. Learning rules: biological? AI?
10. Algorithm-device co-design
11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature
Stanford University2015.04.15H.-S. Philip Wong112
Open Research Questions1. Functionality → performance/Watt, performance/m2 →
variability → reliability
2. Scale up (system size), scale down (device size)
3. Role of variability (functionality, performance)
4. Fan-in / fan-out, hierarchical connections, power delivery
5. Low voltage (wire energy ≅ device energy)
6. Stochastic learning behavior → statistical learning rules
7. Meta-plasticity (internal state variables)
8. Timing as an internal variable
9. Learning rules: biological? AI?
10. Algorithm-device co-design
11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature
Stanford University2015.04.15H.-S. Philip Wong113
Open Research Questions1. Functionality → performance/Watt, performance/m2 →
variability → reliability
2. Scale up (system size), scale down (device size)
3. Role of variability (functionality, performance)
4. Fan-in / fan-out, hierarchical connections, power delivery
5. Low voltage (wire energy ≅ device energy)
6. Stochastic learning behavior → statistical learning rules
7. Meta-plasticity (internal state variables)
8. Timing as an internal variable
9. Learning rules: biological? AI?
10. Algorithm-device co-design
11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature
Stanford University2015.04.15H.-S. Philip Wong114
Open Research Questions1. Functionality → performance/Watt, performance/m2 →
variability → reliability
2. Scale up (system size), scale down (device size)
3. Role of variability (functionality, performance)
4. Fan-in / fan-out, hierarchical connections, power delivery
5. Low voltage (wire energy ≅ device energy)
6. Stochastic learning behavior → statistical learning rules
7. Meta-plasticity (internal state variables)
8. Timing as an internal variable
9. Learning rules: biological? AI?
10. Algorithm-device co-design
11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature
Stanford University2015.04.15H.-S. Philip Wong115
Open Research Questions1. Functionality → performance/Watt, performance/m2 →
variability → reliability
2. Scale up (system size), scale down (device size)
3. Role of variability (functionality, performance)
4. Fan-in / fan-out, hierarchical connections, power delivery
5. Low voltage (wire energy ≅ device energy)
6. Stochastic learning behavior → statistical learning rules
7. Meta-plasticity (internal state variables)
8. Timing as an internal variable
9. Learning rules: biological? AI?
10. Algorithm-device co-design
11. Materials/fabrication:monolithic 3D integration a must,MUST be low temperature
Stanford University2015.04.15H.-S. Philip Wong116
Focus Area: Computation for Data Analytics
Stanford University2015.04.15H.-S. Philip Wong117
Nano-EngineeredComputing Systems Technology
0 1 0 1
Aly et al., IEEE Computer, 2015
Stanford University2015.04.15H.-S. Philip Wong118
Collaborators
Gert CauwenberghsSiddharth Joshi Emre Neftci(UC San Diego)
Jinfeng Kang (Peking U)
Chung LamSangBum KimMatt Brightsky
K.S. Lee, J.M. Shieh, W.K. Yen… (NDL, Taiwan)
Stanford University2015.04.15H.-S. Philip Wong119
Students and Post-Docs
Stanford University2015.04.15H.-S. Philip Wong120
Sponsors
Non-Volatile Memory Technology Research Initiative
Stanford University2015.04.15H.-S. Philip Wong121
Stanford SystemX Alliance
Stanford University2015.04.15H.-S. Philip Wong122
Non-Volatile Memory Technology Research Initiative (NMTRI) @ Stanford University
Stanford University2015.04.15H.-S. Philip Wong123
End of Talk
Questions?