modeling and implementation of firing-rate neuromorphic...

Modeling and Implementation of Firing-Rate Neuromorphic-Network Classifiers with Bilayer Pt/Al2O3/TiO2-x/Pt Memristors

M. Prezioso1*, I. Kataeva2§, F. Merrikh-Bayat1, B. Hoskins1, G. Adam1, T. Sota2, K. Likharev3‡, and D. Strukov1†

1 UC Santa Barbara, Santa Barbara, CA 93106-9560, U.S.A. 2 Research Laboratories, DENSO CORP., 500-1 Minamiyama, Komenoki-cho, Nisshin, Japan 470-0111

3 Stony Brook University, Stony Brook, NY 11794-3800, U.S.A. email: *[email protected], §[email protected], †[email protected], ‡[email protected]

Abstract

Neuromorphic pattern classifiers were implemented, for the first time, using transistor-free integrated crossbar circuits with bilayer metal-oxide memristors. 10×6- and 10×8-crosspoint neuromorphic networks were trained in-situ using a Manhattan-Rule algorithm to separate a set of 3×3 binary images: into 3 classes using the batch-mode training, and into 4 classes using the stochastic-mode training, respectively. Simulation of much larger, multilayer neural network classifiers based on such technology has sown that their fidelity may be on a par with the state-of-the-art results for software-implemented networks.

Introduction

Deep-learning convolutional neural networks, which are essentially multilayer perceptrons (Fig. 1a) with restricted connectivity between some layers, have been demonstrated to achieve some of the best classification performances on a variety of benchmark tasks (1). The major challenge in building fast and energy-efficient networks of this type in hardware is performing efficient vector-by-matrix multiplication, which in turn requires compact implementation of synaptic weights (2).

CrossNet circuits have emerged as an efficient solution to these challenges (3). In such a network, neural cell bodies (somas) are mimicked with analog CMOS circuits, which communicate via passive crossbars with integrated tunable resistive devices ("memristors") (4-7), playing the role of synapses (8-12) – see Figs. 1b-e. Two main goals of this work were to demonstrate the first neural networks with integrated crossbar circuits, and evaluate possible performance of larger classifiers based on this emerging technology. (Some preliminary results were reported in (10).)

Memristive crossbar circuit

A 12×12 crossbar with 200-nm lines separated by 400-nm gaps (Fig. 2a), with a Ta/Pt/Al2O3/TiO2-x/Ti/Pt memristor at each crosspoint, was fabricated using a standard lift-off patterning. The Al2O3/TiO2-x stack was deposited by reactive sputtering, with titanium oxide stoichiometry controlled

precisely via the oxygen flow control. The thickness and stoichiometry were optimized to achieve low forming voltages (<2 V) and highly nonlinear I-V curves with a ~10 ratio of current values at the switching voltage (~1.5 V) and at a half of it (Fig. 2b). The most outstanding feature of such memristors is their low variability (Fig 2c); together with nonlinear I-V and low forming voltages it has enabled forming of most of the devices in crossbar array. Other important characteristics are the ~100 ON/OFF current ratio at ~0.1 V, a switching endurance of at least 5,000 cycles, an estimated retention of at least 10 years at room temperature, and operation currents between ~100 nA and ~100 μA (10). Using short (e.g., 500 μS) pulses makes both set and reset switching processes fairly continuous, enabling gradual tuning of device conductance with an at least 5-bit precision even using a very simple (suboptimal) feedback algorithm (11) – see Fig. 3. Such precision is already acceptable for some neural network applications (3, 12).

Experimental results

During classifier's operation (Figs. 1e, 4, 5a), the vector-by-matrix multiplication of the input signals (represented with voltages) by weights (represented by memristor conductances) is performed on the physical level, in analog domain, using Ohm’s and Kirchhoff’s laws, by applying the input voltages to crossbar's row lines and reading out the currents flowing into virtually grounded column lines (Figs. 1e, 5a). The training was performed in-situ in both the batch and stochastic modes, using the Manhattan-Rule algorithm (13) - see Fig. 4. This rule is convenient for crossbar circuit implementation, due to the use of only the sign information of the conventional Delta-Rule algorithm's result. The advantage of stochastic training is that the weight update for the whole crossbar (of any size) may be performed in just four steps by applying pulses in parallel to rows and columns of the crossbar (12) – see Fig. 5b. Namely, the weights are grouped into four sets, each corresponding to a particular combinations of signs of V(n) and δ(n), and the weight in each group are updated in parallel. On the contrary, in the batch mode the weights in different columns (or rows) have to be updated sequentially (Fig. 5c), so that the update time grows linearly with crossbar size. Additionally, the batch

mode training may come with a large area overhead when implemented on-chip, due to the need of computing and storing intermediate results for the weight update (14). Generally, device-to-device variations of the switching threshold present a significant challenge for the in-situ training, because exponential switching dynamics (7, 11) amplifies even slight threshold variations. Additionally, the change in conductance depends on the initial conductance of the device. In this context, the fact that we have been able to achieve successful convergence for both the batch and stochastic in-situ training (Fig. 6), even despite substantial device-to-device variations in switching dynamics (Fig. 7), is highly encouraging. The batch-mode training gave more stable convergence, and the update dynamics for stochastic training was very close to that in the software-implemented network (Fig. 8).

Simulation results

In another part of this work, an accurate, data-verified model of adaptation of our memristors (15) was used to simulate the performance of pattern classifiers, based on a large-scale fully connected multilayer perceptron and a deep-learning convolutional network, on several representative benchmarks (16), using both in-situ and ex-situ training. Similarly to the experimental results, the classification performance was worse for the stochastic Manhattan-Rule training (Table 1a). However, a simple “variable-amplitude” variation (Fig. 9) of the training scheme (Fig. 5b-c) allows an implementation of the more efficient Delta-Rule algorithm (Fig. 4b), which dramatically improves the stochastic-mode fidelity (Table 1) and achieves state-of-the-art performance for the batch training. In such variable-amplitude scheme, write voltages proportional to log[V(n)] and log[δ(n)], of specific polarity, are applied to the corresponding lines of the crossbar. Since the change of device conductance is roughly exponential in the applied voltage, this procedure results in weight update proportional to the product of δV, thus implementing the Delta Rule directly in the crossbar, without the need of its calculation in external hardware. The simulation results also show that the in-situ training is inherently robust to various network defects (Fig. 10), and that an 8-bit weight import at ex-situ training is sufficient to avoid classification fidelity degradation (Fig. 11).

Conclusions

In summary, we have experimentally demonstrated an artificial neural network using memristors integrated into a dense, transistor-free crossbar circuit. We believe that this demonstration is a significant step toward analog-hardware implementation of practically useful artificial neural networks. The simulation of such scaled-up networks, using a quantitatively verified model of our memristors, has shown

that their performance can be competitive to the state-of-the-art software implementations. Moreover, recent experiments (17) with similar but smaller (so far, discrete) memristors give hope that the metal-oxide memristor networks may be scaled down to at least 30-nm devices. According to theoretical estimates (3), such networks would enable CrossNets with an areal density higher than that of the human cerebral cortex, operating at much higher speed and with comparable energy efficiency.

Acknowledgements

This work was supported by AFOSR under MURI grant FA9550-12-1-0038, by DARPA under contract HR0011-13-C-0051UPSIDE via BAE Systems, Inc., and by the DENSO CORP., Japan.

References

(1) A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNet classification with deep convolutional neural networks”, in: Proc. NIPS’12, Lake Tahoe, NV, Dec. 2012, pp. 1097-1105.

(2) D.B. Strukov, “Smart connections”, Nature, vol. 476, pp. 403-405, 2011.

(3) K.K. Likharev, “CrossNets: Neuromorphic hybrid CMOS/ nanoelectronic networks”, Sci. Adv. Mater., vol. 3, pp. 322–331, 2011.

(4) R. Waser, R. Dittman, G. Staikov, and K. Szot, “Redox-based resistive switching memories”, Adv. Mater., vol. 21, pp. 2632–2663, 2009.

(5) H. S..P. Wong et al., “Metal-oxide RRAM”, Proc. IEEE, vol. 100, pp. 1951-1970, 2012.

(6) W. Lu, D.S. Jeong, M. Kozicki, and R. Waser, “Electrochemical metallization cells – blending nanoionics into nanoelectronics”, MRS Bull., vol. 37 (2), pp. 124-130, 2012.

(7) J.J. Yang, D.B. Strukov, and D.R. Stewart, “Memristive devices for computing”, Nature Nanotechnology, vol. 8, pp. 13-24, 2013.

(8) S. Yu et al., “A neuromorphic visual system using RRAM synaptic devices with Sub-pJ energy and tolerance to variability: Experimental characterization and large-scale modeling”, IEDM Technical Digest, p. 10.4.1, 2012.

(9) S. Park et al., “RRAM-based synapse for neuromorphic system with pattern recognition function”, IEDM Technical Digest, p. 10.2.1, 2012.

(10) M. Prezioso, F. Merrikh-Bayat, B.D. Hoskins, G.C. Adam, K.K., Likharev, “Training and operation of an integrated neuromorphic network based on metal-oxide memristors”, Nature, vol. 521, pp. 61-64, 2015.

(11) F. Alibart, L. Gao, B. Hoskins and D.B. Strukov, “High-precision tuning of state for memristive devices by adaptable variation-tolerant algorithm”, Nanotechnology, vol. 23, p. 075201, 2012.

(12) F. Alibart, A. Zamanidoost, and D.B. Strukov, “Pattern classification by memristive crossbar circuits with ex-situ and in-situ training”, Nature Communications, vol. 4, p. 2072, 2013.

(13) W. Schiffmann, M. Joost, and R. Werner, “Optimization of the backpropagation algorithm for training multilayer perceptrons”, Technical Report, University of Koblenz, 1994.

(14) E. Zamanidoost, M. Klachko, I. Kataeva, and D.B. Strukov, “Low area overhead in-situ training approach for memristor-based classifiers”, in: Proc. NanoArch’15, Boston, MA, July 2015, pp. 139-142.

(15) F. Merrikh Bayat, B. Hoskins, and D.B. Strukov, “Phenomenological modeling of memristive devices”, Appl. Phys. A, vol. 118, pp. 770-786, 2015.

(16) D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classification”, in: Proc. CVPR’12, Providence, RI, June 2012, pp. 3642-3649.

(17) B. Govoreanu et al., “Vacancy-modulated conductive oxide resistive RAM (VMCORRAM)”, IEDM Technical Digest, 10.2, p. 10.2.1, 2013.

0 100 200 300 400 500 600

10.0k10.5k11.0k11.5k12.0k12.5k13.0k13.5k14.0k

Res

ista

nce

(Ω)

PulsesPulse #0 100 200 300 400 500 600

14

13

12

11

10

Res

ista

nce

@ 0

.1V

(kΩ

)

2 μm

Fig. 1. Neuromorphic network implementation with CrossNet circuits [3]: (a) A graph representation of a multilayer perceptron; (b) a cartoon of a hybridCMOS/memristor (CMOL) integrated circuit; (c) analog implementation of the dot-product, (f) its mapping on the hybrid circuit, and (e) the implementation of vector-by-matrix multiplication using a memristive crossbar. (It shows that if negative weight values are required, a synapse may be implemented as a pair of memristors.)

(a) (c)

(d)

(e)

CMOS neurons

crossbar nanowire synapses(a)

CMOS circuits

crossbar with memristive devices

differential pair:

A A A A

VR

(b) (c) (d) (e)

Fig. 4. In-situ training of a single-layer perceptron classifier: (a) flow chart of one epoch for batch- and stochastic-mode training algorithms. Gray-shaded boxes showthe steps implemented inside the crossbar, while those with solid black borders denote the only steps required for performing the feedforward (classification) operation.

−+ −= ijijij GGW

x1

x2 y

W12

neuronsynapses

W

CMOS neuron

f∞y

memristor synapses

V=RIR

G∞W

I=∑jGijVj

V∞ x

(b)

1 0 1 0 0

1 0

1 0 0

Ave

rage

R (k

Ω)

S e tp o in t R ( k Ω )

10 12 14 16 18 20 22

10

12

14

16182022

Ave

rage

R (k

Ω)

Setpoint R (kΩ)

Target resistance @ 0.1V (kΩ)10 100

100

10

222018161412

10

10 12 14 16 18 20 22

Ave

rage

res

ista

nce

@ 0

.1V

(kΩ

)

-2.0-1.5-1.0-0.5 0.0 0.5 1.0 1.5 2.0-600-500-400-300-200-100

0100200300

Cur

rent

(μA

)

Voltage (V)

Reset

Set Form

SiO2/Si

Pt (60 nm)

TiO2-x (30 nm)

Ti (15 nm)

Al2O3 (4 nm)

Ta (5 nm)

Pt (60 nm)

VW-

VR VW+

Voltage (V)-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

300

200

100

0

-100

-200

-300

-400

-500

-600

Cur

rent

(μA

)

- 1 .5 - 1 .0 - 0 .5 0 .0 0 .5 1 .00

5

1 0

1 5

Cou

nts

T h r e s h o ld V o lta g e ( V )

1 .7 1 .8 1 .9 2 .0 2 .10

1 02 03 04 0

Cou

nts

F o rm in g v o lta g e (V )

0 .3 0 .4 0 .5 0 .605

1 01 5

Cou

nts

C o n d u c ta n c e @ 0 .1 V ( μ S )

Reset Set

Switching threshold voltage (V)

Forming voltage (V)

Conductance @ 0.1V (μS)

Cou

ntC

ount

Cou

nt

151050

403020100

151050-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

1.7 1.8 1.9 2.0 2.1

0.3 0.4 0.5 0.6

(b)

Fig. 2. Crossbar circuit with integrated Al2O3/TiO2-x resistive switching devices: (a) micrograph of a 12×12-crosspoint crossbar; (b) typical quasi-dc I-V curves ofmemristor forming and switching, with the inset showing the device stack; and histograms of: (c) conductances before forming, (d) forming voltages, and (e) effectiveswitching threshold voltages. (The threshold is conditionally defined as the point at which device’s resistance is changed by at least 2 kΩ upon application of a 500-μsvoltage pulse train with a slowly increasing amplitude, starting from high/low conductive state for reset/set transitions.)

calculatefi (n) =

calculate(n) =

calculateset

n = 1Δij = 0

desired class fi(g)(n)

last pattern

?no

yesupdate weights

Vj(n)

n = n +1next epoch

end of

epoch

nolast

pattern?

yesupdate weights

n = n +1stochastic

batch

∑=

J

jjijVW

1

( )iIβtanh( ) ( ) ( )nVnn jiij δ=Δ

( ) ( ) ( )[ ] )()(

nII ii

gii dI

dfnfnfn =−=δ

Stochastic Delta Rule:

( )∑=

Δ=ΔN

nijij nW

1

sgnη

Batch Manhattan Rule:

( )nnW ijij Δ=Δ sgn)( ηStochastic Manhattan Rule:

( )nnW ijij Δ=Δ η)(

(a) (b)

training set:

initializeWij

(a)

Fig. 3. Analog properties of crossbar-integrated devices: (a) tuning of the resistance measured at a non-disturbing voltage of 0.1 V (sampled at 1 kHz and averaged over200 s interval) to various values within the dynamic range, (b) a repeated state measurement over time, and (c) an example of tuning sequence. Error bars on panel (a)show the standard deviations during the time sequence shown on panel (b). The sudden changes of resistance, visible on panel (b), are more noticeable at higherresistances.

(c)

Res

ista

nce

@0.

1V (Ω

)

105

104

Time (s)0 20 40 60 80 100 120 140 160 180

Dataset

Software Xbar

in-situ (var. ampl.)Xbar

ex-situ 2%Xbar

ex-situ 0.2%best average best average best average best average

MNIST 0.40 0.47±0.05 0.4 0.48±0.024 0.61 0.89±0.22 0.41 0.42±0.01GTSRB 1.36 1.53±0.18 1.26 1.56±0.27 1.42 1.56±01 1.46 1.47±0.01

CIFAR10 15.63 15.91±0.2215.67 15.87±0.22 19.77 20.29±0.43 15.5 15.8±0.01

Trainingmode

SoftwareXbar in-situ Xbar ex-situ

2% importManhattan Rule Var. ampl.best average best average best average best average

Batch - - 1.98 2.06±0.09 1.47 1.62±0.07 - -

Stochastic 1.57 1.75±0.07 19.26 20.16±1.3 4.06 4.31±0.331.54 1.62±0.04

Mis

clas

sific

atio

n ra

te

Ratio of stuck devices (%)0 10 20 30 40 50 60 70 80 90 100

Fig. 11. Classification performance as afunction of weight import precision (simulatedby adding normally distributed noise) for adeep-learning convolutional network [16]trained by an ex-situ training method.

Table 1. Classification fidelity for (a) 300-hidden-neuron multilayer perceptron network tested on the MNIST benchmark, and (b) a deep-learning convolutional neuralnetwork tested on three indicated benchmarks. The deep network architecture is similar to those described in [14]. 500 patterns per batch were used for batch training.

(b)

Fig. 10. MNIST dataset classification fidelity of amultilayer perceptron as a function of the fractionof stuck-on-open or stuck-on-close devices, forseveral training approaches.

Fig. 5. Physical-level descriptionof the classification experiment: (a)example of operation of classifierusing a 10 × 6 fragment of thecrossbar; example of weightadjustment for (b) stochastic and(c) batch training for a specificerror matrix. Panels (b) and (c)show the voltages only for first twosteps. The read and write biaseswere always VR = 0.1 V and VW

± =±1.3 V, respectively (Fig. 2b).

Fig. 7. Average change in resistance, measured at 0.1V, for 18 devicesinside the crossbar when applying one (a) reset and (b) set pulse. Error barsshow the standard deviation.

Fig. 8. Update dynamics table for the 10×8 crossbar, showing the average number oftimes a device was set (counted as +1) and reset (-1) during the stochastic training: (a)simulation results for software-based perceptron averaged over 500 runs, and (b)experimental results, for similar device initialization.

1.10 1.15 1 .20 1.25 1.30 1 .35 1.40

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Lear

ning

rate

(%)

R ese t P u lse (V )

(a)

Pulse amplitude (V)1.10 1.15 1.20 1.25 1.30 1.35 1.40R

esis

tanc

e ch

ange

(%)

3.53.02.52.01.51.00.50.0

1 .0 0 1 .0 5 1 .1 0 1 .1 5 1 .2 0 1 .2 5 1 .3 0

-0 .3

-0 .2

-0 .1

0 .0

0 .1

0 .2

0 .3

0 .4

Avg

Lea

rnin

g R

ate

(%)

L e a rn in g P u lse S E T (V )

Pulse amplitude (V)1.00 1.05 1.10 1.15 1.20 1.2 1.30

(b)

Res

ista

nce

chan

ge (%

) 3.02.52.01.50.0

-0.5-1.0-1.5

0 0 -80 80 0 0 0 0-40 40 -40 40 40 -40 40 -4040 -40 -40 40 40 -40 -40 40-40 40 40 -40 40 -40 -40 4040 -40 40 -40 -40 40 -40 40-80 80 0 0 0 0 0 00 0 -80 80 0 0 0 00 0 0 0 0 0 80 -8040 -40 -40 40 40 -40 -40 40-50 50 -50 50 -50 50 -50 50

(a) (b)3.6 -3.6 -76.9 76.9 0.5 -0.5 0.3 -0.3-38.6 38.6 -38.5 38.5 41.5 -41.5 40.9 -40.941.4 -41.4 -41.3 41.3 38.4 -38.4 -38.3 38.3-41.8 41.8 42.0 -42.0 38.1 -38.1 -38.4 38.438.5 -38.5 38.1 -38.1 -41.3 41.3 -40.9 40.9-78.2 78.2 4.1 -4.1 0.1 -0.1 -0.3 0.33.6 -3.6 -73.5 73.5 0.1 -0.1 -0.2 0.2-0.2 0.2 -0.5 0.5 3.2 -3.2 77.9 -77.939.2 -39.2 -41.2 41.2 38.2 -38.2 -37.4 37.4-47.1 47.1 -47.2 47.2 -51.8 51.8 -51.6 51.6

(a)

bias

pattern (“z”)V=

I1+ I1

- I2+ I2

- I3+ I3

-

+VR-VR-VR+VR+VR+VR-VR-VR+VR-VR

300-hidden-neuron MLP, variable-amplitude for in-situ training

(b)

0 0 0 00

00

00

0

0 0 0 00

-VW+/2

00

00

0

-VW+/2

-VW+/2

-VW+/2

-VW+/2

+VW+/2

+VW+/2

+VW+/2

+VW+/2

+VW+/2

-VW+/2

+VW+/2

step 1: set G+

step 2: reset G+

step 3: reset G-

step 4: set G-

+log[V1(n)]

-log[δ1(n)]

0

0

-log[V1(n)]

0

0 +log[δ2(n)]

+log[V2(n)]

0

0

-log[V2(n)]

0

0-log[δ2(n)] +log[δ1(n)]

step 1 step 2

step 3 step 4

Fig. 9. An example of a “variable-amplitude” four-step weightupdate for 2×2 crossbar and specific V1(n)>0, V2(n)<0,δ1(n)>0, and δ2(n)>0.

Error rate is normalized with respect to the best results (software in table 1b)

Fig. 6. Results of pattern classification experiments: the convergence of network’s output in the process of in-situ training for the (a) batch and (b) stochastic training modes; (c)-(d): the training and test images used for(c) batch and (d) stochastic training experiments. For the batch training, one epoch is the input of 30 patterns,while for stochastic training, one iteration is the application of one pattern. The batch mode training (test)images are formed by flipping one pixel (two pixels) of the “ideal patterns” shown with the solid border.

(d)

0 100 20002468

1012

Training set Test set

# m

issc

lass

. pat

t.

Iteration #

0 1 0 2 0 3 0 4 0 5 0 6 00

2 04 06 08 0

1 0 0

E p o c h

T e s t s e t L e a rn in g s e t

2 6 % m is c la s s if ic a t io n

10080604020

0

121086420

0 10 20 30 40 50 60

0 100 200Iteration #

Epoch #

Training set

Test set

26%

Training setTest set

# m

iscl

assi

fied

patte

rns

(a)

(b)

training set test set

n

v

zz

v

n

xtraining set test

setx

i ic cv v

(c)

100

10-1

10-2

10-3

3.0

2.5

2.0

1.5

1.0

0.5

Nor

mal

ized

mis

clas

sific

atio

n ra

te

Normalized STD for weight import (%)

sgn[ΔW]=sgn[δi(n)Vj (n)] 0

0

00

0

0 00

0

000

0

0 00

-VW+/2

-VW+/2

-VW+/2

-VW+/2

-VW+/2

+VW+/2 +VW

+/2 +VW+/2

+VW+/2

+VW+/2

+VW+/2

+VW+/2

+VW+/2

-VW+/2 -VW

+/2 -VW+/2

(a)

sgn[ΔW]=

(c)

8-bit weight import precision

0.01 0.1 1

modeling and implementation of firing-rate neuromorphic...

Documents