a cellular computing architecture for parallel memristive stateful logic

12
A cellular computing architecture for parallel memristive stateful logic Eero Lehtonen a,n , Jari Tissari a , Jussi Poikonen b , Mika Laiho a , Lauri Koskinen a a Technology Research Center, University of Turku, Joukahaisenkatu 1C, 20520 Turku, Finland b Department of Communications and Networking, Aalto University, Espoo, Finland article info Article history: Received 9 March 2014 Received in revised form 3 September 2014 Accepted 12 September 2014 Keywords: Memristor Memristive crossbar CMOL Stateful logic Implication logic abstract We present a cellular memristive stateful logic computing architecture and demonstrate its operation with computational examples such as vectorized XOR, circular shift, and content-addressable memory. The considered architecture can perform parallel elementary memristor programming and stateful logic operations, namely implication and converse nonimplication. The topology of the crossbar structure used for computing can be dynamically recongured, enabling combinations of local and global operations with varying granularity. In the CMOS cells used for controlling the memristors, we apply a new type of capacitive keeper circuit, which allows for energy efcient implementation of logic operations. The correct operation of this architecture is veried by detailed HSPICE simulations for a structure containing eight memristive crossbars. This work presents a hardware platform which enables future work on parallel stateful computing. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction Memristive implication logic was originally proposed by Kuekes in [1] as a way to perform logic on memristors. In this form of logic, Boolean variables are represented by the low and high resistance states R ON and R OFF of binary memristors. This operation was rst demonstrated empirically in [2]; since then, various memristive stateful logic operations and corresponding synthesis of Boolean functions have been considered for example in [39]. Memristive stateful logic is inherently sequential, and as noted already in [2], it is most efciently used in parallel form in memristive crossbar architectures. However, as demonstrated for example in [6], a monolithic memristive crossbar circuit allows only limited parallelism. Solutions to partitioning crossbar circuits to allow increased parallelism have been proposed for example in [6,9,8,10]. Specically Kim et al. [8] presents how memristive stateful logic can be performed in a CMOL-type [11] FPNI architecture [12], which allows pipelining stateful operations. A similar approach is discussed in [9], where for example a stateful eight-bit adder beneting from crossbar partitioning is demonstrated. However, in the previously presented parallel stateful computing architectures the fan-out and fan-in of elementary stateful logic operations is limited. In this work we show how the keeper circuits presented in [13,6] can be used to facilitate large fan-in and fan-out in parallel stateful logic operations. In the following we present a circuit architecture designed for efcient parallel stateful logic computing. This architecture con- sists of an array of small memristive crossbar circuits which can connected to form larger crossbars. We show how such an architecture allows us to perform in parallel complex vector operations using a relatively small number of sequential stateful logic operations. We present a CMOS cell design with the objective of minimizing the number of transistors per memristor required, and show that this design allows the implementation of uncondi- tional write operations and two stateful logic operations, enabling the computation of arbitrary Boolean logic functions. Correct operation of the circuit architecture is veried by detailed HSPICE circuit simulations using 0:13 μm CMOS technology, and a mem- ristor model whose characteristics are selected based on empirical results presented in [14]. To demonstrate parallel stateful computing algorithms in large- scale simulations, we developed a Matlab script language and its compiler which generates from a list of commands the control signals required in HSPICE simulations of the considered circuit. These commands are used also in the following text to dene example computations. The main objective of this work is to propose and simulate a hardware platform which enables future work on parallel stateful computing. The paper is organized as follows. In Section 2 we present the memristor model and dene the stateful logic operations used in this work. In Section 3 we describe the cellular stateful logic architecture and dene its control signals. Implementation of elementary write and logic operations and related compiler commands are described in Section 4. Examples of parallel vector Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/mejo Microelectronics Journal http://dx.doi.org/10.1016/j.mejo.2014.09.005 0026-2692/& 2014 Elsevier Ltd. All rights reserved. n Corresponding author. Tel.: þ358 23336963. E-mail address: eero.lennart.lehtonen@utu.(E. Lehtonen). Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J (2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i Microelectronics Journal (∎∎∎∎) ∎∎∎∎∎∎

Upload: helsinki

Post on 28-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

A cellular computing architecture for parallel memristive stateful logic

Eero Lehtonen a,n, Jari Tissari a, Jussi Poikonen b, Mika Laiho a, Lauri Koskinen a

a Technology Research Center, University of Turku, Joukahaisenkatu 1C, 20520 Turku, Finlandb Department of Communications and Networking, Aalto University, Espoo, Finland

a r t i c l e i n f o

Article history:Received 9 March 2014Received in revised form3 September 2014Accepted 12 September 2014

Keywords:MemristorMemristive crossbarCMOLStateful logicImplication logic

a b s t r a c t

We present a cellular memristive stateful logic computing architecture and demonstrate its operationwith computational examples such as vectorized XOR, circular shift, and content-addressable memory.The considered architecture can perform parallel elementary memristor programming and stateful logicoperations, namely implication and converse nonimplication. The topology of the crossbar structureused for computing can be dynamically reconfigured, enabling combinations of local and globaloperations with varying granularity. In the CMOS cells used for controlling the memristors, we applya new type of capacitive keeper circuit, which allows for energy efficient implementation of logicoperations. The correct operation of this architecture is verified by detailed HSPICE simulations for astructure containing eight memristive crossbars. This work presents a hardware platform which enablesfuture work on parallel stateful computing.

& 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Memristive implication logic was originally proposed by Kuekesin [1] as a way to perform logic on memristors. In this form of logic,Boolean variables are represented by the low and high resistancestates RON and ROFF of binary memristors. This operation was firstdemonstrated empirically in [2]; since then, various memristivestateful logic operations and corresponding synthesis of Booleanfunctions have been considered for example in [3–9].

Memristive stateful logic is inherently sequential, and as notedalready in [2], it is most efficiently used in parallel form inmemristive crossbar architectures. However, as demonstrated forexample in [6], a monolithic memristive crossbar circuit allows onlylimited parallelism. Solutions to partitioning crossbar circuits to allowincreased parallelism have been proposed for example in [6,9,8,10].Specifically Kim et al. [8] presents how memristive stateful logic canbe performed in a CMOL-type [11] FPNI architecture [12], whichallows pipelining stateful operations. A similar approach is discussedin [9], where for example a stateful eight-bit adder benefiting fromcrossbar partitioning is demonstrated. However, in the previouslypresented parallel stateful computing architectures the fan-out andfan-in of elementary stateful logic operations is limited. In this workwe show how the keeper circuits presented in [13,6] can be used tofacilitate large fan-in and fan-out in parallel stateful logic operations.

In the following we present a circuit architecture designed forefficient parallel stateful logic computing. This architecture con-sists of an array of small memristive crossbar circuits which canconnected to form larger crossbars. We show how such anarchitecture allows us to perform in parallel complex vectoroperations using a relatively small number of sequential statefullogic operations. We present a CMOS cell design with the objectiveof minimizing the number of transistors per memristor required,and show that this design allows the implementation of uncondi-tional write operations and two stateful logic operations, enablingthe computation of arbitrary Boolean logic functions. Correctoperation of the circuit architecture is verified by detailed HSPICEcircuit simulations using 0:13 μm CMOS technology, and a mem-ristor model whose characteristics are selected based on empiricalresults presented in [14].

To demonstrate parallel stateful computing algorithms in large-scale simulations, we developed a Matlab script language and itscompiler which generates from a list of commands the controlsignals required in HSPICE simulations of the considered circuit.These commands are used also in the following text to defineexample computations. The main objective of this work is to proposeand simulate a hardware platform which enables future work onparallel stateful computing.

The paper is organized as follows. In Section 2 we present thememristor model and define the stateful logic operations used inthis work. In Section 3 we describe the cellular stateful logicarchitecture and define its control signals. Implementation ofelementary write and logic operations and related compilercommands are described in Section 4. Examples of parallel vector

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/mejo

Microelectronics Journal

http://dx.doi.org/10.1016/j.mejo.2014.09.0050026-2692/& 2014 Elsevier Ltd. All rights reserved.

n Corresponding author. Tel.: þ358 23336963.E-mail address: [email protected] (E. Lehtonen).

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i

Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎

operations and computations implemented using the consideredarchitecture are presented in Section 5. Section 6 concludes.

2. Background

2.1. Memristors

In this paper we consider the use of rectifying linear bistablememristors in massively parallel logic computing. By rectifyingmemristors we mean devices which pass significant current onlyto the forward direction, while the switching behavior corre-sponds to that of a nonrectifying bipolar memristor, that is,positive voltages program the device into a more conductive state,whereas negative voltages program the device towards a moreresistive state. In the following we define the mathematical modelof the memristors assumed in this work. This model is inspired bythe device demonstrated empirically in [14].

A memristor has a state variable wA ½0;1� which correspondsto the value of its memristance Rm in the forward direction asfollows. When w¼0, the memristor is said to be in the OFF-statecorresponding to Rm ¼ ROFF. When w¼1, the memristor is said tobe in the ON-state, and Rm ¼ RON. Formally,

Rm ¼ ROFFðRON=ROFFÞw; vZ0ROFF; vo0;

(ð1Þ

where v is the voltage across the memristor in the forwarddirection. In accordance with [14], we assume that ROFF ¼ 500 MΩand RON ¼ 500 kΩ.

The memristor is programmed towards the ON-state byapplying across it a voltage larger than a threshold voltage VTH.Correspondingly applying a voltage more negative than �VTH

programs the memristor towards the OFF-state. When the voltageacross the memristor is between �VTH and VTH, the state of thedevice is assumed to remain unchanged. Note that for simplicity, weassume symmetric threshold voltages for programming to OFF- andON-states. In practice, these voltages may differ, as for example inthe device of [14]. Such asymmetry must be taken into accountwhen defining control voltages for programming and logic opera-tions. The following dynamics are assumed for this bipolar mem-ristor:

dwdt

¼αðv�VTHÞ; vZVTH

αðvþVTHÞ; vr�VTH

0 otherwise;

8><>: ð2Þ

where α is a positive constant related to the programming rate, andVTH ¼ 1 V. In this work we assume α¼ 125� 107 (Vs)�1; with thisvalue of α, a memristor initially in state w¼0 is programmed tostate w¼1 in 4 ns by applying þ1.2 V across it. Comparableprogramming rates have been predicted for memristive devices in[15], and reported empirically for example in [16,17]. It should benoted that this memristor model is a very simplified one, forexample its programming rate depends piecewise linearly on theapplied voltage, and its threshold voltages are fixed, in contrast towhat is observed in many physical devices [18]. The main motiva-tion of this work is to investigate a computing architecture with amultitude of memristors, and the considered simple model allowsfor efficient simulation of the presented circuitry. However, due tothis simplification, simulated values of operation durations andenergies should be considered only as suggestive, while physicalrealizations of the considered circuits may have considerablydifferent characteristics.

In the simulations presented in this work we use a HSPICEmodel of the above described memristor. A pinched hysteresis

curve of this memristor is depicted in Fig. 1. The SPICE netlist forthis simulation model is

.SUBCKT memristor P M w

þRon¼500k Roff¼500Meg vth¼1 vprog¼1.2

dwdtprog¼250e6 winit¼0 wmin¼0 wmax¼1

*State variable

Gsv 0 w value¼’sgn(V(P,M))*dwdtprog*absgeq(V(P,

M),vth)

*(abs(V(P,M))-vth)/(vprog-vth)*trunc(V(w),V

(P,M))’

Csv w 0 1

.IC V(w)¼winit

*I-V relation

Gmem P M value¼’V(P,M)/calcrecresistance(V(P,M),

V(w))’

*Auxiliary functions

.PARAM sign2(var)¼’(sgn(var)þ1)/2’

.PARAM trunc(var1,var2)¼’(sign2(var1-wmin)þsign2(var2))*

(sign2(wmax-var1)þsign2(-var2))/2’

.PARAM absgeq(var1,var2)¼’sign2(var1-var2)þsign2(-var2-var1)’

.PARAM calcresistance(w)¼’Roff*((Ron/Roff)**w)’

.PARAM calcrecresistance(var1,var2)¼’sign2

(var1)*

calcresistance(var2)þsign2(-var1)*Roff’

.ENDS memristor

2.2. Stateful logic

Stateful logic is a form of computational logic in whichdevices both store and perform operations on logical values. Whenimplemented with memristors, Boolean variables are representedby their memristances. Stateful logic operations are realized byprogramming the states of a set of output memristors conditionallyto the states of a set of input memristors. An example of a circuitperforming memristive stateful logic is depicted in Fig. 2. Details ofmemristive stateful logic have been discussed extensively forexample in [5,6]. In Section 3 we present a CMOS–memristorcircuit which implements two stateful logic operations: materialimplication and converse nonimplication. These operations can beused to sequentially compute the value of any Boolean function,

−1.5 −1 −0.5 0 0.5 1 1.510−12

10−11

10−10

10−9

10−8

10−7

10−6

10−5

Voltage (V)

Cur

rent

mag

nitu

de (A

)

Fig. 1. I–V curve of the rectifying memristor model used in this paper. The modelwas driven with a sinusoidal input voltage with frequency 25 MHz and amplitude1.2 V.

E. Lehtonen et al. / Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎2

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i

assuming that memristors can also be unconditionally pro-grammed to ON- or OFF-state.

Memristive implication, denoted in this work by IMP, works asfollows: if every input memristor is in OFF-state, the outputmemristors are programmed to ON-state. Otherwise the states ofthe output memristors do not change. In converse nonimplication,CNIMP, if any input memristor is in ON-state, the output memris-tors are programmed to OFF-state. The truth tables of these logicoperations are specified in Table 1. In practice these operations arerealized in the circuit presented in Section 4 by ensuring that themagnitude of the voltage across any output memristor is largerthan VTH only if in the selected logical operation its state shouldchange.

3. Circuit architecture

The overall circuit architecture considered in this work consistsof multiple small nanowire crossbars, as illustrated in Fig. 3,interfaced with CMOS control logic located physically beneaththe crossbars. A rectifying memristor is located at each crossing ofa vertical and a horizontal nanowire; in the examples presented inthe following each crossbar contains eight vertical and eighthorizontal nanowires. Pass transistors are used to connect adja-cent crossbars, which effectively facilitates changing the topologyas small crossbars are merged into larger ones.

In Fig. 4 we demonstrate the connectivity between the nano-wires in an 8�8 crossbar and the CMOS layer. In this case theCMOS layer is divided into 16 CMOS cells. Eight of these cells,denoted by H1;…;H8 control horizontal nanowires, while the rest,V1;…;V8 control vertical nanowires. Each cell is connected to fourCMOS wires, which are used to control the operation of the cells.In this paper we do not consider the peripheral control logicrequired to drive the control signals. However, the control signalsare described in detail in Section 4. In addition to the controlsignals, each cell is connected to the supply voltages VDD and VSS,

and to global voltages VH and VL used in four-state buffers in theCMOS cells. The regular structure of the CMOS cell array shown inFig. 4 is designed so that when these arrays are tiled into a largerarchitecture, all horizontally or vertically aligned nanowires atdifferent crossbars are driven by CMOS cells that are controlled bythe same signals.

The schematic of the CMOS cell is presented in Fig. 5. Note thatthis cell design is used for all CMOS cells, to control eitherhorizontal or vertical nanowires. It consists of a transmission gatecontrolled by signal C, which allows voltage D to be driven to thenanowire connected to the cell. On the other hand, the voltage atthe nanowire can be passed to D when reading out the state of amemristor. In addition to the transmission gate, each cell containsa latch consisting of an inverter and a four-state buffer controlledby signals A and B. The transistor-level schematic and logic-leveldescription of this four-state buffer are presented in the inset of

Fig. 2. A memristive circuit enabling the computation of stateful logic operation Syielding m4 ¼ SðORðm1 ;m2Þ;m4Þ and m5 ¼ SðORðm1 ;m2Þ;m5Þ. Here the operation Scan be either IMP or CNIMP as specified in Table 1. In this example, memristor m3

does not participate in the stateful logic operation, and therefore it is driven by asafety voltage vsafe. To realize various stateful logic operations, the voltagesvset ; vcond; vsafe should be selected as specified in Section 4. The keeper circuit isused to ensure correct operation of the circuit; the specific circuit applied in thiswork is depicted in Fig. 5. Notice that this circuit generalizes to a two-dimensionalstructure as presented in [6].

Table 1Truth tables of memristive implication and converse nonimplication. Here,mi1 ;…;mik are the input memristors and moj is an output memristor. Notice thatthe state of the output memristor depends on the logical OR of the states of theinput memristors and its own initial state.

p¼ORðmi1 ;…;mikÞ q¼moj moj ¼ IMPðp; qÞ moj ¼ CNIMPðp; qÞ

0 0 1 00 1 1 11 0 0 01 1 1 0

Fig. 3. High-level view of the circuit architecture considered in this work.Nanowire crossbars are pictured in red; pass transistor control wires are drawnin gray. Columns of pass transistors are controlled by signals VP and rows by signalsHP. The area of a single crossbar circuit, depicted in detail in Fig. 4, is marked with ablack rectangle.

Fig. 4. An 8�8 memristor crossbar interfaced with CMOS cells.

E. Lehtonen et al. / Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i

Fig. 5. Note that the output voltages corresponding to 0 and 1 inthe logical description are VH and VL, and HZ refers to highimpedance. In short, the four-state buffer operates as an inverterwhen A¼ 0; B¼ 1, is in high impedance when A¼ 1; B¼ 0, andwhen A¼ 0; B¼ 0 or A¼ 1; B¼ 1 operates as an inverter whichcan drive only logical 1 or 0, respectively.

Digital signals A, B, and C, and the analog signal D, are thecontrol signals depicted as buses of width 4 in Fig. 4. The objectivein designing the presented CMOS cell was to simplify the neces-sary internal logic as much as possible; currently the designcontains 10 minimum-sized transistors, as the switch controlledby signal C is realized by a transmission gate containing aninverter. The number of external control signals could be reducedby adding local memory registers within the cells — investigationof the tradeoff between the complexities of the control signalingand the CMOS cell is left for future work.

Including the pass transistors at the borders of an 8�8nanowire crossbar, each crossbar circuit consists of 176 transistorsand 64 memristors. Thus the transistor to memristor ratio is2.75:1. In general, in an N�N crossbar this ratio is 22:N. Notethat if it is possible to implement this architecture using, forexample, a 32�32 nanowire crossbar, the transistor to memristorratio becomes less than one. Note also that the footprint of amemristor depends here on the number of nanowires in thecrossbar, and the circuit area required by a CMOS cell andassociated wiring.

4. Elementary operations

In the following we present how elementary programming andlogic operations are realized in the crossbar circuit of Fig. 4. Wefirst describe unconditional programming of memristors to ON-and OFF-state, and then consider the implementation of statefullogic operations. In the final part of this section we present howthese elementary operations can be used to realize vectorizedcomputations over several of the crossbar circuits shown in Fig. 3.

4.1. Definitions

Fig. 6 illustrates the control signaling in part of the crossbarcircuit. Operations performed with memristors in the crossbar aredivided into row and column operations, denoted by R and C,respectively, in the following. In row operations, the inputs andoutputs are rows of memristors, corresponding to row vectoroperations. Correspondingly, in column operations the inputs andthe outputs are columns of memristors. A memristor can be excludedfrom a given row or column operation, meaning that its state will not

change. To choose which memristors participate in an operation, andhow the memristors’ states are affected, voltages of different magni-tudes and polarities must be applied across the memristors. For thispurpose, we define five modes of operation for the CMOS cells: IN,OUT, SAFE, LET, and EXC. These are defined by the external controlsignals A, B, C, and D and the global signals VH and VL.

In a column operation, the CMOS cells connected to verticalnanowires may be in modes IN, OUT, or SAFE, while the CMOS cellsconnected to horizontal nanowires may be in modes LET or EXC. Theconverse applies for row operations. The IN, OUT, and SAFE modesare used to define columns or rows of memristors as inputs, outputs,or not participating in a given operation. Note especially that thestates of input and not participating memristors do not changeduring logic operations, while output memristors may change theirstates according to the applied logic operation. In the unconditionalwrite operations, input memristors are programmed, while notparticipating memristors do not change their states. In a columnoperation, CMOS cells in mode LET specify the rows on which anoperation is performed; CMOS cells in mode EXC correspondinglyspecify the rows not affected by the operation. Similarly in rowoperations CMOS cells in modes LET and EXC specify the columns onwhich an operation is performed. In other words, a write or logicoperation occurs in the memristors located between nanowiresconnected to CMOS cells in modes IN or OUT, and LET. The statesof memristors on nanowires connected to CMOS cells in modes SAFEor EXC do not change in write or logic operations.

To simplify the descriptions of the examples below, we applythe following notation: an operation is defined as

C=R IN OP OUT ON LET;

where C/R specifies a column or row operation, IN specifies thecolumn or row indices of the cells in mode IN, OP specifies theelementary operation type (WRITE0, WRITE1, IMP, CNIMP), OUTspecifies the column or row indices of cells in mode OUT, and ONLET specifies the row or column indices of the cells in mode LET. Forexample, to specify an IMP operation from column 1 to column 2 onrows 3 and 4 we write

C1 IMP 2 ON 3 4:

In the following subsections we define the signals and voltagescorresponding to the elementary operations. The set of voltagesapplied in these operations is VSET;L; VSET;H;VCOND;L, and VCOND;H.Corresponding to the memristor model presented in Section 2.1,these voltages are selected as 0 V, 1.2 V, 0.4 V, and 0.8 V, respectively.

Fig. 5. A single CMOS cell interfaced with a nanowire by a vertical via. Fig. 6. Control signaling. Note that the CMOS cells Vi and Hj are drawn on the edgesof the memristive crossbar for illustrative purposes; the actual architecture is asdepicted in Fig. 4.

E. Lehtonen et al. / Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎4

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i

4.2. Write operations

In general, write operations are performed by applying sufficientpositive or negative voltages across the memristors to be pro-grammed while ensuring that the magnitudes of the voltages acrossother memristors are kept below the programming threshold. Inwrite operations, A¼1 and B¼0 in every cell so that the four-statebuffers are in high impedance state, while C¼1 and the voltages atthe nanowires corresponding to the cell states are defined by thevoltages D. As the write operations are unconditional, the OUT modeis not needed. Table 2 specifies the WRITE0 operation, in whichmemristors located between nanowires driven by CMOS cells inmodes IN and LET are programmed to OFF-state. Correspondingly,the unconditional programming to ON-state (WRITE1) is specified inTable 4. In our HSPICE simulations, the duration of the writeoperations is 30 ns. Out of this total duration, 14 ns is used for theactual programming of the memristor, while the rest is used forconfiguring the control signals, allowing for a 2 ns transition timefrom rail to rail per digital signal. This transition time is estimated toenable application of these operations also in large-scale circuits withlong CMOS control wires and substantial capacitance.

Simulated mean power consumption estimates for the memris-tors during the write and logic operations are presented in Tables 3,5, 7 and 9. These power consumption estimates were obtained asaverages over all possible signal and memristor state configurationsfor each operation. In our simulations, the CMOS cell powerconsumption varied between approximately 1:5 μW and 3 μW percell. However, as the CMOS power consumption depends heavily onthe chosen CMOS process, in the following we concentrate on thesimulated power consumption of the memristors.

Let us consider a WRITE0 operation onmemristorsm11 andm12 inthe circuit of Fig. 6. It is realized by defining the modes of the CMOScells as V1:IN, V2:IN, V3:SAFE, H1:LET, and H2:EXC. This is a rowoperation, and thus the corresponding control voltage values can beread at rows 5–8 of Table 2. Using the notation introduced in Section4.1, the command for this operation is

R 1 2 WRITE 0 ON 1:

4.3. Logic operations

Compared to the write operations described above, the ele-mentary stateful logic operations (IMP, CNIMP) require morecomplex control signaling. The logic operations are divided intothree distinct phases: precharge, read, and write. In the prechargephase, the voltages at the nanowires connected to CMOS cells inmode LET are initialized. In the read phase, the states of inputmemristors are measured and stored in the latches of cells inmode LET. This measurement is based on assuming a sufficientcapacitance at the four-state buffer output. The correspondingcapacitor is first precharged according to the desired logic opera-tion, after which it is charged by driving current through the inputmemristors. Since the ratio of the OFF and ON resistances of thememristors is assumed to be large — in this work, 1000 — thecharging time of the capacitor in the read phase depends sig-nificantly on the states of the input memristors. In practice thiscapacitor is due to transistor gate and nanowire capacitance. In thesimulations presented in this paper we assume a capacitance valueof 1 fF; since RON ¼ 500 kΩ and ROFF ¼ 500 MΩ, the associated RCtime constants are 0.5 ns and 500 ns, respectively. In earlierstudies on stateful logic [1,2,13], reference resistors have beenused to perform voltage division allowing the measurement of thestate of input memristors, whereas in this work the CMOS cellsperform capacitive measurements of the inputs. This improves theenergy efficiency of memristive stateful logic, as no direct currentis driven through the input memristors during the logic opera-tions. This efficiency is observable for example in the simulationresults presented in Table 7, as the read phase power consumptionestimates are in the order of 10 nW, whereas the write phasepower consumption can be close to 1 μW.

In the write phase, the voltages across the output memristorsdepend on the states of the latches of CMOS cells in mode LET, andon the chosen logical operation. Again, voltages corresponding tocell modes SAFE and EXC must be chosen so that the states of non-participating memristors are not changed. Signals and voltagescorresponding to the three phases of the IMP and CNIMP opera-tions are defined in Tables 6 and 8, and illustrated in Fig. 7. In oursimulations, the duration of these operations is 40 ns. Of this totalduration, 4 ns is used for precharging, 12 ns is used for reading,and 12 ns is used for writing; the rest of the total duration isreserved for configuring the CMOS cells. Again, 2 ns are allowedfor each transition of the control signals.

Table 2Definition of the WRITE0 operation. It programs to OFF-state the memristor locatedbetween nanowires driven by CMOS cells in modes IN and LET. In columnoperations (denoted by C), CMOS cells in modes IN and SAFE drive verticalnanowires, while CMOS cells in modes LET and EXC drive horizontal nanowires.The converse applies for row operations.

C/R MODE ABC D

C IN 101 VSET;L

C SAFE 101 VCOND;L

C LET 101 VSET;H

C EXC 101 VCOND;L

R IN 101 VSET;H

R SAFE 101 VCOND;H

R LET 101 VSET;L

R EXC 101 VCOND;H

Table 3Voltages across memristors and their simulated mean power consumptions in theWRITE0 operation.

MODE IN SAFE

LET �1.2 V (2 nW) �0.8 V (10 nW)EXC �0.4 V (o1 nW) 0 V (o1nW)

Table 4WRITE1.

C/R MODE ABC D

C IN 101 VSET;H

C SAFE 101 VCOND;L

C LET 101 VSET;L

C EXC 101 VCOND;H

R IN 101 VSET;L

R SAFE 101 VCOND;H

R LET 101 VSET;H

R EXC 101 VCOND;L

Table 5Voltages across memristors and their simulated mean power consumptions in theWRITE1 operation.

MODE IN SAFE

LET 1.2 V (1:3 μW) 0.4 V (0:1μW)EXC 0.4 V (80 nW) �0.4 V (o1nW)

E. Lehtonen et al. / Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 5

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i

Notice that the control signals A and B change between theread and write phases in cells in mode LET. For example, in acolumn implication operation, the four-state buffer may drive onlylogical 1 in the read phase, while in the write phase it acts as aninverter. The reason for this is that while measuring the conduc-tance of the input memristors, the latch should not drive thehorizontal nanowire towards VL. The latch drives the horizontalnanowire towards VH if the voltage at the nanowire rises, indicat-ing that at least one of the input memristors is in ON-state. Afterthe read phase, the latch maintains either VL or VH, correspondingto the measured logical value, at the horizontal nanowire. Thisoperation realizes the keeper circuit depicted in Fig. 2 anddiscussed in [13,6].

In the following, we present simulations realized using HSPICE,Cadence Virtuoso, and the 130 nm ST Microelectronics CMOSprocess. The memristor model used in these simulations is pre-sented in Section 2.1.

4.3.1. Monte Carlo simulationTo assess the tolerance of the considered architecture to

variations in the values of RON and ROFF, an 8�8 crossbar circuitwas simulated for a sequence of write and logic operations; thissimulation was repeated 100 times as a Monte Carlo simulation.The parameters varied in these simulations were wmin, wmax,and dwdtprog in the model netlist presented in Section 2.1. Thesewere sampled from normal distributions with means 0, 1, and250�106 and standard deviations 0.05, 0.05, and 50�106. Fig. 8shows the corresponding memristance values of one memristorduring this sequence of operations. The resulting values of ROFF areapproximately between 250 MΩ and 1:4 GΩ, and the values of RON

between 230 kΩ and 1:2 MΩ. Moreover, the switching durationvaries approximately from 3 to 7 ns, as illustrated in the inset ofFig. 8. In these simulations, the observed variance in memristancesand switching times did not affect the correct operation of thecircuit.

4.4. Pass transistors

An important functionality of the circuit architecture illustratedin Fig. 3 is the possibility to connect the nanowires via passtransistors. This facilitates reconfiguring the topology of thearchitecture, and allows independent crossbars of various sizesto be used for parallel stateful logic computations. For example,when all of the pass transistors are turned off, the circuit

0 10 20 30 40−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Control voltages, IMP (P = 1, Q = 0)

Time (ns)

Vol

tage

(V)

INOUTSAFELETEXC

0 10 20 30 40−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Control voltages, CNIMP (P = 1, Q = 0)

Time (ns)V

olta

ge (V

)

INOUTSAFELETEXC

READ WRITE PC READ WRITEPC

Fig. 7. Simulated voltages at the nanowires controlled by CMOS cells in different modes, corresponding to column operations P IMP Q and P CNIMP Q, where P is in state1 and Q is initially in state 0. Note that as in these examples P¼ 1, in the READ phase the voltage at the nanowire connected to a CMOS cell in mode LET follows the voltagedriven by the CMOS cell in mode IN.

Table 6IMP. For column operations, VL ¼ VSET;L ; VH ¼ VCOND;H, while for row operationsVL ¼ VCOND;L ; VH ¼ VSET;H.

C/R MODE Precharge Read Write

ABC D ABC D ABC D

C IN 101 VSET;L 101 VCOND;H 101 VSET;L

C OUT 101 VSET;L 101 VSET;L 101 VSET;H

C SAFE 101 VSET;L 101 VSET;L 101 VSET;L

C LET 101 VSET;L 000 X 010 XC EXC 101 VSET;L 101 VCOND;H 101 VCOND;H

R IN 101 VSET;H 101 VCOND;L 101 VSET;H

R OUT 101 VSET;H 101 VSET;H 101 VSET;L

R SAFE 101 VSET;H 101 VSET;H 101 VSET;H

R LET 101 VSET;H 110 X 010 XR EXC 101 VSET;H 101 VCOND;L 101 VCOND;L

Table 7Voltages across memristors and their simulated mean power consumptions in theIMP operation. The notation 0–0.8 V means that the voltage during this operation isbetween 0 V and 0.8 V, while the notation �0.8/0 V means that the voltage iseither �0.8 V or 0 V, depending on the result of the read phase.

Phase MODE IN OUT SAFE

Read LET 0–0.8 V (10 nW) �0.8–0 V (o1 nW) -0.8–0 (o1 nW)Read EXC 0 V (o1 nW) �0.8 V (1 nW) �0.8 V (1 nW)Write LET �0.8/0 V (o1 nW) 0.4/1.2 V (0:9 μW) �0.8/0 V (o1 nW)Write EXC �0.8 V (o1 nW) 0.4 V (0:1 μW) �0.8 V (1 nW)

E. Lehtonen et al. / Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎6

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i

architecture consists of a number of small — in our examples, 8�8— nanowire crossbars, within which write and logic operations canbe performed in parallel. On the other hand, inter-crossbar logicoperations can be performed when the pass transistors are turnedon between adjacent crossbars. We assume that a single controlsignal is used to drive all the gates of pass-transistors that form acolumn or row in the proposed architecture, as depicted in Fig. 3.In our script notation, the states of the pass-transistors are definedby the command

C=R IN PASS STATE;

where C/R defines whether the command controls a column or arow of pass-transistors, IN is the set of column or row indices over

which the command applies, and STATE is either 1 (pass-transis-tors are turned on) or 0 (pass-transistors are turned off). By defaultwe assume that the pass-transistors are turned off. For example, toturn the 3rd and 5th columns of pass-transistors on, we write

C 3 5 PASS 1:

To facilitate the readability of more involved lists of commands, weintroduce delimiters IN PARALLEL and END PARALLEL. Betweenthese, the commands are performed in parallel and independentlyof each other.

As an example of the above let us consider a stateful logiccircuit with eight 8�8 nanowire crossbars which are alignedhorizontally and connected via pass transistors. This circuit isdepicted in Fig. 9. This circuit has 64 vertical nanowires controlledby the CMOS cells V1;…;V64, and 8 horizontal nanowires con-trolled by the cells H1;…;H8. It also contains seven columns ofpass-transistors denoted by VP1;…;VP7.

4.4.1. Propagation delaysIn practice, inter-crossbar stateful logic operations are slower

than operations taking place within a single crossbar. This is due tothe fact that the result of the read phase of a logic operation mustbe passed onto the crossbar where the write phase of theoperation is conducted. The pass transistors add resistive andcapacitive load which is driven by the latches in LET mode.

Let us assume that all of the memristors in this circuit areinitially in state 0. The following list of commands first writes thestates of the memristors on the leftmost and rightmost columns inthe architecture, then turns on the pass-transistors, and finallyperforms a stateful operation on the first four rows of the leftmostand rightmost columns of memristors in this architecture:

C 1 WRITE1 ON 3 4 7 8

C 64 WRITE1 ON 2 4 6 8

C 1 2 3 4 5 6 7 PASS 1

C 1 IMP 64 ON 1 2 3 4.

Let us consider the propagation delay related to the implicationoperation in the above. On the third row the memristor in theleftmost column is in state 1. To pass this state to the rightmostcrossbar of the architecture, the latch voltage VH must propagatethrough all of the crossbars. In Fig. 10 the voltages at the third

Table 8CNIMP. For column operations, VL ¼ VCOND;L ; VH ¼ VSET;H, while for row operationsVL ¼ VSET;L ; VH ¼ VCOND;H.

C/R MODE Precharge Read Write

ABC D ABC D ABC D

C IN 101 VCOND;L 101 VSET;H 101 VCOND;L

C OUT 101 VCOND;L 101 VCOND;L 101 VSET;L

C SAFE 101 VCOND;L 101 VCOND;L 101 VCOND;L

C LET 101 VCOND;L 000 X 010 XC EXC 101 VCOND;L 101 VCOND;L 101 VCOND;L

R IN 101 VCOND;H 101 VSET;L 101 VCOND;H

R OUT 101 VCOND;H 101 VCOND;H 101 VSET;H

R SAFE 101 VCOND;H 101 VCOND;H 101 VCOND;H

R LET 101 VCOND;H 110 X 010 XR EXC 101 VCOND;H 101 VCOND;H 101 VCOND;H

Fig. 8. Monte Carlo simulation of a sequence of write and logic operations for a single memristor in an 8�8 crossbar. The black curve represents the nominal value of thememristance of one memristor, while the other curves illustrate 100 Monte Carlo runs with variance in device parameters. The left inset demonstrates the programmingtime variation in a single WRITE0 operation.

Table 9Voltages across memristors and their simulated mean power consumptions in theCNIMP operation.

Phase MODE IN OUT SAFE

Read LET 0–0.8 V (10 nW) �0.8–0 V (o1nW) �0.8–0 (o1nW)Read EXC 0.8 V (0:4μW) 0 V (o1nW) 0 V (o1nW)Write LET �0.8/0 V (o1nW) �1.2/�0.4 V (2 nW) �0.8/0 V (o1nW)Write EXC 0 V (o1 nW) �0.4 V (o1 nW) 0 V (o1 nW)

E. Lehtonen et al. / Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 7

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i

horizontal nanowires in each crossbar are illustrated: it can beseen that the propagation delay is approximately 1 ns per crossbar.This delay is an order of magnitude shorter than the overallduration of a single logic operation, 40 ns, but becomes dominantin large circuits.

4.5. Synthesis of logical functions

The synthesis of arbitrary Boolean functions with stateful logichas been extensively described for example in [4,6]. For thepurposes of the following examples, let us consider the synthesisof vectorized XOR in the 64�8 circuit illustrated in Fig. 9. As this isa bitwise operation, we present here only the computations takingplace in a single 8�8 crossbar. Suppose that the state of the circuitis initially as presented in Fig. 11(a), and that the goal is to computethe bitwise XOR of rows 1 and 2, and store the result in the bottomrow. XOR can be synthesized in many ways, here we apply theNAND-OR approach presented in [6]. The list of commandsrequired for this synthesis is given below:

R 1 IMP 6 ON ALL

R 2 IMP 7 ON ALL

R 1 7 IMP 8 ON ALL

R 2 6 IMP 8 ON ALL

the first two commands above copy the negated values of rows1 and 2 to rows 6 and 7, respectively, as demonstrated in Fig. 11(b).The final two commands compute XOR, as

XORðp; qÞ ¼NANDðORðp;:qÞ;ORð:p; qÞÞ; ð3Þ

where : denotes the logical negation. These commands areillustrated in Fig. 11(c) and (d).

In our simulation for the 64�8 crossbar circuit, this computa-tion takes 160 ns, as each of the implication operations takes40 ns. In our simulation, the mean total power consumption of thememristors during this computation is approximately 18:0 μW.

5. Computational examples

5.1. Shift

As discussed in [6], implementation of a bit shift operationis not efficient in a nonsegmented crossbar. In a single N�Ncrossbar the left-shift of an N-element vector requires O(N)operations, since it is necessary to shift each bit separately.However, the left-shift can be implemented using a constantnumber of operations regardless of the vector length when thecomputation is parallelized to independent crossbars as in theconsidered architecture. In the following we show a circular left-shift operation, where the first column of crossbar i, i¼ 1;…;8 iscopied as the first column of crossbar i�1, where the indices aretaken modulo 8.

Assume that the state of the 64�8 crossbar circuit is initially asdepicted in Fig. 12(a). The shift operation can be performed byapplying the following commands, assuming that all of the passtransistors are initially turned off:

1. C 1 3 5 7 PASS 1

2. IN PARALLEL

3. C 57 IMP 56 ON ALL

4. C 41 IMP 40 ON ALL

5. C 25 IMP 24 ON ALL

6. C 9 IMP 8 ON ALL

7. END PARALLEL

8. C 1 3 5 7 PASS 0

9. C 2 4 6 PASS 1

10. IN PARALLEL

11. C 49 IMP 48 ON ALL

12. C 33 IMP 32 ON ALL

13. C 17 IMP 16 ON ALL

14. END PARALLEL

15. C 1 3 5 7 PASS 1

16. C 1 IMP 64 ON ALL

17. C 1 2 3 4 5 6 7 PASS 0

18. C 1 9 17 25 33 41 49 57 WRITE1 ON ALL

19. IN PARALLEL

20. C 8 CNIMP 1 ON ALL

21. C 16 CNIMP 9 ON ALL

22. C 24 CNIMP 17 ON ALL

23. C 32 CNIMP 25 ON ALL

24. C 40 CNIMP 33 ON ALL

25. C 48 CNIMP 41 ON ALL

26. C 56 CNIMP 49 ON ALL

27. C 64 CNIMP 57 ON ALL

28. END PARALLEL

29. C 8 16 24 32 40 48 56 64 WRITE0 ON ALL

In Fig. 12(b), pass transistor columns 1, 3, 5, and 7 are turnedon, and parallel implication operations are performed in theresulting four 8�16 crossbars. This corresponds to the first setof parallel operations in the given list of commands (commands 2–7). Correspondingly in Fig. 12(c), pass transistor columns 2, 4, and6 are turned on while the other columns are turned off, and

Fig. 9. Crossbar architecture consisting of eight 8�8 memristor crossbars and seven pass transistor columns. This architecture is used in the computation examples.

65 70 75 80−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Time (ns)

Vol

tage

(V)

Fig. 10. HSPICE simulation of the propagation delays in the read phase of an IMPoperation in the circuit of Fig. 9. Different curves correspond to latch outputvoltages in the individual crossbar circuits through which the signal propagates.

E. Lehtonen et al. / Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎8

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i

implication operations are performed in the resulting 8�16crossbars. In Fig. 12(d), all pass transistors are turned on in orderto negate and copy the states of memristors in the leftmostcolumn to the rightmost column of the crossbar circuit. The

depicted configuration corresponds to the state of the circuit aftercommand 16.

Subsequently, all pass transistors are turned off (command 17),and parallel operations are performed within each 8�8 crossbar,

Fig. 12. Circular left shift. Crossbars are visually separated when the corresponding pass transistors are turned off. For example, in subfigure (a) representing the initialconfiguration, all pass transistors are turned off, which means that stateful operations can be performed in parallel in all 8�8 crossbars. Subfigures (b)–(d) show how thenegated first column of crossbar i is copied to the last column of crossbar i�1. Subfigure (e) shows the final configuration of the circular left shift operation. Outputmemristors in stateful operations in each stage are framed with dashed lines.

Fig. 11. Computation of the bitwise XOR of the top two rows of an 8�8 crossbar. Here each square represents a memristor; white squares denote memristors in OFF-state,while black squares denote memristors in ON-state. The depicted steps of the computation are (a) initial configuration, (b) copying negated values of inputs, (c) implicationfrom rows one and seven to row eight, (d) the result of the computation. The bottom row in the final step equals the XOR of the top two rows. Output memristors in statefuloperations in each stage are framed with dashed lines.

E. Lehtonen et al. / Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 9

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i

corresponding to commands 18–28 and the WRITE0 command 29.Finally, the configuration of the crossbar circuit is as depicted inFig. 12(d).

In our simulation, the circular left-shift operation takes 220 ns,as each of the parallel sets of logic operations takes 40 ns and eachof the write operations takes 30 ns. It should be noted that toachieve the circular shift operation, a logic operation from the firstcrossbar to the last crossbar must be performed; in larger circuitsthis may require a longer time than rest of the operations due topropagation delay. However, in our small-scale simulation, thispropagation delay is embedded into the durations of the readphases of the logic operations. In our simulation, the mean totalpower consumption of the memristors during this computation isapproximately 9:5 μW.

5.2. Content-addressable memory

As a second example, we show how a content-addressablememory (CAM) can be implemented using the proposed circuitarchitecture. In this example, the CAM consists of a search vector,stored in bits 1–7 of the first column of the first 8�8 crossbar in thecircuit of circuit of Fig. 9, and data vectors stored in the first columns

of the other 8�8 crossbars. The rightmost column of each 8�8crossbar contains a unique address for that crossbar. After the CAMoperation, the rightmost column of the first 8�8 crossbar should bethe address of crossbar containing the data vector that equals thesearch vector. This operation is achieved using parallel stateful logicoperations; in the presented form the CAM operation allows only onematching data vector to be located. Note that this CAM operation isnot disruptive on input vectors, but requires a fixed number ofcolumns of memristors per crossbar for intermediate computations.In the following example, the overhead required for computing issignificant, but this overhead is mitigated if larger crossbars are used.

The following list of commands specifies the elementary opera-tions used in the CAM. Initially the crossbar circuit is configured asshown in Fig. 13(a). The first set of parallel implications (commands1–9) copies the negated data crossbar addresses. Then the passtransistors are turned on, and the negated search vector is copied tothe data crossbars (commands 10–11), as depicted in Fig. 13(b). Afterthis, all pass transistors are turned off, and the XORs of the data andsearch vectors are computed in parallel and stored in the sixthcolumn of each data crossbar (commands 13–48). Subsequently, aparallel implication operation is performed on the sixth column ofthe data crossbars (command 49). If the data vector and the search

Fig. 13. Content-addressable memory implemented by stateful logic operations. Subfigure (a) shows the initial configuration; the leftmost crossbar contains the searchvector, while the other crossbars contain data vectors and their addresses. In subfigure (b), the negated search vector and addresses have been copied in each of the datacrossbars. Subfigure (c) shows the configuration of the circuit after computing the XOR of the search vector and the data vectors in parallel. The single memristor in ON-statein the bottom row indicates that the search vector has been located in the corresponding data crossbar. In subfigures (d) and (e), the address of this data crossbar is copied tothe last column of the first crossbar. Note that none of the data vectors or addresses have been modified in the CAM operation. Output memristors in stateful operations ineach stage are framed with dashed lines.

E. Lehtonen et al. / Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎10

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i

vector are equal, the result of this implication operation equals 1. Theconfiguration of the circuit at this point is depicted in Fig. 13(c). Ascan be seen, the search vector has been located in the fifth datacrossbar, indicated by the memristor in ON-state in the bottom row.

1. IN PARALLEL

2. C 16 IMP 15 ON 1 2 3

3. C 24 IMP 23 ON 1 2 3

4. C 32 IMP 31 ON 1 2 3

5. C 40 IMP 39 ON 1 2 3

6. C 48 IMP 47 ON 1 2 3

7. C 56 IMP 55 ON 1 2 3

8. C 64 IMP 63 ON 1 2 3

9. END PARALLEL

10. C 1 2 3 4 5 6 7 PASS 1

11. C 1 IMP 11 19 27 35 43 51 59 ON 1 2 3 4 5 6 7

12. C 1 2 3 4 5 6 7 PASS 0

13. IN PARALLEL

14. C 11 IMP 12 ON 1 2 3 4 5 6 7

15. C 19 IMP 20 ON 1 2 3 4 5 6 7

16. C 27 IMP 28 ON 1 2 3 4 5 6 7

17. C 35 IMP 36 ON 1 2 3 4 5 6 7

18. C 43 IMP 44 ON 1 2 3 4 5 6 7

19. C 51 IMP 52 ON 1 2 3 4 5 6 7

20. C 59 IMP 60 ON 1 2 3 4 5 6 7

21. END PARALLEL

22. IN PARALLEL

23. C 9 IMP 13 ON 1 2 3 4 5 6 7

24. C 17 IMP 21 ON 1 2 3 4 5 6 7

25. C 25 IMP 29 ON 1 2 3 4 5 6 7

26. C 33 IMP 37 ON 1 2 3 4 5 6 7

27. C 41 IMP 45 ON 1 2 3 4 5 6 7

28. C 49 IMP 53 ON 1 2 3 4 5 6 7

29. C 57 IMP 61 ON 1 2 3 4 5 6 7

30. END PARALLEL

31. IN PARALLEL

32. C 9 11 IMP 14 ON 1 2 3 4 5 6 7

33. C 17 19 IMP 22 ON 1 2 3 4 5 6 7

34. C 25 27 IMP 30 ON 1 2 3 4 5 6 7

35. C 33 35 IMP 38 ON 1 2 3 4 5 6 7

36. C 41 43 IMP 46 ON 1 2 3 4 5 6 7

37. C 49 51 IMP 54 ON 1 2 3 4 5 6 7

38. C 57 59 IMP 62 ON 1 2 3 4 5 6 7

39. END PARALLEL

40. IN PARALLEL

41. C 12 13 IMP 14 ON 1 2 3 4 5 6 7

42. C 20 21 IMP 22 ON 1 2 3 4 5 6 7

43. C 28 29 IMP 30 ON 1 2 3 4 5 6 7

44. C 36 37 IMP 38 ON 1 2 3 4 5 6 7

45. C 44 45 IMP 46 ON 1 2 3 4 5 6 7

46. C 52 53 IMP 54 ON 1 2 3 4 5 6 7

47. C 60 61 IMP 62 ON 1 2 3 4 5 6 7

48. END PARALLEL

49. R 1 2 3 4 5 6 7 IMP 8 ON 14 22 30 38 46 54 62

50. IN PARALLEL

51. C 14 IMP 15 ON 8

52. C 22 IMP 23 ON 8

53. C 30 IMP 31 ON 8

54. C 38 IMP 39 ON 8

55. C 46 IMP 47 ON 8

56. C 54 IMP 55 ON 8

57. C 62 IMP 63 ON 8

58. END PARALLEL

59. R 8 CNIMP 1 2 3 ON 15 23 31 39 47 55 63

60. C 1 2 3 4 5 6 7 PASS 1

61. C 15 23 31 39 47 55 63 IMP 8 ON 1 2 3

After locating the correct data crossbar, the last set of paralleloperations and the final CNIMP and IMP operations (commands50–61) copy its address to the first crossbar. The CNIMP operation(command 59) is used to erase the negated addresses of all datacrossbars except the one whose data vector is equal to the searchdata. This configuration is illustrated in Fig. 13(d). Finally, the passtransistors are turned on, and the multi-input implication opera-tion is used to copy the correct address to the first crossbar, asshown in Fig. 13(e).

In our simulation, the above implementation of the content-addressable memory takes 400 ns, as each of the parallel logicoperations takes 40 ns. The CAM requires two long-range statefullogic operations, namely when the search vector is copied to thedata crossbars and when the data address is copied to the firstcrossbar. In our simulation, the mean total power consumption ofthe memristors during this computation is approximately 5:3 μW.

6. Conclusion

In this work we have presented a CMOS/memristor circuitarchitecture which enables parallel computing using memristivestateful logic. In the CMOS cells used for controlling the memris-tors, we have applied a capacitive keeper circuit which allows forimproving the energy efficiency of the read phase of memristivestateful logic compared to earlier approaches using resistivevoltage division. The presented architecture allows unconditionalwrite operations, and implication and converse nonimplicationoperations with arbitrary fan-in and fan-out. We have verified thecorrect operation of this architecture by detailed HSPICE simula-tions for a structure containing eight memristive crossbars: 8� 82

memristors and 8� ð2 � 8Þ CMOS cells. It should be noted thatwhile the size of the simulated architecture was limited to eightmemristive crossbars, the numbers of elementary operationsreported for the computation examples of Section 5 are valid alsofor circuits containing larger numbers of crossbars. Naturally, asdiscussed in Section 4.4.1, worst-case propagation delays increasewith the number of crossbars.

The reported architecture seems useful for the implementationof memristive stateful logic, as it allows us to reduce significantlythe lengths of computational sequences required in implementingvector operations, and enables efficient parallel computation. Themain objective for the work was to design and simulate a platformwhich enables future work on parallel stateful computing. Thepresented simulation results indicate that this objective wasreached; based on this work we identify significant topics forfuture studies, as outlined below.

In this work we have used a script language and a compiler togenerate control voltages for HSPICE simulations. Such anapproach allows for abstract definitions of algorithms computedin the considered architecture. Future work on this topic includesresearch on compilers for specific hardware environments wherethe considered circuitry could be embedded.

To determine the practical applicability of this work, the CMOSlogic required for control signals, and peripheral circuitry shouldbe considered. Also, it is yet to be determined what type ofcomputations are most efficiently implemented in the presentedarchitecture, compared to traditional CMOS realizations. Further-more, the suitability of memristive devices for digital computingremains to be validated empirically. For example, a potentiallimiting factor in the use of memristors for logic computation isthe maximum number of state changes per memristor. Currently,switching endurances of more than 1012 state changes have beenreported [16], while it is projected in [15] that the best-caseendurance could be of the order 1016. We note that 1015 statechanges would in practice allow the presented system to operate

E. Lehtonen et al. / Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 11

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i

for several years. Finally, in this work we applied a simplifiedmemristor model; effects of the detailed physical properties ofspecific memristor technologies such as asymmetric thresholdvoltages, retention, and nonlinear I–V characteristics should beconsidered.

Acknowledgments

This work was supported by the Academy of Finland (253596,258831, 264914, 277383, 140108) and by the Samsung GlobalResearch Outreach (GRO) program.

References

[1] P. Kuekes, Material implication: digital logic with memristors, in: Presentationin the Memristor and Memristive Systems Symposium at UC Berkeley, 2008.

[2] J. Borghetti, G.S. Snider, P.J. Kuekes, J.J. Yang, D.R. Stewart, R.S. Williams,Memristive switches enable stateful logic operations via material implication,Nature 464 (2010) 873–876.

[3] E. Lehtonen, J.H. Poikonen, M. Laiho, Two memristors suffice to compute allBoolean functions, Electron. Lett. (3) (2010) 239.

[4] J.H. Poikonen, E. Lehtonen, M. Laiho, On Synthesis of Boolean Expressions forMemristive Devices Using Sequential Implication Logic, IEEE Trans. Computer-Aided Design Integ. Circuits Syst. 31 (7) (2012) 1129–1134, http://dx.doi.org/10.1109/TCAD.2012.2187524.

[5] E. Lehtonen, J.H. Poikonen, M. Laiho, Implication logic synthesis methods formemristors, in: Proceedings of the IEEE International Symposium on Circuitsand Systems, ISCAS 2012, 2012.

[6] E. Lehtonen, J. Poikonen, M. Laiho, Memristive stateful logic, in: A. Adamatzky,L. Chua (Eds.), Memristor Networks, Springer International Publishing, Swit-zerland, 2014.

[7] P. Teodorovic, S. Dautovic, V. Malbasa, Recursive Boolean formula minimiza-tion algorithms for implication logic, IEEE Trans. Comput.-Aided Des. Integr.Circuits Syst. 32 (11) (2013) 1829–1833.

[8] K. Kim, S. Shin, S.-M. Kang, Field programmable stateful logic array, IEEE Trans.Comput.-Aided Des. Integr. Circuits Syst. 30 (December (12)) (2011) 1800–1813.

[9] S. Kvatinsky, G. Satat, N. Wald, E. Friedman, A. Kolodny, U. Weiser, Memristor-based material implication (IMPLY) logic: design principles and methodolo-gies, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2013.

[10] S. Shin, K. Kim, S.-M. Kang, Resistive computing: memristors-enabled signalmultiplication, IEEE Trans. Circuits Syst. I: Reg. Pap. 60 (May (5)) (2013)1241–1249.

[11] K.K. Likharev, D.B. Strukov, CMOL: devices, circuits, and architectures, in:G. Cuniberti, G. Fagas, K. Richter (Eds.), Introducing Molecular Electronics,Springer, Berlin, 2005, pp. 447–478.

[12] G.S. Snider, R.S. Williams, Nano/CMOS architectures using a field-programmable nanowire interconnect, Nanotechnology 18 (3) (2007).

[13] M. Laiho, E. Lehtonen, Cellular nanoscale network cell with memristors forlocal implication logic and synapses, in: Proceedings of 2010 IEEE Interna-tional Symposium on Circuits and Systems (ISCAS), 2010, pp. 2051–2054.

[14] K.-H. Kim, S. Gaba, D. Wheeler, J.M. Cruz-Albrecht, T. Hussain, N. Srinivasa,W. Lu, A functional hybrid memristor crossbar-array/CMOS system for datastorage and neuromorphic applications, Nano Lett. 12 (1) (2012) 389–395.

[15] J. Hutchby, M. Garnet, Emerging research devices and emerging device materials:Memory assessment workshop summary, 2010, International Technology Road-map for Semiconductors, Technical Report, 2010 (Online). Available: ⟨http://www.itrs.net/Links/2010ITRS/2010Update/ToPost/ERDERM2010FINALReport-MemoryAssessmentITRS.pdf⟩.

[16] M.-J. Lee, C.B. Lee, S.R. Lee, M. Chang, J.H. Hur, Y.-B. Kim, C.-J. Kim, D.H. Seo,S. Seo, U.-I. Chung, I.-K. Yoo, K. Kim, A fast, high-endurance and scalable non-volatile memory device made from asymmetric Ta2O5�x/TaO2�x bilayerstructures, Nat. Mater. 10 (8) (2011) 625–630.

[17] A.C. Torrezan, J.P. Strachan, G. Medeiros-Ribeiro, R.S. Williams, Sub-nanosecond switching of a tantalum oxide memristor, Nanotechnology 22(48) (2011) 485203 (Online). Available: ⟨http://stacks.iop.org/0957-4484/22/i=48/a=485203⟩.

[18] E. Lehtonen, J. Poikonen, M. Laiho, W. Lu, Time-dependency of the thresholdvoltage in memristive devices, in: 2011 IEEE International Symposium onCircuits and Systems (ISCAS), May 2011, pp. 2245–2248.

E. Lehtonen et al. / Microelectronics Journal ∎ (∎∎∎∎) ∎∎∎–∎∎∎12

Please cite this article as: E. Lehtonen, et al., A cellular computing architecture for parallel memristive stateful logic, Microelectron. J(2014), http://dx.doi.org/10.1016/j.mejo.2014.09.005i