parallel stateful logic in rram: theoretical analysis and
TRANSCRIPT
Parallel Stateful Logic in RRAM: Theoretical Analysis and Arithmetic Design
Feng Wang, Guojie Luo, Guangyu Sun, Jiaxi Zhang, Peng Huang, Jinfeng Kang
2019/07/16
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
RRAM Resistance for Representing Logic Values
[1] Akinaga, H., & Shima, H. (2010). Resistive random access memory (ReRAM) based on metal oxides. Proceedings of the IEEE.
3
HRS (logical 0)
LRS(logical 1)
SET (πππΏ β ππ΅πΏ > πππΈπ )
RESET (ππ΅πΏ β πππΏ > ππ πΈππΈπ )
metal
metal
insulator
WL
BL
RRAM State Switching as Primitive Logic Operations
[2] Kvatinsky, S., Member, S., Belousov, D., Liman, S., Satat, G., Member, S., β¦ Weiser, U. C. (2014). MAGIC β Memristor-Aided Logic. TCAS-II.
4
MAGIC NOR implementation π = NOR(π, π)
β ππΊ > 2 β ππ πΈππΈπ
Parallel execution over rows and columns
β WL parallelism: π ππ = NOR π π1, π π2 π β [1,π]
β BL parallelism: π ππ = NOR(π 1π , π 2π) π β [1, π]
VG VG GND R1
R1
R1
R1 R1
R1
R1
R1
R1
BL voltage controller
WL1
WL2
WLm
BL1 BL2 BLm
R11 R12 R1m
R21 R22 R2m
Rm1 Rm2 Rmm
RRAM cell
RRAM-based Stateful Logic Families
[3] Borghetti, J., Snider, G. S., Kuekes, P. J., Yang, J. J., Stewart, D. R., & Williams, R. S. (2010). βMemristiveβ switches enable βstatefulβ logic operations via
material implication. Nature.
[4] Huang, P., Kang, J., Zhao, Y., Chen, S., Han, R., Zhou, Z., β¦ Liu, X. (2016). Reconfigurable Nonvolatile Logic Operations in Resistance Switching Crossbar
Array for Large-Scale Circuits. Advanced Materials.
[5] Avati, V., Eggert, K., & Taylor, C. (2018). FELIX: Fast and Energy-Efficient Logic in Memory. In ICCAD.
[6] Xu, L., Bao, L., Zhang, T., Yang, K., Cai, Y., Yang, Y., & Huang, R. (2018). Nonvolatile memristor as a new platform for non-von Neumann computing. In
ICSICT.
5
Support parallel execution
Functionally complete
Work Stateful logic operations
[3] IMP
[2] NOR, NOT
[4] NAND, NIMP
[5] NOR, NAND, Min, OR
[6] NOR, NOT, NAND, NIMP, XOR
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
SIML Computation Model7
All of the stateful logic families satisfy the four assumptions
β The latency of a single stateful logic operation is identical.
β The input number of a single operation cannot exceed a constant.
β The latency of WL and BL operations is identical.
β The degree of parallelism can reach the crossbar size. The crossbar size scales with the problem size.
Lower Bounds of the Time Complexity (1)8
(a) Theorem 1 (b) Theorem 2
Condition bitwise functions most arithmetic functions
Parallelismupper bound
max(π€, β) ππ
π€ + β
Time complexity lower bound
trivial bound: ππ
max(π€,β)shape bound: π(π€ + β)
Example function ππ = ππ1 NOR ππ2 (π = 1, 2,β¦ , π) ππ = NORπ=1π ππ1 NOR ππ2 (π = 1, 2, β¦ , π)
Example algorithm (netlist)
Example layout in RRAM
π11 π21 β¦ ππ1
π12 π22 β¦ ππ2
π1 π2 β¦ ππ
π11 π21 β¦ ππ1
π12 π22 β¦ ππ2
π1 π2 β¦ ππ
π11
π12
π1 β¦ ππ1
ππ2
ππ
#1
#1 #1 #1
#n-1
#n #n #n
π11
π21 ππ1
β¦
ππ2
ππ
π(π): total cycles in the series implementation, π€: width of layout (# of BLs), β: height of layout (# of WLs), ππππππ₯: length of the critical path.
Lower Bounds of the Time Complexity (2)9
(c) Corollary 1 (d) Theorem 3
Condition square layout a given algorithm
Parallelismupper bound
π(π
π€β) π
π
ππππππ₯
Time complexity lower bound
function bound: π( π€β) algorithm bound: ππππππ₯
Example function β π = NORπ=1π ππ1
Example algorithm (netlist)
β
Example layout in RRAM
β
π(π): total cycles in the series implementation, π€: width of layout (# of BLs), β: height of layout (# of WLs), ππππππ₯: length of the critical path.
π1 π2 β¦ ππ ππ
π1
π2 ππ
πβ¦
#1 #n-1
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
Ripple Carry Adder11
One-bit full adder
Pulse Logic operation
1 π1 = NOR(π΄, π΅)
2 π2 = NOR(π΄, π1)
3 π3 = NOR π΅, π1
4 π4 = NOR(π1, π2)
5 π5 = NOR(π4, πΆπ)
6 πΆπ = NOR(π1, π5)
7 π6 = NOR(π4, π5)
8 π7 = NOR(π5, πΆπ)
9 π = NOR(π5, π7)
π¨ππ π¨ππ β¦ π¨ππ
π¨ππ π¨ππ β¦ π¨ππ
π11 π12 β¦ π1π
β¦ β¦ β¦ β¦
π41 π42 β¦ π4π
π51 π52 β¦ π5π
πΆπ1 πΆπ2 β¦ πΆππ
πΆπ1 πΆπ1(πΆπ2) β¦ πΆπ πβ1 (πΆππ)
π61 π62 β¦ π6π
β¦ β¦ β¦ β¦
πΊπ πΊπ β¦ πΊπ
β‘
β’
β
β β’ add π BLs in parallel β‘ generate and propagate carries in series
time complexity: π(π)shape / algorithm lower bound
Carry Select Adder12
π¨ππ β¦ π¨π π π¨ππ β¦ π¨π π 0 π1 β¦ π π
π¨π( π+π) β¦ π¨π(π π) π¨π( π+π) β¦ π¨π(π π) 0 π π+1 β¦ π2 π
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
π¨π(πβ π+π) β¦ π¨ππ π¨π(πβ π+π) β¦ π¨ππ 0 ππβ π+1 β¦ ππβ²
π΄1( π+1)β² β¦ π΄1(2 π)β² π΄2( π+1)β² β¦ π΄2(2 π)β² 1 π π+1β² β¦ π2 πβ²
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
π΄1(πβ π+1)β² β¦ π΄1πβ² π΄2(πβ π+1)β² β¦ π΄2πβ² 1 ππβ π+1β² β¦ ππβ²
β¦ β¦ β¦ β¦ β¦ β¦ πΊπ β¦ πΊ π
β¦ β¦ β¦ β¦ β¦ β¦ πΊ π+π β¦ πΊπ π
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
β¦ β¦ β¦ β¦ β¦ β¦ πΊπβ π+π β¦ πΊπ
β
β’
β‘
β copy the input β‘ add 2 π β 1 WLs in parallel β’ select the correct result
time complexity: π( π)function / algorithm lower bound
Carry Save Adder13
π¨ππ π¨ππ β¦ π¨ππ
π¨ππ π¨ππ β¦ π¨ππ
π¨ππ π¨ππ β¦ π¨ππ
β¦ β¦ β¦ β¦
πΆπ1 πΆπ2 β¦ πΆππ
ππ1 ππ2 β¦ πππ
0 πΆπ1 β¦ πΆπ(πβ1)
ππ1 ππ2 β¦ πππ
β¦ β¦ β¦ β¦
πΊπ πΊπ β¦ πΊπ
shift πΆπ1, β¦,πΆππ right
β
β’
β add π BLs in parallel β’ add the last two addends πΆπ, ππ
β‘
time complexity: π( π)function lower bound
π1 β¦ π π π΄1
π π+1 β¦ π2 π π΄2
β¦ β¦ β¦ β¦
ππβ π+1 β¦ ππ π΄ π
β¦ β¦ β¦ π
β accumulate each WL in parallelβ‘ accumulate the sums in the same BLs
β β‘
Adder Summary14
Function two π-bit integer addition π΄ integer addition
Algorithm ripple carry adder carry select adder carry select adder
Layout π π Γ π(1) π π Γ π( π) π π Γ π( π)
Time complexity π(π) π( π) π( π)
Lower bound type shape / algorithm bound function / algorithm bound function bound
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
Dot Product16
πππ β¦ ππ π΄ πππ β¦ ππ π΄ π1 β¦ π π π1
ππ( π΄+π) β¦ ππ(π π΄) ππ( π΄+π) β¦ ππ(π π΄) π π+1 β¦ π2 π π2
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
ππ(π΄β π΄+π) β¦ πππ΄ ππ(π΄β π΄+π) β¦ πππ΄ ππβ π+1 β¦ ππ π π
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ πΆπ
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ ππ
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦
β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ π¨π β π¨π
β
β’
β element-wise multiplication β‘ accumulate π WLs in parallel β’ accumulate ππs using the carry save adder
β‘
time complexity: π( π)function lower bound
Real Data Type Comparison
[7] KΓΆster, U., Webb, T. J., Wang, X., Nassar, M., Bansal, A. K., Constable, W. H., β¦ Rao, N. (2017). Flexpoint: An Adaptive Numerical Format for Efficient Training of
Deep Neural Networks. arXiv.
17
Β·Β·Β·
Β·Β·Β·
Β·Β·Β·Β·Β·Β·
Β·Β·Β·
Β·Β·Β·
Β·Β·Β· Β·Β·Β·
Β·Β·Β·
fixed-point flex-point
float-point
exponent mantissa
Flex-point Support18
πππ πππ β¦ π31
πππ πππ β¦ π32
β¦ β¦ β¦ β¦
πππ πππ β¦ π3π
Co
ntr
olle
rππππ ππππ β¦ ππ₯π3
β input exponent
β‘ arithmetic instruction
β’ output status
β£ shift instruction
β€ output exponent
Crossbar
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
Integer Algorithm Evaluation20
0
200
400
600
800
8 16 24 32 40 48 56 64
cycl
e
bit width n
MAGIC-adder [8]
ripple carry
carry select
100
1000
10000
100000
1000000
16 32 64 128 256 512 10242048
cycl
e
integer number M
MAGIC-adder [8]
APIM [9]
carry save
0
20000
40000
60000
80000
16 32 64 128 256 512 1024 2048
max
imal
M
crossbar size
APIM [9]
carry save
0
1000
2000
3000
4000
64 128 256 512
cycl
e
vector dimension M
IMAGING [10] Proposed
[8] Talati, N., Gupta, S., Mane, P., & Kvatinsky, S. (2016). Logic design within memristive memories using memristor-aided loGIC (MAGIC). TNANO.
[9] Imani, M., Gupta, S., & Rosing, T. (2017). Ultra-Efficient Processing In-Memory for Data Intensive Applications. In DAC.
Flex-support Evaluation21
0
2
4
6
8
10
100 400 700 1000
Late
ncy
(s)
Matrix size M
RRAM ARM
0
1
2
3
4
100 400 700 1000
Ener
gy /
MΒ³
(nJ)
Matrix size M
RRAM ARM
OUTLINE
Background
Theoretical Analysis
Integer Addition
Extension
Experimental Evaluation
Conclusion
Conclusion23
Theoretical analysis
β SIML model
β time complexity lower bound
Integer addition
β ripple carry adder
β carry select adder
β carry save adder
Extension
β multiplication support
β flex-point support
&A