on-line learning architectures
TRANSCRIPT
Tarek Taha1
On-line Learning Architectures
Tarek M. Taha
Electrical and Computer Engineering Department University of Dayton
March 2017
Tarek Taha2
Device SPICE Model
C. Yakopcic, T. M. Taha, G. Subramanyam, and R. E. Pino, "Memristor SPICE Model and Crossbar Simulation Based on Devices with Nanosecond Switching Time," IEEE International JointConference on Neural Networks (IJCNN), August 2013. [BEST PAPER AWARD]
0 0.1 0.2 0.30
0.5
1
Voltage (V)
Cur
rent
(mA
)
-
Vol
tage
(V)
1 2 3 40
0.2
0.4
0.6
0.8
1
Cur
rent
( µA
)
-2 -1 0-0.4
-0.3
-0.2
-0.1
0
Voltage (V) -1.5 -1 -0.5 0 0.5 1 1.5-40
-20
0
20
40
60
Voltage (V)
Cur
rent
( µA
)
Univ of MichiganBoise State HP Labs
SPIC
E M
odel
Act
ual
Dev
ice
Tarek Taha3
Memristor Backpropagation Circuit
A B C
F1 F2
Online training via backpropagation circuits
The same memristor crossbar implements both forward and backward passes
Training Unit (L2)
CB
βC
AAB
β
ββ
inpu
ts
𝛿𝐿1,6
𝛿𝐿1,1
. . .
+ -
+ -+ -+ -+ -
+ -
ctrl2ctrl1
∑ target_2+
-
δL2,2
output_2
∑-
δL2,1
+target_1
output_1
Training Unit (L1)
𝛿𝐿2,1 𝛿𝐿2,2
Training Unit (L2)
CB
βC
AAB
β
ββ
inpu
ts
𝛿𝐿1,6
𝛿𝐿1,1
. . .
+ -
+ -+ -+ -+ -
+ -
ctrl2ctrl1
∑ target_2+
-
δL2,2
output_2
∑-
δL2,1
+target_1
output_1
Training Unit (L1)
𝛿𝐿2,1 𝛿𝐿2,2
Forward Pass Backward Pass
Tarek Taha4
Socrates: Online Training Architecture
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
3D stack eDRAM
Router
Multicore processing system
Implements backpropagation based online training
Two versions: 1) memristor core and 2) fully digital core
Has three 3D stack memory for storing training dataCB
βC
AAB
β
inpu
ts
Memristor learning core
Digital learning core
--OR--
Tarek Taha5
FPGA Implementation
NC = Neural CoreR = Router
Router
Neural Core
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
NC
R
OutIn
Input portOutput port
Routing bus
8 bitsS S S S
SS S S
S SS S
S
S S S S
S S SS
Router
Actual images to and out of FPGA based multicore neural processor:
We verified the system functionality by implementing the digital system on an FPGA.
Tarek Taha6
Training Efficiency
Accuracy(%) GPU Digital Memristor
MNIST 99% 97% 97%
COIL-20 100% 98% 97%
COIL-100 99% 99% 99%
Training compared to GTX980Ti GPU
Energy Efficiency:- Digital: 5900x- Memristor: 70000x
Speedup:- Digital: 14x- Memristor: 7x- Batch processing on GPU but not on specialized processors
Tarek Taha7
Inference Efficiency
Inference compared to GTX980Ti GPU
Energy Efficiency:- Digital: 7000x - Memristor: 358000x
Speedup:- Digital: 29x- Memristor: 60x
Memristor learning core is about 35x smaller in area:
- Digital: 0.52 mm2
- Memristor: 0.014 mm2
Tarek Taha8
Large Crossbar Simulation
0 2 4 6 8
x 104
0.9
0.92
0.94
0.96
0.98
1
Memristor
Pot
entia
l acr
oss
mem
risto
r (V
)
0 2 4 6 8
x 104
0.9
0.92
0.94
0.96
0.98
1
Memristor
Pote
ntia
l acr
oss
mem
risto
r (V)
SPICE 300×300 MATLAB 300×300
y1
.
.
.
Inpu
ts
. . .
yN/2
VM=1 V
V1=1 V
yj
RRf
R+ -
+ -
Studying training using memristor crossbars needs to be done in SPICE to account for circuit parasitics.
Simulating one cycle of a large crossbar in can take over 24 hours in SPICE. (training requires thousands of iterations and hence is nearly impossible in SPICE).
Developed a fast (~30ms) algorithm to approximate SPICE output with over 99% accuracy.
Enables us to examine learning on memristor crossbars.
Tarek Taha9
Memristor for On-Line Learning
On-line learning requires “highly tunable” or “slow” memristors This does not affect inference speed
Ron
Roff
10ns
Ron
Roff
10ns
500ns
Memory Memristor
On-line Learning Memristor
Tarek Taha10
LiNbO3 Memristor for Online Learning
Device is quite slow: order of tens of millisecond switching time.
Useful for backprogation training.
-3 -2 -1 0 1 2 3
-10
-5
0
5
10
Voltage (V)
Cur
rent
(mA
)
Approx. Switching Time
Switching Voltage
Minimum Pulse Width Required
10ns 6 1ps
10ns 4.5 10ps
100ns 4.5 100ps
1µs 4.5 1ns
10µs 4.5 10ns
Switching speed of devices and minimum programming pulse needed in neuromorphic training circuits.
0 500 1000 1500 2000-9.5
-9
-8.5
-8
-7.5
-7
-6.5
Pulse Number
Cur
rent
(mA
)
0 20 40 60 80 1002
3
4
5
6
Pulse Number
Cur
rent
(mA
)
Tarek Taha11
Unsupervised ClusteringBefore Training After Training
Wisconsin
Iris
Wine
0.2 0.25 0.3 0.35 0.40.45
0.5
0.55
0.6
0.65
0.7
0.75
BenignMalignant
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
BenignMalignant
0.8 0.81 0.82 0.830.5
0.55
0.6
0.65
0.7
SetosaVersicolorVirginica
-1 -0.8 -0.6 -0.4 -0.2 0-1
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
SetosaVersicolorVirginica
0.2 0.25 0.3 0.350.5
0.55
0.6
0.65
0.7
Class1Class2Class3
-1 -0.5 0 0.50.2
0.4
0.6
0.8
1
Class1Class2Class3
x1
x2
x3
x4
x5
x6
1
n1
n2
n3
1
x1’
x2’
x3’
x4’
x5’
x6’
Layer L1 Layer L2
We extended the backpropagation circuit to implement auto-encoder based unsupervised learning in memristor crossbars.
These graphs show the circuit learning and clustering data in an unsupervised manner.
Tarek Taha12
Application: Cybersecurity
0 500 1000 1500 2000 2500 30000
0.2
0.4
0.6
0.8
1
1.2
Normal packet
Dis
tanc
e be
tn in
put &
reco
nstru
ctio
n
0 500 1000 1500 2000 25000
0.2
0.4
0.6
0.8
1
1.2
Anomalous packet
Dis
tanc
e be
tn in
put &
reco
nstru
ctio
n
0 20 40 60 80 1000
20
40
60
80
100
False detection (%)
Det
ectio
n ra
te (%
)
The autoencoder based unsupervised training circuits were used to learn normal network packets.
When the memristor circuits were presented with anomalous network packets, these were detected with a 96.6% accuracy and a 4% false positive rate.
Could also implement rule based malicious packet detection (similar to SNORT) in a very compact circuit
Regex : user\s[^\n]{10}
Tarek Taha13
Application: Autonomous Agent for UAVs
IBM TrueNorth
Exploring implementation on: IBM TrueNorth processor Memristor circuits High performance clusters: CPUs and
GPUs
Cognitively Enhanced Complex Event Processing (CECEP) Architecture
Geared towards autonomous decision making in a variety of applications including autonomous UAVs
Event Output StreamsIO “Adapters”
CB
βC
AAB
β
inpu
ts
MemristorSystems
Tarek Taha14
Parallel Cognitive Systems LaboratoryResearch Engineers:• Raqib Hasan, PhD• Chris Yakopcic, PhD• Wei Song, PhD• Tanvir Atahary, PhD
Sponsors:
http://taha-lab.org
Doctoral Students:• Zahangir Alom • Rasitha Fernando• Ted Josue• Will Mitchell• Yangjie Qi• Nayim Rahman• Stefan Westberg• Ayesha Zaman• Shuo Zhang
Master’s Students:• Omar Faruq• Tom Mealy