welcome to iscaconf.orgtitle microsoft powerpoint - snapea-isca18.pptx author vakhl created date...
TRANSCRIPT
![Page 1: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/1.jpg)
Vahideh Akhlaghi Amir YazdanbakhshKambiz Samadi Rajesh K. Gupta Hadi Esmaeilzadeh
SnaPEA:Predictive Early Activation for Reducing Computation
In Deep Convolutional Neural Networks
University of California, San Diego †Georgia Institute of Technology
‡Qualcomm Technologies, Inc
Equal Contribution
ISCA ’18
* *†
‡
*
![Page 2: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/2.jpg)
CNNs perform trillions of operations for one input
. . . Dog
GoogLeNet 283,000,000,000 Ops
SqueezeNet 222,000,000,000 Ops
AlexNet 1,147,000,000,000 OpsVGG-16 16,362,000,000,000 Ops
CNN models Operations for inference
2
![Page 3: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/3.jpg)
. . . Dog
GoogLeNet 283,000,000,000 Ops
SqueezeNet 222,000,000,000 Ops
AlexNet 1,147,000,000,000 OpsVGG-16 16,362,000,000,000 Ops
CNN models Operations for inference
≥ 90% of operations are for convolutional layers
Convolutions dominate CNN computation
3
![Page 4: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/4.jpg)
Research challenge:How to reduce CNN computation with minimal effect on accuracy?
Our solution: SnaPEA1. Leverage algorithmic structure2. Exploit runtime information3. Tune up with static multi-variable optimization
4
![Page 5: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/5.jpg)
(1) Algorithmic structure of CNNs guides SnaPEA
. . . . . .
Pooling
Conv
ReLU
Norm
al.
Conv
ReLU
Conv
ReLU
Kernels
. . .
𝒘𝒌𝒋𝒊𝒙𝒌𝒋𝒊𝒊𝒋𝒌
5
![Page 6: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/6.jpg)
(1) Algorithmic structure of CNNs guides SnaPEA
. . . . . .
Pooling
Conv
ReLU
Norm
al.
Conv
ReLU
Conv
ReLU
Rectified Linear Unit (ReLU)
Kernels
. . .
𝒘𝒌𝒋𝒊𝒙𝒌𝒋𝒊𝒊𝒋𝒌
6
![Page 7: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/7.jpg)
Opportunity to reduce the computation
Large number of negative convolution outputs (61% on average)
0%
20%
40%
60%
80%
100%
AlexNet GoogLeNet SqueezeNet VGGNet Average
Neg
ativ
e In
puts
to th
e Ac
tivat
ion
Laye
rs
. . .
Conv
. . .
ReLU
GoogLeNet
Black pixels are zero values
7
![Page 8: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/8.jpg)
ReLU makes negative outputs zero: cut convolution short
Early termination of convolution
Blue boxes are the performed operations in two highlighted convolutions
InputOutput
Convolution
8
![Page 9: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/9.jpg)
GoogLeNet
. . . . . .
Pooling
Conv
ReLU
Norm
al.
Conv
ReLU
Conv
ReLU
(2) Runtime information enables reducing computation
Varying distribution of zero and non-zero outputs
Rectified Linear Unit (ReLU)
9
![Page 10: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/10.jpg)
SnaPEA: Principles
SnaPEA: Leveraging algorithmic structure of CNNs and runtime information
Reduce computation without accuracy loss
Trade accuracy for further computation reduction
Add minimal hardware overhead
10
![Page 11: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/11.jpg)
+ + + + + + + + + + + +X
+ - + + + - + - - - + -w
+ - - + + + + + - - + -PartialSum
ReLU 0
Original convolution
SnaPEA: An illustrative example
11
![Page 12: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/12.jpg)
+ + + + + + + + + + + +X
+ - + + + - + - - - + -w
+ - - + + + + + - - + -PartialSum
ReLU 0
+ + + + + + - - - - - -w
+ + + + + + + + + + + +X
+ + + + + + + - - - -PartialSum -
ReLU 0
Convolution in SnaPEA (Exact mode)
Original convolution
SnaPEA: An illustrative example (Exact mode)
12
![Page 13: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/13.jpg)
0
20
40
60
80
100
AlexNet GoogleNet SqueezeNet VGGNet
% N
egat
ive
Wei
ghts
Potential benefits in the exact mode
On average, 54% of the weights are negative13
![Page 14: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/14.jpg)
w
X
+ + +PartialSum
≤ th+ + +
+ - +
SnaPEA: An illustrative example (Predictive mode)
Convolution in SnaPEA (Predictive mode)
Partial Sum
+ + + + + + + + + + + +X
+ - + + + + + - - - - -w
+ + + + + + + + - - - -
Yes
Speculation operations
14
+ + + + + + + + + + + +X
+ - + + + + + - - - - -w
+ + + + + + + + - - - -Partial Sum
No
![Page 15: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/15.jpg)
Speculation operations
X* w x * wGroup 1 Group 2 Group n
Largeabsolute value
Large Small
n largest weights
largest weights from each group
Smallabsolute value
X* w x * w15
![Page 16: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/16.jpg)
Speculation parameters:
Th: Threshold N: Number of speculation operations
Find (Th, N) for all kernels in a CNNto minimize operations and satisfy the accuracy
Optimize the level of speculation
16
![Page 17: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/17.jpg)
… … … …All convolution kernels in a CNN
Layer 2 Layer LLayer 1
(Th,N)
Kernel Profiling
Local Optimization
Global Optimization
Optimize the level of speculation
17
![Page 18: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/18.jpg)
ThresholdValue
# ofOperations
… … … …Layer 2 Layer LLayer 1
Dog
Per kernel sensitivity analysis
… … … …All convolution kernels in a CNN
Layer 2 Layer LLayer 1
(Th,N)
Kernel Profiling
Local Optimization
Global Optimization
Optimize the level of speculation
18
![Page 19: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/19.jpg)
ThresholdValue
# ofOperations
… … … …Layer 2 Layer LLayer 1
Cat
Per kernel sensitivity analysis
… … … …All convolution kernels in a CNN
Layer 2 Layer LLayer 1
(Th,N)
Kernel Profiling
Local Optimization
Global Optimization
Optimize the level of speculation
19
![Page 20: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/20.jpg)
ThresholdValue
# ofOperations
Per kernel sensitivity analysis
… … … …Layer 2 Layer LLayer 1
Dog
… … … …All convolution kernels in a CNN
Layer 2 Layer LLayer 1
(Th,N)
Kernel Profiling
Local Optimization
Global Optimization
Optimize the level of speculation
20
![Page 21: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/21.jpg)
Set of configurations per layer
… … … …Layer 2 Layer LLayer 1
Dog
…Kernel mKernel 2Kernel 1
…Kernel mKernel 2Kernel 1 …
Dog
… … … …All convolution kernels in a CNN
Layer 2 Layer LLayer 1
(Th,N)
Kernel Profiling
Local Optimization
Global Optimization
Optimize the level of speculation
21
![Page 22: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/22.jpg)
… … … …Layer 2 Layer LLayer 1
…Kernel kKernel 2Kernel 1
…Kernel kKernel 2Kernel 1 …
Set of configurations per layer
Dog
Dog
… … … …All convolution kernels in a CNN
Layer 2 Layer LLayer 1
(Th,N)
Kernel Profiling
Local Optimization
Global Optimization
Optimize the level of speculation
22
![Page 23: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/23.jpg)
… … … …Layer 2 Layer LLayer 1
Cat…Layer L
Kernel kLayer 1
Kernel 2Layer 1
Kernel 1
Adjust parameters regarding the cross-layer effect
… … … …All convolution kernels in a CNN
Layer 2 Layer LLayer 1
(Th,N)
Kernel Profiling
Local Optimization
Global Optimization
Optimize the level of speculation
23
![Page 24: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/24.jpg)
… … … …Layer 2 Layer LLayer 1
Bird…Layer L
Kernel kLayer 1
Kernel 2Layer 1
Kernel 1
Adjust parameters regarding the cross-layer effect
… … … …All convolution kernels in a CNN
Layer 2 Layer LLayer 1
(Th,N)
Kernel Profiling
Local Optimization
Global Optimization
Optimize the level of speculation
24
![Page 25: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/25.jpg)
Adjust parameters regarding the cross-layer effect
… … … …Layer 2 Layer LLayer 1
Dog…Layer L
Kernel kLayer 1
Kernel 2Layer 1
Kernel 1
… … … …All convolution kernels in a CNN
Layer 2 Layer LLayer 1
(Th,N)
Kernel Profiling
Local Optimization
Global Optimization
Optimize the level of speculation
25
![Page 26: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/26.jpg)
SnaPEA: Hardware implementation
Add low-overhead sign checks and threshold checks to the hardware
… …
PE1,1 PE1,m
PEn,1 PEn,m
≤
Partial result
Terminate
Terminate
Sign-bit
Exact mode
Predictive mode
Threshold
…PAU
MAC
PAU PAU
…MAC MAC
PAU
MAC
PAU PAU
…MAC MAC
…PAU
MAC
PAU PAU
…MAC MAC
PAU
MAC
PAU PAU
…MAC MAC
Prediction Activation Unit (PAU)
26
![Page 27: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/27.jpg)
Compute Lane
Weight and In/Out Buffer
Index Buffer
K Compute Lanes
MAC
Prediction Activation Unit (PAU)
SnaPEA: Hardware implementation
Processing Engine (PE)
27
![Page 28: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/28.jpg)
CNN Model AlexNet2012
GoogLeNet2015
SqueezeNet2016
VGG-162014
Top-1 Accuracy 57.2% 68.7% 57.5% 70.5%
Top-5 Accuracy 80.1% 89.0% 80.3% 89.9%
Benchmarks
Optimization
Hardware implementation
Optimization algorithm built on top of Caffe
Simulation: Cycle accuratePower estimation: Design Compiler using TSMC 45 nmBaseline design: Eyeriss with the same number MAC units (256)SnaPEA area overhead compared to Eyeriss: 4.5%
28
Experimental setup
![Page 29: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/29.jpg)
1.27 1.34 1.29 1.24 1.281.37 1.45 1.44
1.261.38
1.54
2.02
1.52 1.511.63
1.892.08
1.83 1.81 1.81
0
0.5
1
1.5
2
2.5
AlexNet GoogleNet SqueezeNet VGGNet Geomean
Exact Predictive (accuracy loss <= 1%)Predictive (accuracy loss <= 2%) Predictive (accuracy loss <= 3%)
Spee
dup
over
Eye
riss
29
Experimental results
![Page 30: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/30.jpg)
Layers in the predictive mode for accuracy loss ≤ 3%
Network % of Conv Layers
Speedup Energy Improvement
AlexNet 60.0 2.11 1.97GoogleNet 84.2 2.14 2.04
SqueezeNet 65.4 1.94 1.84VGGNet 61.5 1.87 1.73
Experimental results
On average, 68% of layers operate inthe predictive mode (3% accuracy drop). 30
![Page 31: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/31.jpg)
Spee
dup
Highest speedup (3.6x) in a layer in GoogLeNet31
Experimental results
![Page 32: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM](https://reader033.vdocuments.mx/reader033/viewer/2022050515/5f9f2a4316cfb605ef7eb30c/html5/thumbnails/32.jpg)
SnaPEAExploit algorithmic structure and runtime information Reduce computations in convolutional layersControl the accuracy with multi-variable optimizationAdd minimal hardware overhead
Future directionsLeverage runtime information (e.g., patterns in inputs and activations)Expand to other activation functions (e.g., sigmoid)Tune up the hardware for more parallelism
Conclusion
32