welcome to iscaconf.orgtitle microsoft powerpoint - snapea-isca18.pptx author vakhl created date...

32
Vahideh Akhlaghi Amir Yazdanbakhsh Kambiz Samadi Rajesh K. Gupta Hadi Esmaeilzadeh SnaPEA: Predictive Early Activation for Reducing Computation In Deep Convolutional Neural Networks University of California, San Diego Georgia Institute of Technology Qualcomm Technologies, Inc Equal Contribution ISCA ’18 * *† *

Upload: others

Post on 08-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

Vahideh Akhlaghi Amir YazdanbakhshKambiz Samadi Rajesh K. Gupta Hadi Esmaeilzadeh

SnaPEA:Predictive Early Activation for Reducing Computation

In Deep Convolutional Neural Networks

University of California, San Diego †Georgia Institute of Technology

‡Qualcomm Technologies, Inc

Equal Contribution

ISCA ’18

* *†

*

Page 2: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

CNNs perform trillions of operations for one input

. . . Dog

GoogLeNet 283,000,000,000 Ops

SqueezeNet 222,000,000,000 Ops

AlexNet 1,147,000,000,000 OpsVGG-16 16,362,000,000,000 Ops

CNN models Operations for inference

2

Page 3: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

. . . Dog

GoogLeNet 283,000,000,000 Ops

SqueezeNet 222,000,000,000 Ops

AlexNet 1,147,000,000,000 OpsVGG-16 16,362,000,000,000 Ops

CNN models Operations for inference

≥ 90% of operations are for convolutional layers

Convolutions dominate CNN computation

3

Page 4: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

Research challenge:How to reduce CNN computation with minimal effect on accuracy?

Our solution: SnaPEA1. Leverage algorithmic structure2. Exploit runtime information3. Tune up with static multi-variable optimization

4

Page 5: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

(1) Algorithmic structure of CNNs guides SnaPEA

. . . . . .

Pooling

Conv

ReLU

Norm

al.

Conv

ReLU

Conv

ReLU

Kernels

. . .

𝒘𝒌𝒋𝒊𝒙𝒌𝒋𝒊𝒊𝒋𝒌

5

Page 6: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

(1) Algorithmic structure of CNNs guides SnaPEA

. . . . . .

Pooling

Conv

ReLU

Norm

al.

Conv

ReLU

Conv

ReLU

Rectified Linear Unit (ReLU)

Kernels

. . .

𝒘𝒌𝒋𝒊𝒙𝒌𝒋𝒊𝒊𝒋𝒌

6

Page 7: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

Opportunity to reduce the computation

Large number of negative convolution outputs (61% on average)

0%

20%

40%

60%

80%

100%

AlexNet GoogLeNet SqueezeNet VGGNet Average

Neg

ativ

e In

puts

to th

e Ac

tivat

ion

Laye

rs

. . .

Conv

. . .

ReLU

GoogLeNet

Black pixels are zero values

7

Page 8: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

ReLU makes negative outputs zero: cut convolution short

Early termination of convolution

Blue boxes are the performed operations in two highlighted convolutions

InputOutput

Convolution

8

Page 9: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

GoogLeNet

. . . . . .

Pooling

Conv

ReLU

Norm

al.

Conv

ReLU

Conv

ReLU

(2) Runtime information enables reducing computation

Varying distribution of zero and non-zero outputs

Rectified Linear Unit (ReLU)

9

Page 10: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

SnaPEA: Principles

SnaPEA: Leveraging algorithmic structure of CNNs and runtime information

Reduce computation without accuracy loss

Trade accuracy for further computation reduction

Add minimal hardware overhead

10

Page 11: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

+ + + + + + + + + + + +X

+ - + + + - + - - - + -w

+ - - + + + + + - - + -PartialSum

ReLU 0

Original convolution

SnaPEA: An illustrative example

11

Page 12: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

+ + + + + + + + + + + +X

+ - + + + - + - - - + -w

+ - - + + + + + - - + -PartialSum

ReLU 0

+ + + + + + - - - - - -w

+ + + + + + + + + + + +X

+ + + + + + + - - - -PartialSum -

ReLU 0

Convolution in SnaPEA (Exact mode)

Original convolution

SnaPEA: An illustrative example (Exact mode)

12

Page 13: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

0

20

40

60

80

100

AlexNet GoogleNet SqueezeNet VGGNet

% N

egat

ive

Wei

ghts

Potential benefits in the exact mode

On average, 54% of the weights are negative13

Page 14: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

w

X

+ + +PartialSum

≤ th+ + +

+ - +

SnaPEA: An illustrative example (Predictive mode)

Convolution in SnaPEA (Predictive mode)

Partial Sum

+ + + + + + + + + + + +X

+ - + + + + + - - - - -w

+ + + + + + + + - - - -

Yes

Speculation operations

14

+ + + + + + + + + + + +X

+ - + + + + + - - - - -w

+ + + + + + + + - - - -Partial Sum

No

Page 15: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

Speculation operations

X* w x * wGroup 1 Group 2 Group n

Largeabsolute value

Large Small

n largest weights

largest weights from each group

Smallabsolute value

X* w x * w15

Page 16: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

Speculation parameters:

Th: Threshold N: Number of speculation operations

Find (Th, N) for all kernels in a CNNto minimize operations and satisfy the accuracy

Optimize the level of speculation

16

Page 17: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

… … … …All convolution kernels in a CNN

Layer 2 Layer LLayer 1

(Th,N)

Kernel Profiling

Local Optimization

Global Optimization

Optimize the level of speculation

17

Page 18: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

ThresholdValue

# ofOperations

… … … …Layer 2 Layer LLayer 1

Dog

Per kernel sensitivity analysis

… … … …All convolution kernels in a CNN

Layer 2 Layer LLayer 1

(Th,N)

Kernel Profiling

Local Optimization

Global Optimization

Optimize the level of speculation

18

Page 19: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

ThresholdValue

# ofOperations

… … … …Layer 2 Layer LLayer 1

Cat

Per kernel sensitivity analysis

… … … …All convolution kernels in a CNN

Layer 2 Layer LLayer 1

(Th,N)

Kernel Profiling

Local Optimization

Global Optimization

Optimize the level of speculation

19

Page 20: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

ThresholdValue

# ofOperations

Per kernel sensitivity analysis

… … … …Layer 2 Layer LLayer 1

Dog

… … … …All convolution kernels in a CNN

Layer 2 Layer LLayer 1

(Th,N)

Kernel Profiling

Local Optimization

Global Optimization

Optimize the level of speculation

20

Page 21: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

Set of configurations per layer

… … … …Layer 2 Layer LLayer 1

Dog

…Kernel mKernel 2Kernel 1

…Kernel mKernel 2Kernel 1 …

Dog

… … … …All convolution kernels in a CNN

Layer 2 Layer LLayer 1

(Th,N)

Kernel Profiling

Local Optimization

Global Optimization

Optimize the level of speculation

21

Page 22: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

… … … …Layer 2 Layer LLayer 1

…Kernel kKernel 2Kernel 1

…Kernel kKernel 2Kernel 1 …

Set of configurations per layer

Dog

Dog

… … … …All convolution kernels in a CNN

Layer 2 Layer LLayer 1

(Th,N)

Kernel Profiling

Local Optimization

Global Optimization

Optimize the level of speculation

22

Page 23: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

… … … …Layer 2 Layer LLayer 1

Cat…Layer L

Kernel kLayer 1

Kernel 2Layer 1

Kernel 1

Adjust parameters regarding the cross-layer effect

… … … …All convolution kernels in a CNN

Layer 2 Layer LLayer 1

(Th,N)

Kernel Profiling

Local Optimization

Global Optimization

Optimize the level of speculation

23

Page 24: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

… … … …Layer 2 Layer LLayer 1

Bird…Layer L

Kernel kLayer 1

Kernel 2Layer 1

Kernel 1

Adjust parameters regarding the cross-layer effect

… … … …All convolution kernels in a CNN

Layer 2 Layer LLayer 1

(Th,N)

Kernel Profiling

Local Optimization

Global Optimization

Optimize the level of speculation

24

Page 25: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

Adjust parameters regarding the cross-layer effect

… … … …Layer 2 Layer LLayer 1

Dog…Layer L

Kernel kLayer 1

Kernel 2Layer 1

Kernel 1

… … … …All convolution kernels in a CNN

Layer 2 Layer LLayer 1

(Th,N)

Kernel Profiling

Local Optimization

Global Optimization

Optimize the level of speculation

25

Page 26: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

SnaPEA: Hardware implementation

Add low-overhead sign checks and threshold checks to the hardware

… …

PE1,1 PE1,m

PEn,1 PEn,m

Partial result

Terminate

Terminate

Sign-bit

Exact mode

Predictive mode

Threshold

…PAU

MAC

PAU PAU

…MAC MAC

PAU

MAC

PAU PAU

…MAC MAC

…PAU

MAC

PAU PAU

…MAC MAC

PAU

MAC

PAU PAU

…MAC MAC

Prediction Activation Unit (PAU)

26

Page 27: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

Compute Lane

Weight and In/Out Buffer

Index Buffer

K Compute Lanes

MAC

Prediction Activation Unit (PAU)

SnaPEA: Hardware implementation

Processing Engine (PE)

27

Page 28: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

CNN Model AlexNet2012

GoogLeNet2015

SqueezeNet2016

VGG-162014

Top-1 Accuracy 57.2% 68.7% 57.5% 70.5%

Top-5 Accuracy 80.1% 89.0% 80.3% 89.9%

Benchmarks

Optimization

Hardware implementation

Optimization algorithm built on top of Caffe

Simulation: Cycle accuratePower estimation: Design Compiler using TSMC 45 nmBaseline design: Eyeriss with the same number MAC units (256)SnaPEA area overhead compared to Eyeriss: 4.5%

28

Experimental setup

Page 29: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

1.27 1.34 1.29 1.24 1.281.37 1.45 1.44

1.261.38

1.54

2.02

1.52 1.511.63

1.892.08

1.83 1.81 1.81

0

0.5

1

1.5

2

2.5

AlexNet GoogleNet SqueezeNet VGGNet Geomean

Exact Predictive (accuracy loss <= 1%)Predictive (accuracy loss <= 2%) Predictive (accuracy loss <= 3%)

Spee

dup

over

Eye

riss

29

Experimental results

Page 30: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

Layers in the predictive mode for accuracy loss ≤ 3%

Network % of Conv Layers

Speedup Energy Improvement

AlexNet 60.0 2.11 1.97GoogleNet 84.2 2.14 2.04

SqueezeNet 65.4 1.94 1.84VGGNet 61.5 1.87 1.73

Experimental results

On average, 68% of layers operate inthe predictive mode (3% accuracy drop). 30

Page 31: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

Spee

dup

Highest speedup (3.6x) in a layer in GoogLeNet31

Experimental results

Page 32: Welcome to Iscaconf.orgTitle Microsoft PowerPoint - snapea-isca18.pptx Author vakhl Created Date 6/7/2018 1:31:46 PM

SnaPEAExploit algorithmic structure and runtime information Reduce computations in convolutional layersControl the accuracy with multi-variable optimizationAdd minimal hardware overhead

Future directionsLeverage runtime information (e.g., patterns in inputs and activations)Expand to other activation functions (e.g., sigmoid)Tune up the hardware for more parallelism

Conclusion

32