introduction to neural networks - cornell university · introduction to neural networks ritchie...

47
Introduction to Neural Networks Ritchie Zhao, Zhiru Zhang School of Electrical and Computer Engineering ECE 5775 (Fall’17) High-Level Digital Design Automation

Upload: lytruc

Post on 08-Aug-2018

238 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

Introduction to Neural Networks

Ritchie Zhao, Zhiru ZhangSchool of Electrical and Computer Engineering

ECE 5775 (Fall’17)High-Level Digital Design Automation

Page 2: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ Neural networks have revolutionized the world– Self-driving vehicles– Advanced image recognition– Game AI

1

Rise of the Machines

Page 3: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

2

Rise of the Machines

▸ Neural networks have revolutionized research

Page 4: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ 1950’s: First artificial neural networks created based on biological structures in the human visual cortex

▸ 1980’s – 2000’s: NNs considered inferior to other, simpler algorithms (e.g. SVM, logistic regression)

▸ Mid 2000’s: NN research considered “dead”, machine learning conferences outright reject most NN papers

▸ 2010 – 2012: NNs begin winning large-scale image and document classification contests, beating other methods

▸ 2012 – Now: NNs prove themselves in many industrial applications (web search, translation, image analysis)

3

A Brief History

Page 5: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ We’ll discuss neural networks for solving supervised classification problems

4

Classification Problems

Cat

Bird

Person

Inputs Labels

Training Set

???

Inputs Predictions

Test Set

• Given a training set consisting of labeled inputs

• Learn a function which maps inputs to labels

• This function is used to predict the labels of new inputs

Page 6: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ Inputs: Pairs of numbers (𝒙𝟏, 𝒙𝟐)▸ Labels: 0 or 1

▸ This is a binary decision problem

▸ Real-life analogue:– Label = Raining or Not Raining– 𝒙𝟏 = relative humidity– 𝒙𝟐 = cloud coverage

5

A Simple Classification Problem

x1 x2 Label-0.7 -0.6 0-0.6 0.4 0-0.5 0.7 1-0.5 -0.2 0-0.1 0.7 10.1 -0.2 00.3 0.5 10.4 0.1 10.4 -0.7 00.7 0.2 1

Training Set

Page 7: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

6

Visualizing the Data

Plot of the data points

𝒙𝟏𝒙𝟐

x1 x2 Label-0.7 -0.6 0-0.6 0.4 0-0.5 0.7 1-0.5 -0.2 0-0.1 0.7 10.1 -0.2 00.3 0.5 10.4 0.1 10.4 -0.7 00.7 0.2 1

Training Set

Page 8: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ In this case, the data points can be classified with a linear decision boundary

7

Decision Function

Decision boundary

Page 9: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ The perceptron is the simplest possible neural network, containing only one neuron

▸ Described by the following equation:

𝑦 = 𝜎(*𝑤,𝑥,

.

,/0

+ 𝑏)

8

The Perceptron

𝑤, = weights𝑏 = bias𝜎 = activation function

Page 10: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ A 2-input, 1-output perceptron:

9

Visualizing the Perceptron

𝑤0𝜎

𝑤3+

𝑥0

𝑥3𝑦

𝑏

Linear function of �⃗� =𝑥0𝑥3

𝑊�⃗� + 𝑏

Non-linear function“squashes” output to

the interval (0,1)

OutputProbability that the label should be “1”

Page 11: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ The activation function σ is non-linear:

10

Activation Function

Unit Step Sigmoid ReLU

Discontinuous at origin, zero derivative

everywhere

Continuous anddifferentiable everywhere

Continuous but non-differentiable

at origin

11 + 𝑒9: max(0, 𝑥)

▸ First perceptrons used Unit Step. Early neural networks used Sigmoid. Modern deep networks use ReLU.

Page 12: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ Let’s plot the 2-input perceptron (sigmoid activation)𝑦 = 𝜎(𝑤0𝑥0 + 𝑤3𝑥3 + 𝑏)

11

The Perceptron Decision Boundary

𝑤0 = 10𝑤3 = 10

𝑤0 = 5𝑤3 = 20

𝑤0 = 20𝑤3 = 5

Ratio of weights change the direction of the decision boundary

Page 13: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ Let’s plot the 2-input perceptron (sigmoid activation)𝑦 = 𝜎(𝑤0𝑥0 + 𝑤3𝑥3 + 𝑏)

12

The Perceptron Decision Boundary

𝑤0 = 10𝑤3 = 10

𝑤0 = 5𝑤3 = 5

𝑤0 = 2𝑤3 = 2

Magnitude of weights change the steepness of the decision boundary

Page 14: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ Let’s plot the 2-input perceptron (sigmoid activation)𝑦 = 𝜎(𝑤0𝑥0 + 𝑤3𝑥3 + 𝑏)

13

The Perceptron Decision Boundary

𝑏 = 0 𝑏 = 5 𝑏 = −5

Bias moves boundary away the from origin

Page 15: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ With the right weights and bias we can create any (linear) decision boundary we want!

▸ But how do we compute the weights and bias?

▸ Solution: backpropagation training

14

Finding the Weights and Bias

Page 16: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ Let’s take the 2-input perceptron and initialize the weights and bias randomly

15

Backpropagation

𝑤0

𝜎

𝑤3

+

𝑥0

𝑥3

𝑦

𝑏1.4

5.2

−8.7

Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University

Page 17: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ Send one input pair through and calculate the output

16

Backpropagation

1.4

x1 x2 Label-0.7 -0.6 0

−0.7

−0.6

1.582.98 0.95 Output: 0.95

Target: 0.0

𝑤0

𝜎

𝑤3

+

𝑥0

𝑥3

𝑦

𝑏

5.2

−8.7

Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University

Page 18: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ How to change {𝑤0, 𝑤3, 𝑏} to move 𝑦 towards the target?

▸ Answer: partial derivatives JKJLM

, JKJLN

, JKJO

17

Backpropagation Training

1.4

−0.7

−0.6

1.582.98 0.95 Output: 0.95

Target: 0.0

𝑤0

𝜎

𝑤3

+

𝑥0

𝑥3

𝑦

𝑏

5.2

−8.7

Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University

Page 19: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ We can calculate the derivatives function-by-function▸ Green = partial derivative

18

Backpropagation

1.4

−0.7

−0.6

2.98 0.95

1.0𝜕𝑦𝜕𝑦 = 1.00.046

𝑠 𝑦𝑤0

𝜎

𝑤3

+

𝑥0

𝑥3

𝑦

𝑏

5.2

−8.7

Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University

𝑦 = σ(𝑠) =1

1 + 𝑒9S

𝜕𝑦𝜕𝑠 =

𝑒9S

(1 + 𝑒9S)3 = 0.046

Page 20: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ We can calculate the derivatives function-by-function ▸ Propagate partial derivatives backwards

19

Backpropagation

5.2

−8.7

1.4

−0.7

−0.6

2.98 0.95

1.0

𝑠 = 𝑤0𝑥0 + 𝑤3𝑥3 + 𝑏

𝜕𝑠𝜕𝑏 = 1

𝜕𝑠𝜕𝑤0

= 𝑥0

𝜕𝑠𝜕𝑤3

= 𝑥3

0.046

𝑠 𝑦

0.046

−0.032

−0.028

𝑤0

𝜎

𝑤3

+

𝑥0

𝑥3

𝑦

𝑏

Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University

Page 21: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ What we’ve done is apply the chain rule of derivatives

20

Backpropagation

5.2

−8.7

1.4

−0.7

−0.6

2.98 0.95

1.0

𝜕𝑦𝜕𝑤0

=𝜕𝑦𝜕𝑠

𝜕𝑠𝜕𝑤0

𝜕𝑦𝜕𝑠 = 0.046

𝜕𝑠𝜕𝑤0

= −0.7

0.046

𝑠 𝑦

0.046

−0.032

−0.028

𝑤0

𝜎

𝑤3

+

𝑥0

𝑥3

𝑦

𝑏

Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University

Page 22: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ In general, any operator in a neural network defines a forwards pass and a backwards pass

21

Backpropagation

𝑓

𝑥

𝑦

𝑓(𝑥, 𝑦)

𝜕𝐸𝜕𝑓

𝜕𝐸𝜕𝑥 =

𝜕𝐸𝜕𝑓

𝜕𝑓𝜕𝑥

𝜕𝐸𝜕𝑦 =

𝜕𝐸𝜕𝑓

𝜕𝑓𝜕𝑦

Inspired by course slides from L. Fei-Fei, J. Johnson, S. Yeung, CS231n at Stanford University

▸ Backpropagation allows us to build complex networks and still be able to update the weights

Page 23: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ We now know how to compute JKJLM

, JKJLN

, JKJO

▸ Parameter update rule (for minimizing 𝑦):

𝑤 = 𝑤 − η JKJL

η = learning rate or step size

▸ This is optimization by gradient descent, as we move in the direction of the steepest gradient

22

Weight Update

Page 24: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ For simplicity we minimized 𝑦, since label 𝑡 = 0▸ In practice we use a loss function 𝐿(𝑦, 𝑡)

– Square Loss: 𝐿 𝑦, 𝑡 = (𝑦 − 𝑡)3

– Cross Entropy: 𝐿 𝑦, 𝑡 = −𝑡 ln 𝑦 − 1 − 𝑡 ln(1 − 𝑦)

▸ For simplicity we dealt with 1 training sample▸ In practice we process multiple training samples

(a minibatch) and sum the gradients before updating the weights

23

Practical Differences

Page 25: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ The perceptron’s linear decision boundary is not very useful for complex problems▸ A deep neural network (DNN) consists of many

layers, each containing multiple neurons

24

Deep Neural Network

Image credit: http://www.opennn.net/

This network is fully connected

Input Layer Hidden LayersOutput Layer

Page 26: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ Each hidden unit learns a hidden feature▸ Later features depends on earlier features in the

network, so they will get more complex

25

What Happens in a DNN?

Image credit: http://www.opennn.net/

Simple FeaturesMore Complex Features

Page 27: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ A deep neural network creates highly non-linear decision boundaries

26

Deep Neural Network

2 units 3 units 4 units

Image credit: https://www.carl-olsson.com/fall-semester-2013/

Page 28: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ Convolutional neural network (CNN) is a neural network specifically built to analyze images▸ State-of-the-art in image classification▸ Key difference: the layers at the front are

convolutional layers27

Convolutional Neural Network

Convolutional Layers Fully-connected Layers

Image credit: http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/

Page 29: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

Fully-connected Layer 1. An output neuron is connected

to each input neuron

Convolutional Layer1. Inputs and outputs are vectors

of 2D feature maps2. An output pixel is connected

to its neighboring region oneach input feature map

3. All pixels on an output feature map use the same weights

28

The Convolutional Layer

Image credit: https://inspirehep.net/record/1501944/plots

Page 30: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

29

The Convolutional Layer

Image credit: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

Input Feature Map

Output Feature Map

▸ The conv layer moves a weight filter across the input map, doing a dot product at each location

▸ The weight filter detects local features in the input map

Page 31: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ Convolutional layers detect visual features▸ This shows features in the input image which cause

high activity, across 3 different conv layers in a CNN

30

Intuition on CNNs

Image credit: https://devblogs.nvidia.com/parallelforall/accelerate-machine-learning-cudnn-deep-neural-network-library/; H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks”, CACM Oct 2011

Page 32: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural NetworksCheng Zhang1, Peng Li2, Guangyu Sun1,3, Yijin Guan1, Bingjun Xiao2, Jason Cong2,3,1

1Center for Energy-Efficient Computing and Applications, Peking University2Computer Science Department, University of California, Los Angeles3PKU/UCLA Joint Research Institute in Science and EngineeringFPGA’15, Feb 2016

31

Accelerating CNNs on FPGA

Page 33: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

32

Convolutional Layer Code

1 for(row=0; row<R; row++) {2 for(col=0; col<C; col++) {3 for(to=0; to<M; to++) {4 for(ti=0; ti<N; ti++) {5 for(i=0; i<K; i++) {6 for(j=0; j<K; j++) {

output_fm[to][row][col] += weights[to][ti][i][j]*input_fm[ti][S*row+i][S*col+j];

}}}}}}

Loop over output pixelsLoop over output feature mapsLoop over input feature mapsLoop over filter pixels

Filters

Output Feature MapsInput Feature Maps

N MC

R

K

Page 34: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

33

Challenges for FPGA

...Computation Resource

PE-1 PE-2 PE-n

Inter-connect

buffer1On-chip memory

buffer2

External Memory

Off-chip Bus

Limited on-chip storage

Limited off-chip bandwidth

Limited compute and routing resources

▸ We can’t just unroll all the loops due to limited FPGA resources▸ Must choose the right pragmas and code transformations

Solution: Loop pipelining

Solution: Loop tiling

Solution: Data reuse, compute-memory balance

Page 35: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

34

Loop Tiling

1 for(row=0; row<R; row++) {2 for(col=0; col<C; col++) {3 for(to=0; to<M; to++) {4 for(ti=0; ti<N; ti++) {5 for(i=0; i<K; i++) {6 for(j=0; j<K; j++) {

output_fm[to][row][col] += weights[to][ti][i][j]*input_fm[ti][S*row+i][S*col+j];

}}}}}}

Loop over pixels in an output map

Filters

Output Feature MapsInput Feature Maps

N MC

R

K

Page 36: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

1 for(row=0; row<R; row+=Tr) {2 for(col=0; col<C; col+=Tc) {3 for(to=0; to<M; to++) {4 for(ti=0; ti<N; ti++) {5 for(trr=row; trr<min(row+Tr, R); trr++) {6 for(tcc=col; tcc<min(tcc+Tc, C); tcc++) {7 for(i=0; i<K; i++) {8 for(j=0; j<K; j++) {

output_fm[too][trr][tcc] += weights[too][tii][i][j]*input_fm[tii][S*trr+i][S*tcc+j];

}}}}}}}}}} 35

Loop Tiling

Filters

Output Feature MapsInput Feature Maps

N MC

R

K

Loops over pixelsin each tile

Tc

Tr

Loop over different tiles in output map

Page 37: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

1 for(row=0; row<R; row+=Tr) {2 for(col=0; col<C; col+=Tc) {3 for(to=0; to<M; to+=Tm) {4 for(ti=0; ti<N; ti+=Tn) {

// software: write output feature map// software: write input feature map + filters

5 for(trr=row; trr<min(row+Tr, R); trr++) {6 for(tcc=col; tcc<min(tcc+Tc, C); tcc++) {7 for(too=to; too<min(to+Tm, M); too++) {8 for(tii=ti; tii<(ti+Tn, N); tii++) {9 for(i=0; i<K; i++) {10 for(j=0; j<K; j++) {

output_fm[too][trr][tcc] += weights[too][tii][i][j]*input_fm[tii][S*trr+i][S*tcc+j];

}}}}}}

// software: read output feature map}}}}

36

Code with Loop Tiling

Software Portion

FPGA Portion

Maximize memory reuse

Maximize computational performance

Page 38: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

5 for(trr=row; trr<min(row+Tr, R); trr++) {6 for(tcc=col; tcc<min(tcc+Tc, C); tcc++) {7 for(too=to; too<min(to+Tm, M); too++) {8 for(tii=ti; tii<(ti+Tn, N); tii++) {9 for(i=0; i<K; i++) {10 for(j=0; j<K; j++) {

output_fm[too][trr][tcc] += weights[too][tii][i][j]*input_fm[ti][S*trr+i][S*tcc+j];

}}}}}}

5 for(trr=row; trr<min(row+Tr, R); trr++) {6 for(tcc=col; tcc<min(tcc+Tc, C); tcc++) {7 for(too=to; too<min(to+Tm, M); too++) {8 for(tii=ti; tii<(ti+Tn, N); tii++) {9 for(i=0; i<K; i++) {10 for(j=0; j<K; j++) {

output_fm[too][trr][tcc] += weights[too][tii][i][j]*input_fm[ti][S*trr+i][S*tcc+j];

}}}}}}

9 for(i=0; i<K; i++) {10 for(j=0; j<K; j++) {5 for(trr=row; trr<min(row+Tr, R); trr++) {6 for(tcc=col; tcc<min(tcc+Tc, C); tcc++) {7 for(too=to; too<min(to+Tm, M); too++) {8 for(tii=ti; tii<(ti+Tn, N); tii++) {

output_fm[too][trr][tcc] += weights[too][tii][i][j]*input_fm[tii][S*trr+i][S*tcc+j];

}}}}}}

Reorder these two loops to the top level9 for(i=0; i<K; i++) {10 for(j=0; j<K; j++) {5 for(trr=row; trr<min(row+Tr, R); trr++) {6 for(tcc=col; tcc<min(tcc+Tc, C); tcc++) {

#pragma HLS pipeline7 for(too=to; too<min(to+Tm, M); too++) {8 for(tii=ti; tii<(ti+Tn, N); tii++) {

output_fm[too][trr][tcc] += weights[too][tii][i][j]*input_fm[tii][S*trr+i][S*tcc+j];

}}}}}}

9 for(i=0; i<K; i++) {10 for(j=0; j<K; j++) {5 for(trr=row; trr<min(row+Tr, R); trr++) {6 for(tcc=col; tcc<min(tcc+Tc, C); tcc++) {

#pragma HLS pipeline7 for(too=to; too<min(to+Tm, M); too++) {

#pragma HLS unroll8 for(tii=ti; tii<(ti+Tn, N); tii++) {

output_fm[too][trr][tcc] += weights[too][tii][i][j]*input_fm[ti][S*trr+i][S*tcc+j];

}}}}}}

9 for(i=0; i<K; i++) {10 for(j=0; j<K; j++) {5 for(trr=row; trr<min(row+Tr, R); trr++) {6 for(tcc=col; tcc<min(tcc+Tc, C); tcc++) {

#pragma HLS pipeline7 for(too=to; too<min(to+Tm, M); too++) {

#pragma HLS unroll8 for(tii=ti; tii<(ti+Tn, N); tii++) {

#pragma HLS unrolloutput_fm[too][trr][tcc] += weights[too][tii][i][j]*input_fm[ti][S*trr+i][S*tcc+j];

}}}}}}

Generated HardwarePerformance determined by tile factors Tn and TmAmount of data transfer determined by Tr and Tc

Tm

Tn

37

Optimizing for On-Chip Performance

Page 39: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ Challenge: Number of available optimizations present a huge space of possible designs– Optimal loop order? Tile size for each loop?

▸ Implementing and testing each design by hand will be slow and error-prone– Some designs exceed the on-chip resources

▸ Solution: Performance modeling + automated design space exploration

38

Design Space Complexity

Page 40: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ We calculate the following design metrics:– Total number of operations (FLOP)

• Depends on the CNN model parameters

– Total external memory access (Byte)• Depends on the CNN weight and activation size

– Total execution time (S)• Depends on the hardware architecture (e.g. tile factors Tm and Tn)• Ignore resource constrains for now

▸ Execution Time = Number of Cycles×Clock Period

▸ Number of Cycles ≈ ^_̀

× a_b

×𝑅×𝐶×𝐾3

39

Performance Modeling

Page 41: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

40

Performance Modeling

...Computation Resource

PE-1 PE-2 PE-n

Inter-connect

buffer1On-chip memory

buffer2

External Memory

Off-chip Bus

Computational Throughput

Required Bandwidth

Computation To Communication (CTC) Ratio

GFLOP/S

FLOP / (Byte Accessed)

GByte/S

=Totalnumberofoperations

Totalexecutiontime

=TotalnumberofoperationsTotalexternalmemoryaccess

𝐺𝐵𝑦𝑡𝑒𝑆

= 𝐺𝐹𝐿𝑂𝑃/𝑠

𝐹𝐿𝑂𝑃/(𝐵𝑦𝑡𝑒𝐴𝑐𝑐𝑒𝑠𝑠𝑒𝑑)

=ComputationalThroughput

CTCRatio

Page 42: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

41

Roofline MethodC

ompu

tatio

nal T

hrou

ghpu

t

Computation To Communication (CTC) Ratio

Required Bandwidth

GFLOP/S

FLOP / Byte

Gbyte/S

Design

Page 43: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

42

Roofline MethodC

ompu

tatio

nal T

hrou

ghpu

t

Computation To Communication (CTC) Ratio

GFLOP/S

FLOP / Byte

Bandwidth Roof(Device memory

bandwidth)Computational Roof(Max device throughput)

“The Roofline”Exceeds On-Chip Resources

Bandwidth-BottleneckedTheoretical Throughput

Real Throughput

Page 44: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

0

20

40

60

80

100

120

0 10 20 30 40 50 60

43

Design Space Exploration

Computational RoofB

Bandwidth Roof

A

Com

puta

tiona

l Thr

ough

put

Computation To Communication (CTC) Ratio

Page 45: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

▸ The accelerator design is implemented with Vivado HLS on the VC707 board containing a Virtex7 FPGA

▸ Hardware resource utilization

▸ Comparison of theoretical and real performance

44

Experimental Evaluation

Resource DSP BRAM LUT FFUsed 2240 1024 186251 205704Available 2800 2060 303600 607200Utilization 80% 50% 61% 34%

Layer 2 Theoretical Measured Difference5.11ms 5.25ms ~5%

Page 46: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

45

Experimental Results

0

10

20

30

40

50

60

70

80

90

100

1 thread -O3 16 threads -O3 FPGA

Energy

0

2

4

6

8

10

12

14

16

18

20

1 thread -O3 16 threads -O3 FPGA

Speedup

90x

20x

18x

5x

CPU Xeon E5-2430 (32nm) 16 cores 2.2 GHzgcc 4.7.2 –O3OpenMP 3.0

FPGA Virtex7-485t (28nm) 448 PEs 100MHzVivado 2013.4Vivado HLS 2013.4

Page 47: Introduction to Neural Networks - Cornell University · Introduction to Neural Networks Ritchie Zhao, ... CS231n at Stanford University ... Convolutional layers detect visual features

FIN

46