optimizing fpga accelerator design for deep convolution neural networks by: mohamad kanafanai

21
Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Upload: aubrie-parrish

Post on 19-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Optimizing FPGA Accelerator Design for Deep Convolution neural NetworksBy: Mohamad Kanafanai

Page 2: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

OutlineIntroductionBackgroundMethodologyResultsEvaluation of the systemCriticismQ&A

Page 3: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

IntroductionCNN is extend from artificial

neural networkApplication include image

processing Requires high performance

computation hardwareDesign exploration is a must !

Page 4: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

What is Deep Convolution neural Networks ? Type of Machine learning8 stepsLimitationsFeed forward computation

Page 5: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Roof Line model Provide a graphical representation

of performance and productivity◦Rates and efficiencies(Gflops, % of peak)◦limitation◦Benefits

Focus ◦Computation◦Communication◦locality

Not for fine tuning

Page 6: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Types of dataIrrelevant Independent Dependent

Page 7: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Double buffering Allows for two way

communicationIncrease throughput

Page 8: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Main concerns Communication overheadBuffer managementBandwidth optimizationBetter Utilization of FPGA

Page 9: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Design ExplorationComputation

◦Loop scheduling◦Loop tile sizes

Communication ratio

Page 10: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Directives loop PipelineSoftware pipeliningIncrease throughput

Page 11: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Directives Loop UnrollingMaximizes computationData flow design

Page 12: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Directives Loop TillingDivides loops into smaller loops

◦ensure data stays in cache◦Great for Data reuse

Page 13: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Memory Optimization Polyhedral based optimizationLocal memory promotion for

irrelevant type communicationsData reuse

Page 14: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Designed Model

Page 15: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Detail of the final design

Page 16: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

ResultsVirtex 7 100 MHz as IP using VHLSIntel Xeon E5 2.2 GHz 15 MB cachePre synthesis report used for performance

and exploration

Page 17: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Evaluation of the system 17.42 X speedup on 1 thread GP implementation 4.8 X speedup on 16 thread GP implementation 18.6 watts vs 95 watts GP 3.62X speedup on ICCD2013 Design

Page 18: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

My opinionThe techniques used to optimize

loops are well thought out It’s a unique way of looking at an

acceleratorThe memory enhancement offer

great insight

Page 19: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Pitfall of the claimPre cached data testsEvaluation metrics when

comparing other designs Only tested using one imageTechnology difference Claiming Design has best

utilization

Page 20: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Q&A

Page 21: Optimizing FPGA Accelerator Design for Deep Convolution neural Networks By: Mohamad Kanafanai

Referencehttp://crd.lbl.gov/assets/pubs_presos/pa

rlab08-roofline-talk.pdfhttps://www.youtube.com/watch?v=n6h

pQwq7Inwhttp://en.wikipedia.org/wiki/Loop_tilinghttp://en.wikipedia.org/wiki/Polytope_m

odelChen Zhang, Peng Li, Guangyu Sun,

Yijin Guan, Bingjun Xiao, Jason Cong ,Center for Energy-Efficient Computing and Applications, Peking University, China, Computer Science Department, University of California, Los Angeles, USA