optimizing fpga accelerator design for deep convolution neural networks by: mohamad kanafanai
TRANSCRIPT
Optimizing FPGA Accelerator Design for Deep Convolution neural NetworksBy: Mohamad Kanafanai
OutlineIntroductionBackgroundMethodologyResultsEvaluation of the systemCriticismQ&A
IntroductionCNN is extend from artificial
neural networkApplication include image
processing Requires high performance
computation hardwareDesign exploration is a must !
What is Deep Convolution neural Networks ? Type of Machine learning8 stepsLimitationsFeed forward computation
Roof Line model Provide a graphical representation
of performance and productivity◦Rates and efficiencies(Gflops, % of peak)◦limitation◦Benefits
Focus ◦Computation◦Communication◦locality
Not for fine tuning
Types of dataIrrelevant Independent Dependent
Double buffering Allows for two way
communicationIncrease throughput
Main concerns Communication overheadBuffer managementBandwidth optimizationBetter Utilization of FPGA
Design ExplorationComputation
◦Loop scheduling◦Loop tile sizes
Communication ratio
Directives loop PipelineSoftware pipeliningIncrease throughput
Directives Loop UnrollingMaximizes computationData flow design
Directives Loop TillingDivides loops into smaller loops
◦ensure data stays in cache◦Great for Data reuse
Memory Optimization Polyhedral based optimizationLocal memory promotion for
irrelevant type communicationsData reuse
Designed Model
Detail of the final design
ResultsVirtex 7 100 MHz as IP using VHLSIntel Xeon E5 2.2 GHz 15 MB cachePre synthesis report used for performance
and exploration
Evaluation of the system 17.42 X speedup on 1 thread GP implementation 4.8 X speedup on 16 thread GP implementation 18.6 watts vs 95 watts GP 3.62X speedup on ICCD2013 Design
My opinionThe techniques used to optimize
loops are well thought out It’s a unique way of looking at an
acceleratorThe memory enhancement offer
great insight
Pitfall of the claimPre cached data testsEvaluation metrics when
comparing other designs Only tested using one imageTechnology difference Claiming Design has best
utilization
Q&A
Referencehttp://crd.lbl.gov/assets/pubs_presos/pa
rlab08-roofline-talk.pdfhttps://www.youtube.com/watch?v=n6h
pQwq7Inwhttp://en.wikipedia.org/wiki/Loop_tilinghttp://en.wikipedia.org/wiki/Polytope_m
odelChen Zhang, Peng Li, Guangyu Sun,
Yijin Guan, Bingjun Xiao, Jason Cong ,Center for Energy-Efficient Computing and Applications, Peking University, China, Computer Science Department, University of California, Los Angeles, USA