high-quality, deterministic parallel placement for fpgas on commodity hardware adrian ludwin, vaughn...
TRANSCRIPT
![Page 1: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/1.jpg)
High-Quality, Deterministic Parallel Placement for FPGAson Commodity HardwareAdrian Ludwin, Vaughn Betz & Ketan Padalia
FPGA Seminar Presentation
Nov 10, 2009
![Page 2: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/2.jpg)
Overview
Motivation Review simulated annealing Approaches Summary
![Page 3: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/3.jpg)
Motivation
![Page 4: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/4.jpg)
Simulated Annealing Placement
Probabilistic approach to finding optimal solution Behavior
Moves through solution space Greedily Randomly
Balance between greediness and randomness is controlled by a temperature
Temperature evolves through time based on a cooling schedule
![Page 5: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/5.jpg)
Simulated Annealing Placement
For a single moveCompute change in
cost: ΔCAccept move:
ΔC < 0 ΔC > 0, with
probability e-ΔC/T
Repeat while gradually decreasing T and window size
c4c1
c5
c2
c3t3
![Page 6: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/6.jpg)
Constraints
Runs on commodity hardware Good quality of results
Robust Determinism
Bug reportingConsistent regression results
![Page 7: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/7.jpg)
Selected Previous Work
Close relatedMove accelerationParallel moves
Other methods Independent setsPartitioned placementsSpeculative
![Page 8: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/8.jpg)
Algorithm #1
![Page 9: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/9.jpg)
Algorithm #2
![Page 10: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/10.jpg)
Objective
Determine efficacy Analyze runtime and categorize
MemorySynchronization InfrastructureEvaluationProposal
![Page 11: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/11.jpg)
Methodology
Parallel equivalent flowSerial flow which mimic parallel flowEmulates behavior of multithreaded
application by using only one thread/core Useful for comparison
Accounts for infrastructure overhead
![Page 12: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/12.jpg)
Methodology
Attributing runtime Two types of measurements
Bottom up (bu) measure each component of a move
End-to-end (e2e) measure runtime for entire run
![Page 13: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/13.jpg)
Methodology
![Page 14: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/14.jpg)
Methodology
Test setsSet of 11 Stratix® II FPGA benchmark
designs IP and customer circuits 10k to 100k logic cells
Also tested on 40 Stratix II FPGA circuits Obtained similar result
![Page 15: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/15.jpg)
Results for Algorithm #1
![Page 16: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/16.jpg)
Moves attribution
![Page 17: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/17.jpg)
Overhead analysis
![Page 18: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/18.jpg)
Observations
Theoretical speedup 1.7xMeasured: 1.3x (best)
Increase in evaluation runtimeDue to reduced cache locality
Proposal time is “hidden”
![Page 19: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/19.jpg)
Analysis
Time spent on stall is negligible Evaluation accounts for most of overhead Little to gain by removing determinism
Serial equivalency is less than 3% runtime
![Page 20: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/20.jpg)
Summary for Algorithm #1
Speedup: 1 – 1.3x Memory inefficiency is the biggest
bottleneck Theoretically algorithm should scale
However, difficult to partition and balance two stages
![Page 21: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/21.jpg)
Speedups for Algorithm #2
![Page 22: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/22.jpg)
Attribution on 2 cores
![Page 23: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/23.jpg)
Attribution on 2 cores
![Page 24: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/24.jpg)
Attribution on 4 core
![Page 25: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/25.jpg)
Attribution on 4 cores
![Page 26: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/26.jpg)
Observations
Memory latency due to inter-processor communicationWorsens with more cores
![Page 27: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/27.jpg)
Summary for Algorithm #2
Parallel moves has better scalability than pipelined moves
Bottleneck is still memory Again serial equivalency costs little
![Page 28: High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56649e205503460f94b0c515/html5/thumbnails/28.jpg)
Take Home Messages
Memory is important Good algorithms are even more important