high-level synthesis with legup a crash course for users and researchers jason anderson, stephen...
TRANSCRIPT
![Page 1: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/1.jpg)
High-Level Synthesis with LegUpA Crash Course for Users and Researchers
Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi
11 February 2013ACM FPGA Symposium
Monterey, CADept. of Electrical and Computer EngineeringUniversity of Toronto
![Page 2: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/2.jpg)
LegUpLegUp
LegUpLegUp
LegUp
LegUp
LegUp
LegUp
LegUp
Hong Kong Berlin
Tokyo New York City
![Page 3: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/3.jpg)
Tutorial Outline
• Overview of LegUp and its algorithms (60 min)• Labs (“hands on” via VirtualBox)
– Lab 1: Using the LegUp Framework (30 min)– Break– Lab 2: Adding resource constraints (30 min)– Lab 3: Changing How LegUp implements
hardware (30 min)
![Page 4: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/4.jpg)
Project Motivation
• Hardware design has advantages over software:– Speed– Energy-efficiency
• Hardware design is difficult and skills are rare:– 10 software engineers for every hardware engineer*
• We need a CAD flow that simplifies hardware design for software engineers
*US Bureau of Labour Statistics ‘08
![Page 5: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/5.jpg)
Top-Level Vision
Program code
C CompilerProcessor
(MIPS)
Self-ProfilingProcessor
Profiling Data:
Execution CyclesPower
Cache Misses
High-levelsynthesis Suggested
programsegments to
target to HWFPGA fabric
P Hardenedprogramsegments
Altered SW binary (calls HW accelerators)
int FIR(int ntaps, int sum) { int i; for (i=0; i < ntaps; i++) sum += h[i] * z[i]; return (sum);}....
![Page 6: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/6.jpg)
LegUp: Key Features• C to Verilog high-level synthesis• Many benchmarks (incl. 12 CHStone)• MIPS processor (Tiger)• Hardware profiler• Automated verification tests• Open source, freely downloadable
– Like ABC (Synthesis) or VPR (Place & Route)– 600+ downloads since March 2011– http://legup.eecg.utoronto.ca
![Page 7: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/7.jpg)
FPGA
System Architecture
MIPS ProcessorHardware
Accelerator
AVALON INTERFACE
Hardware Accelerator
Memory ControllerOn-Chip Cache
Memory
Off-Chip MemoryALTERA DE2 or DE4 Board
Cyclone II or Stratix IV
Memory Memory
![Page 8: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/8.jpg)
High-Level Synthesis Framework• Leverage LLVM compiler infrastructure:
– Language support: C/C++– Standard compiler optimizations– More on this shortly
• We support a large subset of ANSI C: Supported UnsupportedFunctions Dynamic MemoryArrays, Structs RecursionGlobal VariablesPointer ArithmeticFloating Point
![Page 9: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/9.jpg)
tAddr+= V1tAddr += (tAddr << 8)tAddr ^= (tAddr >> 4)b = (tAddr >> B1) & B2a = (tAddr + (tAddr << A1)) >> A2fNum = (a ^ tab[b])
Address Hash(in hardware)
Hardware Profiler Architecture
MIPS P Instr. $
Op Decoderret call
instr
0 1
PC
function #
targetaddress
F#
count
Popped F#(ret | call)
PC
counter+ 0
1
reset
0
Incr. when PC changes
Counter StorageMemory
(for all functions)
Call Stack
count
Data Counter(for current function)
See paper IEEE ASAP’11
• Monitor instr. bus to detect function call/ret.
• Call: Hash (in HW) from function address to index; push to stack.
• Ret: pop function index from stack.
• Use function indexes to associate profiling data (e.g. cycles, power) with counters.
![Page 10: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/10.jpg)
Processor/Accelerator Hybrid Flow
int main () {…sum = dotproduct(N);...
}
int dotproduct(int N) {…for (i=0; i<N; i++) {
sum += A[i] * B[i];}return sum;
}
![Page 11: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/11.jpg)
Processor/Accelerator Hybrid Flow
int main () {…sum = dotproduct(N);...
}
int dotproduct(int N) {…for (i=0; i<N; i++) {
sum += A[i] * B[i];}return sum;
}
#define dotproduct_DATA (volatile int *) 0xf0000000#define dotproduct_STATUS (volatile int *) 0xf0000008#define dotproduct_ARG1 (volatile int *) 0xf000000C
int legup_dotproduct(int N) {*dotproduct_ARG1 = (volatile int) N;*dotproduct_STATUS = 1;return *dotproduct_DATA;
}
![Page 12: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/12.jpg)
Processor/Accelerator Hybrid Flow
int main () {…sum = dotproduct(N);...
}
set_accelerator_function “dotproduct”
HW Accelerator
HLS
![Page 13: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/13.jpg)
int main () {…sum = dotproduct(N);...
}
Processor/Accelerator Hybrid Flow#define dotproduct_DATA (volatile int *) 0xf0000000#define dotproduct_STATUS (volatile int *) 0xf0000008#define dotproduct_ARG1 (volatile int *) 0xf000000C
int legup_dotproduct(int N) {*dotproduct_ARG1 = (volatile int) N;*dotproduct_STATUS = 1;return *dotproduct_DATA;
}
sum = legup_dotproduct(N);
![Page 14: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/14.jpg)
int main () {…
...}
Processor/Accelerator Hybrid Flow#define dotproduct_DATA (volatile int *) 0xf0000000#define dotproduct_STATUS (volatile int *) 0xf0000008#define dotproduct_ARG1 (volatile int *) 0xf000000C
int legup_dotproduct(int N) {*dotproduct_ARG1 = (volatile int) N;*dotproduct_STATUS = 1;return *dotproduct_DATA;
}
MIPS Processor
SW
sum = legup_dotproduct(N);
![Page 15: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/15.jpg)
How Does LegUp Handle Memory and Pointers?
• LegUp stores each array in a separate FPGA BRAM• BRAM data width matches the data in the array• Each BRAM is identified by a 9-bit tag• Addresses consist of the RAM tag and array index:
• A shared memory controller uses the tag bit to determine which BRAM to read or write from
• The array index is the address passed to the BRAM
9-bit Tag 23-bit Index31 22 023
![Page 16: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/16.jpg)
Pointer Example
• We have two arrays in the C function:– int A[100], B[100]
• Tag 0 is reserved for NULL pointers• Tag 1 is reserved for off-chip memory• Assign tag 2 to array A and tag 3 to array B• Address of A[3]: Address of B[7]:
Tag=2 Index=331 02223
Tag=3 Index=731 02223
![Page 17: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/17.jpg)
FF FF
Shared Memory Controller
• Both arrays A and B have 100 element BRAMs• Load from pointer D:
Tag=2 Index=1331 02223
A[0]0
...
A[13]
….
13
BRAM Tag=2A[99]99
B[0]0
...
B[13]
….
13
BRAM Tag=3B[99]99
3
2A[13]
32
3232
![Page 18: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/18.jpg)
Core Benchmarks (+Many More)• 12 CHStone Benchmarks (JIP’09) and Dhrystone
– Too large/complex for academic HLS tools• Include golden input/output test vectors
• Not supported by academic toolsCategory Benchmarks Lines of C code
Arithmetic 64-bit double precision: add, mult, div, sin
376 – 755
Encryption AES, Blowfish, SHA 716 – 1,406
Processor MIPS processor 232
Media JPEG decoder, Motion, GSM, ADPCM 393 – 1,692
General Dhrystone 491
![Page 19: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/19.jpg)
Experimental ResultsLegUp 1.0 (2011) for Cyclone II
1. Pure software on MIPS
Hybrid (software/hardware):2. Second most compute-intensive function
(and descendants) in H/W3. Same as 2 but with most compute-intensive
4. Pure hardware using LegUp5. Pure hardware using eXCite (commercial tool)
![Page 20: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/20.jpg)
Experimental Results
MIPS-S
W
LegU
p-Hyb
rid2
LegU
p-Hyb
rid1
LegU
p-HW
eXCite-H
W0
500
1000
1500
2000
2500
0
5000
10000
15000
20000
25000
30000
35000
40000
# of LEsExec. time
Exec
ution
tim
e (g
eom
etric
mea
n)
# of
LEs
(geo
met
ric m
ean)
![Page 21: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/21.jpg)
Comparison: LegUp vs eXCite• Benchmarks compiled to hardware• eXCite: Commercial high-level synthesis tool
• Couldn’t compile Dhrystone
Geomean LegUp eXcite LegUp/eXciteCircuit Runtime (μs) 292 357 0.82 (1.22x)Logic Elements 15,646 13,101 1.19Area-Delay Product 4.57M 4.68M 0.98
![Page 22: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/22.jpg)
Energy Consumption
MIPS-S
W
LegU
p-Hyb
rid2
LegU
p-Hyb
rid1
LegU
p-HW
eXCite-H
W -
100,000
200,000
300,000
400,000
500,000
600,000
Ener
gy (μ
J) (g
eom
etric
mea
n)
18x less energy than software
![Page 23: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/23.jpg)
Current Release: LegUp 3.0
• Loop pipelining• Dual and multi-ported memory support• Bitwidth minimization• Multi-pumping DSP units for area reduction• Alias analysis for dependency checks• Parallel accelerators via Pthreads & OpenMP
Results now considerably better than LegUp 1.0 release
![Page 24: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/24.jpg)
LegUp 3.0 vs. LegUp 1.0
adpcm ae
s
blowfishdfad
ddfdiv
dfmul
dfsin
dhrystone
gsm jpegmips
motion sha
geomea
n0.4
0.6
0.8
1
1.2
1.4
1.6
Wall-Clock TimeCyclesFmaxLEs
CHStone Benchmark Circuit
LegU
p 3.
0/Le
gUp
1.0
Ratio
Wall-clock time: 16% betterCycle latency: 31% better
FMax: 18% worseLEs (area): 28% better
![Page 25: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/25.jpg)
LLVM Compiler and HLS Algorithms
![Page 26: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/26.jpg)
LLVM Compiler
• Open-source compiler framework.– http://llvm.org
• Used by Apple, NVIDIA, AMD, others.• Competitive quality with gcc.• LegUp HLS is a “back-end” of LLVM.
• LLVM: low-level virtual machine.
![Page 27: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/27.jpg)
LLVM Compiler
• LLVM will compile C code into a control flow graph (CFG)
• LLVM will perform standard optimizations– 50+ different optimizations in LLVM
C Programint FIR(int ntaps, int sum) { int i; for (i=0; i < ntaps; i++) sum += h[i] * z[i]; return sum;}....
LLVM
Compiler
CFG
BB0
BB1
BB2
![Page 28: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/28.jpg)
Control Flow Graph
• Control flow graph is composed of basic blocks• basic block: is a sequence of instructions
terminated with exactly one branch– Can be represented by an acyclic data flow graph:
CFG
BB0
BB1
BB2
load load
+
load
+
store
![Page 29: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/29.jpg)
LLVM Details
• Instructions in basic blocks are primitive computational operations:– shift, add, divide, xor, and, etc.
• Or are control-flow operations:– branch, call, etc.
• The CDFG is represented in LLVM’s intermediate representation (IR)– IR is machine-independent assembly code.
![Page 30: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/30.jpg)
High-Level Synthesis FlowC Compiler
(LLVM)C Program
Allocation
Scheduling
Binding
Target H/W Characterization
RTL Generation
User Constraints• Timing• Resource
Synthesizable Verilog
Optimized LLVM IR
![Page 31: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/31.jpg)
Scheduling
• Scheduling: is the task of scheduling operations into clock cycles using a finite state machine
load load
+ load
+
store
State 1
State 0
State 2
State 3
FSM Schedule
![Page 32: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/32.jpg)
Binding
• Binding: is the task of assigning scheduled operations to functional units in the datapath
load load
+ load
+
store
Schedule Datapath
2-port RAM +
FF
![Page 33: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/33.jpg)
High-Level Synthesis: Scheduling
![Page 34: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/34.jpg)
SDC Scheduling
• SDC System of Difference Constraints– Cong, Zhang, “An efficient and versatile scheduling algorithm based on SDC
formulation”. DAC 2006: 433-438.
• Basic idea: formulate scheduling as a mathematical optimization problem– Linear objective function + linear constraints
(==, <=, >=).• The problem is a linear program (LP)
– Solvable in polynomial time with standard solvers
![Page 35: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/35.jpg)
Define Variables• For each operation i to
schedule, create a variable ti.
• The ti’s will hold the cycle # in which each op is scheduled.
• Here we have:– tadd, tshift, tsub
+ <<
-
Data flow graph (DFG): already accessible in LLVM.
![Page 36: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/36.jpg)
Dependency Constraints
• In this example, the subtract can only happen after the add and shift.
• tsub – tadd >= 0
• tsub – tshift >= 0
• Hence the name difference constraints.
add shift
sub
![Page 37: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/37.jpg)
Handling Clock Period Constraints
• Target period: P (e.g., 10 ns)• For each chain of dependant
operations in DFG, estimate the path delay D (LegUp’s models)– E.g.: D from mod -> or = 23 ns.
• Compute: R = ceiling(D/P) - 1– E.g.: R = 2
• Add the difference constraint:– tor - tmod >= 2
mod
xor
shr
or
![Page 38: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/38.jpg)
Resource Constraints
• Restriction on # of operations of a given type that can execute in a cycle
• Why we need it?– Want to use dual-port RAMs in FPGA
• Allow up to 2 load/store operations in a cycle
– Floating point• Do not want to instantiate many FP cores of a given
type, probably just one• Scheduling must honour # of FP cores available
![Page 39: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/39.jpg)
Resource Constraints in SDC
• Res-constrained scheduling is NP-hard.• Implemented approach in [Cong & Zhang DAC2006]
+ +
+
+
+ +
+
+A B
C
D
E F
G
H
Say want to schedule with only have 2 addersin the HW (lab #2)
![Page 40: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/40.jpg)
Add SDC Constraints
• Generate a topological ordering of the resource-constrained operations.
• Say constrained to 2 adders in HW.• Starting at C in the ordering, create a
constraint: tC – tA > 0
• Next consider, E, add constraint: tE - tB > 0• Continue to the end• Resulting schedule will have <= 2 adds / cycle
A B C E F D G H
![Page 41: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/41.jpg)
ASAP Objective Function
• Minimize the sum of the variables:
• Operations will be scheduled as early as possible, subject to the constraints
• LP program solvable in polynomial time
![Page 42: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/42.jpg)
High-Level Synthesis: Binding
![Page 43: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/43.jpg)
High-Level Synthesis: Binding
• Weighted bipartite matching-based binding– Huang, Chen, Lin, Hsu, “Data path allocation based on bipartite weighted
matching”. DAC 1990: 499-504.
• Finds the minimum weighted matching of a bipartite graph at each step – Solve using the Hungarian Method (polynomial)
operations
hardware functional units
edge costs
![Page 44: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/44.jpg)
Binding
• Bind the following scheduled program
State 0
State 1
State 2
State 3
![Page 45: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/45.jpg)
Binding
• Resource Sharing: requires 3 multipliers
State 0
State 1
State 2
State 3
![Page 46: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/46.jpg)
State 0
State 1
State 2
State 3
Binding
• Bind the first cycle Functional Units
1
1
1
![Page 47: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/47.jpg)
State 0
State 1
State 2
State 3
Binding
• Bind the second cycle Functional Units
2
2
1
![Page 48: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/48.jpg)
State 0
State 1
State 2
State 3
Binding
• Bind the third cycle Functional Units
2
2
2
![Page 49: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/49.jpg)
State 0
State 1
State 2
State 3
Binding
• Bind the fourth cycle Functional Units
3
2
2
![Page 50: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/50.jpg)
Binding
• Required Multiplexing: Functional Units
3
2
2
![Page 51: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/51.jpg)
High-Level Synthesis: Challenges
• Easy to extract instruction level parallelism using dependencies within a basic block
• But C code is inherently sequential and it is difficult to extract higher level parallelism
• Coarse-grained parallelism: – function pipelining
• Fine-grained parallelism: – loop pipelining
![Page 52: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/52.jpg)
Loop Pipelining
![Page 53: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/53.jpg)
Motivating Examplefor (int i = 0; i < N; i++) {
sum[i] = a + b + c + d}
+
a b
+
c
+
d
cycle
1
2
3
• Cycles: 3N• Adders: 3• Utilization: 33%
![Page 54: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/54.jpg)
Loop PipeliningCycle 1 2 3 4 5 … N N+1 N+2
i=0 + + +
i=1 + + +
i=3 + + +
…. …. … …. …
i=N-2 + + +
i=N-1 + + +
• Cycles: N+2 (~1 cycle per iteration)• Adders: 3• Utilization: 100% in steady state
Steady State
![Page 55: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/55.jpg)
Loop Pipelining Example
for (int i = 0; i < N; i++) {a[i] = b[i] + c[i]
}• Each iteration requires:
• 2 loads from memory• 1 store
• No dependencies between iterations
![Page 56: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/56.jpg)
Loop Pipelining Example
for (int i = 0; i < N; i++) {a[i] = b[i] + c[i]
}• Cycle latency of operations:
• Load: 2 cycles• Store: 1 cycle• Add: 1 cycle
• Single memory port
![Page 57: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/57.jpg)
LLVM Instructionsfor (int i = 0; i < N; i++) {
a[i] = b[i] + c[i]}
%i.04 = phi i32 [ 0, %bb.nph ], [ %3, %bb ]
%scevgep5 = getelementptr %b, %i.04
%0 = load %scevgep5%scevgep6 = getelementptr
%c, %i.04%1 = load %scevgep6%2 = add nsw i32 %1, %0%scevgep = getelementptr
%a, %i.04store %2, %scevgep%3 = add %i.04, 1%exitcond = eq %3, 100br %exitcond, %bb2, %bb
![Page 58: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/58.jpg)
LLVM Instructionsfor (int i = 0; i < N; i++) {
a[i] = b[i] + c[i]}
%i.04 = phi i32 [ 0, %bb.nph ], [ %3, %bb ]
%scevgep5 = getelementptr %b, %i.04
%0 = load %scevgep5%scevgep6 = getelementptr
%c, %i.04%1 = load %scevgep6%2 = add nsw i32 %1, %0%scevgep = getelementptr
%a, %i.04store %2, %scevgep%3 = add %i.04, 1%exitcond = eq %3, 100br %exitcond, %bb2, %bb
![Page 59: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/59.jpg)
LLVM Instructionsfor (int i = 0; i < N; i++) {
a[i] = b[i] + c[i]}
%i.04 = phi i32 [ 0, %bb.nph ], [ %3, %bb ]
%scevgep5 = getelementptr %b, %i.04
%0 = load %scevgep5%scevgep6 = getelementptr
%c, %i.04%1 = load %scevgep6%2 = add nsw i32 %1, %0%scevgep = getelementptr
%a, %i.04store %2, %scevgep%3 = add %i.04, 1%exitcond = eq %3, 100br %exitcond, %bb2, %bb
![Page 60: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/60.jpg)
LLVM Instructionsfor (int i = 0; i < N; i++) {
a[i] = b[i] + c[i]}
%i.04 = phi i32 [ 0, %bb.nph ], [ %3, %bb ]
%scevgep5 = getelementptr %b, %i.04
%0 = load %scevgep5%scevgep6 = getelementptr
%c, %i.04%1 = load %scevgep6%2 = add nsw i32 %1, %0%scevgep = getelementptr
%a, %i.04store %2, %scevgep%3 = add %i.04, 1%exitcond = eq %3, 100br %exitcond, %bb2, %bb
![Page 61: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/61.jpg)
LLVM Instructionsfor (int i = 0; i < N; i++) {
a[i] = b[i] + c[i]}
%i.04 = phi i32 [ 0, %bb.nph ], [ %3, %bb ]
%scevgep5 = getelementptr %b, %i.04
%0 = load %scevgep5%scevgep6 = getelementptr
%c, %i.04%1 = load %scevgep6%2 = add nsw i32 %1, %0%scevgep = getelementptr
%a, %i.04store %2, %scevgep%3 = add %i.04, 1%exitcond = eq %3, 100br %exitcond, %bb2, %bb
![Page 62: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/62.jpg)
Scheduling LLVM Instructions
for (int i = 0; i < N; i++) {a[i] = b[i] + c[i]
}• Each iteration requires:
• 2 loads from memory• 1 store
• There are no dependencies between iterations
Cycle:
![Page 63: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/63.jpg)
Scheduling LLVM Instructions
for (int i = 0; i < N; i++) {a[i] = b[i] + c[i]
}• Each iteration requires:
• 2 loads from memory• 1 store
• There are no dependencies between iterations
Memory Port Conflict
Cycle:
![Page 64: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/64.jpg)
Loop Pipelining Example
for (int i = 0; i < N; i++) {a[i] = b[i] + c[i]
}• Initiation Interval (II)
• Constant time interval between starting successive iterations of the loop
• The loop requires 6 cycles per iteration (II=6)• Can we do better?
![Page 65: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/65.jpg)
Minimum Initiation Interval
• Resource minimum II:– Due to limited # of functional units– ResMII = Uses of functional unit
# of functional units• Recurrence minimum II:
– Due to loop carried dependencies• Minimum II = max(ResMII, RecMII)
![Page 66: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/66.jpg)
Resource Constraints
• Assume unlimited functional units (adders, …)• Only constraint: single ported memory controller• Reservation table:
• The resource minimum initiation interval is 3
![Page 67: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/67.jpg)
Iterative Modulo Scheduling
• There are no loop carried dependencies so Minimum II = ResMII = 3
• Iterative: Not always possible to schedule the loop for minimum II
II = minII
Attempt to modulo schedule loop with II II = II + 1
Fail
Success
![Page 68: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/68.jpg)
Iterative Modulo Scheduling
• Operations in the loop that execute in cycle:i
• Must also execute in cycles:i + k*II k = 0 to N-1
• Therefore to detect resource conflicts look in the reservation table under cycle:
(i-1) mod II + 1• Hence the name “modulo scheduling”
![Page 69: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/69.jpg)
New Pipelined Schedule
![Page 70: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/70.jpg)
Modulo Reservation Table
• Store couldn’t be scheduled in cycle 6 • Slot = (6-1) mod 3 + 1 = 3 • Already taken by an earlier load
![Page 71: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/71.jpg)
Iterative Modulo Scheduling
• Now we have a valid schedule for II=3• We need to construct the loop kernel,
prologue, and epilogue• The loop kernel is what is executed when the
pipeline is in steady state– The kernel is executed every II cycles
• First we divide the schedule into stages of II cycles each
![Page 72: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/72.jpg)
Pipeline Stages
00
Stage: 1 2 3
![Page 73: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/73.jpg)
Pipelined Loop Iterations
i=0 i=1Stage 1
3 Cycles
i=0
i=2 i=3
i=4
i=3
i=1 i=2
i=0 i=1 i=4
i=4
i=3
i=2
Stage 2
Stage 3
Prologue Kernel (Steady State)
Epilogue
![Page 74: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/74.jpg)
Loop Dependencies
for (i = 0; i < M; i++)for (j = 0; j < N; j++)
a[j] = b[i] + a[j-1];
• May cause non-zero recurrence min II.• Several papers in FPGA 2013 deal with
discovering/optimizing loop dependencies
Depends on previous iteration
![Page 75: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/75.jpg)
Limitations and Current Research
![Page 76: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/76.jpg)
LegUp HLS Limitations
• HLS will likely do better for datapath-oriented parts of a design.
• Results likely quite sensitive to how loops are structured in your C code.
• Difficult for HLS to “beat” optimized structured HW design.
![Page 77: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/77.jpg)
FPGA/Altera-Specific Aspects of LegUp
• Memory – On-chip (AltSyncRAM),
off-chip (DDR2/SDRAM controller)• IP cores
– Divider, floating point units• On-chip SOC interconnect
– Avalon interface• LegUp-generated Verilog fairly FPGA-agnostic:
– Not difficult to migrate to target ASICs
![Page 78: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/78.jpg)
Current Research Work
• Impact of compiler optimizations on HLS• Enhanced parallel accelerator support
– Combining Pthreads+OpenMP• Smaller processor• Improved loop pipelining• Software fallback for bitwidth-optimized
accelerators• Enhanced GUI to display CDFG connected
with the schedule
![Page 79: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/79.jpg)
Current Work: PCIe Support
• Enable use of LegUp-generated accelerators in an HPC environment– Communicating with an x86
processor via PCIe
• Message passing or memory transfers– Software API for fpga_malloc,
fpga_free, send, receive
• DE4 / Stratix IV support in next LegUp release
![Page 80: High-Level Synthesis with LegUp A Crash Course for Users and Researchers Jason Anderson, Stephen Brown, Andrew Canis, Jongsok (James) Choi 11 February](https://reader031.vdocuments.mx/reader031/viewer/2022032203/56649e205503460f94b0c7cd/html5/thumbnails/80.jpg)
On to the Labs!