1 multi-ported memories for fpgas via xor eric laforest, ming liu, emma rapati, and greg steffan...
TRANSCRIPT
![Page 1: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/1.jpg)
1
Multi-ported Memories for FPGAs via XOR
Eric LaForest, Ming Liu, Emma Rapati, and Greg SteffanECE, University of Toronto
![Page 2: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/2.jpg)
2
Multi-Ported Memories (MPM)• MPM: Memory with more than 2 ports• Many uses:– register files– queues/buffers
• FPGA BRAMs:– have only 2 ports
• Building MPMs:– multiple BRAMs– logic elements (ALMs/LEs)– clever combinations
Write Ports Read Ports
… …
![Page 3: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/3.jpg)
3
Example: 2W/2R MPM
How can we build this?
![Page 4: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/4.jpg)
4
2W/2R: Pure-ALMs/LEs
Scales very poorly with memory depth
![Page 5: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/5.jpg)
5
1W/2R: Replicated BRAMS
Fairly efficient Only one write port
M0
M1
R0
R1
W0 X
![Page 6: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/6.jpg)
6
2W/2R Banked BRAMs
Multiple write ports Fragmented data
M0
M1
R0
R1
W0
W1X
![Page 7: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/7.jpg)
7
2W/2R “Multipumping”
No fragmentation Divides clock speed
M0
R0W0
W1 R1
![Page 8: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/8.jpg)
8
Review:The Live Value Table (LVT) Approach
(FPGA’10)
Efficient Multi-Ported Memories for FPGAs, Eric LaForest and J. Gregory Steffan, International Symposium on Field-Programmable Gate Arrays, Monterey, CA, February, 2010.
![Page 9: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/9.jpg)
9
LVT-Based MPM
Bank for twowrite ports, replicate to provide read ports
Muxes select bankto read from
LVT: rememberswhich bank hasmost recently-written value
LVT
![Page 10: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/10.jpg)
10
LVT-Based MPM
SignificantMultiplexing!
Many ALMs!
Punchline: LVT is a big freq win, but...
![Page 11: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/11.jpg)
11
An XOR Approach
![Page 12: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/12.jpg)
12
XOR• XOR basics:
A 0 = A⊕ B B = 0⊕
• Implication: A B B = A⊕ ⊕
Can we exploit XOR to build better MPMs?
Intuition: avoid LVT-table, multiplexing
![Page 13: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/13.jpg)
13
2W/2R XOR Design
R0
Goal: a read is only an XOR operation
![Page 14: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/14.jpg)
14
2W/2R XOR Design
R0
Focus on one location for now
OLDOLD
A OLD⊕
A OLD⊕
A OLD OLD⊕ ⊕=A
![Page 15: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/15.jpg)
15
2W/2R XOR Design
R0
W0
XOR new value with old value
A
OLD
OLD
A OLD⊕
A OLD⊕
![Page 16: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/16.jpg)
16
2W/2R XOR Design
R0
W0
Support multiple locations, two write ports
![Page 17: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/17.jpg)
17
2W/2R XOR Design
R0W1
W0
Most-recently-written bank holds new value XOR old(s)
A OLD1⊕
OLD1
A
B OLD2⊕
OLD2
B
![Page 18: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/18.jpg)
18
2W/2R XOR Design
R0
R1
W1
W0
Add support for second read port---done! (almost)
![Page 19: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/19.jpg)
19
2W/2R XOR Design
R0
R1
W1
W0
Writing requires reading: hence 2 cycles to write!
TICK
A
Solution: need pipelining to avoid stalling
TOCK
![Page 20: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/20.jpg)
20
2W/2R XOR Design
R0
R1
W1
W0
What if read a location one cycle after written?
TICK
A
Solution: bypass with forwarding logic
Read?
![Page 21: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/21.jpg)
21
Generalized XOR Design
![Page 22: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/22.jpg)
22
Generalized XOR Design
![Page 23: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/23.jpg)
23
LVT vs XOR
![Page 24: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/24.jpg)
24
Methodology
Use Quartus 10.0 to target Stratix IV Favor speed over area, optimize Average over 10 seeds to get Fmax
Measure area as Total Equivalent Area (TEA) Expresses area in a single unit (ALMs) 1 M9K == 28.7ALMs **
Measure a large design space Depth: 32-8192 memory locations Ports: 2W/4R, 4W/8R, 8W/16R
** H. Wong, J. Rose and V. Betz, "Comparing FPGA vs. Custom CMOS and the Impact on Processor Microarchitecture," ACM Int. Symp.on FPGAs, 2011
![Page 25: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/25.jpg)
25
Example Layout: 8192-deep 2W/4R
Significant resource diversity!LVT XOR
![Page 26: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/26.jpg)
26
2W/4R
LVT better for small designs, XOR better for large
8192 XOR:15% faster,2x smaller
CAD anomalyFast
er
Smaller (log)
![Page 27: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/27.jpg)
27
Navigating the Design Space (2W/4R)
Which is best? That depends...
![Page 28: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/28.jpg)
Summary
28
2W/4R 4W/8R
8W/16RUse LVT when: • want to minimize BRAMs• building <= 128 depthelse use XOR, i.e. when: • >= 256 & spare BRAMS
![Page 29: 1 Multi-ported Memories for FPGAs via XOR Eric LaForest, Ming Liu, Emma Rapati, and Greg Steffan ECE, University of Toronto](https://reader030.vdocuments.mx/reader030/viewer/2022032415/56649f0b5503460f94c1e921/html5/thumbnails/29.jpg)
29
Conclusions• Use LVT when– building up to 128-entry designs– you want to minimize BRAM usage
• Use XOR when– building 256-entry or larger designs– you want to minimize ALM usage
• Interesting Library/Generator?– help the designer automatically navigate this space
• Further work– Exploring “true-dual-port” mode, stalls, power– Results on other vendor’s FPGAs