Download - Fully Pipelined FPU for OR1200
![Page 1: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/1.jpg)
Fully Pipelined FPU for OR1200
Eric Zhang
Electrical & Computer Engineering
![Page 2: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/2.jpg)
Introduction & Motivation
• Floating Point Unit:
– Performs floating point operations such as:
• add/sub, multiplication, division, sine, cosine, FMA
– Wide dynamic range and high precision
– Required by many algorithms and applications
• Eg. Hotspot, SRAD, etc.
– High performance and Low power consumption
![Page 3: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/3.jpg)
FPU in OR1200
• Arithmetic, Conversion, Comparison
![Page 4: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/4.jpg)
FPU in OR1200
• Serial implementation with long stalls10 cycles total
38 cycles total
37 cycles total
![Page 5: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/5.jpg)
Goals and Objectives
• Pipeline the current version of floating point
multiplication and division
• Reduce number of clock cycles
• Eliminate the stalls due to serial implementation
• Synthesize and obtain the physical layout of the
pipelined FPU using Synopsys Top-Down design flow
![Page 6: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/6.jpg)
Methodology
• Analyze existing floating point implementation
– Identify serial implementation that possible for pipelining
• Pipeline the FPU multiplier and divider using Synopsys
Register Retiming design flow
• DC for synthesis, VCS for functional simulation and
verification, IC compiler for physical layout, and power
and area measurement
![Page 7: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/7.jpg)
Register Retiming
![Page 8: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/8.jpg)
Register Retiming
1. Library setup
2. Constraint setup
4. Compile
5. New constraint
6. Retiming
3.
![Page 9: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/9.jpg)
Register Retiming Flow
![Page 10: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/10.jpg)
Register Retiming Timing Report
![Page 11: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/11.jpg)
Schematic Before Retiming
![Page 12: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/12.jpg)
Schematic After Retiming
![Page 13: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/13.jpg)
VCS Functional Simulation
1.6 * 4.0 = 6.4
![Page 14: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/14.jpg)
VCS Functional Simulation
1.6 / 4.0 = 0.0625
![Page 15: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/15.jpg)
Physical Layout
![Page 16: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/16.jpg)
Specification Results
Spec Pipelined Original
Frequency 222 MHz 222Mhz
VDD 1.05V 1.05 V
Metal Layers 9 9
# of input pins 143 143
# of output pins 80 80
Area 0.5 mm^2 0.45 mm^2
FPMUL Cycles 13 38
FPDIV Cycles 11 37
Dynamic Power 3.79 mW 0.65 mW
Leakage Power 1.33 mW 0.69 mW
Total Power 5.13 mW 1.34mW
![Page 17: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/17.jpg)
DesignWare IP
• Technology-independent
• Microarchitecture-level library
• Synthesizable for ASIC, SoC, and FPGA design
• IPs include:
– Arithmetic Components: Multiplier, divider,adder, etc
• DW01_add, DW02_mult, DW_fp_mult
– DSP, AMBA Bus, Memory Controller
• DW_fir
– etc
![Page 18: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/18.jpg)
DesignWare IP
• To use DesignWare IP:
1. set synthetic_library dw_foundation.sldb
2. set link_library $target_library $synthetic_library
3. License: DesignWare
• Instantiation In Verilog file:
– DW01_mult #(8, 8) U1 (A, B, TC, PRODUCT);
• Synthesize using normal flow
![Page 19: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/19.jpg)
DesignWare IP
• Benefits of using DesignWare IP
– Increased productivity: parameterized, pre-verified
– Better quality of results (QoR): optimized by Synopsys
– Design reusability
![Page 20: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/20.jpg)
Improved Scripts for design flow
• Automaticly setup all necessary folders and scripts
• Automaticly setup scratch storage for synthesis
results
• Scripts common to different projects are created as
symbolic links
– Eg. setup.tcl
![Page 21: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/21.jpg)
Improved Scripts for design flow
Top level folder without any projects:
Create a project called “test”:
![Page 22: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/22.jpg)
Improved Scripts for design flow
Top level folder after creating “test”:
Folder layout of project “test” :
Other useful scripts : timing_closure.sh : binary search for minimum delay
project_init.tcl: Project specific information: top-level design name, language, etc
![Page 23: Fully Pipelined FPU for OR1200](https://reader033.vdocuments.mx/reader033/viewer/2022051118/5681672f550346895ddbd452/html5/thumbnails/23.jpg)
Thank you!