46524297 reconfigurable computing
TRANSCRIPT
OUTLINE
• INTRODUCTION
• RECONFIGURABLE COMPUTING ARCHITECTURES
• RECONFIGURATION MANAGEMENT
• PROGRAMMING RECONFIGURABLE SYSTEMS
• COMPILING C FOR SPATIAL COMPUTING
• HW/SW Partitioning
• BEE2:A High-End Reconfigurable Computing System
• REFERENCES
Reconfigurable Computing 2
INTRODUCTION Conventional Computing
• Software-programmed microprocessors
– Processors execute a set of instructions.
– Performance can suffer, if not in clock speed then in work rate.
– Lower performance than ASICs.
Reconfigurable Computing 3
INTRODUCTION Conventional Computing
• Hardwired (ASICs)
– Special purpose.
– Very fast and efficient.
– Circuit cannot be altered after fabrication.(Redesign!)
Reconfigurable Computing 4
INTRODUCTION Reconfigurable Computing • Fill the gap between hardware and software.
– Much higher performance than software.
– Higher level of flexibility than hardware.
Reconfigurable Computing 5
INTRODUCTION Reconfigurable Computing • Uses FPGAs or other programmable hardware for
compute-intensive calculations.
• Usually coupled with a general-purpose microprocessor that is responsible for
– Controlling the reconfigurable logic .
– Executing program code that cannot be efficiently accelerated.
Reconfigurable Computing 6
INTRODUCTION Reconfigurable devices
• Contain an array of computational elements.
• Functionality is determined through configuration bits.
Reconfigurable Computing 7
INTRODUCTION Reconfigurable devices
• Most current FPGAs and reconfigurable devices are
SRAM-programmable
– Control routing.
– Control multiplexers, LUT,…
– Control signals for a computational units.
Reconfigurable Computing 8
D flip-flop with optional bypass
3-input LUT
INTRODUCTION Reconfigurable devices
• Reconfigurable Processing Fabric (RPF)
– Fine-grained
– Coarse-grained
Reconfigurable Computing 9
INTRODUCTION Reconfigurable devices
• Fine-grained RPF
– Bit manipulation tasks
– For complex calculations, numerous fine-grained PEs are required.
• slower clock rates
Reconfigurable Computing 10
INTRODUCTION Reconfigurable devices
• Coarse-grained RPF
– Use bus interconnect and PEs
– Performs more than just bitwise operations, such as ALUs and multipliers.
Reconfigurable Computing 11
OUTLINE
• INTRODUCTION
• RECONFIGURABLE COMPUTING ARCHITECTURES
• RECONFIGURATION MANAGEMENT
• PROGRAMMING RECONFIGURABLE SYSTEMS
• COMPILING C FOR SPATIAL COMPUTING
• HW/SW Partitioning
• BEE2:A High-End Reconfigurable Computing System
• REFERENCES
Reconfigurable Computing 12
RECONFIGURABLE COMPUTING ARCHITECTURES
Reconfigurable Computing 13
S. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, R. Laufer. PipeRench: A coprocessor for streaming multimedia acceleration.
RPF integration
RECONFIGURABLE COMPUTING ARCHITECTURES
• RPF integration – Separate processor (coprocessor)
• Data communication takes place through main memory
• Limited bandwidth between CPU and RPF
Reconfigurable Computing 14
RECONFIGURABLE COMPUTING ARCHITECTURES
• RPF integration – Loosely coupled RPF and
processor architecture • RPF with the host
processor on the same chip
• Direct interaction between RPF and processor
• RPF with direct memory access
Reconfigurable Computing 15
Chameleon’s architecture
RECONFIGURABLE COMPUTING ARCHITECTURES
• RPF integration – Tightly coupled RPF and
processor • RPF integrated as
functional unit such as ALU, Multipliers.
• RFU access input data through register files.
Reconfigurable Computing 16
The datapath of the processor + RFU architecture
RECONFIGURABLE COMPUTING ARCHITECTURES
• RPF integration – Tightly coupled RPF and processor
• Virtual Instruction Configurations(VICs ) in the RFU typically run during the execute stage (and possibly the memory stage) of the pipeline.
Reconfigurable Computing 17
An example of a pipeline of a processor with an RFU
OUTLINE
• INTRODUCTION
• RECONFIGURABLE COMPUTING ARCHITECTURES
• RECONFIGURATION MANAGEMENT
• PROGRAMMING RECONFIGURABLE SYSTEMS
• COMPILING C FOR SPATIAL COMPUTING
• HW/SW Partitioning
• BEE2:A High-End Reconfigurable Computing System
• REFERENCES
Reconfigurable Computing 18
RECONFIGURATION MANAGEMENT Problem Definition
• Reconfigurability allows hardware to perform different tasks at different times.
• Application’s configurations can be swapped
• Reconfiguring the hardware at runtime is called Runtime Reconfiguration (RTR).
Reconfigurable Computing 19
RECONFIGURATION MANAGEMENT Problem Definition
• RTR
– Run-time reconfiguration is based upon the concept of virtual hardware, which is similar to virtual memory.
• physical hardware is much smaller than the sum of the resources required.
• swap configurations in and out of the actual hardware.
Reconfigurable Computing 20
RECONFIGURATION MANAGEMENT Problem Definition
• RTR
– Increases hardware utilization
– Introduces significant reconfiguration overhead
– Time consuming • Can require of hundreds of
milliseconds
Reconfigurable Computing 21
RECONFIGURATION MANAGEMENT Problem Definition
• Computation and reconfiguration are mutually exclusive
– time spent reconfiguring is time lost in terms of application acceleration.
• Reconfiguration occupies approximately 25 to 98 percent of total execution time
Reconfigurable Computing 22
RECONFIGURATION MANAGEMENT Configuration Architectures
• What is Configuration architectures?
• Architectures
– Single-context
– Multi-context
– Partially Reconfigurable
– Others
Reconfigurable Computing 23
RECONFIGURATION MANAGEMENT Configuration Architectures
Single-context
• configurations are grouped into contexts, and each full context is swapped in and out of the FPGA as needed.
Reconfigurable Computing 24
RECONFIGURATION MANAGEMENT Configuration Architectures
Single-context
• Configuration information is loaded into the programmable array through a serial shift chain
Reconfigurable Computing 25
RECONFIGURATION MANAGEMENT Configuration Architectures
Single-context
• require few pins for configuration, potentially simplifying board-level design
• Entire chip must be reprogrammed for any change to the configuration data because the data cannot be selectively “reused” on the chip.
Reconfigurable Computing 26
RECONFIGURATION MANAGEMENT Configuration Architectures
Single-context
• Configuration cycles can be reduced by widening the configuration path
– Virtex-5 allow a configuration data bus up to 32 bits wide
Reconfigurable Computing 27
RECONFIGURATION MANAGEMENT Configuration Architectures
Multi-context
• Providing storage for multiple configurations
– facilitating configuration prefetching and fast reconfiguration
– Contains multiple planes (contexts) of configuration data
Reconfigurable Computing 28
RECONFIGURATION MANAGEMENT Configuration Architectures
Multi-context
• Multiplexer chooses between the context planes
Reconfigurable Computing 29
RECONFIGURATION MANAGEMENT Configuration Architectures
Multi-context advantage
• Background loading of configuration data
• Fast switching between stored configurations
– some in a single clock cycle
• Overlapping computations with configuration
Reconfigurable Computing 30
RECONFIGURATION MANAGEMENT Configuration Architectures
Multi-context drawbacks
• Area overhead
– Additional configuration data
– Multiplexing
• Single cycle configuration
– Dynamic power?
Reconfigurable Computing 31
RECONFIGURATION MANAGEMENT Configuration Architectures
Partially Reconfigurable
• Not all configurations require the entire chip area
• Reconfigure utilized resources only
• Use addressable configuration memory
Reconfigurable Computing 32
RECONFIGURATION MANAGEMENT Configuration Architectures
Partially Reconfigurable
– Decrease reconfiguration time
– Decrease configuration data
– Configuration occupying large area (time issue)
– Independent configurations with overlapping hardware?
Reconfigurable Computing 33
OUTLINE
• INTRODUCTION
• RECONFIGURABLE COMPUTING ARCHITECTURES
• RECONFIGURATION MANAGEMENT
• PROGRAMMING RECONFIGURABLE SYSTEMS
• COMPILING C FOR SPATIAL COMPUTING
• HW/SW Partitioning
• BEE2:A High-End Reconfigurable Computing System
• REFERENCES
Reconfigurable Computing 34
PROGRAMMING RECONFIGURABLE SYSTEMS
• Reconfigurable systems can be ignored by application programmers unless they are able to easily incorporate its use into their systems.
• Software design environment that aids in the creation of configurations for the reconfigurable hardware is required.
Reconfigurable Computing 35
PROGRAMMING RECONFIGURABLE SYSTEMS
• Software design environment
– Manual
• Powerful method for the creation of high-quality circuit designs.
• Requires a great deal of background knowledge of the particular reconfigurable system employed.
• Significant amount of design time.
Reconfigurable Computing 36
PROGRAMMING RECONFIGURABLE SYSTEMS
• Software design environment
– Fully automatic
• Quick and easy.
• Makes the use of reconfigurable hardware more accessible to general application programmers.
• Quality may suffer.
Reconfigurable Computing 37
PROGRAMMING RECONFIGURABLE SYSTEMS
Reconfigurable Computing 38
OUTLINE
• INTRODUCTION
• RECONFIGURABLE COMPUTING ARCHITECTURES
• RECONFIGURATION MANAGEMENT
• PROGRAMMING RECONFIGURABLE SYSTEMS
• COMPILING C FOR SPATIAL COMPUTING
• HW/SW Partitioning
• BEE2:A High-End Reconfigurable Computing System
• REFERENCES
Reconfigurable Computing 39
Compiling C for spatial computing Why C? • There are many more C programmers than hardware
designers.
• Writing an algorithm in C is typically faster than in an HDL.
• Large existing code base.
• Allows both hardware (HW) and software (SW) versions to be created
– operating system can choose at runtime which is better
Reconfigurable Computing 40
Compiling C for spatial computing Why C? • Easy for the designer or compiler to quickly explore
the tradeoffs between different hardware/software partitioning.
• The code can be easily tested on a conventional
microprocessor.
Reconfigurable Computing 41
Compiling C for spatial computing How C runs on spatial hardware (overview)
• In a C program, the statements execute in order.
• With spatial computation, each operation is implemented as a function unit
Reconfigurable Computing 42
Compiling C for spatial computing How C runs on spatial hardware (overview)
Memory loads and stores
• Memory access operations must be scheduled
– allow sharing among memory operations.
– preserve sequential C semantics.
Reconfigurable Computing 43
Compiling C for spatial computing How C runs on spatial hardware (overview)
If-then-else Using
Multiplexers
Reconfigurable Computing 44
Compiling C for spatial computing How C runs on spatial hardware (overview)
More than just simple
if-then-else control flow
– Use sub-circuits
Reconfigurable Computing 45
Compiling C for spatial computing How C runs on spatial hardware (overview)
Optimizing the
Common Path
Reconfigurable Computing 46
Compiling C for spatial computing How C runs on spatial hardware (overview)
• What about
–Parallelism?
–Pipelining?
–Memory dependencies?
–Operator size?
Reconfigurable Computing 47
Overall compiler flow
Compiling C for spatial computing Automatic Compilation
Reconfigurable Computing 48
C source code
Control
Flow Graph
Hyperblocks
Circuit
Generation
Data Flow
Graph
Compiling C for spatial computing Automatic Compilation
Control Flow Graph (CFG)
• Breaking code into basic blocks of simple instructions.
• Blocks are connected by control edges indicating a possible branch.
• All instructions inside a given block execute once the block is entered.
Reconfigurable Computing 49
Compiling C for spatial computing Automatic Compilation
Hyperblocks • CFG basic blocks are quite
small and limit our opportunities for parallelism.
• Compiler combines blocks along commonly taken paths.
• Hyperblocks have a single entry point at the top and one or more exits.
Reconfigurable Computing 50
Compiling C for spatial computing Automatic Compilation
Data Flow Graph (DFG)
• The DFG is composed of nodes and edges.
• Nodes
– Inputs, constants, operations, memory access and exit nodes
• Edges
– Data transfer edges, ordering edge, exit edge
Reconfigurable Computing 51
Compiling C for spatial computing Automatic Compilation
Data Flow Graph (DFG)
Reconfigurable Computing 52
Compiling C for spatial computing Automatic Compilation
DFG optimizations
• Strength reduction – replacing one operator with another operator(s)
having less overall latency/area. • replace x*2 with x+x or x<<1
• x*7 can be expressed as (x<<2)+(x<<1)+x, but even better as (x<<3)-x.
Reconfigurable Computing 53
Compiling C for spatial computing Automatic Compilation
DFG optimizations
• Boolean value identification – ISO C does not contain a Boolean data type
– Although the result of a comparison is defined to be either 0 or 1, the type of the result is a signed integer—typically 32 bits.
– Use only one bit
Reconfigurable Computing 54
Compiling C for spatial computing Automatic Compilation
DFG optimizations • Type-based operator size reduction
– ISO C semantics dictate that arithmetic and logical operations involving type char and/or short operands must be performed at the precision of type int.
– Thus, a 16-bit adder will give the same result as a 32-bit adder
Reconfigurable Computing 55
Compiling C for spatial computing Automatic Compilation
DFG optimizations
• Type-based operator size reduction
– Analyze number of bits actually required by variables and operators.
• Example – Integer i within the loop
for (i = 0; i < 100; i++)
Reconfigurable Computing 56
Compiling C for spatial computing Automatic Compilation
DFG to Reconfigurable Fabric • Mapping DFG nodes to modules
• Scheduling each module to a specific timestep.
• Then, finally, connections are made between modules from different hyperblocks sub-circuits to complete the overall circuit.
Reconfigurable Computing 57
OUTLINE
• INTRODUCTION
• RECONFIGURABLE COMPUTING ARCHITECTURES
• RECONFIGURATION MANAGEMENT
• PROGRAMMING RECONFIGURABLE SYSTEMS
• COMPILING C FOR SPATIAL COMPUTING
• HW/SW Partitioning
• BEE2:A High-End Reconfigurable Computing System
• REFERENCES
Reconfigurable Computing 58
HW/SW Partitioning • For systems that include both reconfigurable
hardware and a traditional microprocessor.
• program must first be partitioned into – Sections to be executed on the reconfigurable
hardware • ex. fixed datapath operations
– Sections to be executed in software on the microprocessor • ex. complex control sequences such as variable-length
loops
Reconfigurable Computing 59
HW/SW Partitioning
• Partitioning
– Manually
• Program developed ends up tuned to a specific machine
• Alternative solution is to use compiler directives
Reconfigurable Computing 60
The NAPA C language [Gokhale and Stone 1998] provides pragma statements to allow a programmer to specify whether a section of
code is to be executed in software on the Fixed Instruction Processor (FIP), or in hardware on the Adaptive Logic Processor
(ALP).
HW/SW Partitioning
• Partitioning
– Automatically
• compiler and runtime system take full responsibility for determining the right code and granularity to move to the reconfigurable fabric.
• reconfigurable hardware transparent to the designer
• Cost functions based upon acceleration gained • to determine whether the cost of configuration
is overcome by the benefits of hardware execution or not.
Reconfigurable Computing 61
OUTLINE
• INTRODUCTION
• RECONFIGURABLE COMPUTING ARCHITECTURES
• RECONFIGURATION MANAGEMENT
• PROGRAMMING RECONFIGURABLE SYSTEMS
• COMPILING C FOR SPATIAL COMPUTING
• HW/SW Partitioning
• BEE2:A High-End Reconfigurable Computing System
• REFERENCES
Reconfigurable Computing 62
BEE2:A High-End Reconfigurable Computing System
• BEE: Berkeley Emulation Engine
– BEE2 can provide over 10 times more computing throughput than a DSP-based system with similar power consumption and cost.
– Over 100 times that of a microprocessor-based system.
Reconfigurable Computing 63
BEE2:A High-End Reconfigurable Computing System
• BEE: Berkeley Emulation Engine
– Applications
• Emulation and design of novel wireless communications systems.
• High-performance real-time digital signal processing.
• Real-time scientific computation and simulation.
• The acceleration of CAD tools.
Reconfigurable Computing 64
BEE2:A High-End Reconfigurable Computing System
• BEE: Berkeley Emulation Engine
– BEE2 system uses Xilinx Virtex-2 Pro FPGAs
– Virtex-2 Pro embeds PowerPC 405 processor cores into the reconfigurable fabric.
– BEE2 has no hardware-managed caches, hence all data transfers within the system have tightly bounded latency.
• BEE2 is therefore well suited for real-time applications
Reconfigurable Computing 65
BEE2:A High-End Reconfigurable Computing System
• BEE: Berkeley Emulation Engine
– Programming environment
• High-level block diagram design environment based on Mathworks Simulink and the Xilinx System Generator library.
• Uses automatic compilation tools
Reconfigurable Computing 66
BEE2:A High-End Reconfigurable Computing System
• Compute modules: – Compute modules:
consists of five “Xilinx Virtex 2 Pro 70” FPGA chips directly connected to four Dual Data- rate2(DDR2)- 240-pin DRAM DIMMs, with a maximum capacity of 4 Gbytes per FPGA.
– The local mesh connects the four compute FPGAs on a 2D grid.
Reconfigurable Computing 67
BEE2:A High-End Reconfigurable Computing System
• Compute modules:
– Each link between the adjacent FPGAs on the grid provides over 40 Gbps of data throughput per link.
– The four down links from the control FPGA to each of the computing FPGAs provide up to 20 Gbps per link
Reconfigurable Computing 68
REFERNCES
• Scott Hauck and Andre Dehon, “Reconfigurable Computing The Theory and Practice of FPGA Based Computing”
• Katherine Compton,” Reconfigurable Computing: A Survey of Systems and Software”, Northwestern University.
• Chen Chang, John Wawrzynek, and Robert W. Brodersen, “Berkeley BEE2: A High-End Reconfigurable Computing System”, University of California.
Reconfigurable Computing 69
Reconfigurable Computing 70
Thank You