A Reconfigurable Computing System Based on a Cache-Coherent Fabric
Presenter: Neal Oliver Intel Corporation
Authors- Neal Oliver, Rahul R Sharma, Stephen Chang, Bhushan Chitlur, Elkin Garcia, Joseph Grecco, Aaron Grier, Nelson Ijih, Yaping Liu, Pratik Marolia, Henry Mitchel, Suchit Subhaschandra, Arthur Sheiman, Tim Whisonant, and Prabhat Gupta
June 10, 2012
Presented at CARL 2012
2
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copyright © 2012, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others
3
Topics
• Motivations for this work
• Usage Models
• Overview of Intel QuickPath Interconnect (QPI)
• Platform Architecture
• Hardware Architecture
• Programming Model
• Simulation
• Future Work
4
Motivation for this work
• Keep innovation on Intel platforms
• High-throughput, low-latency attachment to server
• Cache-coherent memory
• Enlarge memory space for FPGA platform
• Enable additional programming paradigms
• Drive requirements for future fabrics
• Platform for simulation/emulation
5
Usage Models
Intel QuickAssist Accelerator
Platform (QAP)
Accelerators •Algorithmic accelerators •e.g. Seismic imaging, genomics, computational finance
ASIC Prototyping •Significantly reduces risk for ASICs connecting to QPI •e.g. QPI attached ASICs for telecom, node controllers
CPU Bridge
FPGA QPI
FPGA Farm
OR
Big Box Emulator
CPU Accel
FPGA QPI
QPI
QPI
RTL
ASIC
Logic
ASIC
HW/SW Co-Emulation Platforms • Enables development of high performance emulation platforms • Pre-Si SW development
6
Intel QuickPath Interconnect (QPI)
QPI: A low latency, point-to-point coherent system interconnect.
QPI Caching Agent (CA) – cache devices
QPI Home Agent (HA) – DDR memory controller
Latest Intel platforms have IO integrated into CPU (not shown)
Four Socket Platform
P0 M P1
CS
I/O I/O
QPI Links
Two Socket Platform
M Memory
P Processor
CS Chipset
QPI Links
P0 M
P2 M
P1 M
P3 M
CS
I/O I/O
CS
I/O I/O
QPI Links
QPI Links
7
QPI-attached Accelerator Hardware Module (AHM)
AHM installed in Server
Substitution of AHM for CPU
QPI Links
P M
P M
M
P M
CS
I/O I/O
CS
I/O I/O
QPI Links
AHM
Altera Stratix IV Module
Xilinx Virtex 6 Module
Intel® Xeon processor 7000 series
Accelerator Hardware Module (AHM)
FPGA
PCIe attached FPGA
8
Platform Architecture
Link and Protocol
System Protocol Layer
Accelerator Function Unit(s)
(AFUs)
PHY
CPU FPGA
Intel QPI
SPL Interface
CCI - Core Cache Interface
AAL Interface
Direct Driver Interface
Accelerator Abstraction Layer
Link and Protocol
PHY
SW
ele
ments
Host Application
9
FPGA
QPI HW stack
•FPGA high speed IO operating at 4.8 to 6.4 GT/sec •Full width : 20 data lanes per direction
QPI Link and Protocol (QLP)
System Protocol Layer
(virtual memory subsystem)
Accelerator Function Units
(AFU)
QPI PHY (QPH)
•QPI Caching (cache) and Home Agent (memory controller) implemented •64KB to 256KB set-associative cache
•Manages coherency of FPGA attached DDR memory
•Implements virtual to physical address translation •Handles all cache line reordering in blocks of data •DMA engine •Memory management and protection
•Application-specific hardware module, user algorithm to be accelerated. •Can connect to SPL or CCI interface
10
QPI Home+Cache Agent Architecture
•Modular: CA, HA, CA+HA
• Interfaces: CCI- hides complexity of underlying coherent fabric. MC (optional) – provides interface to memory controller.
•QPH electrical uses existing FPGA IOs
•QLP designed using soft logic, tightly integrated, highly parallel.
•Programmable cache/snoop tables
SPL
QPI Link + Protocol
Control
QPI PHY
(QPH)Rx
AlignTx
Align
8 flits (640 bits)
Rx Control
Tx Control
8 flits (640 bits)
Core Cache
Module
Cache
Data
RAM
CCI
Cache tagCache
table
To AFU
QLP
phits
Memory
Queue
Snoop tag
Snoop
table
Memory Controller
DDR
To
DIMM
QPI link to CPU
MC interface
11
Cache Hit/Miss protocol flow
• Cache hits are an order of magnitude faster.
• Prefetch
striding patterns to improve cache hit rate
12
Programming Model
AAL functions: • Allocate shared workspaces WSi and pin in system memory
• Support both physical and virtual memory access
• Allocate and manage AFUs for application (via “proxy AFU” (PAFUi) abstraction)
• Provide remote procedure call (RFP) abstraction of AFU to application
QPI Physical (QPH) + Link / Protocol layer
System Protocol Layer ( SPL )
AFU 0 AFU 1 AFU n … .
QuickAssist FPGA
Hardware Module
System
Memory
QPI
WS 0 WS 1 WS n … .
CPU (
SOFTWARE )
Accelerator Abstraction Layer ( AAL )
Kernel Space
User
Space
Application User Library
QPI
PAFU 0 PAFU 1 PAFU n … .
Accelerator Interface Adaptor ( AIA )
( QLP )
Proxy AFU: • Provide abstraction of AFU to application
• Enables staged development of AFU algorithms
13
AFU Simulation Environment (ASE)
1 – Intel and Xeon are registered trademarks of Intel Corporation. Other trademarks are the property of their respective owners.
•AFU HW-SW co-design environment. •Models platform behavior. •Design and validate the SW against the AFU RTL in the simulator environment. •Faster simulation. •Ease of debug. System
Memory
AFU Simulator
(Transaction Level Model)
SW APP AFU RTL
Driver/MonitorAccelerator Interface
Adapter (AIA)
SPL or
CCI
Modelsim1/VCS
1 runtime
14
Future Work
• Research and pathfinding on other accelerator architectures
• Programming language/compiler development
• Benchmarking and performance characterization