design of a high- density soc fpga at 20nm - hot chips · design of a high- density soc fpga at...

24
Design of a High-Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, Mike Hutton Altera, San Jose

Upload: vodieu

Post on 30-Apr-2018

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Design of a High-Density SoC FPGA at 20nm

Brad Vest, Sean Atsatt, Mike Hutton Altera, San Jose

Page 2: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Arria 10 Device Outline

Device Goals and Overview Routing and Logic Architecture Transceiver and I/O Architecture DSP Block and Floating Point Hard Processor System (HPS) Power Summary

2

Page 3: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Device Goals

Mid-Range FPGA: balance of performance/power/cost targeting Key Market Applications

Key Targets and Metrics: − 491 MHz fixed-point DSP datapath for Wireless RRU − 1M+LEs at 350 MHz for 4xOTU4 (400G) OTN networks, with Partial Reconfig − Cloud Server Acceleration – Hardened Floating-Point − 28G transceivers to support 200G to 400G networking/routing − Dramatic die-size reduction

3

Page 4: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Overview and Floorplan

4

TSMC 20SOC Process − 5.3B Tx, 11LM

Resources − 1.15M LEs, 1.7M FFs − 64Mb embedded SRAM − 32 fPLL, 16 PLLs, 32 GCLK − 1.5 TFlops IEEE754 DSP − Dual-Core ARM A9 − Row-based redundancy

I/O − 28G SERDES, >1.7Tb b/w − x72 2.667Gbps DDR4 w/

Hard memory Controller − Hardened PCIe/ILKN/10GE

Page 5: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Design Challenges

Process − More restrictive design rules – double patterning for lowest metal layers

Took significant advantage of structured custom layout to scale − Increasing variation

Significantly more statistical analysis used on critical analog, memory, and sensing circuits to ensure robustness and manufacturability

Digitally assisted analog design − Metal parasitics not scaling with transistors

Strategic metal planning to ensure critical signals get the lowest RC paths through the power mesh

Clock latency and insertion − Increased clock network flexibility to provide SW P&R more options for critical

transfers though the FPGA fabric IP Integration

− Moved to a more modular tile based floorplan including embedded IOs Provides significant area reduction, but requires more upfront planning of interfaces,

metal grid, and feed-thru Extended row based redundancy to work with embedded IOs

5

Page 6: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

ALM registers and dedicated logic

6

Many high-performance designs have FF:LUT > 1 Providing 4 FFs and 8 inputs to an ALM complex allows

for more efficient packing ALM retains most Stratix V features:

− Ternary add, shared LUT-mask, 20b carry-skip

Page 7: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Architectural improvements on Arria 10 Core

7

Column based CRAM CRC on top of Row Based Up to 100x Faster Error Detection and Correction

Tri-state long-lines (V27, H32) Maintain long wire performance despite poor metal process scaling

Enhanced range Time-Borrowing FF for micro-retiming

Page 8: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Arria 10 Transceiver Overview

Support for a wide range of protocols, data rates, and applications

Integrated protocol hard

IP blocks

Low jitter programmable clock sources

Flexible clock distribution networks

Gearbox for configurable

interface widths Advanced adaptive

equalization

Hard IP for: PCIe Ethernet 10GBASE-KR Interlaken More

Page 9: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Arria 10 Transceiver Overview

Wide Range of Data Rates − 611 Mbps – 28.1 Gbps (Native) − Down to 125 Mbps (Oversampling)

High Transceiver Density

Notable improvements − 5 tap Transmitter pre-emphasis − Adaptive CTLE (Continuous Time Linear Equalizer)

High Gain & High Data Rate Modes − Adaptive DFE (Decision Feedback Equalizer)

7 tap fixed, 4 tap floating − Hard Forward Error Correction (FEC) − Total Equalization capability > 30db

9

Feature Arria 10

Transceiver Count Up to 96

Max Data Rate (Select Ch) 28.1 Gbps

Max 28G Channels Up to 16

Max Data Rate (All Ch) 17.4 Gbps

Max Backplane Data Rate 17.4 Gbps

Page 10: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Hardened Calibration – Digitally Assisted Analog

2 Hard micro controllers (uC) handle all transceiver calibration on the device

Key Advantages − Enable calibration before FPGA core is

programmed Critical for Configuration via Protocol (CVP)

− Earlier generations required customers to instantiate increasing number of soft IPs Converges prior generation soft controllers and hard

state machines into one highly flexible system

− Ability to access firmware from application layers − Enables advance calibration techniques and on-

die instrumentation capabilities

10

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

XCVR x6

uC uC

Page 11: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

PMA Tx Jitter Compensation

Replicates clock pattern to act as noise compensation − Avoid hitting PDN resonance (<100MHz) − Add switching current using duplicate pre-driver path during every non-transition bit

to eliminate mid frequency noise

Reduces PDN induced jitter by 80% − Achieves same result as adding capacitance that would increase XCVR area by 50% − Average power increases slightly, maximum power does not

11

Compensating Driver

Serializer

VCC VDDIO

CLK

High Z PDN

10 0 10 1 10 2 10 3 10 4 10 -8

10 -7

10 -6

10 -5

10 -4

10 -3

10 -2

Freq [MHz]

Cur

rent

[A]

CP off CP on

Page 12: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

GPIO and EMIF

Hardened memory controller (HMC) for DDR − Programmably ganged to multiple memory interfaces

IOAUX per column – managed by hard-uC

12

Page 13: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Hardened Floating Point DSP

Hardened IEEE 754 Floating Point adder & Multiplier − 12% DSP Area increase (<<1% die area)

100% Fixed Point backwards compatible − No performance or power penalty

‘Have your cake and eat it too’ How is this possible?

− Overlaid FP algorithms on Fixed point circuits

13

Major Innovation – Hard Floating Point on a Commercial FPGA

X

+

32 32

32

32

Page 14: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

DSP Block – 1000s of blocks at very low latency

14

1.5 TFLOPS of aggregate computation; 50 GFLOPS/W − 1678 blocks @ 2 FLOPS/clock @ 450 MHz = 1.520 GFLOPs − Can run individually or as large integrated DSP system

Hardware recursive structure support (Vector Mode) − 10s/100s of DSP blocks can be seamlessly integrated − Internal/External pipeling of individual DSP elements

Very small latency − Floating Point used for iterative algorithms – require small latency − Arria 10 Floating Point - 256 length dot products ~ 25 clocks − Standard FPGA Technology - 256 length systolic FIR filter ~750 clocks

X

+

A B

AB+CD

X

+

C D AB+CD

X

+

E F

EF+GH

EF+GH

X

+

G H

X

+

I J IJ+KL+ MN+OP

AB+CD+EF+GH IJ+KL

AB+CD+ EF+GH

AB+CD+EF+GH+ IJ+KL+MN+OP

Page 15: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

15

Arria 10 HPS: Faster, Secure and SW Compatible

Arria 10 SoCs feature a Dual Core ARM Processor for − Communications processing, acceleration, host

offload, deeply embedded processing, and FPGA management

Faster − Up to 1.5 GHz per core, total 7500 MIPS

More Secure − Secure Boot with EC DSA Authentication − Root of Trust Support (Certification Authority) − Hierarchal Public Key Infrastructure

Software Compatible − Extensive reuse of software, OS/BSP, tools reuse

with 28nm SoC

Page 16: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Anti-tamper, Secure Boot, POF Encryption (AES) and Authentication (SHA) and Root of Trust Support − Secure boot will only execute code that is

provable from a known source and is unmodified

54 Programmable general-purpose I/O 7x general-purpose timers 4x watchdog timers

Arria 10 HPS Peripheral Feature Summary

16

NEW

NEW

Hard Memory Controller (HMC) − DDR4/3, LPDDR2/3

− Up to 72-bit DDR4

− Multiport Front End (MPFE) Scheduler interface to HMC sharable with core logic

QSPI flash controller with SIO, DIO, QIO SPI Flash support

NAND flash controller (ONFI 1.0 or later) with DMA and ECC support − Added 16-bit Flash device support for higher

throughput

SD/SDIO/MMC controller 4.5 with DMA with CE-ATA digital command support − Updated to eMMC for additional flexibility

256MB of Scratch RAM

EXTERNAL MEMORY CONTROLLERS

COMMUNICATION CONTROLLERS

UPDATED

UPDATED

3x 10/100/1000 Ethernet media access control (MAC) with DMA − Enables simultaneous ingress, egress and control

2x USB On-The-Go (OTG) controller with DMA 2x UART 16550 controller 5x I2C controller

− 3 can be used by EMAC for MIO to external PHY

4x serial peripheral interface (SPI) − 2 Master, 2 Slaves

SECURITY & PERIPHERALS

+DDR4

Larger

Page 17: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

HPS + FPGA enables Hardware & Software Co-processing

High throughput bridge to FPGA − Can access header/packet data an order of magnitude faster than the typical PCIe latency

Non blocking low latency bridge to FPGA − Simple accesses to the fabric

Shared FPGA/HPS bridge with smart scheduler to DDR interface

17

RD & WR Ports FPGA LOGIC

Security

FPGA SECURITY(AES)

Power

SECURE BOOT

(Authentication)

DEVICE POWER MANAGEMENT

Config

FPGA CONFIGMANAGEMENT

Memory Controllers

FPGAHARD MEMORY CONTROLLERS

HPS HARD MEMORY CONTROLLER

Hard Processor System (HPS)

Low Latency Bridge High Throughput Bridges

LW HPS TO CORE BRIDGE HPS to FPGA BRIDGE FPGA to HPSBRIDGE

HPS DEDICATED

GPIO (17)

Flash ControllersQSPI

Peripheral Controllers

Processor PeripheralsScratch RAM

(256 KB)

Shared Memory Bridge

FPGA to HPS SDRAM BRIDGE

HPS Dedicated IO (Boot)

ON CHIP ROM

GP TIMERS (x7)

HPS I/O

MU

X

FPGA CONFIG MANAGER

8 Channel DMA

Watch Dog Timer (x4)

HPS Memory Scheduler

SCHEDULERHPS SDRAM INTERCONNECT

NAND SD/ eMMC

16550 UART (x2)I2C(x5)

SPI (x2)

USB 2.0 OTG (x2)

10/100/1000 ETHERNET MAC with DMA (x3)

AM

BA

INTER

CO

NN

ECT

AM

BA

INTER

CO

NN

ECT

AcceleratorCoherency Port

512 KB L2 CACHE (SHARED)w/ ECC

Snoop Control Unit

ARM CoreSight ™ Multicore Debug/ Trace

AMBA INTERCONNECT

CO

RE 2

32 KB I-CACHEw/ Parity

ARM CORTEX™ – A9 MP

CO

RE 1

NEON SIMD FPU32 KB D-CACHE

w/ Parity

AXI 32 AXI 32/64/128 AXI 32/64/128

Page 18: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Prevention • AES-256 Bitstream Encryption

• Key masked prior to storing

• DPA Resistance

• JTAG readback not allowed

• JTAG disable

• Factory Test-mode disable

• Tamper-Protection mode

• On-chip Oscillator

Detection • JTAG monitoring • Built-in SEU detection

• On-chip Temperature sensor • On-chip Oscillator

• Unique Chip ID • Secure Boot (Code Authentication)

• VBAT Under-voltage detection

Response • JTAG disable

• Built-in SEU correction

• Chip-core zeroize

• Volatile Key zeroize

• And more!

Arria 10 Device Security Feature Types

Page 19: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Arria 10 SoC Secure Key Storage

Every Certification results in 3 keys − A private key, a public key and a code signing

key

Private Key Storage

− This key remains on the server or laptop and is invisible

Public Key Storage − The hash value of the public key is stored in the

eFUSE memory of the Arria 10 device

− In manufacturing the eFUSE is blown by the Quartus II programmer so all devices in the field have to run authenticated software

Code signing Key − Stored in the Flash from where the processor

boots

19

Private Key remains on a secure server or laptop

Arria 10 SoC

eFUSE

Public Key is stored in device eFuse memory

Code signing Key is Boot Flash

CSK*

Boot 1*

OS/App*

POF*

Page 20: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

20

Arria 10 Power Saving Features

Vcc PowerManager

Programmable Power

Technology

Enables device to run at lower than nominal Vcc while retaining same performance level reducing static and dynamic power

Enables lower power transistors for non- performance critical paths to reduce static power

Lower operating Vcc to trade off performance to achieve lower total power

SmartVoltage ID

Page 21: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Stat

ic P

ower

Dis

trib

utio

n Static Power Savings

Standard Static Power Limit

SmartVoltage ID Static Power

Limit

SmartVoltage ID Power Reduction

Allows FPGA to be operated at lower core Vcc while retaining same performance

Reduces worst case static power

Reduces average dynamic power consumption across distribution of devices − Lower OpEx

Requires power system controller that can support tuned voltage

21

Reduce Static Power by Up to 35%

Arria 10 FPGA

Host Power System Controller

Option 2Option 1

Tuned Vcc

Voltage

Vcc

SmartVoltage ID

Page 22: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Programmable Power Technology

22

Accelerate speed-critical paths while reducing power on non-speed

critical paths

Quartus II optimizes your design automatically, enabling high-speed

logic only where needed

Get performance where you need it, and reduced power

everywhere else

Static Power

Threshold Voltage

Reduces Core Static Power by Up to 20%

Page 23: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Summary

Arria 10 engineering focus was maintaining the right balance for a mid range device − Development cost, performance, power, unit cost − Focus on design efficiency through new methodologies

Arria 10 supports key hard features attractive to the target markets − HMC, processor, security

Hardened Floating point DSP feature is opening new

markets for FPGAs

23

Page 24: Design of a High- Density SoC FPGA at 20nm - Hot Chips · Design of a High- Density SoC FPGA at 20nm Brad Vest, Sean Atsatt, ... protocol hard IP blocks ... AXI 32 AXI 32/64/128 AXI

Thank You Thank You