the gpu advantage...the gpu advantage tianhe-1a at nsc tianjin tianhe-1a at nsc tianjin the...

55

Upload: others

Post on 14-Mar-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3
Page 2: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

To ExaScale and Beyond2

The GPU is the Computer3

The GPU Advantage1

GPU Computing

Page 3: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

The GPU Advantage

Page 4: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

A Tale of Two Machines

The GPU Advantage

Page 5: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Tianhe-1Aat NSC Tianjin

Page 6: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Tianhe-1Aat NSC Tianjin

The World’s Fastest Supercomputer

2.507 Petaflop

7168 Tesla M2050 GPUs

Page 7: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Tesla M2050 GPUs

Page 8: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

3 of Top5 Supercomputers

0

500

1000

1500

2000

2500

Tianhe-1A Jaguar Nebulae Tsubame Hopper II

Gig

afl

ops

Page 9: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

0

1

2

3

4

5

6

7

8

0

500

1000

1500

2000

2500

Tianhe-1A Jaguar Nebulae Tsubame Hopper II

Megaw

att

s

Gig

afl

ops

Top 5 Performance and Power

Page 10: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

NVIDIA/NCSA Green 500 Entry

Page 11: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

NVIDIA/NCSA Green 500 Entry

Page 12: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

NVIDIA/NCSA Green 500 Entry

128 nodes, each with:

1x Core i3 530 (2 cores, 2.93 GHz => 23.4 GFLOP peak)

1x Tesla C2050 (14 cores, 1.15 GHz => 515.2 GFLOP peak)

4x QDR Infiniband

4 GB DRAM

Theoretical Peak Perf: 68.95 TF

Footprint: ~20 ft^2 => 3.45 TF/ft^2

Cost: $500K (street price) => 137.9 MF/$

Linpack: 33.62 TF, 36.0 kW => 934 MF/W

Page 13: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Efficiency and Programmability

The GPU Advantage

Page 14: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

GPU200pJ/Instruction

CPU2nJ/Instruction

Page 15: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

GPU200pJ/Instruction

Optimized for Throughput

Explicit Managementof On-chip Memory

CPU2nJ/Instruction

Optimized for Latency

Caches

Page 16: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

CUDA GPU Roadmap

16

2

4

6

8

10

12

14

DP G

FLO

PS p

er

Watt

2007 2009 2011 2013

TeslaFermi

Kepler

Maxwell

Page 17: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Efficiency and Programmability

The GPU Advantage

Page 18: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

CUDA Enables Programmability

The GPU Advantage

Page 19: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

CUDA C: C with a Few Keywords

void saxpy_serial(int n, float a, float *x, float *y)

{

for (int i = 0; i < n; ++i)

y[i] = a*x[i] + y[i];

}

// Invoke serial SAXPY kernel

saxpy_serial(n, 2.0, x, y);

__global__ void saxpy_parallel(int n, float a, float *x, float *y)

{

int i = blockIdx.x*blockDim.x + threadIdx.x;

if (i < n) y[i] = a*x[i] + y[i];

}

// Invoke parallel SAXPY kernel with 256 threads/block

int nblocks = (n + 255) / 256;

saxpy_parallel<<<nblocks, 256>>>(n, 2.0, x, y);

Standard C Code

CUDA C Code

Page 20: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

DirectX

GPU Computing Ecosystem

Languages & API’s

Tools & PartnersIntegratedDevelopment Environment

Parallel Nsight for MS Visual Studio

Mathematical Packages

Consultants, Training& Certification

Research & Education

All Major Platforms

Libraries

Fortran

Page 21: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

GPU Computing TodayBy the Numbers:

CUDA Capable GPUs200 Million

CUDA Toolkit Downloads600,000

Active GPU Computing Developers100,000

Members in Parallel Nsight Developer Program8,000

Universities Teaching CUDA Worldwide362

CUDA Centers of Excellence Worldwide11

Page 22: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

To ExaScale and Beyond

Page 23: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Science Needs 1000x More Computing

1,000,000,000

1,000,000

1,000

1

Gigaflops

1982 1997 2003 2006 2010 2012

Estrogen Receptor36K atoms

F1-ATPase327K atoms

Ribosome2.7M atoms

Chromatophore50M atoms

BPTI3K atoms

BacteriaBillions of atoms

1 Exaflop

1 Petaflop

Ran for 8 months to simulate 2 nanoseconds

Page 24: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

DARPA Study Identifies Four Challenges for

ExaScale Computing

Report published September 28, 2008:

Four Major Challenges

Energy and Power challenge

Memory and Storage challenge

Concurrency and Locality challenge

Resiliency challenge

Number one issue is power

Extrapolations of current architectures and

technology indicate over 100MW for an Exaflop!

Power also constrains what we can put on a chip

Available at

www.darpa.mil/ipto/personnel/docs/ExaScale_Study_Initial.pdf

Page 25: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Power is THE Problem

Page 26: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

A GPU is the Solution

Power is THE Problem

Page 27: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

ExaFLOPS at 20MW = 50GFLOPS/W

0.1

1

10

100

2010 2013 2016

GFLO

PS/W

(Core

)

GPU

CPU

EF

GPU ─ 5GFLOPS/W

Page 28: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

50GFLOPS/W

10x Energy Gap for Today’s GPU

0.1

1

10

100

2010 2013 2016

GFLO

PS/W

(Core

)

GPU

CPU

EF

GPU ─ 5GFLOPS/W

10x

ExaFLOPS Gap

Page 29: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

0.1

1

10

100

2010 2013 2016

GFLO

PS/W

(Core

)

GPU

CPU

EF

GPUs Close the Gap with

Process and Architecture

4xProcess Technology

4xArchitecture

Page 30: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

0.1

1

10

100

2010 2013 2016

GFLO

PS/W GPU

CPU*

EF

GPUs Close the Gap

with Process and Architecture

Page 31: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

0.1

1

10

100

2010 2013 2016

GFLO

PS/W

GPU

CPU*

EF

GPUs Close the Gap

With CPUs, a Gap Remains

6xResidual CPU Gap

Page 32: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

GPUs Close the GapWith CPUs, a Gap Remains

Heterogeneous Computing is Required to get to ExaScale

0.1

1

10

100

2010 2013 2016

GFLO

PS/W GPU

CPU*

EF

Page 33: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

NVIDIA’s Extreme-Scale Computing Project

Echelon

Page 34: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Echelon Team

Page 35: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

System Sketch

Self-Aware OS

Self-Aware Runtime

Locality-AwareCompiler & Autotuner

Echelon System

Cabinet 0 (C0) 2.6PF, 205TB/s, 32TB

Module 0 (M)) 160TF, 12.8TB/s, 2TB M15

Node 0 (N0) 20TF, 1.6TB/s, 256GB

Processor Chip (PC)

L0

C0

SM0

L0

C7

NoC

SM127

MC NICL20 L21023

DRAMCube

DRAMCube

NV RAM

High-Radix Router Module (RM)

CN

Dragonfly Interconnect (optical fiber)

N7

LC

0

LC

7

Page 36: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Execution Model

A B

Active Message

Abstract Memory Hierarchy

Global Address Space

ThreadObject

B

Lo

ad

/Sto

re

A

B

Page 37: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

The High Cost of Data MovementFetching operands costs more than computing on them

20mm

64-bit DP20pJ 26 pJ 256 pJ

1 nJ

500 pJ Efficientoff-chip link

28nm

256-bitbuses

16 nJDRAMRd/Wr

256-bit access8 kB SRAM

50 pJ

Page 38: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

An NVIDIA ExaScale Machine

Page 39: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Lane – 4 DFMAs, 20GFLOPS

DFMA DFMA DFMA DFMA

Main

Registers

LSI LSI

Operand Registers

L0 I$

L0 D$

Page 40: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

SM – 8 lanes – 160GFLOPS

P P P P P P P P

Switch

L1$

Page 41: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

1024 SRAM Banks, 256KB each

128 SMs 160GF each

Chip – 128 SMs – 20.48 TFLOPS+ 8 Latency Processors

NIMC MC

SM SM SM SM

NoC

SM LP LP

SRAM SRAM SRAM

Page 42: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Node MCM – 20TF + 256GB

GPU Chip

20TF DP

256MB

1.4TB/s

DRAM BW

150GB/s

Network BW

DRAMStack

DRAMStack

DRAMStack

NVMemory

Page 43: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

32 Modules, 4 Nodes/Module, Central Router Module(s), Dragonfly Interconnect

NODE

NODE

NODE

NODE

MODULE

NODE

NODE

NODE

NODE

MODULE

ROUTER

ROUTER

ROUTER

ROUTER

MODULE

NODE

NODE

NODE

NODE

MODULE

NODE

NODE

NODE

NODE

MODULE

Cabinet – 128 Nodes – 2.56PF – 38 kW

Page 44: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Dragonfly Interconnect400 Cabinets is ~1EF and ~15MW

System – to ExaScale and Beyond

Page 45: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3
Page 46: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

GPU Computing Enables ExaScaleAt Reasonable Power2

The GPU is the ComputerA general purpose computing engine, not just an accelerator3

GPU Computing is #1 TodayOn Top 500 AND Dominant on Green 5001

GPU Computing is the Future

The Real Challenge is Software4

Page 47: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3
Page 48: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3
Page 49: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Optimize the Storage Hierarchy2

Tailor Memory to the Application3

Data Movement Dominates Power1

Power is THE Problem

Page 50: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Some Applications Have Hierarchical Re-Use

0

20

40

60

80

100

120

1.0E+0 1.0E+3 1.0E+6 1.0E+9 1.0E+12

% M

iss

Size

DGEMM

Page 51: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Applications with Hierarchical

Reuse Want a Deep Storage Hierarchy

P P P P P P P P P P P P P P P P

L2 L2 L2 L2

L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1

L3

L4

Page 52: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Some Applications Have

Plateaus in Their Working Sets

0

20

40

60

80

100

120

1.0E+0 1.0E+3 1.0E+6 1.0E+9 1.0E+12

% M

iss

Size

Table

Page 53: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Applications with Plateaus

Want a Shallow Storage Hierarchy

P P P P P P P P P P P P P P P P

NoC

L2 L2 L2 L2

L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1

Page 54: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Configurable Memory Can Do Both

At the Same Time

Flat hierarchy for large working sets

Deep hierarchy for reuse

“Shared” memory for explicit management

Cache memory for unpredictable sharing

P

L1

SRAM SRAM SRAM SRAM

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

P

L1

NoC

Page 55: The GPU Advantage...The GPU Advantage Tianhe-1A at NSC Tianjin Tianhe-1A at NSC Tianjin The World’s Fastest Supercomputer 2.507 Petaflop 7168 Tesla M2050 GPUs Tesla M2050 GPUs 3

Configurable Memory Reduces Distance and Energy

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

P L1

SRAM

P L1

SRAM

P L1 P L1

RO

UT

ER

RO

UT

ER

RO

UT

ER

RO

UT

ER