boost soc performance from edge to cloud - 2017 arm ... · boost soc performance from edge to...

26
© ARM 2016 Boost SoC performance from edge to cloud ARM ® CoreLink™ System IP Neil Parris, Director interconnect marketing China Tech Symposia Systems and software group, ARM November 2016

Upload: haduong

Post on 15-Mar-2018

224 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

Title 44pt Title Case

Affiliations 24pt sentence case

20pt sentence case

© ARM 2016

Boost SoC performance from edge to cloud ARM® CoreLink™ System IP

Neil Parris, Director interconnect marketing

China Tech Symposia

Systems and software group, ARM

November 2016

Page 2: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 2

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

3.7 exa-bytes

per month 22x bandwidth

increase

More nodes, new use cases

1ms end to end

~30x access

nodes

Page 3: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 3

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Intelligent flexible cloud to enable new use cases

Compute

Acceleration

Page 4: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 4

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Heterogeneous compute requires coherency

Flexible heterogeneous architecture

Blend compute and acceleration for target solution

Fast, reliable transport to shared memory

Maximize throughput, minimize latency

Coherency simplifies software

Accelerate SoC development and deployment

IP designed, optimized and validated for systems

Cortex-A

ARM IP Tooling

CoreLink Interconnect

CoreLink Controllers

CoreSight

Coherent backplane

TrustZone

Accelerator

Page 5: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 5

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

3rd-generation ARM coherent backplane IP

CoreLink CMN-600 Coherent Mesh Network

CoreLink DMC-620 Dynamic Memory Controller

Optimized for next-generation intelligent connected systems

Page 6: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 6

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Build more powerful systems

Boost performance

Up to 5x more throughput

Fastest path to DDR4 memory

Up to 50% latency reduction

Performance at any design point

Up to 32 clusters (128 CPUs)

Frequencies exceeding

2.5GHz

Tailor designs from

edge to cloud

>1TB/s bandwidth

Performance comparison to ARM CoreLink CCN and CoreLink DMC-520

Page 7: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 7

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Delivering maximum compute density

6x compute

5x throughput

32x Cortex-A72

CoreLink CCN-508

4x DMC-520

64x Cortex-A72

CoreLink CMN-600

8x DMC-620

16x Cortex-A57

CoreLink CCN-504

2x DMC-520

Rela

tive

Perf

orm

ance

Compute = measured by specint2k6_rate

Throughput = achieved requested bandwidth

Same process node and test conditions.

2.5x

0

1

2

3

4

5

6

Page 8: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 8

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Fastest path to DDR4 memory

50%

Interconnect

+ DMC

DDR PHY

+ memory

Static latency – Cortex-A72 load-to-use

Same process node and test conditions

Estimated DDR PHY + memory cycles for 3rd party PHY & closed page DRAM

CoreLink CMN-600 configured as 4 cpu cluster to match CoreLink CCN-504

CPU

CoreLink

CCN-504

DMC-520

CoreLink

CMN-600

DMC-620

DDR PHY

+ memory CPU

Interconnect

+ DMC

Increase CPU performance

50% backplane latency reduction

High frequency mesh transport

One cycle per mesh cross point

Improved area efficiency

60% more bandwidth for same area

Page 9: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 9

Text 54pt sentence case

9

Text 54pt sentence case Tailor solutions from edge to cloud

Page 10: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 10

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

New scalable coherent mesh architecture

Agile System Cache Agile System Cache

Accelerator

DMC-620 DMC-620

Custom mesh size and

device placement

Agile System Cache

with snoop filter

Cortex-A

CoreLink CMN-600

1 to 32 clusters (128 CPUs)

mix ARMv8-A CPUs and accelerators

1 to 8 high performance

DDR4-3200 controllers

NIC-450

PCIe 100GbE

DDR4-3200

IO

DDR4-3200

Up to 32 IO coherent

subsystems

Coherent Multichip Link

CCIX support

Page 11: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 11

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Scalable solutions from edge to cloud

Access point

Data center compute

System Cache

Accelerator

Cortex-A

CoreLink CMN-600

DMC-620

IO

NIC-450

IO

Automated interconnect generation with ARM CoreLink Creator

Bandwidth >1 TB/s 20 GB/s

System cache 128MB 0MB

DDR channels 8 1

Cortex-A CPUs 128 1

Data center compute

DMC-620 100GbE PCIe

DMC-620 DMC-620

DMC-620

DM

C-6

20

DM

C-6

20

DM

C-6

20

DM

C-6

20

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CoreLink CMN-600

Page 12: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 12

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Why a mesh topology?

Interconnect capabilities scale with

system size

Naturally add links, wires, cross point

routers with resources

Mesh cross sectional bandwidth scales

by N vs 1 for a ring topology

Mesh latency scales by √N vs N for a

ring topology

Bandwidth scaling comparison

Achieved coherent bandwidth as observed by requestors

Same process node and test conditions

Ach

ieve

d B

andw

idth

(G

B/s

)

Number of CPU Clusters

CoreLink CCN CoreLink CMN 0

200

400

600

800

1000

1200

0 8 16 24 32

CoreLink CCN Family

CoreLink CMN-600

Page 13: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 13

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

CoreLink CMN-600

DMC-620

Innovations to increase throughput

Intelligent cache allocation

Throughput uplift for RDMA, networking, storage

IO allocate on ingress, de-allocate on egress

Combine with integrated scratch pad

Lock critical counters, stats, and tables on-chip

Software configurable cache partitioning

0 0.5 1 1.5 2

DDR

Relative IO throughput

Agile System Cache

Cortex-A IO

Allocate on

ingress

Read and

invalidate on

egress

Scratch Pad

Cache

Lock critical

data

IO

Page 14: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 14

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Maximizing heterogeneous SoC performance

New working groups for affinity or isolation

Assign cache, bandwidth and memory resources

Flexible assignment, software programmable

Provides predictable multi-application performance

QoS regulation for compute, accelerators, IO

End-to-end regulation from master thru memory

Tune for bandwidth, latency or real-time traffic

Intelligent memory scheduling to meet guarantees

DMC-620 100GbE PCIe

DMC-620 DMC-620

DMC-620

DM

C-6

20

DM

C-6

20

DM

C-6

20

DM

C-6

20

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

System Cache

Accelerator

Cortex-A

DMC-620

IO

NIC-450

IO

System Cache

DMC-620

control

plane data

plane

virtual

machine

virtual

machine

virtual

machine

virtual

machine

Page 15: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 15

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Enterprise-class DDR3/4 memory controller

Lowest memory latency and bandwidth

utilization with efficient QoS Up to 95% utilization with random traffic

Up to 50% reduction in static pipeline latency

Up to DDR4-3200 memory DDR3/4 with UDIMM, RDIMM, LRDIMM

Up to 1 TB per channel with 3D stacked DRAM

Advanced Security and RAS Integrated ARM TrustZone

SECDED or symbol based error correction

End-to-end data path parity protection Secure, reliable, protected

data

System optimized

Latest DDR standards

Performance comparison to CoreLink DMC-520

Page 16: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 16

Text 54pt sentence case

16

Text 54pt sentence case Multichip interconnect standards

Page 17: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 17

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Interconnect standards for different needs

ARM AMBA

The standard for on-chip communication enabling IP

portability, creation and re-use

CCIX

Extends the benefits of cache coherency to the multi-chip

server node for evolving accelerator and IO use-cases

GenZ

Enables a new data centric computing approach with scalable

memory pools at both server node and rack level

Page 18: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 18

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

CCIX: Extending coherency benefits to multichip

New work loads require more shared data, higher

bandwidth and lower latency

Coherency eliminates the software and DMA

overhead of transferring data between devices

Free flowing, high frequency AMBA 5 CHI data

transfers, transferred over multichip topologies

Accelerates time to deployment by leveraging

existing PCIe transport

IP, electricals, mechanicals and software exist today

Extends top end bandwidth to 25Gbps

Server node with shared address space

Compute Node Accelerator

DDR Memory

CCIX

DDR Memory

CC

IX

CC

IX

Page 19: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 19

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

GenZ: A new approach to data access

Data centric computing approach to big data

Interconnect based on memory operations

Eliminates traditional complex, code intensive block

based storage software stacks

Storage Class Memory (SCM)

New, emerging non-volatile memory technologies

Latencies closer to traditional DDR than SSDs.

Disaggregated memory at rack scale

Large pool of low latency, volatile and non-volatile

memory at the rack scale

Dynamic utilization/allocation lowers TCO

Server node

DDR Memory DDR Memory

CC

IX

CC

IX

Pooled Memory

Server node

DDR Memory DDR Memory

CC

IX

CC

IX

Storage Class

Memory

GenZ

Storage Class

Memory

GenZ

Storage Class

Memory

GenZ

Storage Class

Memory

GenZ

Data center rack

Page 20: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 20

Text 54pt sentence case

20

Text 54pt sentence case Accelerating system deployment

Page 21: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 21

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Assemble systems in days with IP tooling

Enables guided intelligent IP configuration, creation and assembly

Ensures system viability with design rule checks (DRCs)

Reduces and converges iterations quickly

SYSTEMS IP

Configure Create Assemble

Cortex-A

ARM IP tooling

CoreLink CMN-600

CoreLink DMC-620

CoreSight

Coherent backplane

1-32

clusters

TrustZone

Accelerator

Page 22: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 22

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Accelerate software development

Device Drivers (UEFI/ACPI)

Linux Kernel Hypervisor

Application & ODP API

ARM Fixed Virtual Platform (FVP)

Cortex-A

ARM IP tooling

CoreLink CMN-600

CoreLink DMC-620

CoreSight

Coherent backplane

1-32

clusters

TrustZone

Accelerator

Reference software stack Open source device drivers for CoreLink IP

Linux kernel and OS boot ready

Compliant with UEFI, ACPI and Server Base

System Architecture (SBSA)

Prototype with fixed virtual platforms Prototyping model of reference system

Built with ARM Fast Models for IP components

Reference subsystem memory map and

registers

Page 23: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 23

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Jump start SoC designs

Device Drivers (UEFI/ACPI)

Linux Kernel Hypervisor

Application & ODP API

ARM Fixed Virtual Platform (FVP)

Cortex-A

ARM IP tooling

CoreLink CMN-600

CoreLink DMC-620

CoreSight

Coherent backplane

1-32

clusters

TrustZone

Accelerator

System reference design data Peta cycles of system validation

Measured RTL industry benchmark reports

Measured area, frequency and power in

targeted process nodes

Page 24: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 24

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Trusted and proven ARM CoreLink family

ARM CoreLink System IP – silicon

proven in billions of devices

CoreLink CMN-600 & DMC-620

applicable to multiple applications

>75 Coherent

interconnect

licenses

>75 Memory

controller

licenses

>500 Total

interconnect

licenses

Page 25: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

© ARM 2016 25

Title 40pt Title Case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Build more powerful SoCs – faster

CoreLink CMN-600 Coherent Mesh Network and

CoreLink DMC-620 Dynamic Memory Controller

5x more throughput

50% lower latency

Accelerate deployment Tailor solutions

1 to 32 clusters (128 CPUs)

Mix compute and acceleration

Automated interconnect creation

Software virtual prototyping

Boost performance

Page 26: Boost SoC Performance from edge to cloud - 2017 Arm ... · Boost SoC performance from edge to cloud ARM ® CoreLink System IP Neil Parris, Director interconnect marketing China Tech

The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited

(or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be

trademarks of their respective owners.

Copyright © 2016 ARM Limited

© ARM 2016

The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited

(or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be

trademarks of their respective owners.

Copyright © 2016 ARM Limited

Confidential © ARM 2016