system ip for 64bit systems - armtechforum.com.cn · trace off-chip trace self-hosted trace ......

26
1 System IP for 64-bit Systems ARM ® CoreLink and CoreSight System IP ARM Tech Symposia November, 2014

Upload: hadan

Post on 06-Jul-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

1

System IP for 64-bit Systems ARM® CoreLink™ and CoreSight™ System IP

ARM Tech Symposia

November, 2014

2

64-bit Mobile Devices

3

The Mobile Consumer Expects Something New Every Year

Your next

flagship devices

2011 2012

2013

2014 Dual-core CPU

performance

>5.0 inch

1080p 60fps

screens >13 MPixel

camera

ARMv8-A and

shift to 64-bit

4

Why 64-bit in Mobile?

Performance through

architecture

Cleaner instruction set architecture

Hard-float ABI by default in ARMv8-A

More registers, less stack spillage

Cheaper function calls

Up to 16x crypto acceleration

Preparation for larger memory devices

5

Increasing Demand for System Bandwidth

2009 2011 2013 2015

50

40

20

10

>20 Mpixel cameras

and 4K output

Capture and screen

frame rates

Screen sizes and

resolutions

30

Year of device shipping

Peak

on-c

hip

sy

stem

ban

dw

idth

(G

B/s

)

6

Designing Within an Energy and Thermal Envelope

High-end feature rich gaming

Video editing on the move

SoC mobile power envelope

2.5 - 3W 4 - 5W 7W

7

Mobile users spend a high amount of time on a

range of mobile applications*:

38% on web browsing and Facebook

32% on gaming

16% on audio, video and utility

Common “building blocks” in workloads:

Short bursts of high intensity

Long periods of sustained high intensity

Low intensity

Mobile Application Workloads

Measured on a Quad Cortex-A7 Symmetric Multiprocessing platform

* Source: Flurry Analytics Time

Time

Time

Pow

er

Pow

er

Pow

er

Web Browsing

Gaming

Audio Playback

8

Heterogeneous Computing

2x higher performance vs. SMP*

Up to 75% CPU power savings vs. SMP*

Architecturally Identical Processors

High performance tuned big cores

Low power tuned LITTLE cores

Hardware Coherency

Cache Coherent Interconnect (CCI)

L1 and L2 snooping between clusters

Seamless & Automatic Task Allocation

big.LITTLE Technology

“Right Task on the Right Core”

L2 Cache L2 Cache

Cache Coherent Interconnect

Interrupt Control

Up to 40% SOC power savings**

** Measured across a set of casual games and common use-cases on an ARM

Partner 4xCortex-A15.4xCortex-A7 big.LITTLE device

big Cluster

LITTLE Cluster

* Quad Cortex-A15 Symmetrical Multiprocessing System (SMP)

9

big.LITTLE is Mainstream

Cortex-A15/A7 big.LITTLE in product in 2014

Mediatek MT8135 , Samsung Exynos 5422 , Allwinner A80

High-end mobile moving to A57 and A53 big.LITTLE

Benefits for additional high-end performance, 64-bit

Silicon expected in late 2014 - e.g. Qualcomm Snapdragon 810,

Exynos 7 Octa

Global Task Scheduling is now a differentiation point

HMP access to all cores

10

CoreLink CCI-400 System Coherency for 64-bit big.LITTLE

First of a generation supporting multi-cluster coherency.

We are actively working on the next generation CCI.

Quad Cortex-A57

L2

Mali-GPU

CoreLink™ CCI-400

Cache Coherent Interconnect

with AMBA® 4 ACE™

System and I/O DDR

L2

Quad Cortex-A53

High performance hub interconnect for smart

phone and beyond

2 CPU clusters, 8 core GPU, DMC

Performance and power efficiency with big.LITTLE

Supporting Cortex-A53 and Cortex-A57

Integrated clock gating

System level hardware coherency

Full coherency for CPU

I/O coherency for GPU

Mature and silicon proven, over 30 licensees

11

Common memory

view for all SoC components

Unified interrupts for

complex processors

64-bit Mobile Sub-System Example

Software Debug, Hardware

Performance Trace

Hardware coherency enables

big.LITTLE and simplifies software

Optimized path to

memory for best performance

Configurable interconnect enables

flexible system design

GIC-500I/O Coherent

Masters

Cortex-A57 Cortex-A53

Peripherals

MMU-500

MMU-500

DRAM

NIC-400

Mali T760

GPU

CCI-400

TZC-400

Mali

V500Display

NIC-400

Memory System

DMC-400 3rd

Party: LPDDR3/4

ET

MS

TM

12

GIC-500I/O Coherent

Masters

Cortex-A57 Cortex-A53

Peripherals

MMU-500

MMU-500

DRAM

NIC-400

Mali T760

GPU

CCI-400

TZC-400

Mali

V500Display

NIC-400

Memory System

DMC-400 3rd

Party: LPDDR3/4

64-bit Mobile Sub-System – Debug and Trace

Run-control

debug

Real-time

trace

Off-chip

trace

Self-

hosted

trace

Debugger

access to

peripherals Debugger access

to memory

System &

Software

Trace

Cross

communication

of events

Event & trace

correlation

Performance

Analysis

ETM ETM

PMU PMU

TPIU CTM

timestamp

TMC

STM

13

64-bit Infrastructure Devices

14

STB

Wide range of network performance and intelligence behaviors

Content in the Cloud Drives Intelligence in the Network

Content moving closer to user for better performance

The Cloud / Data Center

Rendering moving into network

for greater UI possibilities

Display Clients

15

Networking and Datacenter Infrastructure requires solving diverse problems…

Heterogeneous platforms for diverse environments

Data center to shopping center!

Power efficiency and elasticity are always important

Evolving compute problems

Demanding performance/efficiency requirements

Different cores for different problems

Common SW Framework on heterogeneous compute platforms

Infrastructure Compute Challenges

16

Heterogeneous Compute Requirements

Specialised Processing

L1, Content Delivery, Security

Diverse requirements

Trend: Advanced modulation schemes

Need: DSPs, Accelerators

Data Plane Processing

Throughput driven, IO intensive

Deterministic performance

Trend: Higher packet rates

Need: Small Cores at Maximum Efficiency

Control Plane Processing

Fast Event Processing

Complex signalling

Trend: Evolving Software

Need: Efficient, High Compute Performance

MAC Scheduling

Real Time, Latency Driven

Multiple core processing

Trend: More Complexity (LTE-A, 5G)

Need: High Compute, Low Latency Performance

High Bandwidth, Low Latency Interconnect

Wide Range of Implementations from Few to Many Coherent Devices

17

Scalable Efficient Interconnect for Compelling Solutions

CCN-508

Syst

em

Perf

orm

ance

High-end Mid-range Cost-efficient

System Size

CCN-504

CCN-502

CCI-400

CCN-512

Level-3 Cache Size 0MB 32MB

DDR Bandwidth 20 GB/s 100 GB/s

On-chip bandwidth 0.2 Tb/s 1.8 Tb/s

AMBA 5 CHI

AMBA 4 ACE

18

Extending the ARM CoreLink™ Cache Coherent Network Family

• 2 new members extend the scalability of the CCN family

• Native AMBA 5 CHI interfaces providing high frequency, non-blocking data transfers

• End-to-end QoS and RAS

• Integrated Level 3 Cache and Snoop Filter

Up to 4 Clusters (16 cores)

Small to Mid-Range Systems

CoreLink CCN-512 Maximize Compute Density

Up to 12 Clusters (48 cores)

High-End Systems

CoreLink CCN-502 High Performance, Small Footprint

DSPDSP

ACE

Network Interconnect

NIC-400

Flash

NIC-400

USB

Memory

Controller

DMC-520

x72

DDR4-3200

AHB

Snoop Filter1-32MB L3 cache

PCIe

10-40

GbE

DPI Crypto

CoreLink™ CCN-512 Cache Coherent Network

DSP SATA

Memory

Controller

DMC-520

x72

DDR4-3200

Cortex-A57

Memory

Controller

DMC-520

x72

DDR4-3200

Memory

Controller

DMC-520

x72

DDR4-3200

PCIe

DPI

I/O Virtualisation CoreLink MMU-500

SRAM

Network Interconnect

NIC-400

GPIO PCIe

GIC-500

Cortex CPU

or CHI

master

Cortex-A53

Cortex-A57

Cortex-A53

Cortex-A57

Cortex-A53

Cortex-A57

Cortex-A53

Cortex CPU

or CHI

master

Cortex CPU

or CHI

master

Cortex CPU

or CHI

master

DSPDSP

NIC-400

USB

Snoop Filter0-8MB L3 cache

PCIe

10-40

GbE

CoreLink™ CCN-502 Cache Coherent Network

DSP SATA

Memory

Controller

DMC-520

x72

DDR4-3200

Memory

Controller

DMC-520

x72

DDR4-3200

I/O Virtualisation CoreLink MMU-500

Network Interconnect

NIC-400

Flash SRAM GPIO PCIe

GIC-500

Memory

Controller

DMC-520

x72

DDR4-3200

Memory

Controller

DMC-520

x72

DDR4-3200

Cortex-A53 Cortex-A53Cortex-A57 Cortex-A57

19

Scalable Platform for Diverse Processing Needs

Cortex-A7

Cortex-A53

CCI-400

CCN-502

Cost-Efficient Power-Optimized

CCN-502

CCN-504

Cortex-A53

Cortex-A57

Mid-range Performance

CCN-508

CCN-512

Cortex-A53

Cortex-A57

High Performance Networking

and Server

20

Efficient Hardware-Assisted Virtualization

Direct hardware access with MMU-500

Low latency interrupt delivery with GIC-500

Support for on-chip or off-chip peripherals

21

Software Debug, Hardware Debug

And System Profiling

Configurable interconnect enables

flexible system design

Common memory

view for all SoC components

Unified interrupts for

complex processors

Optimized path to

memory for best performance

Hardware coherency enables

scaling and simplifies software

64-bit Infrastructure System Example

DSPDSP

NIC-400

USB

Snoop Filter0-8MB L3 cache

PCIe

10-40

GbE

CoreLink™ CCN-502 Cache Coherent Network

DSP SATA

Cortex-A57

Memory

Controller

DMC-520

x72

DDR4-2400

Memory

Controller

DMC-520

x72

DDR4-2400

I/O Virtualisation CoreLink MMU-500

NIC-400

Flash SRAM GPIO PCIe

GIC-500

Memory

Controller

DMC-520

x72

DDR4-3200

Memory

Controller

DMC-520

x72

DDR4-3200

Cortex-A53 Cortex-A53Cortex-A57

PT

MS

TM

22

System IP for 64-bit Systems

Summary

23

Mobile Picking Up the Pace and Reach

Need for more performance in a constrained

thermal envelope

Premium mobile expects something new

every year

ARMv8-A and

shift to 64-bit

24

NFV

Cloud

RAN

Equipment

Base Station

Controller

Optical Core

Networking

Equipment

B-RAS

Evolved Packet Core

SGSN

Storage Array Network

Controller

Edge Server

Cellular Macro Cell

Base Stations

Core Server

Email Web

HPC Scientific Compute

GGSN

Cellular Small Cell

Base Stations

DSLAM

Microwave

Backhaul

Optical Line

Termination Mobile Broadband

Access and

Aggregation

Edge

Router

Core Router

Media content web

Scale out storage

SDN

Cable Modem

DSL Modem

Home Gateway Set Top Box

Femto BTS

Optical Network

Termination

Wi-Fi

Access Point

Cellular Remote

Radio Head/Antenna CDN

Cloud

Scalable Platforms for Diverse Processing Needs

XaaS Cloud

CDN

25

Debug and Trace Solutions for 64-bit Systems

26

64-bit applications support

enabled

100% compatibility

for 32-bit applications

Interconnect, interrupts,

virtualization,

debug and trace

Juno – The First ARMv8-A 64-bit Software Development Target

64 and 32-bit

Software

System IP

ARMv8-A

Juno

Premium ARMv8-A

software target

platform

Available now