arm technical symposium developing software for multicore arm systems

22
1 Developing Software for Multicore ARM Platforms

Upload: raghu-guruvayurappan

Post on 29-Nov-2015

29 views

Category:

Documents


1 download

DESCRIPTION

arm processor

TRANSCRIPT

1

Developing Software forMulticore ARM Platforms

2

Agenda

What do we mean by multicore?

Benefiting from SMP

OS support

Applications

Existing software

Impact on other ARM technologies

3

What do we Mean by Multicore?

Heterogeneous multicore systems have existed for a longtime:

Cortex™-A8Mali™-400

MP Cortex-M3

Interconnect

Power ManagerApplicationUser Interface

and 3D graphics

Memory

4

Coherent Multicore Cluster

Cortex family of MPCore™ processors:

Cortex-A5 MPCore, Cortex-A9 MPCore and Cortex-A15 MPCore

Cortex-R5 MPCore, Cortex-R7 MPCore

Cortex-A9 Cortex-A9…

Coherency Logic

Power ManagerUser Interface

and 3D graphics

Mali-400 MP Cortex-M3

Interconnect

Homogenous multicore cluster, as part of a heterogeneoussystem:

5

Coherency

ARM MPCore processors provide:

Cortex-A series

Coherency Logic

D$I$

Cortex-A series

D$I$

AXI connection tomemory system

ACP Hardware maintained

coherency between L1 datacaches

Broadcast of cache and TLBmaintenance operations

Inter-processor interruptsignalling using integratedinterrupt controller

Coherency with externalun-cached masters using ACP

6

Agenda

What do we mean by multicore?

Benefiting from SMP

OS support

Applications

Existing software

Impact on other ARM technologies

7

SMP OS

A symmetric multi-processing(SMP) OS runs across multipleCPUs

Each CPU sees the same memorysystem

Task can be scheduled to any CPU

A multi-threaded task may run onseveral CPUs at once

The OS can hide much of thecomplexity from applications

Many widely used OS alreadysupport SMP on ARM

CPUCPU

Application

SMP OS

Thread Thread

MPCore Cluster

8

Multiple OS

It is also possible to run multiple different operating systems

For example, SMP Linux in parallel with a RTOS

CPUCPU

Application

SMP OS

MPCore Cluster

CPU

RTOS

Task Thread Thread

The two operating systems perform different, but related tasks

User interface on SMP Linux, Modem stack on the RTOS

9

Applications

Applications can be written to take advantage of multicoreenvironment

Work is split across multiple independent threads, which the OS canschedule to different CPUs

Functional blocks are serially dependent but temporallyindependent

Each block runs as a separate thread, running on different CPUs

Analogue

VideoSampling

Remove

Inter-Frame

Redundancy

QuantiseSamples

Run-Length

Compress

Buffer

Store

Motion

Compensation

(Simplified MPEG encoding functional block diagram)

CPU0 CPU1

CPU2

CPU3

MPEG

10

Applications (cont)

Other examples of splitting across threads:

Image Processing

Application

Thread

Thread

Thread

Thread

Thread

Application

Application

Single frame dividedinto multiple regions

Each region handled bya different thread

Spawn a new threadper frame

For example, rapid shotmode on a camera

11

Barriers and Synchronization

The tasks being run on different cores will requiresynchronization

CPU 0

Load application intomemory

Signal CPU1 that newapplication can be run

CPU 1

Wait for signal from CPU0

Run application

MPCore cluster

Synchronization code will normally include manual barriers

Data Synchronization Barrier (DSB) or Data Memory Barrier (DMB)

12

Barriers in Action

STR r11, [r1] ; Save instruction to program memory

DCCMVAU r1 ; clean D-$ so instruction visible to I-$

DSB ; ensure clean completes on all CPUs

ICIMVAU r1 ; discard stale data from I-$ …

BPIMVA r1 ; … and from Branch Predictor

DSB ; ensure I-$/BP invalidates complete for allSTR r0, [r2] ; set flag == 1 to signal completion

ISB ; synchronize context on this processor

MOV pc, r1 ; branch to new code

P1-Pn

WAIT ([r2] == 1) ; wait for flag signaling completion

; no barrier required here

ISB

MOV pc, r1 ; execute newly saved instruction

13

Single Threaded Tasks and SMP

Existing software may not be optimized for a multicoreenvironment

Single threaded applications

The SMP OS can schedule different applications to differentCPUs

Complexity of multicore environment hidden from the application bythe OS

ThreadBrowser

Video Player

E-mail client Thread

SMP OS Scheduler

CPU 0 CPU 1 CPU<n>

14

Case Study

Single threaded browsersaw a 1.54x performanceimprovement when run on adual-core system

No code changes required inthe browser

Improvement comes fromthe OS ability to schedulethe non-browser tasks to theother CPU

Dual core Cortex-A9 MPCore, running Android‘Froyo’, 2.6.32 kernel, BBench2010_server

1.54x

Available Compute Profile

1 Core 2 Core

Core 2

Off

Idle

0

1

2

1 Core 2 Cores

Browser Performance

15

Case Study (cont.)

Running on a single core, the browser performance fell (0.78x)when also listening to streaming audio

But browser performance increased when run on a dual-core

Could also choose to maintain performance (1.0x), and lowerfrequency

0

1

2

1 Core (Browser only) 1 Core (Browser &Web Radio)

2 Cores (Browser andWeb Radio)

2 Cores (Browseronly)

Browser Performance with Streaming Web Radio Application

1.5x

0.78x

1.54x

16

Agenda

What do we mean by multicore?

Benefiting from SMP

OS support

Applications

Existing software

Impact on other ARM technologies

17

Introducing TrustZone® Technology

What about multicore?

Operating System

TrustZone Driver

Vendor Specific Library

Application

TEE

Trusted Service(s)

Secure Monitor

Normal Secure

Privileged

User

18

TrustZone in a Multicore System

Architecture allows for a full SMP OS in the Secure world

Design aim for TEE is simplicity, this aids certification

SMP support is normally not needed, and represents unnecessarycomplexity

TEE executes on one processor only

TEE

Application

SMP OS

CPU CPU CPU CPU

Application Application Application

Normal

Secure

19

Dedicated CPU

Alternative model is to dedicate one CPU to TrustZone

Only justified if making heavy use of trusted services

TEE

Application(s)

SMP OS

CPU CPU CPU CPU

20

Large Physical Address

The Cortex-A15 processor introduces support for the LargePhysical Address Extension (LPAE)

Extension to ARMv7-A VMSA

Provides 32-bit virtual address mapped to a greater than 32-bitphysical address

Each application is limited to a 4GB virtual address space

But OS has potentially more than 4GB of memory to work with

Becomes more important as the number of current processesincreases

And the amount of memory consumed by those processesincreases

21

Multiple Clusters

The Cortex-A15 MPCore processor, together with AMBA®4ACE™, supports multiple coherent clusters

Cortex-A15…

Coherency Logic in L2 Cache

Coherent Interconnect

Cortex-A15 Cortex-A15…

Coherency Logic in L2 Cache

Cortex-A15

SMP OS can be extended across multiple clusters of CPUs

Expands the number of CPUs available to scheduler

22

Any Questions?