arm technical symposium developing software for multicore arm systems
DESCRIPTION
arm processorTRANSCRIPT
2
Agenda
What do we mean by multicore?
Benefiting from SMP
OS support
Applications
Existing software
Impact on other ARM technologies
3
What do we Mean by Multicore?
Heterogeneous multicore systems have existed for a longtime:
Cortex™-A8Mali™-400
MP Cortex-M3
Interconnect
Power ManagerApplicationUser Interface
and 3D graphics
Memory
4
Coherent Multicore Cluster
Cortex family of MPCore™ processors:
Cortex-A5 MPCore, Cortex-A9 MPCore and Cortex-A15 MPCore
Cortex-R5 MPCore, Cortex-R7 MPCore
Cortex-A9 Cortex-A9…
Coherency Logic
Power ManagerUser Interface
and 3D graphics
Mali-400 MP Cortex-M3
Interconnect
Homogenous multicore cluster, as part of a heterogeneoussystem:
5
Coherency
ARM MPCore processors provide:
Cortex-A series
Coherency Logic
D$I$
Cortex-A series
D$I$
AXI connection tomemory system
ACP Hardware maintained
coherency between L1 datacaches
Broadcast of cache and TLBmaintenance operations
Inter-processor interruptsignalling using integratedinterrupt controller
Coherency with externalun-cached masters using ACP
6
Agenda
What do we mean by multicore?
Benefiting from SMP
OS support
Applications
Existing software
Impact on other ARM technologies
7
SMP OS
A symmetric multi-processing(SMP) OS runs across multipleCPUs
Each CPU sees the same memorysystem
Task can be scheduled to any CPU
A multi-threaded task may run onseveral CPUs at once
The OS can hide much of thecomplexity from applications
Many widely used OS alreadysupport SMP on ARM
CPUCPU
Application
SMP OS
Thread Thread
MPCore Cluster
8
Multiple OS
It is also possible to run multiple different operating systems
For example, SMP Linux in parallel with a RTOS
CPUCPU
Application
SMP OS
MPCore Cluster
CPU
RTOS
Task Thread Thread
The two operating systems perform different, but related tasks
User interface on SMP Linux, Modem stack on the RTOS
9
Applications
Applications can be written to take advantage of multicoreenvironment
Work is split across multiple independent threads, which the OS canschedule to different CPUs
Functional blocks are serially dependent but temporallyindependent
Each block runs as a separate thread, running on different CPUs
Analogue
VideoSampling
Remove
Inter-Frame
Redundancy
QuantiseSamples
Run-Length
Compress
Buffer
Store
Motion
Compensation
(Simplified MPEG encoding functional block diagram)
CPU0 CPU1
CPU2
CPU3
MPEG
10
Applications (cont)
Other examples of splitting across threads:
Image Processing
Application
Thread
Thread
Thread
Thread
Thread
Application
Application
Single frame dividedinto multiple regions
Each region handled bya different thread
Spawn a new threadper frame
For example, rapid shotmode on a camera
11
Barriers and Synchronization
The tasks being run on different cores will requiresynchronization
CPU 0
Load application intomemory
Signal CPU1 that newapplication can be run
CPU 1
Wait for signal from CPU0
Run application
MPCore cluster
Synchronization code will normally include manual barriers
Data Synchronization Barrier (DSB) or Data Memory Barrier (DMB)
12
Barriers in Action
STR r11, [r1] ; Save instruction to program memory
DCCMVAU r1 ; clean D-$ so instruction visible to I-$
DSB ; ensure clean completes on all CPUs
ICIMVAU r1 ; discard stale data from I-$ …
BPIMVA r1 ; … and from Branch Predictor
DSB ; ensure I-$/BP invalidates complete for allSTR r0, [r2] ; set flag == 1 to signal completion
ISB ; synchronize context on this processor
MOV pc, r1 ; branch to new code
P1-Pn
WAIT ([r2] == 1) ; wait for flag signaling completion
; no barrier required here
ISB
MOV pc, r1 ; execute newly saved instruction
13
Single Threaded Tasks and SMP
Existing software may not be optimized for a multicoreenvironment
Single threaded applications
The SMP OS can schedule different applications to differentCPUs
Complexity of multicore environment hidden from the application bythe OS
ThreadBrowser
Video Player
E-mail client Thread
SMP OS Scheduler
CPU 0 CPU 1 CPU<n>
14
Case Study
Single threaded browsersaw a 1.54x performanceimprovement when run on adual-core system
No code changes required inthe browser
Improvement comes fromthe OS ability to schedulethe non-browser tasks to theother CPU
Dual core Cortex-A9 MPCore, running Android‘Froyo’, 2.6.32 kernel, BBench2010_server
1.54x
Available Compute Profile
1 Core 2 Core
Core 2
Off
Idle
0
1
2
1 Core 2 Cores
Browser Performance
15
Case Study (cont.)
Running on a single core, the browser performance fell (0.78x)when also listening to streaming audio
But browser performance increased when run on a dual-core
Could also choose to maintain performance (1.0x), and lowerfrequency
0
1
2
1 Core (Browser only) 1 Core (Browser &Web Radio)
2 Cores (Browser andWeb Radio)
2 Cores (Browseronly)
Browser Performance with Streaming Web Radio Application
1.5x
0.78x
1.54x
16
Agenda
What do we mean by multicore?
Benefiting from SMP
OS support
Applications
Existing software
Impact on other ARM technologies
17
Introducing TrustZone® Technology
What about multicore?
Operating System
TrustZone Driver
Vendor Specific Library
Application
TEE
Trusted Service(s)
Secure Monitor
Normal Secure
Privileged
User
18
TrustZone in a Multicore System
Architecture allows for a full SMP OS in the Secure world
Design aim for TEE is simplicity, this aids certification
SMP support is normally not needed, and represents unnecessarycomplexity
TEE executes on one processor only
TEE
Application
SMP OS
CPU CPU CPU CPU
Application Application Application
Normal
Secure
19
Dedicated CPU
Alternative model is to dedicate one CPU to TrustZone
Only justified if making heavy use of trusted services
TEE
Application(s)
SMP OS
CPU CPU CPU CPU
20
Large Physical Address
The Cortex-A15 processor introduces support for the LargePhysical Address Extension (LPAE)
Extension to ARMv7-A VMSA
Provides 32-bit virtual address mapped to a greater than 32-bitphysical address
Each application is limited to a 4GB virtual address space
But OS has potentially more than 4GB of memory to work with
Becomes more important as the number of current processesincreases
And the amount of memory consumed by those processesincreases
21
Multiple Clusters
The Cortex-A15 MPCore processor, together with AMBA®4ACE™, supports multiple coherent clusters
Cortex-A15…
Coherency Logic in L2 Cache
Coherent Interconnect
Cortex-A15 Cortex-A15…
Coherency Logic in L2 Cache
Cortex-A15
SMP OS can be extended across multiple clusters of CPUs
Expands the number of CPUs available to scheduler