multicore 101: migrating embedded apps to multicore with linux

50
Multicore 101: Migrating Embedded Applications to a Multicore Environment with Linux Presented by MontaVista Software and Freescale Semiconductor Ian Forsyth Senior Enablement Architect Freescale Semiconductor Brad Dixon Director of Product Management MontaVista Software Attend Vision for more in-depth multicore sessions www.mvista.com/Vision

Upload: brad-dixon

Post on 14-Dec-2014

348 views

Category:

Technology


1 download

DESCRIPTION

Joint presentation with Ian Forsyth of Freescale Semiconductor (2008)

TRANSCRIPT

Page 1: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Multicore 101: Migrating Embedded Applications to a Multicore Environment with Linux

Presented by MontaVista Software and Freescale Semiconductor

Ian Forsyth Senior Enablement Architect

Freescale Semiconductor

Brad Dixon Director of Product Management

MontaVista Software

Attend Vision for more in-depth multicore sessions www.mvista.com/Vision

Page 2: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►The Challenge In Migrating Applications•

The “Net Effect”•

Changing networking topology•

The multicore challenge►Proposed Multicore Solutions

Combined hardware/software•

Virtualization and hypervisor►The Pathway to Migrating Your Applications

Contain –

Exploit –

Analyze –

Optimize•

Use the right tools►Learn more and evaluate multicore solutions

Evaluate MontaVista TestDrive: Freescale + MontaVista Linux

Agenda

Multicore 101

Page 3: Multicore 101: Migrating Embedded Apps to Multicore with Linux

The “Net Effect”

NetworkAdmission

Control

Service ProviderRouters

Storage Networks

Unified ThreatManagement

IMS Controller

Integrated ServicesRouters

Access Point Aggregation

Serving Node Router (GSN)

Metro Carrier Edge Router

Access Gateway

TelePresence

Wireless

IP Services

Enterprise

Converged Networking

SSL, IPSec, Firewall

Networking trends drive the need for more performance

Multicore 101

Page 4: Multicore 101: Migrating Embedded Apps to Multicore with Linux

The Changing Networking Topology

►Layer 4-7 (Application) processing in the network is now common

► Increasing Integration in datacom

deployments

►Both driving higher computational capabilities from hardware vendors

Multicore 101

Page 5: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Why Multicore in Embedded Networks?

►Demand for differentiating features

►Advance services are implemented in software running on general purpose CPUs

►Frequency scaling of CPU cores no longer valid, primarily due to power

►Multicore processors viewed as most viable approach Performance Requirement

1xCPU

Device Hot-spotPower Limit

Pow

er

nxCPU

Multicore 101

Page 6: Multicore 101: Migrating Embedded Apps to Multicore with Linux

The Multicore Challenge – It’s All About the Software

► Multicore silicon devices have raced ahead of the embedded software market’s ability to support them

► Millions of lines of single-threaded legacy code will need to be written in a parallel fashion in order to utilize multicore devices

► Creates a paradigm shift in how developers must think about and implement future programs

► No automated or “quick-fix”

approaches for this software migration and paradigm shift –

significant programmer effort is required

► Tools and support –

simulators, compilers, OS, virtualization packages, performance profilers, debuggers, example applications and training will all be key to the widespread adoption of multicore solutions

Power Architecture™Core

D-Cache I-Cache

L2 Cache

Power Architecture™Core

D-Cache I-Cache

L2 Cache

Single-threaded Legacy Software

Power Architecture™Core

D-Cache I-Cache

L2 Cache

Power Architecture™Core

D-Cache I-Cache

L2 Cache

Power Architecture™Core

D-Cache I-Cache

L2 Cache

Multicore Software

Multicore 101

Page 7: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Multicore Tools and Solutions

Market-specific multicore stacks, apps, libraries. Support green field.

Software Pyramid

Support for standard and OS-dependent programming models, often leveraging multiprocessor.

Base multicore infrastructure: Operating System, boot standards.

First-rate tools: debuggers, performance and trace analyzers, simulators, compilers.

SMP/AMP OS’sAdvance DebugLibraries

Early Code Partitioning Hardware & Software Hypervisor

Stacks N/W Accel

Multicore 101

Page 8: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Hypervisor

Optimized High-Speed Drivers

Applications

Freescale QorIQ™ SiliconPerformance Model

QorIQ™ Solution Platforms

Simulation to Hardware: Same Software

Freescale-supplied

Functional Model APIID

E (compiler / debugger / build tools)

Simics

Virtualized Development

Environment

Hypervisor

Optimized High-Speed Drivers

Applications

Multicore 101

Page 9: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Hybrid Functional/Performance Simulator

API

Functional Model Performance Model

CPU

RAM

I/O

HardwareAcceleration

ROM

BusCPU

Ethernet

CPU

CPU

I/O

CPU

Ethernet

CPU

HardwareAcceleration

Simulated Time

Functional Mode Simulation -

High Speed

Functional Mode Simulation

PeriodicCheckpoints

Performance Mode Simulation

Multicore 101

Page 10: Multicore 101: Migrating Embedded Apps to Multicore with Linux

A Hybrid Model:

Functional

Performance

Virtualization for Reduced Cycle Time

Single Simulation

Environment

Core

SOC

Boards

Systems

MPC8360/MPC8641DMPC8548/MPC8572Multicore Platform/ …

e200, e300 e500, e600, …

Freescale with Virtutech and MontaVista provide a multicore development platform that accelerates software development before and after silicon availability

Provides programmer's view of the SoC

Deterministic

Non-invasive

Control of time

Systematic control of validation and error

Control of cores

Control of configuration

Force and detect race conditions

Optimized solutions

Products andSystems

Multicore 101

Page 11: Multicore 101: Migrating Embedded Apps to Multicore with Linux

MPC8641/40D Dual Core Block Diagram► Dual e600 PowerPC cores @

1.25/1.0 GHz•

1MB L2 Cache w/ECC per core•

36-bit physical addressing

► System Unit•

64b DDR/DDR2 w/ECC•

4x 10/100/1000 Ethernet Controllers

► High-speed Interfaces•

1x/4x SRIO (2.5GB/s) and x1/x2/x4/x8 PCI-Express (4GB/s)

OR two x1/x2/x4/x8 PCI-Express (8GB/s)

► Pin and Software compatible to MC8641D

► Max Power (Watts)•

31.0 W @ 1.25 GHz•

21.0 W @ 1.00 GHz

► Production Availability•

0 to 105C – Now•

-40 to 105C – Q408

► MontaVista commercial support•

Professional Edition 5.0•

Carrier Grade Edition 5.0

Multicore 101

Page 12: Multicore 101: Migrating Embedded Apps to Multicore with Linux

QorIQ™ P4080 Multicore

Features• Eight e500mc cores

• CoreNet™ scales to 32 cores• PCI Express®

2.0, 10GbE• PME 2.0, SEC 4.0• Data path acceleration• Trust/secure boot• Hypervisor• Standardized debug•

Virtualization with real applications

• High-performance SoC• Advanced technology• Tier one partnerships• Outstanding ecosystem• MontaVista Linux support

Innovative Multicore Micro-architecture for unprecedented computing efficiency, performance and scalability.

On-chip coherency fabric•

Back-side cache per CPU core•

On-demand application acceleration

Multicore Simulation Environment for accurate, fast code development and debugging.

Fully tap the capabilities of the multicore platform•

Debug software not hardware•

Dynamic, real-time debug with non-intrusive capture

45-nm Process Technology for industry-leading power-to-performance solution.

Provides highest instructions-per-cycle (IPC) and frequency for given Milliwatt/area

It’s a smarter approach to multicore. Freescale’s Multicore Platform

Multicore 101

Page 13: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Datapath Acceleration Architecture

Congestion Mgmt

Parse

Classify

SteerPolicing

Stash Context Enqueue

Manage Work Q

QMan BMan

FMan

QorIQ™ P4 Platform DPAA

Datapath

Acceleration Architecture simultaneously enables a lower complexity software environment as well as very high networking performance

Cores Accelerators

NetworkInterfaces

Multicore 101

Page 14: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Multicore Operating Systems► Wide variation of customer use-cases

Multiple operating systems utilized across cores on a single deviceProprietary, 3rd party and Open Source multicore operating systems

Symmetric Multi-Processing (SMP) and Asymmetric Multi-Processing (AMP), often running concurrently

Often no OS, or engineered light OS, used on forwarding/data plane cores

► Leverage Power Architecture™ technology’s 3rd party OS ecosystemFreescale embedded HypervisorFreescale boot standards, including u-boot Leverage open boot protocol and API standards (e.g. Power.org™)Freescale Light Weight Executive (LWE) for run to completion data plane processingDemonstrate performance and provide reference example for customers

Services

Light Weight Executive

Forwarding/ Data Plane Control Plane

MontaVista

Linux®

SMPAMPAMPPower

Architecture™Core

Power Architecture™

Core

Power Architecture™

Core

Power Architecture™

Core

Power Architecture™

Core

Power Architecture™

Core

Power Architecture™

Core

Power Architecture™

Core

MontaVista

Linux®MontaVista

Linux®

Multicore 101

Page 15: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Light Weight Executive Summary►The LWE provides a set of services and abstractions to an

application►Focus is on run-to-completion model

►Freescale provides example applications to demonstrate the use of the LWE

►The LWE helps Freescale customers and partners develop functionality using cores as highly optimized accelerators

Light Weight ExecutiveApplicationSoftware on other Cores–

e.g. running Linux®

interaction

Multicore 101

Page 16: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Hypervisor Contrasts

CPU CPU CPU

Freescale Hypervisor Implementation

Traditional Hypervisor Implementation

Requirement: isolation, performance

Implications: No more than one OS per core, OS has direct control of

high-speed peripherals

Requirement: solves problem of under-utilized CPUs, plus isolation

Implications: more than one OS per core, complexity, performance

implications

QorIQ™ P4080 hypervisor hardware assists in meeting both requirement sets

Guest OS

Guest OS

Guest OS

Guest OS

Multicore 101

Page 17: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Natural Virtualization via QorIQ™ P4080 Datapath

Network Interface

P4080 Datapath

portalportal

Cores can access the same network interface with no SW synchronization because cores have their own portals

►Datapath

decouples cores and peripherals–

allows N cores to share M peripherals

►Accessed by “Portals”

that are per-core►Allows direct and efficient access by cores to many high-speed

peripherals

Power Architecture™

Core

Power Architecture™

Core

Multicore 101

Page 18: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Solution

Hypervisor

Drivers

Applications

Freescale QorIQ™ Silicon

Example Apps

Stacks

IPC

High Level IPCL

W

E

Hypervisor

Drivers

Applications

Freescale QorIQ™ Silicon

Partition Mgmt.

Stacks

IPC

High Level IPC

MontaVista

Linux

Freescale

3rd Party and/or Customer

Solution = Freescale software + ecosystem software + customer software

Multicore 101

Page 19: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Market Analysis

Source: Embedded Systems Design Survey

“Developers overwhelmingly voted for the chip's software-

development tools as the most important thing when evaluating a new embedded processor.”

“The most valuable feature of a chip isn't even the chip itself.

Compilers and debuggers trump MIPS and megahertz.”

-

Jim Turley, ESD

Multicore 101

Page 20: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Migrating to Multicore: What is the pathway?

►Contain►Exploit►Analyze►Optimize

Multicore 101

Page 21: Multicore 101: Migrating Embedded Apps to Multicore with Linux

ContainmentGoal: Migrate application codebase to multicore

platform without disruption

►Risk –

concurrent execution will expose latent race conditions and synchronization issues

►Technique –

utilize Linux's

processor and interrupt affinity APIs to contain your application's threads and processes to a single core

Multicore 101

Page 22: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Containment

You

r A

pp

Housekeeping Utilities

Multicore 101

Page 23: Multicore 101: Migrating Embedded Apps to Multicore with Linux

ContainmentHousekeeping

Utilities

You

r A

pp

Housekeeping Utilities

You

r A

pp

Multicore 101

Page 24: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Benefits:►Delay exposing latent concurrency defects►Easily gain an efficiency boost by exploiting available cores► I/D/L2 cache efficiency by minimizing scheduler bounces

ContainmentHousekeeping

Utilities

You

r A

pp

Housekeeping Utilities

You

r A

pp

Multicore 101

Page 25: Multicore 101: Migrating Embedded Apps to Multicore with Linux

The designer can explicitly control which CPUs are permitted to handle particular threads and interrupts

Migration with Containment

Shown on Freescale 8641D multicore processor

Multicore 101

Page 26: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►Why SMP?►Linux's

long march to multicore

►On virtualization

A Quick Sidebar…

Multicore 101

Page 27: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►Multicore CPU's can permit a number of processing scenarios

►SMP maximizes run-time flexibility to match CPU to the needs of the moment

►SMP ends up playing a role in many system architectures►Combined with a hypervisor SMP does not exclude any other

design options

Why SMP?

Multicore 101

Page 28: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Linux’s Long March to Multicore

►Linux has been MC ready for years►Kernel, drivers, protocol stacks, and

apps are ready►As core count scales the focus shifts

to exploiting MC at the application layer

Multicore 101

Page 29: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►Difficulties applying virtualization to telecom/datacom•

The isolation vs. latency trade-off•

Hardware contention•

I/O devices►Hardware support minimizes virtualization overhead

On Virtualization…

Multicore 101

Page 30: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►SMP is the natural way for Linux to exploit multicore processors.►Hypervisors can permit new flexibilities►New hardware features are making hypervisor based

architectures more efficient to use

Sidebar Summary

Multicore 101

Page 31: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►Contain•

Migrate to multicore but contain code to a single core

►Exploit►Analyze►Optimize

Migrating to Multicore: What is the Pathway?

Multicore 101

Page 32: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Goal: Identify code that will benefit from multicore execution and modify code to exploit available cores

Exploit

Multicore 101

Page 33: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Objective: scale efficiently across multiple cores so that more client work can be handled rapidly

Key question is how to map client requests (or packets) to workers quickly and obtain speed-up from multicore

Application Architectures to Exploit MC

Multicore 101

Page 34: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►Each request requires a small amount of work

►Requests are largely independent of each other

►Requires read-only access to a moderate amount of state

►Small amount of state may travel with the request

►Must be able to manage overload effectively

Application Characteristics

Multicore 101

Page 35: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►Each request requires a small amount of work

►Requests are largely independent of each other

►Requires read-only access to a moderate amount of state

►Small amount of state may travel with the request

►Must be able to manage overload effectively

►Some anti-patterns•

Non-concurrent•

Process/Thread per client•

Spawn process/thread per request

HPC message passing such as MPI

Application Characteristics

Multicore 101

Page 36: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►Each request requires a small amount of work

►Requests are largely independent of each other

►Requires read-only access to a moderate amount of state

►Small amount of state may travel with the request

►Must be able to manage overload effectively

►Some anti-patterns•

Non-concurrent•

Process/Thread per client•

Spawn process/thread per request

HPC message passing such as MPI

For telecom/datacom applications an event driven architecture is ideal to facilitate multicore migration

Application Characteristics

Multicore 101

Page 37: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Similar to that used by memcached

& Apache►Dispatcher can handle overload, monitoring, etc.►Multicore awareness only for central services►Plugable

Dispatcher is feasible if planned correctly►Managing global, per service, per session, and per request state

is the battleground for scalability

Sample Application Architecture

Multicore 101

Page 38: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►Contain•

Migrate to multicore but contain code to a single core

►Exploit•

Use an event driven architecture to add explicit functional parallelism

►Analyze►Optimize

Migrating to Multicore: What is the Pathway?

Multicore 101

Page 39: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Goal: Understand MC performance bottlenecks and diagnose unexpected faults

Benchmark first... the bottlenecks may not be where you think they are

Analyze

Multicore 101

Page 40: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Profiling•

Can be used for far more than CPU cycles per function or line•

e500mc core has a rich set of performance attributes it can monitor

MontaVista DevRocket can use oprofile

to collect and correlate this data to your code

Runtime Monitoring•

“top”

in SMP mode will give you a broad overview of CPU stats

Tracing•

Fine grained CPU-aware tracing

Analysis Tools

Multicore 101

Page 41: Multicore 101: Migrating Embedded Apps to Multicore with Linux

MontaVista DevRocket Analysis Tools

Multicore 101

Page 42: Multicore 101: Migrating Embedded Apps to Multicore with Linux

MontaVista DevRocket Analysis Tools

Multicore 101

Page 43: Multicore 101: Migrating Embedded Apps to Multicore with Linux

MontaVista DevRocket Analysis Tools

Multicore 101

Page 44: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Per process & thread information•

Time in nanoseconds•

Time consumed since process start.

See: /proc/<PID>/tasks/<TID>/msa

for per-thread information

# cat /proc/1845/msa

State: Interruptible

Now: 2287392468035

ONCPU_USER 1473381312

ONCPU_SYS 3110032766

INTERRUPTIBLE 1183737626438

UNINTERRUPTIBLE 1011435

INTERRUPTED 546291

ACTIVEQUEUE 2217218048

EXPIREDQUEUE 0

STOPPED 0

ZOMBIE 0

SLP_POLL 0

SLP_PAGING 0

SLP_FUTEX 0

CGE5 Only: Microstate Accounting

Multicore 101

Page 45: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Debug process, thread, and kernel context

Debug “Multi-Anything

DevRocket IDE

Multicore 101

Page 46: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►Contain•

Migrate to multicore but contain code to a single core

►Exploit•

Use an event driven architecture to add explicit functional parallelism

►Analyze•

Use available profiling, tracing, and performance monitoring tools and APIs

►Optimize

Migrating to Multicore: What is the Pathway?

Multicore 101

Page 47: Multicore 101: Migrating Embedded Apps to Multicore with Linux

Goal: Get the most from the available MC performance

►Focus attention on areas where Amdahl's law indicates the most benefit can occur!

►Leverage data parallelization for CPU bound computations

►Utilize interrupt and process/thread affinity to tune the system

Optimize

Multicore 101

Page 48: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►Contain•

Migrate to multicore but contain code to a single core

►Exploit•

Use an event driven architecture to add explicit functional parallelism

►Analyze•

Use available profiling, tracing, and performance monitoring tools and APIs

►Optimize•

Specialize cores as needed. Explore other MC optimizations

Migrating to Multicore: What is the Pathway?

Multicore 101

Page 49: Multicore 101: Migrating Embedded Apps to Multicore with Linux

►Carrier Grade Edition 4.0•

8572•

8641D, 8640D►Carrier Grade Edition 5.0

8641D, 8640D

►Professional Edition 4.0•

8641D, 8640D►Professional Edition 5.0

8572•

8641D, 8640D

Freescale P4080 operating today on the Virtutech Simics simulator in advance of hardware availability

MontaVista offers comprehensive support of Freescale Power Architecture processors today

MontaVista Support for Freescale Multicore

Multicore 101

Page 50: Multicore 101: Migrating Embedded Apps to Multicore with Linux

October 1-3, 2008 San Francisco, CA Where embedded Linux gets real

Two Ways to Learn More About Multicore

MontaVista TestDrive Evaluate Freescale multicore and MontaVista Linux for free, visit:

www.mvista.com/freescale/eval

Multicore 101

MontaVista Vision For more information on in-depth multicore sessions, visit:

www.mvista.com/vision