1 presented by: jeff schaffer sr. field applications engineer qnx software systems...

75
1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems [email protected] 818-227-5105 “Embedded Operating Systems: The State of the Art” QNX is a leading provider of real time operating system (RTOS) software, development tools, and services for mission critical embedded

Upload: mason-mccall

Post on 27-Mar-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

1

Presented by:

Jeff SchafferSr. Field Applications EngineerQNX Software [email protected]

“Embedded Operating Systems:

The State of the Art”

QNX is a leading provider of real time operating system (RTOS) software, development tools, and services for mission critical embedded applications.

Page 2: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

2

Role of the Embedded OS

Traditional

– Permit sharing of common resources of the computer (disks, printers, CPU)

– Provide low-level control of I/O devices that may be complex, time dependent, and non-portable

– Provide device-independent abstractions (e.g. files, filenames, directories)

Additional Roles

– Prevent common causes of system failure and instability; minimize impact when they occur

– Extend system life cycles

– Isolate problems during development and at runtime

Page 3: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

3

Architecture Comparison

REAL TIME EXECUTIVEAdvantage: single address spaceDisadvantage: single address space,

different binary imagesFailure: means reboot

MONOLITHIC KERNELAdvantage: apps run in own memory spaceDisadvantage: kernel not protected,

kernel testingFailure: might mean reboot

TRUE MICROKERNELAdvantageModules run in own memory spaceAdd/replace services on the flyReusable modulesDirect hardware accessDisadvantage: context switchingFailure: usually does not mean reboot

Page 4: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

4

MicrokernelX86, PPC, MIPS, SH4,

ARM, StrongARM, XScale

App

PhotonGUI

Flashfsys Audio

driver

TCP/IP

Serialdriver Http

serverJava

ProcessManager

• Dynamic architecture makes hot-start and upgrades easy, even with drivers

• Philosophy: a trusted kernel running a system of untrusted software components

• Processes provide a reusable component model with well defined message interfaces

• Processes communicate via messages or other methods, such as shared memory. Permits loose inter-module coupling.

• No requirement for filesystem, GUI, etc.

MicroKernel – Neutrino

Page 5: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

5

Process 1 Process 2

Pipes

Process address

mapShared memoryobject

map

Process address

map

mapSharedMemory

msg 5msg 2msg 3msg 4Process 1 Process 2MessageQueues

Typical Forms of IPC

Mailboxes

Kernel

Page 6: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

6

Which Architecture for me?

Depends on your application and processor! Simple apps (such as single control loops) generally

only need a real-time executive As system becomes more complex, typically need a

more complex operating system architecture Need to look at factors such as scalability and

reliability Do standards matter?

Page 7: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

API’sTwo most common standards

Advantages of standardsPortability of code

Hiring of programmers

Page 8: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

8

Less than 1 second response?

Less than 1 millisecond response?

Less than 1 microsecond response?

Do I need Real-Time?

What is Real Time?

Maybe ...

Page 9: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

9

Real-Time

"A real-time system is one in which the correctness of the computations not only

depends upon the logical correctness of the computation but also upon the time at which

the result is produced. If the timing constraints of the system are not met, system

failure is said to have occurred."

Donald Gillies (comp.realtime FAQ)

Page 10: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

10

A Simple Example...

“it doesn’t do you any good if the signal that cuts fuel to the jet engine arrives a millisecond after the engine

has exploded”

Bill O. Gallmeister - POSIX.4 Programming for the Real World

Page 11: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

11

ATM

“Hard” vs. “Soft” Real Time

Hard– absolute deadlines– late responses cannot be tolerated and may have a

catastrophic effect on the system– example: flight control

Soft– systems which have reduced constraints on "lateness”;

e.g. late responses may still have some value– still must operate very quickly and repeatably– example: cardiac pacemaker

Page 12: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

12

Real-time OS Requirements

Operating system factors that permit real-time:– Thread Scheduling– Control of Priority Inversion– Time Spent in Kernel– Interrupt Processing

Page 13: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

13

Factor #1: Scheduling

Non real-time scheduling– round-robin– FIFO– adaptive

Real-time scheduling– priority based– sporadic

Page 14: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

14

Sequence:1. Low priority task acquires bus mutex to transfer data2. High priority task blocks until mutex released3. Medium priority task pre-empts low priority task4. Watchdog timer resets since Bus Manager has not run in some time

Factor #2: Priority Inversion

Source: Embedded Systems Programming

Information Bus Manager

Meteorological Data Gathering Task

Communications Task

Page 15: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

15

Factor #3: Kernel Time

Kernel operations must be pre-emptible– if they are not, an unknown amount of time can

be spent in the kernel performing an operation on behalf of a user process

– can cause real-time process to miss deadline All kernels have some window (or multiple windows)

of time where pre-emption cannot occur Some operating systems attempt to provide real-

time capability by adding “checkpoints” within the kernel so they can be interrupted at these points

Page 16: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

16

int KER

iret

Entry a few opcodes Interrupts off

Unlocked

KernelOperation

whichmay

includemessage

pass

usecstomsecs

Pre-emptable

Exit a few opcodes Interrupts off

Locked usecsNo pre-emptionInterrupts on

Unlocked usecs Pre-emptable

A Kernel call is asoftware interrupt

Example

Page 17: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

Split Out Long OperationsSplit Out Long Operations

ProcessManager

Thread

Sync

Message

Sched

Signal

Channel

ClockTimer

Intr

Fork

Exec

Pathname

Spawn

Mmap

Waitpid

SessionUID/GID

Debug

Nto Proc

Page 18: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

18

Factor #4: Interrupts

This is broken down into the following areas: Method of handling the interrupt processing chain Handling of Nested Interrupts

Page 19: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

19

Interrupt Processing Chain

ISR

INT x

ISR

INT y

IST IST

IST scheduled whenever queue emptied, non-deterministic

ISR

INT x

ISR

INT y

IST IST

IST scheduled by normal OS scheduling,

deterministic

Page 20: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

20

Conventional OS

Real-time kernel

Problems– different API’s– real-time layer proprietary– existing OS apps not R/T– poor communication

between operating systems– loss of control issue

Can I Make Any Conventional OS Real-Time

Method– Add real-time layer below

conventional OS, running conventional OS as a low priority real-time process

– Add real-time layer to hardware service layer

Page 21: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

21

Title of presentationTitle 2

Scalability

Page 22: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

22

Scaling Solution #1:Single Board, Single Node

CPU

Bridge Mem.

Bus PCI

Peripherals

The only scaling possible is a CPU replacement

Page 23: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

23

Scaling Solution #2:Single Board, Multiple Nodes

Relatively simple to implementAllows “scaling-on-demand”Suitable if nodes have independent

“work”

Inter-node IPC slower than memory accessComplexity in maintaining global view of dataDifficult to break-up computationally-intensive

tasks

CPU

Bridge Mem.

Bus PCI

Peripherals

CPU

Bridge

Bus PCI

Peripherals

Node 1

Node 2

Page 24: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

24

Scaling Solution #3:Single Board, Multiple Processors

CPU0

Bridge Mem.

Bus

PCI

PeripheralsCPU1

Tightly-coupled symmetric multiprocessing (SMP) All processors have a symmetric and consistent view

of physical memory and peripherals Scales processing power Need software (RTOS) support

Page 25: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

25

The SMP OS Dilemma

SMP systems to date use desktop operating systems; not responsive enough for real-time requirements

• Application servers• Databases• Web servers

Typical real-time operating systems (home-built or commercial), such as are commonly used in routers and switches today, do not have SMP support

SMP capable real-time operating systems run the CPU’s as independent processors with independent operating systems

Page 26: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

26

SMP Support

True (tightly coupled) SMP support

Only the kernel needs SMP awareness

Transparent to application software and drivers - identical binaries for UP and SMP systems

Automatic scheduling across all CPU’s

Page 27: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

27

Thread

Running

CPU 0Process

CPU 1

Thread

Process

Ready queues

63Priority

6261...0

Thread Thread

Thread

Blocked states Thread Thread

QNX “True” SMP

STATE_RUNNING thread on each processor

Priority-based ready queues

Each thread can be locked to a specific CPU by using a processor affinity mask

Scheduler remembers last CPU thread ran on

– Minimize thread migration– Optimize cache usage

Highest-priority READY thread always immediately scheduled

Page 28: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

28

Why Is Cache Important?

Cache efficiency is probably the single largest determinant of performance on SMP

Coherent view of physical memory is maintained using cache snooping

Cache snooping is done at the CPU bus level and so operates at lower speeds than core

Coherency is “invisible” to software

Page 29: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

29

Performance Implications

Snoop traffic expected on SMP Cache hits generally cause no bus transaction Multiple processors writing to same location

degrades performance (ping-pong effect) Performance degrades when large amount of data

modified on one processor and read on the other Sometimes it is better to have specific threads in a

process run on same CPU

Page 30: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

30

Designing for SMP:One Big task

Single thread

Giant App

• Will not work with SMP

Page 31: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

31

Designing for SMP:Single Threaded Tasks

App 1

Single thread

App 2

Single thread

• Works with SMP• Process data can be shared with shared memory

• Good concurrency, some complexity

• IPC not usually as efficient as memory sharing

Page 32: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

32

Designing for SMP:Scaling Software with Threads

Threads

Server

• Single copy server• All process data is implicitly shared and accessible

• Can achieve good concurrency with less complexity

• POSIX synchronization used• Mutexes• Semaphores• Condition variables• Usually more efficient than

inter-process synchronization

Note: SMP finds concurrency problems fast!

Page 33: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

33

Optimizing Compute-intensive Applications

Main thread

Threads

Application

Worker thread

Worker thread

Pool of worker threads Dispatch “work” to worker

threads Scales very well with SMP The tricky part is “breaking

up” the problem

Page 34: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

34

CPU 0CPU 0 CPU 1CPU 1

IRQ 7IRQ 7

IRQ 8IRQ 8 IRQ 9IRQ 9

IRQ 10IRQ 10

IRQ CPU7 08 19 110 1

ISRISR

IST

Interrupt processed on CPU that was targeted

Can distribute load by handling interrupts on different processors

Sometimes not the optimal strategy due to cache effects

Interrupt Handling

Page 35: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

35

Scaling Solution #4:Multiple Processors/Nodes

CPU0

Bridge Mem.

Bus

PCI

PeripheralsCPU1

CPU0

Bridge

Bus

PCI

PeripheralsCPU1

Node 2

Node 1

Page 36: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

36

Network

Network

Chassis

Network

Network

Network

Network

...

Hig

h-s

pe

ed

inte

rco

nn

ect

Lo

w-s

pee

d b

us

Line card

Line card

Example

Page 37: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

QNET

Messages flow transparently through QNET from one message bus to another.

LAN orInternet orBackplane

QNET

MicrokernelApp

All applications and servers become network distributed without any special code.

FlashFsys CDROM

Fsys

TCP/IP

AudioPhotonApp

ProcessManager

The QNET MicroNetwork

Page 38: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

38

LineLinecardcard

LineLinecardcard

ControlControlcardcard

QNX Qnet Manager

Extends message passing across multiple QNX microkernels

Over anything with a packet driver:

– Ethernet, RapidIO, 3GIO, InfiniBand, Stargen, etc.

Class of service Use symbolic prefixes to make

client code independent of location of resource manager

Page 39: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

39

Linecard

Controlcard

Linecard

One or multiple links can connect different nodes.

QNET Class of Service

Page 40: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

40

Data is sent out the link which will deliver it the fastest. This is based upon link speed and queue length for each link.

Linecard

Controlcard

Linecard

QNET: Load-Balanced Distribution

Page 41: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

41

Data is sent out a primary link. If it fails, data is diverted to a secondary link. The primary link is probed and when it comes back online, data is diverted back to it.

Linecard

Controlcard

Linecard

QNET: Ordered Distribution

Page 42: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

42

Data is sent out both links at the same time. A failure on either of the links is handled gracefully.

Linecard

Controlcard

Linecard

QNET: Parallel Distribution

Page 43: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

43

Designing for Networked SMP:Single/Multi Threaded Tasks

App 1

Multiple threads

App 2

Single thread

• Different processes necessary for different nodes

• Works with SMP• Process data can be shared with shared memory

• IPC for networked communication

Page 44: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

44

Client /service

Client Node

A

B

/net/a/dev/service

/net/b/dev/service

• Simple link provides transparent redirection• Process has to monitor status of link• Switch over is not transparent to client

Transparent Redirection

Page 45: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

45

Client

Client Node

A

B

/net/a/dev/service

/net/b/dev/service

Servicemgr

• Service manager acts as a proxy• Monitors health of and/or load on services/nodes• Switch over is transparent to client

/dev/service

Transparent Redirection

Page 46: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

46

Client

Client Node

A

B

/net/a/dev/service

/net/b/dev/service

Servicemgr

/dev/service

• Requests serviced redundantly • First/majority/best result• Different implementations

Redundant Links

Page 47: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

FLASHFSYS TCP/IP

App App

BlueTooth

Qnet

MO

ST

BU

S

FLASHFSYS Graphics

Browser Audio

Photon

Qnet

CDROMFSYS

Graphics

Browser Audio

Photon

Qnet

Page 48: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

FLASHFSYS TCP/IP

App App

BlueTooth

FLASHFSYS Graphics

Browser Audio

Photon

Qnet

CDROMFSYS

Graphics

Qnet

Qnet

MO

ST

BU

S

Browser

Page 49: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

49

Title of presentationTitle 2

Reliability and Availability

Page 50: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

50

Why?

Embedded systems are different! Failure in an embedded system can have severe

effects - like death …

“Pilots really hate to be told they have

to reboot their plane while in flight”Walter Shawlee

Page 51: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

51

Definitions

MTBF: Mean Time Between Failure– The average number of hours between failures for a

large number of components over a long time. (e.g. MIL-HDBK-217)

MTTR: Mean Time To Repair– Total amount of time spent performing all corrective

maintenance repairs divided by the number of repairs

MTBI: Mean Time Between Interruptions.– The average number of hours between failures while

a redundant component is down.

Page 52: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

52

Defining HA

Quantified by failure rate (MTBF) Time to resume service after failure is MTTRReliability

Allows for failure, with quick service restoration. As MTTR 0, Availability 100%Availability

< 5 minutes downtime / year (> 99.999% uptime)Assume faults exist: design to contain, notify, recover and restore rapidly

5 Nines

Page 53: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

53

$68,372,928

$6,837,293$683,729 $68,373

99% 99.9% 99.99% 99.999%

an

nu

al l

oss

es

annual availability

Source: Gartner Group ($13,000/minute Cross-industry Average)

Annual Cost of Downtimeversus Availability

Costs speak for themselves

Page 54: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

54

Availability via Reliability and Repair

low MTTR -> high availability– System is composed of reliable components, that

are protected from each other, and that communicate ONLY through well known interfaces.

this leads to– fault isolation– speedy recovery– reset a component not a board/system– dynamic control

• stop/start• upgrade

Page 55: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

55

Software vs Hardware HA

Hardware HA– utilizes redundancy of key components

• a single fault cannot cause all redundant components to fail (No SPOF). e.g. mirrored disks, multiple system boards, I/O cards

– Active/active, active/spare, active/standby

Software is a Significant Cause of Downtime

But that’s only part of the problem!!!

Page 56: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

56

Comparison

Software Fault40%

Planned Outage

30%

Operator Error15%

Environment5%

Hardware10%

Page 57: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

57

High Level Look at a Core Router/Switch

One or more control elements

OC

LD

(1W

)

OC

LD

(2W

)

OC

LD

(3W

)

OC

LD

(4W

)

OC

I (1

A)

OC

I (1

B)

OC

I (2

A)

OC

I (2

B)

OC

M (

A)

OC

M (

B)

OC

I (3

A)

OC

I (3

B)

OC

I (4

A)

OC

I (4

B)

OC

LD

(4E

)

OC

LD

(3E

)

OC

LD

(2E

)

OC

LD

(1E

)

Sh

elf

Pro

cess

or

Fill

er

I

O

OFF

ON

I

O

OFF

ON

Maintenance Panel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Fiber Management Trough

Optical Multiplexer Tray (OMX)

Cooling Unit

Page 58: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

58

Handling Failures

OC

LD

(1W

)

OC

LD

(2W

)

OC

LD

(3W

)

OC

LD

(4W

)

OC

I (1

A)

OC

I (1

B)

OC

I (2

A)

OC

I (2

B)

OC

M (

A)

OC

M (

B)

OC

I (3

A)

OC

I (3

B)

OC

I (4

A)

OC

I (4

B)

OC

LD

(4E

)

OC

LD

(3E

)

OC

LD

(2E

)

OC

LD

(1E

)

Sh

elf

Pro

cess

or

Fill

er

I

O

OFF

ON

I

O

OFF

ON

Maintenance Panel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Fiber Management Trough

Optical Multiplexer Tray (OMX)

Cooling Unit

Isolate Fault to a Board

Switch to Backup

Page 59: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

59

OC

LD

(1W

)

OC

LD

(2W

)

OC

LD

(3W

)

OC

LD

(4W

)

OC

I (1

A)

OC

I (1

B)

OC

I (2

A)

OC

I (2

B)

OC

M (

A)

OC

M (

B)

OC

I (3

A)

OC

I (3

B)

OC

I (4

A)

OC

I (4

B)

OC

LD

(4E

)

OC

LD

(3E

)

OC

LD

(2E

)

OC

LD

(1E

)

Sh

elf

Pro

cess

or

Fill

er

I

O

OFF

ON

I

O

OFF

ON

Maintenance Panel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Fiber Management Trough

Optical Multiplexer Tray (OMX)

Cooling Unit

Route Manager

TCP/IP stack

SNMP Manager

Application

Application

Flash Drivers

Device Manager

NetworkManager

RTOS

Application

Hardware

Application

Isolate fault to a SW component

May not be in the Hardware

Page 60: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

60

Route Manager

TCP/IP stack

SNMP Manager

Application

Application

Flash Drivers

Device Manager

NetworkManager

RTOS

Application

Application

Faulty Software Component

• Isolate and contain• Repair (e.g. restart)• Notify• Diagnose• Upgrade

Ideal: Identify and Fix

Page 61: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

61

Component-level recovery rarely done

Lack of suitable protection and isolation Lack of modularity Tight component coupling Few dynamic capabilities

Software failures normally handled by: Hardware watchdogs Redundant boards

Page 62: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

62

Repair Time

Board Replacement Hours

Reboot Minutes

Failover to Standby Seconds

SW Component Restart 10’s Milliseconds

SW Failover Milliseconds

Page 63: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

63

TCP/IP

HA Managerrestartsservice

FLASHFSYS

DISKFSYS

Microkernel

TCP/IP

HAManagerATM

Process Memory Violation

Kernel notifies HA Manager

Dump file forpost-mortem

analysis

High Availability Manager

Page 64: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

64

Driver

HAM HAMGuardian

HAM CheckpointedState

Stack

App

CheckpointedState

HA Manager (HAM) monitors components, sends notification of component failure

Heart-beat services detect component hangs

Core file on crash can be created for debugging and analysis

Checkpointing permits recovering current state

Notification and Recovery

Page 65: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

65

• A second “shadow” server attaches to the same name

Recovery

Page 66: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

66

• A second “shadow” server attaches to the same name• If primary faults, new clients connect to shadow server• Old clients can re-connect to shadow server.

Recovery

Page 67: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

67

• Start a new “shadow” server

Recovery

Page 68: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

68

Serverv 1.0Client

/dev/service

/dev/service

Serverv 1.1

NewClient

Service Upgrades

New version of server attaches to same name

New clients connect to new server

Old server exits when all old clients have exited

Page 69: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

69

QNX Momentics Tools

Page 70: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

70

Design Goals

Tools needed to be easy to learn

Tools which could take advantage of QNX

Tools which could integrate tools from other vendors, company designed tools, and industry specific tools and have them work with our tools and each other

Tools needed to be customizable to the user or the company

Page 71: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

71

Windows, Solaris, QNX NeutrinoWindows, Solaris, QNX Neutrino

IDE Workbench(Eclipse framework)

IDE Workbench(Eclipse framework)

Sourcedebugger

Java codedeveloper

Targetinformation

System builder

Profiler

Photon app builder

Memoryanalysis

C/C++ codedeveloper

Targetagent

Targetagent

PhotonmicroGUIPhoton

microGUI

Flashfsys

Flashfsys TCP/IPTCP/IP

HttpserverHttp

serverJavaJava

Ethernet, Serial,JTAG, ROMulator

Microkernel

Command-line

tools

BSPs

DDKs

Neutrinoruntime

3rd-PartyTools

Virtio

Invoke command-line tools

QNX® Neutrino® RTOS

Rational

…TBA

XScale

QNX® Momentics

The Best Tools and the Best RTOS

Page 72: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

72

IBM donated FrameworkJava IDE200 person-years of effortOpen Source

Consortium founding members include

QNX IDE: Standards based

Page 73: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

73

System Profiling

Page 74: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

74

Protocol

TCP/IPDeviceDriver

Application

InstrumentedMicroKernel

Trace

SystemEvent Log

System Events• interrupts,• scheduler, • messages, • system calls

System Characterization• Performance analysis• Field diagnostic• Live or post-mortem

Printer

Data display

Statistical &

Numerical

Analysis

Systems Analysis Toolkit

Page 75: 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems jpschaffer@qnx.com 818-227-5105 Embedded Operating Systems: The State

Providing Technology for Today…Providing Technology for Today…

Architecture for TomorrowArchitecture for Tomorrow

Irvine Office - 949-727-0444David Weintraub - Regional Sales Manager

[email protected]

Woodland Hills Office - 818-227-5105Jeff Schaffer - Sr. Field Applications Engineer

[email protected]