arm handouts

69
1 TM 1 39v10 The ARM Architecture Day 10 Agenda Exceptions System Design Memory Interface Synchronizatio n Input / Output

Upload: sundar

Post on 13-Nov-2014

161 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ARM Handouts

1TM 139v10 The ARM Architecture

Day 10 Agenda

Exceptions

System Design

Memory Interface

Synchronization

Input / Output

Page 2: ARM Handouts

2TM 239v10 The ARM Architecture

Vector Table

Exception Handling

When an exception occurs, the ARM: Copies CPSR into SPSR_<mode> Sets appropriate CPSR bits

Change to ARM state Change to exception mode Disable interrupts (if appropriate)

Stores the return address in LR_<mode> Sets PC to vector address

To return, exception handler needs to: Restore CPSR from SPSR_<mode> Restore PC from LR_<mode>

This can only be done in ARM state.Vector table can be at

0xFFFF0000 on ARM720T and on ARM9/10 family

devices

FIQ

IRQ

(Reserved)

Data Abort

Prefetch Abort

Software Interrupt

Undefined Instruction

Reset

0x1C

0x18

0x14

0x10

0x0C

0x08

0x04

0x00

Page 3: ARM Handouts

3TM 339v10 The ARM Architecture

PSR Mode Bit Values

Page 4: ARM Handouts

4TM 439v10 The ARM Architecture

Normal and High Vector Address

Page 5: ARM Handouts

5TM 539v10 The ARM Architecture

Reset

When the nRESET signal goes LOW, the core abandons executing instruction and Forces the PC to fetch the next instruction from address

0x00.

When nRESET goes HIGH again, then Core Overwrites R14_svc and SPSR_svc by copying the current

values of the PC and CPSR into them. The value of the saved PC and SPSR is not defined.

Forces M[4:0] to 10011 (Supervisor mode), sets the I and F bits in the CPSR, and clears the CPSR's T bit.

Execution resumes in ARM state.

Page 6: ARM Handouts

6TM 639v10 The ARM Architecture

Undefined Exception

When the core comes across an instruction which it cannot handle, it takes the undefined instruction trap. This mechanism may be used to extend either the THUMB or

ARM instruction set by software emulation. R14_udf = Address of next instruction address after the

undefined instruction SPSR_udf = CPSR CPSR[4:0] = 0b11011 (Mode bits forced to undef state) CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) Forces the PC to fetch the next instruction from address

0x04 or 0xFFFF0004

After emulating the failed instruction, the trap handler should execute the following irrespective of the state (ARM or Thumb) CPSR = SPSR_udf MOVS PC,R14_und (This restores the CPSR and returns to the

instruction following the undefined instruction)

Page 7: ARM Handouts

7TM 739v10 The ARM Architecture

Software Interrupts

The software interrupt instruction (SWI) is used for entering Supervisor mode, usually to request a particular supervisor function. R14_svc = Address of next instruction after the SWI instruction SPSR_svc = CPSR CPSR[4:0] = 0b10011 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) Forces the PC to fetch the next instruction from address 0x08 or

0xFFFF0008

Upon Exiting SWI CPSR = SPSR_svc MOVS PC,R14_svc (This restores the PC and CPSR, and returns to

the instruction following the SWI)

2831 2427 0

Cond 1 1 1 1 SWI number (ignored by processor)

23

Condition Field

Page 8: ARM Handouts

8TM 839v10 The ARM Architecture

Pre-fetch Abort Instruction

If a pre-fetch abort occurs, the pre-fetched instruction is marked as invalid, but the exception will not be taken until the instruction reaches the head of the pipeline. If the instruction is not executed - for example because a branch occurs while it is in the pipeline - the abort does not take place. R14_abt = Address of aborted instruction + 4 SPSR_abt = CPSR CPSR[4:0] = 0b10111 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) Forces the PC to fetch the next instruction from address 0x0C

or 0xFFFF000C

Upon Exiting Pre-Fetch Abort CPSR = SPSR_abt SUBS PC,R14, #4 (This restores the PC and CPSR, and returns

to the instruction following the Pre-Fetch abort)

Page 9: ARM Handouts

9TM 939v10 The ARM Architecture

Data Abort

If a data abort occurs, the action taken depends on the instruction type: Single data transfer instructions (LDR, STR) write back modified

base registers: the Abort handler must be aware of this. The swap instruction (SWP) is aborted as though it had not been

executed. Block data transfer instructions (LDM, STM) complete. If write-back is set, the base is updated. If the instruction would have overwritten the base with data (ie it

has the base in the transfer list), the overwriting is prevented. All register overwriting is prevented after an abort is indicated,

which means in particular that R15 (always the last register to be transferred) is preserved in an aborted LDM instruction.

The abort mechanism allows the implementation of a demand paged virtual memory system. In such a system the processor is allowed to generate arbitrary addresses. When the data at an address is unavailable, the Memory Management Unit (MMU) signals an abort.

Page 10: ARM Handouts

10TM 1039v10 The ARM Architecture

Data Abort

The abort handler must then work out the cause of the abort, make the requested data available, and retry the aborted instruction. The application program needs no knowledge of the amount of memory available to it, nor is its state in any way affected by the abort

Entering Data Abort R14_abt = Address of aborted instruction + 8 SPSR_abt = CPSR CPSR[4:0] = 0b10111 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) Forces the PC to fetch the next instruction from address 0x10 or

0xFFFF0010

Upon Exiting Data Abort CPSR = SPSR_abt SUBS PC,R14, #8 (This restores the PC and CPSR, and re-executes the

aborted instruction) SUBS PC,R14, #4 (This restores the PC and CPSR, and returns to the

instruction following the data abort instruction)

Page 11: ARM Handouts

11TM 1139v10 The ARM Architecture

Interrupt Request (IRQ) Exception

The IRQ (Interrupt Request) exception is a normal interrupt caused by a LOW level on the nIRQ input. IRQ has a lower priority than FIQ and is masked out when a FIQ sequence is entered. It may be disabled at any time by

setting the I bit in the CPSR, though this can only be done from a privileged (non-User) mode.

Entering IRQ R14_irq = Address of next instruction + 4 SPSR_irq = CPSR CPSR[4:0] = 0b10010 CPSR[T,IRQ] = 0b01 (ARM State, and Disable IRQs) Forces the PC to fetch the next instruction from address 0x18 or

0xFFFF0018 Exiting IRQ

CPSR = SPSR_irq SUBS PC,R14_irq, #4 (This restores the PC and CPSR, and returns to

the instruction)

Page 12: ARM Handouts

12TM 1239v10 The ARM Architecture

Fast Interrupt Request (FIQ) Exception

The FIQ (Fast Interrupt Request) exception is designed to support a data transferor channel process, and in ARM state has sufficient private registers to removethe need for register saving (thus minimizing the overhead of context switching).

FIQ is externally generated by taking the nFIQ input LOW. This input can accept either synchronous or asynchronous transitions, depending on the state of the ISYNC input signal. When ISYNC is LOW, nFIQ and nIRQ are considered asynchronous, and a cycle delay for synchronization is incurred before the interrupt can affect the processor flow.

Entering FIQ R14_fiq = Address of next instruction + 4 SPSR_fiq = CPSR CPSR[4:0] = 0b10001 CPSR[T,FIQ,IRQ] = 0b011 (ARM State, and Disable FIQ’s & IRQs) Forces the PC to fetch the next instruction from address 0x1C or 0xFFFF001C

Exiting FIQ CPSR = SPSR_fiq SUBS PC,R14_fiq, #4 (This restores the PC and CPSR, and returns to the

instruction)

Page 13: ARM Handouts

13TM 1339v10 The ARM Architecture

Return Address Calculation

Return Instruction Previous State Cycles

ARM R14_x THUMB R14_x

BL MOV PC, R14 PC + 4 PC + 2 1

SWI MOVS PC, R14_svc PC + 4 PC + 2 1

UDEF MOVS PC, R14_und PC + 4 PC + 2 1

FIQ SUBS PC, R14_fiq, #4 PC + 4 PC + 4 2

IRQ SUBS PC, R14_irq, #4 PC + 4 PC + 4 2

PABT SUBS PC, R14_abt, #4 PC + 4 PC + 4 1

DABT SUBS PC, R14_abt, #8 PC + 8 PC + 8 3

RESET NA – – 4

Page 14: ARM Handouts

14TM 1439v10 The ARM Architecture

Exception Priorities

Highest priority:

1. Reset

2. Data abort

3. FIQ

4. IRQ

5. Pre-fetch abort

Lowest priority:

6. Undefined Instruction and Software interrupt.

Page 15: ARM Handouts

15TM 1539v10 The ARM Architecture

Agenda

Exceptions

System Design

Memory Interface

Synchronization

Input / Output

Page 16: ARM Handouts

16TM 1639v10 The ARM Architecture

Example ARM-based System

16 bit RAM

8 bit ROM

32 bit RAM

ARMCore

I/OPeripherals

InterruptController

nFIQnIRQ

Page 17: ARM Handouts

17TM 1739v10 The ARM Architecture

AMBA

AMBA Advanced Microcontroller Bus Architecture Open specification framework for System-on-Chip (SoC) Designs

Bri

dg

e

Timer

On-chipRAM

ARM

InterruptController

Remap/Pause

TIC

Arbiter

Bus InterfaceExternalROM

ExternalRAM

Reset

System Bus Peripheral Bus

AHB or ASB APB

ExternalBus

Interface

Decoder

Page 18: ARM Handouts

18TM 1839v10 The ARM Architecture

AMBA

AHB The widely adopted AHB System Bus connects embedded processors

such as an ARM core to high-performance peripherals, DMA controllers, on-chip memory and interfaces.

APB The AMBA APB (Advanced Peripheral Bus) is a simpler bus protocol

designed for ancillary or general purpose peripherals

ADK The AMBA Design Kit is a library of components which enables system

developers to build AMBA based systems quickly and accurately.

ACT The AMBA Compliance Testbench, a comprehensive environment which

enables the rapid development of tests to certify the IP as AMBA compliant.

PrimeCell ARM’s AMBA compliant peripherals

Page 19: ARM Handouts

19TM 1939v10 The ARM Architecture

Agenda

Exceptions

System Design

Memory Interface

Synchronization

Input / Output

Page 20: ARM Handouts

20TM 2039v10 The ARM Architecture

Memory Interface

Memory Hierarchy

Memory Size and Speed

ARM MMU

Memory Interfacing

Page 21: ARM Handouts

21TM 2139v10 The ARM Architecture

Memory Memories come in many shapes, sizes and types

Shapes means packages like TQFP, TSOP, DIP Surface Mount Size: Like 4Mx8-Bit, 16Kx1 bit)

Page 22: ARM Handouts

22TM 2239v10 The ARM Architecture

Memory Technologies

DRAM: Dynamic Random Access Memory upside: very dense (1 transistor per bit) and inexpensive downside: requires refresh and often not the fastest access times often used for main memories

SRAM: Static Random Access Memory upside: fast and no refresh required downside: not so dense and not so cheap often used for caches

ROM: Read Only Memory often used for bootstrapping and such

B

A A

B

Word line

Pass transistor

Capacitor

Bit line

Page 23: ARM Handouts

23TM 2339v10 The ARM Architecture

Users want large and fast memories!

SRAM access times are 2 - 25ns at cost of $100 to $250 per Mbyte.DRAM access times are 60-120ns at cost of $5 to $10 per Mbyte.Disk access times are 10 to 20 million ns at cost of $.10 to $.20 per Mbyte.

Try and give it to them anyway build a memory hierarchy

Exploiting Memory Hierarchy

1997

CPU

Level n

Level 2

Level 1

Levels in thememory hierarchy

Increasing distance from the CPU in

access time

Size of the memory at each level

Page 24: ARM Handouts

24TM 2439v10 The ARM Architecture

The Memory Pyramid

Page 25: ARM Handouts

25TM 2539v10 The ARM Architecture

Locality

A principle that makes having a memory hierarchy a good idea

If an item is referenced,

temporal locality: it will tend to be referenced again soon

spatial locality: nearby items will tend to be referenced soon.

Why does code have locality?

Our initial focus: two levels (upper, lower) block: minimum unit of data hit: data requested is in the upper level miss: data requested is not in the upper level

Page 26: ARM Handouts

26TM 2639v10 The ARM Architecture

Two issues: How do we know if a data item is in the cache? If it is, how do we find it?

Our first example: block size is one word of data "direct mapped"

For each item of data at the lower level, there is exactly one location in the cache where it might be.

e.g., lots of items at the lower level share locations in the upper level

Cache

Page 27: ARM Handouts

27TM 2739v10 The ARM Architecture

Direct Mapped Cache

64

Cache

Line

Index

CAM RAM

Cache Memory

64-way set-associative cache with I-Cache and D-Cache 16KB each

8words length per line with one valid bit and two dirty bits per line

Pseudo random or round robin replacement algorithm

Write-through or write-back cache operation to update the main memory

The write buffer can hold 16 words of data and four addresses.

Page 28: ARM Handouts

28TM 2839v10 The ARM Architecture

Memory Interface

Memory Hierarchy

Memory Size and Speed

ARM MMU

Memory Interfacing

Page 29: ARM Handouts

29TM 2939v10 The ARM Architecture

Storage Basics

CPU sees the RAM as one long, thin line of bytes

That doesn't mean that it's actually laid out that way

Real RAM chips don't store whole bytes, but rather they store individual bits in a grid, which you can address one bit at a time

Page 30: ARM Handouts

30TM 3039v10 The ARM Architecture

SRAM Memory Timingfor Read Accesses

Address and chip select signals are provided tAA before data is available

Outputs reflect new data

2147H2147H High-Speed 4096x1-bit static RAM

A11-A0

DinWE CS

Dout

tRC = Read cycle time tAA = Address access time tACS = Chip select access time tHZ = Chip deselections to high Z out

old address

highimpedance

undef Data Valid

tRC

tAA

tACS

tHz

new addressAddressA11-A0

CS

WE

DoutAddress Bus

Page 31: ARM Handouts

31TM 3139v10 The ARM Architecture

SRAM Memory Timing for Write Accesses

Address and data must be stable tS time-units before write enable signal falls

2147H2147H High-Speed 4096X1-bit static RAM

A11-A0

DinWE CS

Din

tS = Signal setup timetRC = Read cycle time tAA = Address access time tACS = Chip select access time tHZ = Chip deselections to high Z out

old address

old data new data

tWC

tAA

tACS

tHz

new addressAddressA11-A0

CS

WE

Din

tS

Address Bus

Page 32: ARM Handouts

32TM 3239v10 The ARM Architecture

DRAM Organization and Operations

In the traditional DRAM, any storage location can be randomly accessed for read/write by inputting the address of the corresponding storage location.

A typical DRAM of bit capacity 2N * 2M consists of an array of memory cells arranged in 2N rows (word-lines) and 2M columns (bit-lines).

Each memory cell has a unique location represented by the intersection of word and bit line.

Memory cell consists of a transistor and a capacitor. The charge on the capacitor represents 0 or 1 for the memory cell. The support circuitry for the DRAM chip is used to read/write to a memory cell.

Page 33: ARM Handouts

33TM 3339v10 The ARM Architecture

DRAM Organization and Operations

Address decoders to select a row and a column

Sense amps To detect and amplify the charge in the capacitor of the memory cell.

Read/Write logic To read/store information in the memory cell.

Output Enable logic Controls whether data should appear at the outputs.

Refresh counters To keep track of refresh sequence.

Page 34: ARM Handouts

34TM 3439v10 The ARM Architecture

DRAM Memory Access

DRAM Memory is arranged in a XY grid pattern of rows and columns.

First, the row address is sent to the memory chip and latched, then the column address is sent in a similar fashion.

This row and column-addressing scheme (called multiplexing) allows a large memory address to use fewer pins.

The charge stored in the chosen memory cell is amplified using the sense amplifier and then routed to the output pin.

Read/Write is controlled using the read/write logic.

Page 35: ARM Handouts

35TM 3539v10 The ARM Architecture

How DRAM Works

Page 36: ARM Handouts

36TM 3639v10 The ARM Architecture

DRAM Memory Access

A typical DRAM read operation:1. The row address is placed on the address pins visa the address bus2. RAS pin is activated, which places the row address onto the Row

Address Latch.3. The Row Address Decoder selects the proper row to be sent to the sense

amps.4. The Write Enable is deactivated, so the DRAM knows that it’s not being

written to.5. The column address is placed on the address pins via the address bus6. The CAS pin is activated, which places the column address on the

Column Address Latch7. The CAS pin also serves as the Output Enable, so once the CAS signal

has stabilized, the sense amps place the data from the selected row and column on the Data Out pin so that it can travel the data bus back out into the system.

8. RAS and CAS are both deactivated so that the cycle can begin again.

Page 37: ARM Handouts

37TM 3739v10 The ARM Architecture

DRAM Performance Specs

Important DRAM Performance Considerations Random access time: time required to read any random single cell Fast Page Cycle time: time required for page mode access read/write

to memory location on the most recently accessed page (no need to repeat RAS in this case)

Extended Data Out (EDO): allows setup of next address while current data access is maintained

SDRAM Burst Mode: Synchronous DRAMs use a self incrementing counter and a mode register to determine the column address sequence after the first memory location accessed on a page effective for applications that usually require streams of data from one or more pages on the DRAM

Required refresh rate: minimum rate of refreshes

Page 38: ARM Handouts

38TM 3839v10 The ARM Architecture

Turning Bits

Into Bytes (2x This Picture)

Page 39: ARM Handouts

39TM 3939v10 The ARM Architecture

Memory Interface

Memory Hierarchy

Memory Size and Speed

ARM MMU

Memory Interfacing

Page 40: ARM Handouts

40TM 4039v10 The ARM Architecture

ARM MMU

Complex VM and protection mechanisms

Presents 4 GB address space (why?)

Memory granularity: 3 options supported 1MB sections Large pages (64 KBytes) access control within a large page on 16

KBytes Small pages (4 KBytes) access control within a large page on 1

Kbytes

Puts processor in Abort Mode when virtual address not mapped or permission check fails

Change pointer to page tables (called the translation table base, in ARM jargon) to change virtual address space useful for context switching of processes

Page 41: ARM Handouts

41TM 4139v10 The ARM Architecture

Example: Single-Level Page Table

VirtualAddress

0111231

pagetable page

frame

data

value = y

y220

entries

32 bits

212

entries

Size of page table = 220 * 32 bits = 4 Mbytes

value = x

x

8 bits

Size of page = 212 * 8 bits = 4 Kbytes

Page 42: ARM Handouts

42TM 4239v10 The ARM Architecture

Single-Level Page Table

Assumptions 32-bit virtual addresses 4 Kbyte page size = 212 bytes 32-bit address space

How many virtual page numbers? 232 / 212 = 220 = 1,048,576 virtual page numbers = number of entries in

the page table

If each page table entry occupies 4 bytes, how much memory is needed to store the page table? 220 entries * 4 bytes = 222 bytes = 4 Mbytes

Page 43: ARM Handouts

43TM 4339v10 The ARM Architecture

Example: Two level Page Table

VirtualAddress

0111231 2122

pagedirectory

pagetable page

frame

data

value = zvalue = y

y210

entries

32 bits

32 bitsSize of page directory = 210 * 32 bits = 4 Kbytes

value = x

x

z

210

entries

Size of page table= 210 * 32 bits = 4 Kbytes

212

entries

8 bits

Size of page = 212 * 8 bits = 4 Kbytes

Page 44: ARM Handouts

44TM 4439v10 The ARM Architecture

Two-Level Page Table

Assumptions 210 entries in page directory (= max number of page tables) 210 entries in page table 32 bits allocated for each page directory entry 32 bits allocated for each page table entry

How much memory is needed? Page table size = 210 entries * 32 bits = 212 bytes = 4 Kbytes Page directory size = 210 entries * 32 bits = 212 bytes = 4 Kbytes

Page 45: ARM Handouts

45TM 4539v10 The ARM Architecture

Two-Level Page Table

Small (typical) system One page table might be enough

Page directory size + Page table size = 8 Kbytes of memory would suffice for virtual memory management

How much physical memory could this one page table handle? Number of page tables * Number of page table entries * Page size

= 1 * 210 * 212 bytes = 4 Mbytes

Large system You might need the maximum number of page tables

Max number of page tables * Page table size = 210 directory entries * 212 bytes = 222 bytes = 4 Mbytes of

memory would be needed for virtual memory management How much physical memory could these 210 page tables handle?

Number of page tables * Number of page table entries * Page size =

210 * 210 * 212 bytes = 4 Gbytes

Page 46: ARM Handouts

46TM 4639v10 The ARM Architecture

Memory Interface

Memory Hierarchy

Memory Size and Speed

ARM MMU

Memory Interfacing

Page 47: ARM Handouts

47TM 4739v10 The ARM Architecture

Interfacing External Memory

Little/Big Endian support

Address space: 4G bytes, (Differs in processor Implementation)

Supports programmable 8/16/32-bit data bus width for each bank

External address lines vary for a specific processor implementation

Programmable bank start address and bank size for bank 7

Eight memory banks: Memory banks for ROM, SRAM or Synchronous DRAM

Fully Programmable access cycles for all memory banks

Supports external wait signals to expend the bus cycle

Supports self-refresh mode in SDRAM for power down

Supports various types of ROM for booting (NOR/NAND Flash, EEPROM, and others)

The write buffer can hold 16 words of data and four addresses.

Page 48: ARM Handouts

48TM 4839v10 The ARM Architecture

CPU Memory Interface

CPU Memory Interface usually consists of: uni directional address bus bi directional data bus read control line write control line ready control line size (byte, word) control line

Memory access involves a memory bus transaction read:

(1) set address, read and size, (2) copy data when ready is set by memory

write:(1) set address, data, write and size, (2) done when ready is set

address bus

data bus

CPU MemoryRead

Write

Ready

size

Page 49: ARM Handouts

49TM 4939v10 The ARM Architecture

Memory Subsystem Components

Memory subsystems generally consist of chips+controller

Each chip provides few bits (e.g., 1 4) per access

Bits from multiple chips are accessed in parallel to fetch bytes and words

Memory controller decodes/translates address and control signals

Controller can also be on memory chip

Example: contains 8 16x1 bit chips and

very simple controller

address bus

data bus

CPU MemoryReadWriteReadySize

1-of-16decoder

1 0 1 1 0 0 1 01 0 0 0 0 0 0 1

0 1 0 1 0 0 1 1

address

00000001

1111

16x1-bit memory chip

16x8-bit memory array

D7 D6 D5 D4 D3 D2 D1 D0

Page 50: ARM Handouts

50TM 5039v10 The ARM Architecture

EEPROM Interfacing

Memory Interface with 8-bit ROM

ARM MEMORY

A0 – A15 A0 – A15

D0 – D7 DQ0 – DQ7

WE WE

OE OE

GCS CE

Memory Interface with 8-bit ROM

Page 51: ARM Handouts

51TM 5139v10 The ARM Architecture

Interfacing 8 - Bit Memory Banks

Memory Interface with 8-bit ROM x 2

Page 52: ARM Handouts

52TM 5239v10 The ARM Architecture

Interfacing 16 - Bit Memory Banks

Memory Interface with 16-bit ROM x 2

Extra Signals

BE – Bank Enable

Page 53: ARM Handouts

53TM 5339v10 The ARM Architecture

Interfacing Banked SDRAM

Memory Interface with 16-bit SDRAM x 2

Page 54: ARM Handouts

54TM 5439v10 The ARM Architecture

Memory Interface with 16-bit SDRAM x 2

ARM SDRAM Signals Description

SCKE SCKE Clock Enable (high/Low)

SCLK SCLK System Clock

SCS0 SCS Chip Select

SRAS SRAS Row Address Strobe

SCAS SCAS Column Address Strobe

WE WE Write Enable

Signals in Interfacing SDRAM

Page 55: ARM Handouts

55TM 5539v10 The ARM Architecture

Critical Thinking

It’s a commonly held belief that adding more RAM increases your performance. If you wanted to speed up your computer, what kind of RAM would you buy and why?

Page 56: ARM Handouts

56TM 5639v10 The ARM Architecture

Agenda

Exceptions

System Design

Memory Interface

Synchronization

Input / Output

Page 57: ARM Handouts

57TM 5739v10 The ARM Architecture

What is the Problem

Adding two array elements to another array element

LDR R0 A[0]

LDR R1 A[1]

ADD R2,R1,R0

STR R2 A[3]

Swapping the Variables

LDR R0 X

LDR R1 Y

STR R1 X

STR R2 Y

What to do ?????

Page 58: ARM Handouts

58TM 5839v10 The ARM Architecture

The Solution

Adding two array elements to another array element

LDR R0 A[0]

LDR R1 A[1]

ADD R2,R1,R0

Bubble or other instructions

STR R2 A[3]

Swapping the Variables

LDR R0 X

LDR R1 Y

STR R0 Y

STR R1 X

That’s Synchronization

Page 59: ARM Handouts

59TM 5939v10 The ARM Architecture

How to Achieve in ARM

SINGLE DATA SWAP (SWP)

[3:0] Source Register

[15:12] Destination Register

[19:16] Base Register

[22] Byte/Word Bit

0 = Swap word quantity

1 = Swap word quantity

[31:28] Condition Field

SWP R0,R1,[R2]

Load R0 with the word addressed by R2, and store R1 at R2.

SWPB R2,R3,[R4]

Load R2 with the byte addressed by R4, and store bits 0 to 7 of R3 at R4.

SWPEQ R0,R0,[R1]

Conditionally swap the contents of the word addressed by R1 with R0.

Page 60: ARM Handouts

60TM 6039v10 The ARM Architecture

How to Achieve in ARM

The data swap instruction is used to swap a byte or word quantity between a register and external memory. This instruction is implemented as a memory read followed by a memory write which are “locked” together (the processor cannot be interrupted until both operations have completed, and the memory manager is warned to treat them as inseparable). This class of instruction is particularly useful for implementing software semaphores.

The swap address is determined by the contents of the base register (Rn). The processor first reads the contents of the swap address. Then it writes the contents of the source register (Rm) to the swap address, and stores the old memory contents in the destination register (Rd). The same register may be specified as both the source and destination.

The LOCK output goes HIGH for the duration of the read and write operations to signal to the external memory manager that they are locked together, and should be allowed to complete without interruption. This is important in multi-processor systems where the swap instruction is the only indivisible instruction which may be used to implement semaphores; control of the memory must not be removed from a processor while it is performing a locked operation.

Page 61: ARM Handouts

61TM 6139v10 The ARM Architecture

Processor Independent Techniques

Semaphores

Mutual Exclusion

Message Ques

Pipes … etc

Page 62: ARM Handouts

62TM 6239v10 The ARM Architecture

Agenda

Exceptions

System Design

Memory Interface

Synchronization

Input / Output

Page 63: ARM Handouts

63TM 6339v10 The ARM Architecture

CPU Bus I/O

CPU needs to talk with I/O devices such as keyboard, mouse, video, network, disk drive, LEDs

Memory mapped I/O Devices are mapped to

specific memory locations just like RAM

Uses load/store instructions just like accesses to memory

Ported I/O Special bus line and

instructions

Address

CPU

Memory I/O Device

Data

Read

Write

CPU

MemoryI/O Device

Data

Read

Write

Address

I/O Port

Memory I/O

Page 64: ARM Handouts

64TM 6439v10 The ARM Architecture

I/O Register Basics

I/O Registers are NOT like normal memory Device events can change their values (e.g., status registers) Reading a register can change its value (e.g., error condition reset)

so, for example, can't expect to get same value if read twice Some are read only (e.g., receive registers) Some are write only (e.g., transmit registers) Sometimes multiple I/O registers are mapped to same address

selection of one based on other info (e.g., read vs. write or extra control bits)

The bits in a control register often each specify something different and important and have significant side effects

Cache must be disabled for memory mapped addresses

When polling I/O registers, should tell compiler that value can change on its own volatile int *ptr;

Page 65: ARM Handouts

65TM 6539v10 The ARM Architecture

Up Next - Bus Architectures

Page 66: ARM Handouts

66TM 6639v10 The ARM Architecture

Bus Protocols

Protocol refers to the set of rules agreed upon by both the bus master and bus slave Synchronous bus transfers occur in relation to successive edges of a

clock Asynchronous bus transfers bear no particular timing relationship Semi synchronous bus Operations/control initiate asynchronously, but

data transfer occurs synchronously

CPU Device 1 Device 2 Device 3

Bus

Page 67: ARM Handouts

67TM 6739v10 The ARM Architecture

Synchronous Bus Protocol

Transfer occurs in relation to successive edges of the system clock

Example: Memory address is placed on the address bus within a certain time, relative to

the rising edge of the clock By the trailing edge of this same clock pulse, the address information has had

time to stabilize, so the READ line is asserted Once the chip has been selected, then the memory can place the contents of

the specified location on the data bus

Clock

Address

Master (CPU) RD

Master (CPU) CS

Data

stable stable

stable stableunstable unstable

Instruction Addr Data Addr

I-fetch data

access time

decoding delay

Page 68: ARM Handouts

68TM 6839v10 The ARM Architecture

Asynchronous Bus Protocol

No system clock used

Useful for systems where CPU and I/O devices run at different speeds

Example: Master puts address and

data on the bus and then raises the Master signal

Slave sees master signal, reads the data and then raises the Slave signal

Master sees Slave signal and lowers Master signal

Slave sees Master signal lowered and lowers Slave signal

write read

Address

Master

Slave

Data

there's somedata

I’vegot it

I see yougot it

I see yousee I got it

We call this exchange “handshaking”

Page 69: ARM Handouts

69TM 6939v10 The ARM Architecture

Thank You

Any

Questions?