2 systems architecture, fifth edition chapter goals describe the system bus and bus protocol...

2Systems Architecture, Fifth Edition

Chapter Goals

• Describe the system bus and bus protocol• Describe how the CPU and bus interact with

peripheral devices• Describe the purpose and function of device

controllers• Describe how interrupt processing coordinates the

CPU with secondary storage and I/O devices

Chapter Goals (continued)

• Describe how buffers, caches, and data compression improve computer system performance

System Bus

• Connects CPU with main memory and peripheral devices

• Set of data lines, control lines, and status lines• Bus protocol

– Number and use of lines

– Procedures for controlling access to the bus

• Subsets of bus lines: data bus, address bus, control bus

Bus Clock and Data Transfer Rate

• Bus clock pulse– Common timing reference for all attached devices

– Frequency measured in MHz

• Bus cycle– Time interval from one clock pulse to the next

• Data transfer rate– Measure of communication capacity

– Bus capacity = data transfer unit x clock rate

Bus Protocol

• Governs format, content, timing of data, memory addresses, and control messages sent across bus

• We can’t let two devices put data on the bus at the same time. So we need access control.

• Approaches for access control– Master-slave approach – traditional – CPU is bus

master and all other devices are slaves

Bus Protocol

• Approaches for access control (transferring data without CPU):– Direct memory access (DMA) – DMA controller

gets data from device and stores in RAM

– Peer-to-peer buses – any device can become master via bus arbitration protocol

Local Bus vs External Bus

• Traditionally, the local bus is connected to CPU and cache and RAM and other internal devices

• External bus connects the main processing unit to I/O devices

• Differences between local and external buses is getting fuzzy – new bus protocols can support both

Parallel vs Serial Bus

• Parallel bus is older technology in which bus is a connection of wires that a devices “plugs into”

• Serial bus interconnects one device after another and creates a daisy-chain of devices

• Timing skew has become a problem with parallel bus design

Serial Bus vs Parallel Bus

Parallel

Serial (daisychain)

Example System Buses• IBM PC Bus - 8 bit data 20 bit address, used in all

early IBM PCs and clones.• PC-AT bus (ISA) - Compatible with PC bus, but has

second strip of connectors with an additional 36 lines. These lines give a 16 bit data bus for 80286 chip

• VESA Local bus (VL-bus or VLB) – found alongside ISA bus in pcs; acted as a high-speed bus for DMA and memory-mapped I/O; aka Very Long Bus!

Example System Buses

• IBM Microchannel - Bus for IBM PS/2 computer; closed architecture with high licensing costs

• EISA (Extended industry standard architecture) - Several non-IBM companies reacted to Microchannel and designed EISA. Provides for 32 bit data bus.

VME Bus• Used in SGI systems• Begun by Motorola, became an IEEE standard (IEEE

P1014)• 32 bit bus, asynchronous design (see next slide)• No circuitry on motherboard• Hundreds of companies design board for VME, 300

page set of VME definitions, very stable• Bus lines provide automatic self-testing and status

reporting• Now VME64 with a 64-bit bus

PCI Bus

• (Peripheral component interconnect) - used in pc and Mac systems

• Well defined and fast• Is the local bus in a machine with other buses• Intel based; CPU bus and peripherals plug directly

into PCI bus• Allows devices to talk to each other without CPU

intervention

Older architecture

Newer architecture

northbridge chip

southbridge chip

Video card

PCI busReal time clockUSBPower managementOther devices

Front side bus(system bus)

Backside bus

PCI Bus

• Plug-in boards have software settings, not DIP switches

• 532 Mbps transfer speed (PCI v.3.0)• Synchronous bus (see figure on next slide)• Initiator and target design (master/slave)• Address and data lines multiplexed

PCI Bus

• OS queries all PCI buses at boot time to find out what devices are present and what system resources (interrupt lines, memory, etc.) each needs. It then allocates the resources and tells each device what its allocation is.

• Each device can request up to six areas of memory space or I/O port space

PCI Versions

• 32-bit, 33MHz (5V, added in Rev. 2.0) • 64-bit, 33MHz (5V, added in Rev. 2.0) • 32-bit, 66MHz (3.3V only, added in Rev. 2.1) • 64-bit, 66MHz (3.3V only, added in Rev. 2.1)

• PCI-extended• Twice as fast as PCI – 1.06 GB/s• Designed for servers to support Gigabit Ethernet cards,

Fibre Channel and Ultra320 SCSI controllers• PCI-X backwards compatible with older PCI standards

(except the 5v ones)• PCI-X only runs as fast as the slowest device• In 2003 PCI SIG ratified PCI-X 2.0 which added 266 MHz

and 533 MHz options, or roughly 2.15 GB/s and 4.3 GB/s throughput (but losing ground to PCIe)

• PCI-X 3.0 in development, but how far with popularity of PCIe?

PCI-Express

• PCIe or PCI-E

• Not the same as PCI. PCI is a parallel bus, where PCIe is a serial bus (like USB)

• Hub on motherboard acts as crossbar switch allowing multiple simultaneous full-duplex connections

• Serial format starting to win out over parallel format due in part to timing skew

• PCIe is a layered protocol, consisting of a Transaction Layer, a Data Link Layer, and a Physical Layer (fairly complex, like USB)

From top to bottom – PCIe x4, x16, x1, x16, and an older PCI connectorfrom Wikipedia

A PCIe card will fit in any slot that isat least wide enough

SCSI (Small Computer System Interface)

• Family of standard buses designed primarily for secondary storage devices

• Most often used for disk drives but can interface pretty much any device

• Implements both a low-level physical I/O protocol and a high-level logical device control protocol

SCSI Interfaces – Parallel

• Still common is the older parallel SCSI (aka SPI)• Popular forms include

– SCSI-1

– Fast SCSI

– Fast-Wide SCSI

– Ultra Wide SCSI

• See handout on parallel SCSI specs

SCSI Interfaces – Serial

• Serial SCSI – modern addition to SCSI system• Faster data rates, hot swapping, and improved

fault isolation among the advantages of serial SCSI

• Once again clock skew issue of high speed parallel interfaces is driving the change from parallel to serial

SCSI Interfaces – iSCSI

• SCSI command set stays the same, its just that the physical specifications essentially no longer exist

• Physical specs are TCP/IP• SCSI-3 implemented over a network• iSCSI competing with Fibre Channel• Many felt iSCSI would not be as fast as Fibre

Channel due to TCP/IP overhead, but now systems are using TCP Offload Engine and 10G Ethernet

Desirable Characteristicsof a SCSI Bus

• Non-proprietary standard• High data transfer rate• Peer-to-peer capability• High-level (logical) data access commands• Multiple command execution• Interleaved command execution• But typically quite a bit more expensive.

I/O Ports

• I/O ports are the pathways between the CPU and a peripheral device

• Logical and Physical Access– Usually a memory address that can be read/written by

the CPU and a single peripheral device

– Also a logical abstraction that enables CPU and bus to interact with each peripheral device as if the device were a storage device with linear address space

Physical access: System bus is usually physically implemented on a large printed circuit board with attachment points for devices.

Logical access: The device, or its controller, translates linear sector address into corresponding physical sector location on a specific track and platter.

Device Controllers

• Implement the bus interface and access protocols• Translate logical addresses into physical addresses• Enable several devices to share access to a bus

connection

Mainframe Channels

• Advanced type of device controller used in mainframe controllers

• Compared with device controllers:– Greater data transfer capacity

– Larger maximum number of attached peripheral devices

– Greater variability in types of devices that can be controlled

Interrupt Processing

• Used by application programs to coordinate data transfers to/from peripherals, notify CPU of errors, and call operating system service programs

• When interrupt is detected, executing program is suspended; pushes current register values onto the stack and transfers control to an interrupt handler

• When interrupt handler finishes executing, the stack is popped and suspended process resumes from point of interruption

• Secondary storage and I/O devices are much slower than RAM, ROM, cache memory, and the CPU (see table on next slide)

• When the CPU asks for data from an I/O device, what should the CPU do?– Sit in a wait cycle?

– Go do something else?

Multiple Types of Interrupts

• Categories of interrupts– I/O event

– Error condition

– Service request

– Processor to processor communication

• Can one interrupt be interrupted by another type of interrupt?

Buffers and Caches

• Improve overall computer system performance by employing RAM to overcome mismatches in data transfer rate and data transfer unit size

Buffers

• Small storage areas (usually DRAM or SRAM) that hold data in transit from one device to another

• Use interrupts to enable devices with different data transfer rates and unit sizes to efficiently coordinate data transfer

• Buffer overflow

Classic example of a buffer: a print buffer

Computer system performance improves dramatically with larger buffer.

Assumes a32-bit bus

2 interrupts eachtime we fill up thebuffer.

Buffer will be filled64KB/buffer sizetimes

Sum of bustransfers andbus interrupts

Assumes 100CPU cycles tohandle aninterrupt.

Diminishing Returns

• When multiple resources are required to produce something useful, adding more and more of a single resource produces fewer and fewer benefits

• Applicable to buffer size

Law of diminishing returns affects both bus and CPU performance

Similar chart to thelast one, but now theamount to transferis 64B instead of64KB.

Note howimprovement stopsonce the buffer sizeequals the transferamount.

• Differs from buffer:– Data content not automatically removed as used

– Used for bidirectional data

– Used only for storage device accesses

– Usually much larger

– Content must be managed intelligently

• Achieves performance improvements differently for read and write accesses

Write access: Sending confirmation (2) before data is written to secondary storage device (3) can improve program performance; program can immediately proceed with other processing tasks.

Read accesses are routed to cache (1). If data is already in cache, it is accessed from there (2). If data is not in cache, it must be read from the storage device (3). Performance improvement realized only if requested data is already waiting in cache.

Cache Controller

• Processor that manages cache content• Guesses what data will be requested; loads it from

storage device into cache before it is requested• Can be implemented in

– A storage device storage controller or communication channel

– Operating system

Primary storage cache Secondary storage cache

• Can limit wait states by using SRAM cached between CPU and SDRAM primary storage

• Level one (L1): within CPU

• Level two (L2): on-chip

• Level three (L3): off-chip

• Gives frequently accessed files higher priority for cache retention

• Uses read-ahead caching for files that are read sequentially

• Gives files opened for random access lower priority for cache retention

Intel Itanium® 2 microprocessor uses three levels of primary storage caching.

Processing Parallelism

• Increases computer system computational capacity; breaks problems into pieces and solves each piece in parallel with separate CPUs

• Techniques– Multicore processors

– Multi-CPU architecture

– Clustering

Multicore Processors

• Include multiple CPUs and shared memory cache in a single microchip

• Typically share memory cache, memory interface, and off-chip I/O circuitry among the cores

• Reduce total transistor count and cost and provide synergistic benefits

Multi-CPU Architecture

• Employs multiple single or multicore processors sharing main memory and the system bus within a single motherboard or computer system

• Common in midrange computers, mainframe computers, and supercomputers

• Cost-effective for– Single system that executes many different

application programs and services

– Workstations

Scaling Up

• Increasing processing by using larger and more powerful computers

• Used to be most cost-effective• Still cost-effective when maximal computer power

is required and flexibility is not as important

Scaling Out

• Partitioning processing among multiple systems• Speed of communication networks; diminished

relative performance penalty• Economies of scale have lowered costs• Distributed organizational structures emphasize

flexibility• Improved software for managing multiprocessor

configurations

High-Performance Clustering

• Connects separate computer systems with high-speed interconnections

• Used for the largest computational problems(e.g., modeling three-dimensional physical phenomena)

Partitioning the problem to match the cluster architecture ensures that most data exchange traverses high-speed paths.

Compression

• Reduces number of bits required to encode a data set or stream

• Effectively increases capacity of a communication channel or storage device

• Requires increased processing resources to implement compression/decompression algorithms while reducing resources needed for data storage and/or communication

Trading data sizeagainst CPU time

Compression Algorithms

• Vary in:– Type(s) of data for which they are best suited

– Whether information is lost during compression

– Amount by which data is compressed

– Computational complexity

• Lossless versus lossy compression

Compression can be used to reduce disk storage requirements (a) or to increase communication channel capacity (b).

MPEG standards address recording and encoding formats for both images and sound.

Exploitsvarying

sensitivityof the earto sounds

to performlossy

compression

Chip Interfacing

• You are working for Nokia on a new cellphone• This phone will have a processor, one EPROM,

one RAM, and an I/O chip to control display and keyboard

• The processor has a 16-bit address bus• With 16 bits, you can have 65,536 bytes of storage

Chip Interfacing

• For the I/O chip, we could attach it as an I/O device, then set CS line on PIO to IORQ line on CPU

• Or we could choose a particular address and have that address go into the CS line of the I/O chip

• The latter form is called memory-mapped I/O

Chip Interfacing

• The I/O chip needs 4 bytes of address space (3 I/O ports and 1 status register)

• The EPROM is an 8K chip so it needs 8K of address space (13 bits needed to select 8K)

• Likewise, the RAM needs 8K of address space

Chip Interfacing

• You don’t want addresses of chips to overlap, so place the devices in memory as follows:– EPROM starts at address 0 (0000h) and is 8K

(8192, or 2000h) long so ends at 1FFFh

– RAM starts at address 32K (32,768, or 8000h) and is 8K long so ends at 9FFFh

– I/O starts at address 65532 (FFFCh) and is 4 bytes long so ends at 65535 (FFFFh)

Chip Interfacing• So, hexadecimal address ranges for each chip are:

– EPROM: 0000 – 1FFF

– RAM: 8000 – 9FFF

– I/O: FFFC – FFFF

• That would place the devices at the following binary addresses:– EPROM: 000xxxxxxxxxxxxx

– RAM: 100xxxxxxxxxxxxx

– I/O: 11111111111111xx

Memory Allocation

EPROM RAM I/O

0K 8K-1 32K 40K-1 65532-65535

Interface

Summary

• How the CPU uses the system bus and device controllers to communicate with secondary storage and input/output devices

• Hardware and software techniques for improving data efficiency, and thus, overall computer system performance: bus protocols, interrupt processing, buffering, caching, and compression

2 systems architecture, fifth edition chapter goals describe the system bus and bus protocol...

Documents

warm-up on a piece of paper, describe how you think...

ecological interactions. engagement as you watch the...

school bus driver training unit f safe driving. objectives...

standard and essential question standard -ssemi1 the student...

virtual functional bus - autosar.org · “virtual...

visually and verbally describe ways the geosphere,...

microeconomics - richmond county school system ·...

feeding relationships and symbiosis state standards 6th –...

interact – researching third country nationals...

interact handbook - home page | rotary club of...

species interaction. questions for today: what are the five...

bus structures unit objectives describe the primary types of...

avalon bus specification reference...

allworx interact and interact professional user guide...

architectures. software architecture describe how the...

microeconomics. ssemi1 the student will describe how...

chapter 12 multiplexing. objectives (1 of 3) describe a...

circular flow & money. ssemi1 the student will describe how...

intermediate accounting i catalog … courses/bus 202...

interact pro connected lighting dashboard and...