Download - I/O InterfaceCS510 Computer ArchitecturesLecture 17 - 1 Lecture 17 I/O Interfaces and I/O Busses
I/O Interface CS510 Computer Architectures Lecture 17 - 1
Lecture 17Lecture 17
I/O Interfaces and I/O Interfaces and I/O BussesI/O Busses
Lecture 17Lecture 17
I/O Interfaces and I/O Interfaces and I/O BussesI/O Busses
I/O Interface CS510 Computer Architectures Lecture 17 - 2
Storage System IssuesStorage System IssuesStorage System IssuesStorage System Issues
• Historical Context of Storage I/O• Storage I/O Performance Measures• Secondary and Tertiary Storage Devices• Processor Interface Issues• I/O Buses• Redundant Arrays of Inexpensive Disks (RAID)
I/O Interface CS510 Computer Architectures Lecture 17 - 3
Disk Time ExampleDisk Time ExampleDisk Time ExampleDisk Time Example• Disk Parameters:
– Transfer size is 8K bytes
– Advertised average seek is 12 ms
– Disk spins at 7200 RPM
– Transfer rate is 4 MB/sec
– Controller overhead is 2 ms
• Assume that disk is idle so no queuing delay• What is Average Disk Access Time for a Sector?
– Ave seek + ave rot delay + transfer time + controller overhead
– 12 ms + 0.5/(7200 RPM/60) + 8 KB/4 MB/s + 2 ms
– 12 + 4.15 + 2 + 2 = 20 ms
• Advertised seek time assumes no locality: typically 1/4 to 1/3 advertised seek time– Average Disk Access Time for a Sector: 20 ms => 12 ms
I/O Interface CS510 Computer Architectures Lecture 17 - 4
Processor Interface IssuesProcessor Interface IssuesProcessor Interface IssuesProcessor Interface Issues
• Interconnections– Busses
• Processor interface– Interrupts
– Memory mapped I/O
• I/O Control Structures– Polling
– Interrupts
– DMA
– I/O Controllers
– I/O Processors
• Capacity, Access Time, Bandwidth
Lecture 17 - 5CS510 Computer ArchitecturesI/O Interface
I/O InterfaceI/O InterfaceI/O InterfaceI/O Interface
Seperate I/O instructions (in,out)
Lines distinguish between I/O and memory transfers
VME busMultibus-IINubus
40 Mbytes/secoptimistically
10 MIPS processorcompletelysaturates the bus!
CPU
Interface Interface
Peripheral Peripheral
Memory
memorybus
Independent I/O Bus
Common Memory& I/O bus
CPU
Interface Interface
Peripheral Peripheral
Memory
Lecture 17 - 6CS510 Computer ArchitecturesI/O Interface
Memory Mapped I/OMemory Mapped I/OMemory Mapped I/OMemory Mapped I/O
Single Memory & I/O Bus No Separate I/O Instructions
CPU
Interface Interface
Peripheral Peripheral
Memory
L2 $
Memory Bus
Memory Bus Adapter
I/O bus
$
CPU
ROM
RAM
I/OI/O
Lecture 17 - 7CS510 Computer ArchitecturesI/O Interface
Programmed I/O (Polling)Programmed I/O (Polling)Programmed I/O (Polling)Programmed I/O (Polling)
CPU
IOC
device
Memory
• not an efficient way to use the CPU, unless the device is very fast!
• but checks for I/O completion can be dispersed among computationally intensive code
• busy on wait loopIs dataready?
readdata
storedata
yesno
done?no
yes
Lecture 17 - 8CS510 Computer ArchitecturesI/O Interface
Interrupt Driven Data TransferInterrupt Driven Data TransferInterrupt Driven Data TransferInterrupt Driven Data Transfer
(2) save PC
(3) interruptservice addr
(4)
1000 transfers/second: 1000 interrupts @ 2 sec per interrupt=> 2 msec 1000 interrupt service @ 98 sec each=> 98 msec 100 msec = 0.1 CPU seconds
CPU
IOC
device
Memory
User program halts only during actual transfer
memory
userprogram
interruptserviceroutine
readstore...rti
addsubandornop
(1) I/Ointerrupt
Lecture 17 - 9CS510 Computer ArchitecturesI/O Interface
Direct Memory AccessDirect Memory AccessDirect Memory AccessDirect Memory Access
0ROM
RAM
Peripherals
DMACn
Memory Mapped I/O
Time to do 1000 xfers in 1 msec:1 DMA set-up sequence: @ 50 sec1 interrupt: @ 2 sec1 interrupt service sequence: @ 48 sec 100sec .0001 second of CPU time
CPU sends a starting address, direction(R/W), and word count to DMAC. Then issues "start".
DMAC provides; Peripheral controller Handshake signals Memory Addresses Handshake signals
CPU
IOC
device
Memory DMAC
I/O
Lecture 17 - 10CS510 Computer ArchitecturesI/O Interface
Input/Output ProcessorsInput/Output ProcessorsInput/Output ProcessorsInput/Output Processors
(2)
(3)
Device to/from memory transfers are controlled by the IOP directly.
IOP steals memory cycles from CPU
OP Device Address
target device
where cmnds are
looks in memory for commands
CPU IOP
Mem
D1
D2
Dn
. . .main memory
bus
I/Obus
OP Addr Cnt Other
whatto do
whereto putdata
howmuch
specialrequests
Commandmemory
CPU
IOP
(1)issues instruction to IOP
interrupts when done(4)
I/O Interface CS510 Computer Architectures Lecture 17 - 11
Relationship Relationship to Processor Architectureto Processor Architecture
Relationship Relationship to Processor Architectureto Processor Architecture
• I/O instructions and busses have largely disappeared• Interrupt vectors have been replaced by jump tables
PC <- M [ IVA + interrupt number ]
• Interrupts:– Stack replaced by shadow registers
– Handler saves registers and re-enables higher priority interrupts
I/O Interface CS510 Computer Architectures Lecture 17 - 12
Relationship Relationship to Processor Architectureto Processor Architecture
Relationship Relationship to Processor Architectureto Processor Architecture
• Caches required for processor performance cause problems for I/O– Flushing is expensive, I/O pollutes cache
– Solution is borrowed from shared memory multiprocessors "snooping"
• Virtual memory frustrates DMA• Stateful processors are hard to switch context
I/O Interface CS510 Computer Architectures Lecture 17 - 13
Interconnect TrendsInterconnect TrendsInterconnect TrendsInterconnect Trends
>1000 m 10 - 100 m 1 mDistance
10 - 100 Mb/s 40 - 1000 Mb/s 320 - 1000+ Mb/sBandwidth
high (>ms) medium low (<µs)Latency
lowExtensive CRC
mediumByte Parity
highByte Parity
Reliability
• Interconnect = glue that interfaces computer system components
• High speed hardware interfaces + logical protocols
• Networks, channels, backplanes
• Memory-mapped• Wide pathways• Centralized arbit.
• Message-based• Narrow pathways• Distributed arbitration
Network Channel Backplane
I/O Interface CS510 Computer Architectures Lecture 17 - 14
Backplane ArchitecturesBackplane ArchitecturesBackplane ArchitecturesBackplane Architectures
Distinctions begin to blur:
SCSI channel is like a bus FutureBus is like a channel (disconnect/reconnect) HIPPI forms links in high speed switching fabrics
128No
16 - 32Single/Multiple
MultipleNo
Async25
12.927.913.621
.5 mIEEE 1014
96Yes32
Single/MultipleMultipleOptionalAsync
3715.595.220.820
.5 mIEEE 896
Metric VME FutureBus96Yes32
Single/MultipleMultipleOptional
Sync201040
13.321
.5 mANSI/IEEE 1296
MultiBus IIBus Width (signals)Address data multiplexed?Data WidthXfer Size# of Bus MastersSplit TransactionsClockingBandwidth, Single Word (0 ns mem)Bandwidth, Single Word (150ns mem)Bandwidth Multiple Word (0 ns mem)Bandwidth Multiple Word (150 ns mem)Max # of devicesMax Bus LengthStandard
25na8
Single/MultipleMultipleOptional
Either5, 1.55, 1.55, 1.55, 1.5
725 m
ANSI X3.131
SCSI-I
I/O Interface CS510 Computer Architectures Lecture 17 - 15
Bus-Based InterconnectBus-Based InterconnectBus-Based InterconnectBus-Based Interconnect
• Bus: a shared communication link between subsystems– Low cost: a single set of wires is shared multiple ways
– Versatility: Easy to add new devices & peripherals may even be ported between computers using common bus
• Disadvantage– A communication bottleneck, possibly limiting the
maximum I/O throughput
• Bus speed is limited by physical factors– the bus length
– the number of devices (and, hence, bus loading).
– these physical limits prevent arbitrary bus speedup.
I/O Interface CS510 Computer Architectures Lecture 17 - 16
Bus-Based InterconnectBus-Based InterconnectBus-Based InterconnectBus-Based Interconnect• Two generic types of busses:
– I/O busses (sometimes called a channel) • lengthy • many types of devices connected • wide range in the data bandwidth• follow a bus standard
– CPU-Memory buses (sometimes called a backplane)• short • high speed, matched to the memory system to maximize memor
y-CPU bandwidth
– To lower costs, low cost (older) systems combine together
• Bus transaction– Sending address & receiving or sending data
I/O Interface CS510 Computer Architectures Lecture 17 - 17
Bus ProtocolsBus ProtocolsBus ProtocolsBus Protocols
Multibus: 20 address, 16 data, 5 control, 50ns Pause• Bus Master: has ability to control the bus, initiates a transaction
• Bus Slave: module activated by the transaction
• Bus Communication Protocol: specification of sequence of events and timing requirements in transferring information.
– Asynchronous Bus Transfers: control lines (req., ack.) serve to orchestrate sequencing
– Synchronous Bus Transfers: sequence relative to the common clock
° ° °Master Slave
Control Lines
Address Lines
Data Lines
I/O Interface CS510 Computer Architectures Lecture 17 - 18
Synchronous Bus Protocols Synchronous Bus Protocols Synchronous Bus Protocols Synchronous Bus Protocols
Pipelined/Split transaction Bus Protocol
Address
Data
Wait
addr 1
data 0
addr 2
wait 1
data 1
addr 3
OK 1
begin read Read complete
Address
Data
Read
Wait
Clock
Read transaction
I/O Interface CS510 Computer Architectures Lecture 17 - 19
Asynchronous HandshakeAsynchronous HandshakeAsynchronous HandshakeAsynchronous HandshakeWrite Transaction
t3: Master releases req
Address
Data
Read
Req.
Ack. 4 Cycle Handshake
t0 t1 t2 t3 t4 t5
Master Asserts Address Next Address
Master Asserts Data
t0: Master has obtained control and asserts address, direction, data; Waits a specified amount of time for slaves to decode target
t2: Slave asserts ack, indicating data receivedt1: Master asserts request line
t4: Slave releases ack
I/O Interface CS510 Computer Architectures Lecture 17 - 20
Asynchronous HandshakeAsynchronous HandshakeAsynchronous HandshakeAsynchronous Handshake
Time Multiplexed Bus: address and data share lines
Read Transaction
Address
Data
Read
Req
Ack
Master Asserts Address Next Address
t0 t1 t2 t3 t4 t5
4 Cycle Handshake
t0: Master has obtained control and asserts address, direction; Waits a specified amount of time for slaves to decode
t2: Slave asserts ack, indicating ready to transmit data;Slave asserts data
t3: Master releases req, data received
t1: Master asserts request line
t4: Slave releases ack
I/O Interface CS510 Computer Architectures Lecture 17 - 21
Bus ArbitrationBus ArbitrationBus ArbitrationBus ArbitrationParallel (Centralized) Arbitration
BR BG
M
BR BG
M
BR BG
M
Bus Arbiter
MBGi BGo
BRM
BGi BGo
BRM
BGi BGo
BR
BG
BR
A.U.
BR A
M
BR A
M
BR A
M
BR A
A.U.
Serial Arbitration (daisy chaining)
Polling
BR: Bus RequestBG: Bus Grant
Bus sequence:BR
BG
Busy
data
I/O Interface CS510 Computer Architectures Lecture 17 - 22
Bus OptionsBus OptionsBus OptionsBus Options
Bus width Separate address Multiplex address & data lines & data lines
Data width Wider is faster Narrower is cheaper (e.g., 32 bits) (e.g., 8 bits)
Transfer size Multiple words has Single-word transfer less bus overhead is simpler
Bus masters Multiple Single master (requires arbitration) (no arbitration)
Split Yes - separate No - continuous transaction?Request and Reply connection is cheaperpackets gets higher and has lower latencybandwidth(needs multiple masters)
Clocking Synchronous Asynchronous
Option High performance Low cost
I/O Interface CS510 Computer Architectures Lecture 17 - 23
1990 Bus Survey1990 Bus Survey1990 Bus Survey1990 Bus Survey
Signals 128 96 96 16 8Addr/Data mux no yes yes n/a n/aData width 16 - 32 32 32 16 8Masters multi multi multi single multiClocking Async Async Sync Async eitherMB/s 25 37 20 25 1.5 (asyn) (0ns, word) 5 (sync) 150ns word 12.9 15.5 10 = = 0ns block 27.9 95.2 40 = = 150ns block 13.6 20.8 13.3 = =Max devices 21 20 21 8 7Max meters 0.5 0.5 0.5 50 25Standard IEEE 1014 IEEE 896.1 ANSI/IEEE ANSI X3.129 ANSI X3.131
1296
VME FutureBus Multibus II IPI SCSI
I/O Interface CS510 Computer Architectures Lecture 17 - 24
VMEVMEVMEVME
• 3 96-pin connectors• 128 defined as standard, rest customer defined
– 32 address
– 32 data
– 64 command & power/ground lines
I/O Interface CS510 Computer Architectures Lecture 17 - 25
SCSI: Small Computer SCSI: Small Computer System InterfaceSystem Interface
SCSI: Small Computer SCSI: Small Computer System InterfaceSystem Interface
• Up to 8 devices to communicate on a bus or “string” at sustained speeds of 4-5 MBytes/sec
• SCSI-2 up to 20 MB/sec• Devices can be slave (“target”) or master(“initiator”)• SCSI protocol: a series of phases, during which specific actions are taken
by the controller and the SCSI disks– Bus Free: No device is currently accessing the bus– Arbitration: When the SCSI bus goes free, multiple devices may request (arbitra
te for) the bus; fixed priority by address– Selection: informs the target that it will participate (Reselection if disconnected)– Command: the initiator reads the SCSI command bytes from host memory and
sends them to the target– Data Transfer: data in or out, initiator: target– Message Phase: message in or out, initiator: target (identify, save/restore data
pointer, disconnect, command complete)– Status Phase: target, just before command complete
I/O Interface CS510 Computer Architectures Lecture 17 - 26
SCSI “Bus”: Channel SCSI “Bus”: Channel ArchitectureArchitecture
SCSI “Bus”: Channel SCSI “Bus”: Channel ArchitectureArchitecture
Command SetupArbitrationSelection
Message Out (Identify)CommandDisconnect to seek/¼ ll buf fer
Message In (Disconnect)- - Bus Free - -
ArbitrationReselection
Message In (Identify)
Data Transfer
Data In
Disconnect to ¼ ll buf ferMessage In (Save Data Ptr)
Message In (Disconnect)- - Bus Free - -
ArbitrationReselection
Message In (Identify)
Command CompletionStatus
Message In (Command Complete)
If no disconnect is needed
Completion
Message In (Restore Data Ptr)
peer-to-peer protocolsinitiator/targetlinear byte streamsdisconnect/reconnect
I/O Interface CS510 Computer Architectures Lecture 17 - 27
1993 I/O Bus Survey1993 I/O Bus Survey1993 I/O Bus Survey1993 I/O Bus Survey
Originator Sun DEC IBM Intel
Clock Rate (MHz) 16-25 12.5-25 async 33
Addressing Virtual Physical Physical Physical
Data Sizes (bits) 8,16,32 8,16,24,32 8,16,24,32,64 8,16,24,32,64
Master Multi Single Multi Multi
Arbitration Central Central Central Central
32 bit read (MB/s) 33 25 20 33
Peak (MB/s) 89 84 75 111 (222)
Max Power (W) 16 26 13 25
Bus SBus TurboChannel MicroChannel PCI
I/O Interface CS510 Computer Architectures Lecture 17 - 28
1993 MP Server Memory Bus 1993 MP Server Memory Bus SurveySurvey
1993 MP Server Memory Bus 1993 MP Server Memory Bus SurveySurvey
Originator HP SGI SunClock Rate (MHz) 60 48 66
Split transaction? Yes Yes Yes? Address lines 48 40 ??Data lines 128 256 144 (parity)Data Sizes (bits) 512 1024 512Clocks/transfer 4 5 4?Peak (MB/s) 960 1200 1056Master Multi Multi MultiArbitration Central Central CentralAddressing Physical Physical PhysicalSlots 16 9 10Busses/system 1 1 2Length 13 inches 12? inches 17 inches
Bus Summit Challenge XDBus
I/O Interface CS510 Computer Architectures Lecture 17 - 29
Communications NetworksCommunications NetworksCommunications NetworksCommunications NetworksPerformance limiter is memory system, OS overhead, not protocol
s
• Send/receive queues in processor memories• Network controller copies back and forth via DMA• No host intervention needed• Interrupt host when message sent or received
NodeProcessor
ControlReg. I/F
NetI/F Memory
RequestBlock
ReceiveBlock
Media
Network Controller
Peripheral Backplane Bus
DMA
. . .
Processor MemoryList of request blocks
Data to be transmitted
. . .List of receive blocks
Data receivedDMA
. . .List of free blocks
I/O Interface CS510 Computer Architectures Lecture 17 - 30
I/O Controller ArchitectureI/O Controller ArchitectureI/O Controller ArchitectureI/O Controller Architecture
Request/Response block interface Backdoor access to host memory
Peripheral Bus (VME, FutureBus, etc.)
HostMemory
ProcessorCache
HostProcessor
Peripheral Bus Interface/DMA
I/O Channel Interface
BufferMemory
ROM
µProc
I/O Controller
I/O Interface CS510 Computer Architectures Lecture 17 - 31
I/O Data FlowI/O Data FlowI/O Data FlowI/O Data Flow
I/O Device Head/Disk Assembly
DMA over Peripheral BusOS Buffers (>10 MByte)
Xfer over Serial InterfaceTrack Buffers (32K - 256KBytes)Embedded Controller
Memory-to-Memory CopyApplication Address Space
Host Processor
Xfer over Disk ChannelHBA Buffers (1 M - 4 MBytes) I/O Controller
Impediment to high performance: multiple copies,
complex hierarchy