eecs 252 graduate computer architecture lec 01 - introduction · growth in data-intensive...
TRANSCRIPT
![Page 1: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/1.jpg)
Multi-Processor Systems
Roberto Airoldi Department of Computer Systems Tampere University of Technology
TG 307
![Page 2: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/2.jpg)
Lecture outline
Why Multi-processor systems are needed? Basics of MP design strategies - memory hierarchy - Homogeneous or Heterogeneous - Communication infrastructure - … Few words on the research work on our research group.
Multi-Processor Systems 2
![Page 3: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/3.jpg)
Uniprocessor Performance (SPECint)
1
10
100
1000
10000
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
Perfo
rman
ce (v
s. V
AX-1
1/78
0)
25%/year
52%/year
??%/year
Multi-Processor Systems 3
From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006
![Page 4: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/4.jpg)
Other Factors ⇒ Multiprocessors
Growth in data-intensive applications – Data bases, file servers, …
Growing interest in servers, server performance Increasing desktop performance less important
– Outside of graphics Improved understanding in how to use multiprocessors effectively
– Especially server where significant natural TLP Advantage of leveraging design investment by replication
– Rather than unique design
Multi-Processor Systems 4
![Page 5: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/5.jpg)
Flynn’s Taxonomy
Flynn classified by data and control streams in 1966
SIMD ⇒ Data Level Parallelism MIMD ⇒ Thread Level Parallelism MIMD popular because
– Flexible: N pgms and 1 multithreaded pgm – Cost-effective: same MPU in desktop & MIMD
Multi-Processor Systems 5
Single Instruction Single Data (SISD) (Uniprocessor)
Single Instruction Multiple Data SIMD (single PC: Vector, CM-2)
Multiple Instruction Single Data (MISD) (????)
Multiple Instruction Multiple Data MIMD (Clusters, SMP servers)
M.J. Flynn, "Very High-Speed Computers", Proc. of the IEEE, V 54, 1900-1909, Dec. 1966.
![Page 6: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/6.jpg)
Back to Basics
“A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.” Parallel Architecture = Computer Architecture + Communication Architecture 2 classes of multiprocessors WRT memory:
– Centralized Memory Multiprocessor • < few dozen processor chips (and < 100 cores) in 2006 • Small enough to share single, centralized memory
– Physically Distributed-Memory multiprocessor • Larger number chips and cores • BW demands ⇒ Memory distributed among processors
Multi-Processor Systems 6
![Page 7: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/7.jpg)
Centralized vs. Distributed Memory
Multi-Processor Systems 7
P 1
$
Inter connection network
$
P n
Mem Mem
P 1
$
Inter connection network
$
P n
Mem Mem
Centralized Memory Distributed Memory
Scale
![Page 8: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/8.jpg)
Centralized Memory Multiprocessor Also called symmetric multiprocessors (SMPs) because single main memory has a symmetric relationship to all processors Large caches ⇒ single memory can satisfy memory demands of small number of processors Can scale to a few dozen processors by using a switch and by using many memory banks Although scaling beyond that is technically conceivable, it becomes less attractive as the number of processors sharing centralized memory increases
Multi-Processor Systems 8
![Page 9: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/9.jpg)
Distributed Memory Multiprocessor Pro: Cost-effective way to scale memory bandwidth
– If most accesses are to local memory Pro: Reduces latency of local memory accesses Con: Communicating data between processors more complex Con: Must change software to take advantage of increased memory BW
Multi-Processor Systems 9
![Page 10: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/10.jpg)
2 Models for Communication and Memory Architecture Communication occurs by explicitly passing messages among the processors:
– message-passing multiprocessors Communication occurs through a shared address space (via loads and stores): shared memory multiprocessors either
– UMA (Uniform Memory Access time) for shared address, centralized memory MP
– NUMA (Non Uniform Memory Access time multiprocessor) for shared address, distributed memory MP
In past, confusion whether “sharing” means sharing physical memory (Symmetric MP) or sharing address space
Multi-Processor Systems 10
![Page 11: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/11.jpg)
Challenges of Parallel Processing
First challenge is % of program inherently sequential Suppose 80X speedup from 100 processors. What fraction of original program can be sequential?
Multi-Processor Systems 11
( )
( )
( )
%75.992.79/79Fraction
Fraction8.0Fraction8079
1)100
Fraction Fraction 1(80
100Fraction
Fraction 1
1 08
SpeedupFraction Fraction 1
1 Speedup
parallel
parallelparallel
parallelparallel
parallelparallel
parallel
enhancedenhanced
overall
==
×−×=
=+−×
+−=
+−=
![Page 12: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/12.jpg)
Challenges of Parallel Processing
Second challenge is long latency to remote memory Suppose 32 CPU MP, 2GHz, 200 ns remote memory, all local accesses hit memory hierarchy and base CPI is 0.5. (Remote access = 200/0.5 = 400 clock cycles) What is performance impact if 0.2% instructions involve remote access? CPI = Base CPI + Remote request rate x Remote request cost CPI = 0.5 + 0.2% x 400 = 0.5 + 0.8 = 1.3 No communication is 1.3/0.5 = 2.6X faster than 0.2% instructions involve remote access
Multi-Processor Systems 12
![Page 13: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/13.jpg)
Challenges of Parallel Processing
Application parallelism ⇒ primarily via new algorithms that have better parallel performance Long remote latency impact ⇒ both by architect and by the programmer For example, reduce frequency of remote accesses either by
– Caching shared data (HW) – Restructuring the data layout to make more accesses local (SW)
Multi-Processor Systems 13
![Page 14: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/14.jpg)
Fundamental MP design decisions
We have already discussed: Shared memory versus Message passing ? Other decisions: Homogeneous versus Heterogeneous ? Bus versus Network ? Generic versus Application specific ? What types of parallelism to support ? Focus on Performance, Power or Cost ? Memory organization ?
Multi-Processor Systems
![Page 15: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/15.jpg)
Homogeneous or Heterogeneous
Homogeneous: – replication effect – memory dominated any way – solve realization issues
once and for all – less flexible
Heterogeneous – better fit to application domain – smaller increments
Multi-Processor Systems
![Page 16: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/16.jpg)
Hybrid approaches
Middle of the road approach Flexible tiles Fixed tile structure at top level
Multi-Processor Systems
![Page 17: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/17.jpg)
Shared vs. Switched Media
Shared media: nodes share a single interconnection medium (e.g., Ethernet, Bus)
– Only one message sent at a time – Inexpensive: one medium used by all processors – Limited bandwidth: medium becomes the bottleneck – Needs arbitration
Switched media: allow direct communication between source and destination nodes (e.g., ATM)
– Multiple users at a time – More expensive: need to replicate medium – Higher total bandwidth – No arbitration – Added latency to go through the switch – Claimed to be more scalable
• but router overhead
Multi-Processor Systems
![Page 18: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/18.jpg)
Which types off parallelism to support?
Kernel level ILP
DLP Module level
Program/ Thread level (TLP)
Application level
Multi-Processor Systems
![Page 19: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/19.jpg)
Heterogeneity
Kernel level
Heterogeneous
Homogeneous Module level
Very heterogeneous Application
level
Multi-Processor Systems
![Page 20: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/20.jpg)
Example MP supporting DLP and ILP: IMAGINE Stream Processor
Multi-Processor Systems
![Page 21: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/21.jpg)
Focus
Low cost ? Low power ? High performance ? Depends on platform class
Multi-Processor Systems
![Page 22: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/22.jpg)
Intrinsic computational efficiency of silicon
i386SX i486DX P5 68040
microsparc Super sparc
601 604
Ultra sparc
P6
604e 21164a
Turbosparc 604e 21364
7400
106
105
104
103
102
101
100 2 1 0.5 0.25 0.13 0.07
Computational efficiency
[MOPS/W]
Feature size [µm]
Intrinsic computational efficiency of silicon
Ref: Roza
C6411
Itanium2
XScale
C5510 Xtensa
Multi-Processor Systems
Computational efficiency of microprocessors
![Page 23: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/23.jpg)
Problem Energy & Performance Memories
– Bandwidth • Slow (memory wall)
– Energy Expensive • RISC: 27% of power in Instruction Cache
– StrongARM SA110: A 160MHz 32b 0.5W CMOS ARM processor • VLIW: 30% of power in Instruction Cache
– C6x, 0.15um, 250MHz, Texas Instruments Inc. – Lx, HP & STMicroelectronics, L. Benini et.al (ISLPED)
Multi-Processor Systems
![Page 24: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/24.jpg)
Wiring hierarchy
How far can signal reach in one local clock cycle (local frequency) ?
0.0
0.5
1.0
1.5
2.0
2.5
3.0
40 50 60 70 80 90 100
Technology node
Line
leng
th (m
m)
Local line length
Intermediate linelengthGlobal linelength
Multi-Processor Systems
![Page 25: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/25.jpg)
INTERCONNECTION NETWORKS
Multi-Processor Systems
![Page 26: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/26.jpg)
Network design parameters
Large network design space: – topology, degree – routing algorithm
• path, path control, collision resolution, network support, deadlock handling, livelock handling
– virtual layer support – flow control – buffering – QoS guarantees – error handling – etc, etc.
Multi-Processor Systems
![Page 27: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/27.jpg)
Network-on-Chip
Some common characteristics of NoC – more than a single shared medium (like bus)
• a true network – provides a point-to-point connection between any two hosts
connected to the network • real point-to-point by a crossbar switch (or equivalent) • virtual point-to-point connection
– higher aggregate bandwidth due to parallelism • many concurrent packet flows or parallel circuit paths
– clearly separates communication from computation – layered approach is used for the network implementation – practical implication from the above points is that the network
contains storage elements for the data • implicit pipelining
Multi-Processor Systems
Additional reading: J. Nurmi, ”Network-on-Chip – A New Paradigm for System-on-Chip Design,”
in Proc. International Symposium on System-on-Chip, Nov. 2005, pp. 2-6.
![Page 28: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/28.jpg)
Protocol layers
Multi-Processor Systems
![Page 29: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/29.jpg)
Circuit-switching and packet-switching Circuit-switching
– A connection has to be established between the communicating parties before the transfer can start
– The line is reserved for the whole time independently of the amount of traffic (like the phone line is reserved whether you are silent or talk at your maximum rate)
– The connection will be explicitly closed when there is no more need for the communication
Packet-switching – The data is packed to carry also information about the destination
address (like a letter or packet in mail) – Separate packets or sub-packets (flits or phits) can be sent along
different routes (if the networks allows that) – Also other senders can inject packets to the network to travel along
(partially) the same route
Multi-Processor Systems
![Page 30: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/30.jpg)
Nodes, switches, resources
A node or a switch – Routes the packet along the network – Often includes some buffering (FIFOs) – The name ’switch’ refers to a crossbar switch connecting any of the
inputs to any of the outputs – Different kinds of switch architectures can be used in NoC – Often connects also the RESOURCES to the network
A resource – processor – memory – fixed-function or reconfigurable block
Multi-Processor Systems
![Page 31: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/31.jpg)
Node characteristics – examples
3 x 3 switch – This size of a crossbar is suitable for connecting a resource to a
bidirectional ring
5 x 5 switch – This is the typical size for Mesh networks, four nearest neighbors +
the local resource
16 x 16 switch – From this size upwards, the switches start to absorb the whole
network operation into a single centralized switch – The switch is the network!
Multi-Processor Systems
![Page 32: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/32.jpg)
Network Topology
Switched media have a topology that indicate how nodes are connected Topology determines
– Degree: number of links from a node – Diameter: max number of links crossed between nodes – Average distance: number of links to random destination – Bisection: minimum number of links that separate the network into
two halves – Bisection bandwidth: link bandwidth x bisection
Multi-Processor Systems
![Page 33: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/33.jpg)
Common Topologies
Multi-Processor Systems
Type Degree Diameter Ave Dist Bisection
1D mesh 2 N-1 N/3 1
2D mesh 4 2(N1/2 - 1) 2N1/2 / 3 N1/2
3D mesh 6 3(N1/3 - 1) 3N1/3 / 3 N2/3
nD mesh 2n n(N1/n - 1) nN1/n / 3 N(n-1) / n
Ring 2 N/2 N/4 2
2D torus 4 N1/2 N1/2 / 2 2N1/2
Hypercube Log2N n=Log2N n/2 N/2
2D Tree 3 2Log2N ~2Log2 N 1
Crossbar N-1 1 1 N2/2
N = number of nodes, n = dimension
![Page 34: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/34.jpg)
Network topologies - Mesh
Two-dimensional mesh – very intuitive, indeed
Multi-Processor Systems
resources
nodes
links
![Page 35: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/35.jpg)
Network topologies - Torus
Torus and folded torus extend from mesh – folded torus has 2x length of the links compared to mesh!
Multi-Processor Systems
![Page 36: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/36.jpg)
Network topologies - Ring
Ring is the simplest of the networks – Unidirectional or bidirectional – Does not correspond to the physical placement of resources!
Multi-Processor Systems
![Page 37: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/37.jpg)
Network topologies – Octacon and Spidercon Extend the ring by introducing shortcuts
– Octacon 4 diagonals – Spidercon N diagonals
Multi-Processor Systems
![Page 38: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/38.jpg)
Routing schemes
Deterministic vs. adaptive – deterministic routing is fixed (by routing table or information in the
packet) – adaptive routing can adapt to e.g. network congestion
Packet handling property – store-and-forward
• the whole packet is always stored in a node – virtual cut-through
• the routing further may start if there is enough room in the next node
– wormhole routing • the flit will be routed further immediately
Multi-Processor Systems
![Page 39: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/39.jpg)
Network performance
020406080
100120140160
10 30 50 70 90
Load %
Throughput
Latency
Throughput
– throughput/input – aggregate throughput – bisection throughput
Latency
– time from data injection to readout completion
Inter-related and function of network loading!
Multi-Processor Systems
![Page 40: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/40.jpg)
Quality-of-Service (QoS)
Normally packet-switched networks provide best effort throughput to all traffic
– ”the mileage may vary...” For some services, guaranteed service is needed
– Predictable delays – Guaranteed throughput
This will be handled by
– circuit-switched channel – time-multiplexing (and giving a certain amount of time slots to the
priority service) – can be approximated by assigning packet priority (may lead to
starvation of other services)
Multi-Processor Systems
![Page 41: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/41.jpg)
Fault and error tolerance
More permanent manufacturing defects in large complex circuits More soft errors due to lower error margins More dynamic transient faults due to crosstalk and other noise on-chip For error tolerance error detection and correction methods from telecommunications can be used to some extent
– must not cause too much overhead in area and power – resending or error correction codes???
Fault tolerance may cause too much overhead, either – network provides natural redundancy for fixing faults
Multi-Processor Systems
![Page 42: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/42.jpg)
Syncronous, asynchronous, GALS Single-clock SoC not practical Single-clock NoC (communication part) may be... However, synchronization between clock domains needed anyway GALS (Globally Asynchronous Locally Synchronous) is a natural mode of operation for large NoCs with many clock domains connected
– synchronization carried out between the network and the synchronous resource
– a place for trade-offs: where to place the A/S border??? Asynchronous functional blocks rare, but the network can be fully asynchronous (with synchronizing wrappers)
Multi-Processor Systems
![Page 43: EECS 252 Graduate Computer Architecture Lec 01 - Introduction · Growth in data-intensive applications – Data bases, file servers, … Growing interest in servers, server performance](https://reader035.vdocuments.mx/reader035/viewer/2022081407/5f1d411183123d1d6f7fc8e6/html5/thumbnails/43.jpg)
Our Research group
Prof. Jari Nurmi leads the group Research areas: Computer Architecture: Networks-on-Chip Multi-Processor architecture design Reconfigurable hardware Implementation f signal processing for Communications Navigation and Positioning: GNSS receivers Indoor positioning
Multi-Processor Systems 43