parallel computer architectures chapter 8. parallel computer architectures (a) on-chip parallelism....

66
Parallel Computer Architectures Chapter 8

Post on 20-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Parallel Computer Architectures

Chapter 8

Page 2: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Parallel Computer Architectures

(a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor.

(d) A multicomputer. (e) A grid.

Page 3: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Parallelism

a) Introduced at various levels

b) Within CPU chip (multiple instructions per cycle)– Instruction level VLIW (Very Long Instruction word)– Superscalar – On Chip Multithreading– Single chip multiprocessors

c) Extra CPU boards (

d) Multiprocessor/Multicomputer

e) Computer grids

f) Tightly Coupled – computationally intimate

g) Loosely Coupled – computationally remote

Page 4: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Instruction-Level Parallelism

(a) A CPU pipeline. (b) A sequence of VLIW instructions. (c) An instruction stream with bundles marked.

Page 5: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

The TriMedia VLIW CPU (1)

A typical TriMedia instruction, showing five possible operations.

Page 6: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

The TriMedia VLIW CPU (2)

The TM3260 functional units, their quantity, latency, and which instruction slots they can use.

Page 7: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

The TriMedia VLIW CPU (3)

The major groups of TriMedia custom operations.

Page 8: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

The TriMedia VLIW CPU (4)

(a) An array of 8-bit elements. (b) The transposed array.

(c) The original array fetched into four registers.

(d) The transposed array in four registers.

Page 9: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Multithreading

a) Fine-grained multithreading– Run multiple threads one instruction from each– Will never stall if enough active threads– Requires hardware to track which instruction is from which thread

b) Coarse-grain multithreading– Run thread until stall and switch (one cycle wasted)

c) Simultaneous multithreading– Coarse grain with no cycle wasted

d) Hyperthreading– 5% increase in size give 25% gain– Resource sharing

• Partitioned full resource sharing• Threshold sharing

Page 10: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

On-Chip Multithreading (1)

(a) – (c) Three threads. The empty boxes indicated that the thread

has stalled waiting for memory. (d) Fine-grained multithreading.

(e) Coarse-grained multithreading.

Page 11: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

On-Chip Multithreading (2)

Multithreading with a dual-issue superscalar CPU. (a) Fine-grained multithreading.

(b) Coarse-grained multithreading. (c) Simultaneous multithreading.

Page 12: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Hyperthreading on the Pentium 4

Sharing between two thread white and gray

Resource sharing between threads in the

Pentium 4 NetBurst microarchitecture.

Page 13: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Single-Chip Multiprocessor

a) Two areas of interest servers and consumer electronics

b) Homogeneous chips– 2 piplines, one CPU– 2 CPU (same design)

c) Hetrogeneous chips– CPU’s for DVD player or CELL phones– More software => slower but cheaper– Many different cores (essentially libraries)

Page 14: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Sample chip

a) Cores on a chip for DVD player:– Control– MPEG video– Audio decoder– Video encoder– Disk controller– Cache

Cores require interconnect

IBM CoreConnect

AMBA Advanced Microcontroller Bus Architecture

VCI Virtual Component Interconnect

OCP-IP Open Core Protocol

Page 15: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Homogeneous Multiprocessors on a Chip

Single-chip multiprocessors.

(a) A dual-pipeline chip. (b) A chip with two cores.

Page 16: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Heterogeneous Multiprocessors on a Chip (1)

The logical structure of a simple DVD player contains a heterogeneous

multiprocessor containing multiple cores for different functions.

Page 17: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Heterogeneous Multiprocessors on a Chip (2)

An example of the IBM CoreConnect architecture.

Page 18: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Coprocessors

a) Come in a variety of sizes – Separate cabinets for mainframes– Separate boards– Separate chips

b) Primary purpose to offload work and assist main processor

c) Different types– I/O– DMA– Floating point– Network– Graphics– Encryption

Page 19: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Introduction to Networking (1)

How users are connected to servers on the Internet.

Page 20: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Networks

a) LAN – local area network

b) WAN – Wide area network

c) Packet – chunk of data on network 64-1500 bytes

d) Store-and-forward packet switching – what a router does

e) Internet – series of WAN’s linked by routers

f) ISP – Internet service provider

g) Firewall – specialized computer that filters traffic

h) Protocols – set of formats, exchange sequences, and rules

i) HTTP – HyperText Transfer Protocol

j) TCP – Transmission Control Protocol

k) IP – Internet protocol

Page 21: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Networks

a) CRC – Cyclic Redundancy Check

b) TCP Header – information about data for TCP level

c) IP header – routing header source, destination, hops

d) Ethernet Header Next hop address, address, CRC

e) ASIC – Application Specific Integrated Circuit

f) FPGA – Field programmable Gate Array

g) Network processor – programmable device that handles incoming and outgoing packets a wire speed

h) PPE – Protocol/Programmable/Packet Processing Engines

Page 22: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Introduction to Networking (2)

A packet as it appears on the Ethernet.

Page 23: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Introduction to Network Processors

A typical network processor board and chip.

Page 24: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Packet Processinga) Checksum verificationb) Field Extractionc) Packet Classificationd) Path Selectione) Destination network determinationf) Route Lookupg) Fragmentation and reassemblyh) Computation (compression/ encryption)i) Header Managementj) Queue managementk) Checksum generationl) Accountingm) Statistics gathering

Page 25: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Improving Performance

a) Performance is name of game.

b) How to measure it.– Packets per second– Bytes per second

c) Ways to speed up– Performance is not linear with clock speed– Introduce more PPE’s– Specialized processors– More internal busses– Widen existing busses– Replace SDRAM with SRAM

Page 26: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

The Nexperia Media Processor

The Nexperia heterogeneous multiprocessor on a chip.

Page 27: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Multiprocessors

(a) A multiprocessor with 16 CPUs sharing a common memory.

(b) An image partitioned into 16 sections, each being analyzed by a different CPU.

Page 28: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Shared-Memory Multiprocessors

a) Multiprocessor – has shared memory

b) SMP (Symetric Multiprocessor) – every multiprocessor can access any I/O device

c) Multicomputer – (distributed memory system) – each computer has it’s own memory

d) Multiprocessor – has one address space

e) Multicomputer has one address space per computer

f) Multicomputers pass messages to communicate

g) Ease or programming vs ease of construction

h) DSM – distributed shared memory page fault memory for distributed computers

Page 29: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Multicomputers (1)

(a) A multicomputer with 16 CPUs, each with its own private memory.

(b) The bit-map image of Fig. 8-17 split up among the 16 memories.

Page 30: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Multicomputers (2)

Various layers where shared memory can be implemented. (a) Thehardware. (b) The operating system. (c) The language runtime system.

Page 31: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Taxonomy of Parallel Computers (1)

Flynn’s taxonomy of parallel computers.

Page 32: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Taxonomy of Parallel Computers (2)

A taxonomy of parallel computers.

Page 33: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

MIMD categories

a) UMA – uniform memory access

b) NUMA – NonUniform Memory Access

c) COMA – Cache only memory Access

d) Multicomputers are NORMA ( No remote Memory Access)– MPP Massive Parallel processor

Page 34: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Consistency Models

a) How hardware and software will work with memory

b) Strict Consistency any read of location X returns the most recent value written to location X

c) Sequential Consistency – values will be returned in the order they are written (true order

d) Processor consistency – Writes by any CPU are seen in the order they are written

e) For every memory word, all CPU see all writes to it in the same order

f) Weak Consistency – no guarantee unless synchronization is used.

g) Release Consistency writes must occur before critical section is reentered.

Page 35: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Sequential Consistency

(a) Two CPUs writing and two CPUs reading a common memoryword. (b) - (d) Three possible ways the two writes and four

reads might be interleaved in time.

Page 36: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Weak Consistency

Weakly consistent memory uses synchronization operations todivide time into sequential epochs.

Page 37: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

UMA Symmetric Multiprocessor Architectures

Three bus-based multiprocessors. (a) Without caching. (b) With

caching. (c) With caching and private memories.

Page 38: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Cache as Cache can

a) Cache coherence protocol keep memory in maximum of one cache (eg. Write through)

b) Snooping cache monitor bus fir access to cache memory

c) Choose between update strategy or invalidate strategy

d) MESI protocol named after states– Invalid

Shared

Exclusive

Modified

Page 39: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Snooping Caches

The write through cache coherence protocol. The empty boxes indicate that no action is taken.

Page 40: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

The MESI Cache Coherence Protocol

The MESI cache coherence protocol.

Page 41: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

UMA Multiprocessors Using Crossbar Switches

(a) An 8 × 8 crossbar switch. (b) An open crosspoint. (c) A closed crosspoint.

Page 42: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

UMA Multiprocessors Using Multistage Switching Networks (1)

(a) A 2 × 2 switch.

(b) A message format.

Page 43: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

UMA Multiprocessors Using Multistage Switching Networks (2)

An omega switching network.

Page 44: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

NUMA Multiprocessors

A NUMA machine based on two levels of buses. The Cm* was

the first multiprocessor to use this design.

Page 45: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Cache Coherent NUMA Multiprocessors

(a) A 256-node directory-based multiprocessor. (b) Division of a 32-bit

memory address into fields. (c) The directory at node 36.

Page 46: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

The Sun Fire E25K NUMA Multiprocessor (1)

The Sun Microsystems E25K multiprocessor.

Page 47: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

The Sun Fire E25K NUMA Multiprocessor (2)

The SunFire E25K uses a four-level interconnect. Dashed linesare address paths. Solid lines are data paths.

Page 48: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Message-Passing Multicomputers

A generic multicomputer.

Page 49: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Topology

Various topologies. The heavy dots represent switches. The CPUsand memories are not shown. (a) A star. (b) A complete interconnect.

(c) A tree. (d) A ring. (e) A grid. (f) A double torus. (g) A cube. (h) A 4D hypercube.

Page 50: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

BlueGene (1)

The BlueGene/L custom processor chip.

Page 51: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

BlueGene (2)

The BlueGene/L. (a) Chip. (b) Card. (c) Board. (d) Cabinet. (e) System.

Page 52: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Red Storm (1)

Packaging of the Red Storm components.

Page 53: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Red Storm (2)

The Red Storm system as viewed from above.

Page 54: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

A Comparison of BlueGene/L and Red Storm

A comparison of

BlueGene/L and Red Storm.

Page 55: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Google (1)

Processing of a Google query.

Page 56: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Google (2)

A typical Google

cluster.

Page 57: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Scheduling

Scheduling a cluster. (a) FIFO. (b) Without head-of-line blocking. (c) Tiling. The shaded areas indicate idle CPUs.

Page 58: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Distributed Shared Memory (1)

A virtual address space consisting of 16 pages

spread over four nodes of a multicomputer.

(a) The initial situation. ….

Page 59: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Distributed Shared Memory (2)

A virtual address space consisting of 16 pages

spread over four nodes of a multicomputer. …

(b) After CPU 0 references page 10. …

Page 60: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Distributed Shared Memory (3)

A virtual address space consisting of 16 pages

spread over four nodes of a multicomputer. …

(c) After CPU 1 references page 10, here assumed to be a read-only page.

Page 61: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Linda

Three Linda tuples.

Page 62: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Orca

A simplified ORCA stack object, with internal data and two operations.

Page 63: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Software Metrics (1)

Real programs achieve less than the perfect speedup indicated by the dotted line.

Page 64: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Software Metrics (2)

(a) A program has a sequential part and a parallelizable part. (b) Effect of running part of the program in parallel.

Page 65: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Achieving High Performance

(a) A 4-CPU bus-based system. (b) A 16-CPU bus-based system.

(c) A 4-CPU grid-based system. (d) A 16-CPU grid-based system.

Page 66: Parallel Computer Architectures Chapter 8. Parallel Computer Architectures (a) On-chip parallelism. (b) A coprocessor. (c) A multiprocessor. (d) A multicomputer

Grid Computing

The grid layers.