computer science and engineering advanced computer architecture cse 8383 february 21 2008 session 6

56
Computer Science and Engineering Advanced Computer Advanced Computer Architecture Architecture CSE 8383 CSE 8383 February 21 2008 February 21 2008 Session 6 Session 6

Upload: lizbeth-todd

Post on 18-Jan-2016

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Advanced Computer Advanced Computer ArchitectureArchitecture

CSE 8383CSE 8383

February 21 2008February 21 2008

Session 6Session 6

Page 2: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Contents

Interconnection Networks (cont.) Static (cont.) Dynamic

Performance Evaluations Grosch’s Law Moore’s Law Von Neumann’s Bottlneck Speedup Amdahl’s Law The Gustafson-Barsis Law

Page 3: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Hypercubes

N = 2d

d dimensions (d = log N) A cube with d dimensions is made out of 2

cubes of dimension d-1 Symmetric Degree, Diameter, Cost, Fault tolerance Node labeling – number of bits

Page 4: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Hypercubes

d = 0 d = 1 d = 2 d = 3

0

1

0100

1110

000

001

100 110

111

011

101

010

Page 5: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Hypercubes

1110 1111

1010 1011

0110 0111

0010 0011

1101

1010

1000 1001

0100 0101

0010

0000 0001

S

d = 4

Page 6: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Hypercube of dimension d

N = 2d d = log n

Node degree = d

Number of bits to label a node = d

Diameter = d

Number of edges = n*d/2

Hamming distance!

Routing

Page 7: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Subcubes and Cube Fragmentation

What is a subcube? Shared Environment Fragmentation Problem Is it Similar to something you know?

Page 8: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Cube Connected Cycles (CCC)

k-cube 2k nodes k-CCC from k-cube, replace each

vertex of the k cube with a ring of k nodes

K-CCC k* 2k nodes Degree, diameter 3, 2k Try it for 3-cube

Page 9: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

K-ary n-Cube

d = cube dimension K = # nodes along each dimension N = kd

Wraparound Hupercube binary d-cube Tours k-ary 2-cube

Page 10: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Analysis and performance metricsstatic networks

Network Degree(d) Diameter(D) Cost SymmetryWorst delay

CCNs N-1 1 N(N-1)/2 Yes 1

Linear Array 2 N-1 N-1 No N

Binary Tree 3 2(log2N –1) N-1 No log2N

n-cube log2N log2N nN/2 Yes log2N

2D-Mesh 4 2(n-1) 2(N-n) No N

K-ary n-cube 2n nk/2 nN Yes K x log2N

Page 11: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Dynamic INDynamic IN

Page 12: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Bus Based IN

Global Memory

P

Global Memory

P

C

P

C

P

C

P P

Page 13: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Dynamic Interconnection Networks

Communication patterns are based on program demands

Connections are established on the fly during program execution

Multistage Interconnection Network (MIN) and Crossbar

Page 14: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Switch Modules

A x B switch module A inputs and B outputs In practice, A = B = power of 2

Each input is connected to one or more outputs (conflicts must be avoided)

One-to-one (permutation) and one-to-many are allowed

Page 15: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Binary Switch

2x2Switch

Legitimate States = 4

Permutation Connections = 2

Page 16: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Legitimate Connections

Straight Exchange

Upper-broadcast

Lower-broadcast

The different setting of the 2X2 SE

Page 17: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Group Work

General Case ??

Page 18: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Multistage Interconnection Networks

ISC1ISC1 ISC2ISC2 ISCnISCn

switches switches switches

ISC Inter-stage Connection Patterns

Page 19: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Perfect-Shuffle Routing Function

Given x = {an, an-1, …, a2, a1}

P(x) = {an-1, …, a2, a1 , an}

X = 110001

P(x) = 100011

Page 20: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Perfect Shuffle Example

000 000

001 010

010 100

011 110

100 001

101 011

110 101

111 111

Page 21: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Perfect-Shuffle

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

Page 22: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Exchange Routing Function

Given x = {an, an-1, …, a2, a1}

Ei(x) = {an, an-1, …, ai, …, a2, a1}

X = 0000000

E3(x) = 0000100

Page 23: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Exchange E1

000 001

001 000

010 011

011 010

100 101

101 100

110 111

111 110

Page 24: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Exchange E1

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

Page 25: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Butterfly Routing Function

Given x = {an, an-1, …, a2, a1}

B(x) = {a1 , an-1, …, a2, an}

X = 010001

P(x) = 110000

Page 26: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Butterfly Example

000 000

001 100

010 010

011 110

100 001

101 101

110 011

111 111

Page 27: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Butterfly

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

Page 28: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Multi-stage network

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

Page 29: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

MIN (cont.)

1

2

3

4

5

6

7

8

9

10

11

12

001

010

011

100101

110

111

000000

001

010

011

100

101

110

111

An 8X8 Banyan network

Page 30: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Min Implementation

Control (X)

Source (S) Destination (D)

X = f(S,D)

Page 31: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Example

X = 0 X = 1

(crossed) (straight)

A

B

C

D

A

B

C

D

Page 32: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Consider this MIN

S1

S2

S3

S4

S5

S6

S7

S8

D1

D2

D3

D4

D5

D6

D7

D8

stage 1 stage 2 stage 3

Page 33: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Example (Cont.)

Let control variable be X1, X2, X3 Find the values of X1, X2, X3 to connect:

S1 D6 S7 D5 S4 D1

Page 34: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

The 3 connections

S1

S2

S3

S4

S5

S6

S7

S8

D1

D2

D3

D4

D5

D6

D7

D8

stage 1 stage 2 stage 3

Page 35: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Boolean Functions

X = x1, x2, x3

S = s2, s2, s3

D = d1, d2, d3

Find X = f(S,D)

Page 36: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Crossbar Switch

M1 M2 M3 M4 M5 M6 M7 M8

P1

P2

P3

P4

P5

P6

P7

P8

Page 37: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Analysis and performance metricsdynamic networks

Networks Delay Cost Blocking Degree of FT

Bus O(N) O(1) Yes 0

Multiple-bus O(mN) O(m) Yes (m-1)

MIN O(logN) O(NlogN) Yes 0

Crossbar O(1) O(N2) No 0

Page 38: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Performance Evaluations

Page 39: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Grosch’s Law (1960s)

“To sell a computer for twice as much, it must be four times as fast”

Vendors skip small speed improvements in favor of waiting for large ones

Buyers of expensive machines would wait for a twofold improvement in performance for the same price.

Page 40: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Moore’s Law

Gordon Moore (cofounder of Intel) Processor performance would double every 18

months This prediction has held for several decades Unlikely that single-processor performance continues

to increase indefinitely

Page 41: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Von Neumann’s bottleneck

Great mathematician of the 1940s and 1950s Single control unit connecting a memory to a

processing unit Instructions and data are fetched one at a time from

memory and fed to processing unit Speed is limited by the rate at which instructions and

data are transferred from memory to the processing unit.

Page 42: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Past Trends in Parallel Architecture (inside the box)

Completely custom designed components (processors, memory, interconnects, I/O)

Longer R&D time (2-3 years) Expensive systems Quickly becoming outdated

– Bankrupt companies!!

Page 43: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Current Trends in Parallel Architecture (outside the box) -- before multicore!!

Advances in commodity processors and network technology

Network of PCs and workstations connected via LAN or WAN forms a Parallel System

Network Computing Compete favorably (cost/performance) Utilize unused cycles of systems sitting idle

Page 44: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Speedup

S = Speed(new) / Speed(old)

S = Work/time(new) / Work/time(old)

S = time(old) / time(new)

S = time(before improvement) /

time(after improvement)

Page 45: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Speedup

Time (one CPU): T(1)

Time (n CPUs): T(n)

Speedup: S

S = T(1)/T(n)

Page 46: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Two Important Laws Influenced Parallel Computing

Page 47: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Argument Against Massively Parallel Processing. Gene Amdahl, 1967.

For over a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of multiplicity of computers in such a manner as to permit cooperative solution .. The nature of this overhead (in parallelism) appears to be sequential so that it is unlikely to be amenable to parallel processing techniques. Overhead alone would then place an upper limit on throughput of five to seven times the sequential processing rate, even if the housekeeping were done in a separate processor… At any point in time it is difficult to foresee how the previous bottlenecks in a sequential computer will be effectively overcome.

Page 48: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

What does that mean?

The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode cannot be used.

Unparallelizable part of the code severely limits the Unparallelizable part of the code severely limits the speedupspeedup.

Page 49: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Walk 4 miles /hourBike 10 miles / hourCar-1 50 miles / hourCar-2 120 miles / hourCar-3 600 miles /hour

200 miles

20 hours

A Bmust walk

Trip Analogy

Page 50: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Speedup Analysis

(4 miles /hour) Time = 70 hours

(10 miles / hour) Time = 40 hours

(50 miles / hour) Time = 24 hours

(120 miles / hour) Time = 21.67 hours

S = 1.8

S = 2.9

S = 3.2

S = 3.4(600 miles /hour) Time = 20.33 hours

Page 51: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

S = T(1)/T(N)

T(N) = T(1) + T(1)(1- )

N

S = 1

+ (1- )

N

=N

N + (1- )

: The fraction of the program that is naturally serial

(1- ): The fraction of the program that is naturally parallel

Amdahl’s Law

Page 52: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

10% 20% 30% 40% 50% 60% 70% 80% 90% 99%

0

5

10

15

20

25Speedup

% Serial

1000 CPUs16 CPUs4 CPUs

Amdahl’s Law

Page 53: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Gustafson – Barsis Law (1988)

Gordon Bell Prize Overcoming the conceptual barrier established by

Amdahl’s law Scale the problem to the size of the parallel system No fixed size problem

: The fraction of the program that is naturally serial

T(N) = 1T(1) = + (1- ) NS = N – (N-1)

Page 54: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

0

20

40

60

80

100

10% 20% 30% 40% 50% 60% 70% 80% 90% 99%

% Serial

Speedup

Gustafson-Barsis

Amdhal

Amdahl vs. Gustafson-Barsis

Page 55: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Data Parallelism – Scale up

Parallelism is in the data, not the control portion of the application

Problem size scales up to the size of the system

Data Parallelism is to the 1990’s what vector parallelism was to the 1970’s

Supercomputer data parallel

Page 56: Computer Science and Engineering Advanced Computer Architecture CSE 8383 February 21 2008 Session 6

Computer Science and Engineering

Problem

Assume that a switching component such as a transistor can switch in zero time. We propose to construct a disk-shaped computer chip with such a component. The only limitation is the time it takes to send electronic signals from one edge of the chip to the other. Make the simplifying assumption that electronic signals travel 300,000 kilometers per second. What must be the diameter of a round chip so that it can switch 109 times per second? What would the diameter be if the switching requirements were 1012 time per second?