cs/ece 552: i/osinclair/courses/cs552/spring2020/hando… · cs/ece 552: i/o prof. matthew d....

54
CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David Wood, Guri Sohi, John Shen, Joshua San Miguel, and Jim Smith

Upload: others

Post on 24-Jun-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

CS/ECE 552: I/O

Prof. Matthew D. Sinclair

Lecture notes based in part on slides created by MikkoLipasti, Mark Hill, David Wood, Guri Sohi, John Shen,

Joshua San Miguel, and Jim Smith

Page 2: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Announcements 4/6

• HW5 Due Friday 4/10

– H&P 5.9, 5.12 may be helpful

– 5.12 posted on Canvas

– Goal: get feedback back ASAP for Phase 2.3

• Project Phase 2.3 due 4/17

• Optionally work on Phases 2.1 and 2.2

– Will make Phase 3 easier

2

Page 3: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Memory Hierarchy

3

L1-I cache L1-D cache

unified I/D L2 cache

main memory

Page 4: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Input/Output

4

L1-I cache L1-D cache

unified I/D L2 cache

main memory I/O I/O I/O

Page 5: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Reliability: RAID• Error correction: more important for disk than for memory

– Error correction/detection per block (handled by disk hardware)

– Mechanical disk failures (entire disk lost) most common failure mode

• Many disks means high failure rates

• Entire file system can be lost if files striped across multiple disks

• RAID (redundant array of inexpensive disks)– Add redundancy

– Similar to DRAM error correction, but…

– Major difference: which disk failed is known

• Even parity can be used to recover from single failures

• Parity disk can be used to reconstruct data faulty disk

– RAID design balances bandwidth and fault-tolerance

– Implemented in hardware (fast, expensive) or software

5

Page 6: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Levels of RAID - Summary• RAID-0 - no redundancy

– Multiplies read and write bandwidth

• RAID-1 - mirroring

– Pair disks together (write both, read one)

– 2x storage overhead

– Multiples only read bandwidth (not write bandwidth)

• RAID-3 - bit-level parity (dedicated parity disk)

– N+1 disks, calculate parity (write all, read all)

– Good sequential read/write bandwidth, poor random accesses

– If N=8, only 13% overhead

• RAID-4/5 - block-level parity

– Reads only data you need

– Writes require read, calculate parity, write data & parity

• RAID-6 – diagonal parity

6

Page 7: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

RAID-3: Bit-level parity • RAID-3 - bit-level parity

– dedicated parity disk

– N+1 disks, calculate parity (write all, read all)

– Good sequential read/write bandwidth, poor random accesses

– If N=8, only 13% overhead

© 2003 Elsevier Science7

Page 8: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

RAID 4/5 - Block-level Parity

© 2003 Elsevier Science

• RAID-4/5

– Reads only data you need

– Writes require read, calculate parity, write data&parity

• Naïve approach

1. Read all disks

2. Calculate parity

3. Write data&parity

• Better approach

– Read data&parity

– Calculate parity

– Write data&parity

• Still worse for small writesthan RAID-3

8

Page 9: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

From the original paper:

9

Page 10: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

RAID-4 vs RAID-5• RAID-5 rotates the parity disk, avoid single-disk bottleneck

© 2003 Elsevier Science

10

Page 11: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

In color: RAID 4 vs. RAID 5

11Source: Wikipedia

Page 12: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

In color: RAID 6

12Source: Wikipedia

Page 13: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Input/Output

• How to communicate from host processor to I/O device?

– Memory-mapped I/O vs. ISA commands for I/O

13

virtual memory space

memory-mapped I/O register

instruction set

read/write I/O register

Page 14: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Input/Output

• How to communicate from I/O device to host processor?

– Polling vs. interrupts

14

L1-I cache L1-D cache

unified I/D L2 cache

main memory I/O I/O I/O

poll (read) I/O register

interrupt processor(similar to exceptions)

Page 15: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Designing an I/O System for Bandwidth

• Approach

– Find bandwidths of individual components

– Configure components you can change…

– To match bandwidth of bottleneck component you can’t

• Example

– Parameters

• 300 MIPS CPU, 100 MB/s I/O bus

• 50K OS insns + 100K user insns per I/O operation

• SCSI-2 controllers (20 MB/s): each accommodates up to 7 disks

• 5 MB/s disks with tseek + trotation = 10 ms, 64 KB reads

– Determine

• What is the maximum sustainable I/O rate?

• How many SCSI-2 controllers and disks does it require?

– Assuming random reads

15

Page 16: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

• First: determine I/O rates of components we can’t change

• Second: configure remaining components to match rate

16

Designing an I/O System for Bandwidth

Page 17: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

• First: determine I/O rates of components we can’t change

– CPU: (300M insn/s) / (150K Insns/IO) = 2000 IO/s

– I/O bus: (100M B/s) / (64K B/IO) = 1562 IO/s

– Peak I/O rate determined by bus: 1562 IO/s

• Second: configure remaining components to match rate

– Disk: 1 / [10 ms/IO + (64K B/IO) / (5M B/s)] = 43.9 IO/s

– How many disks?

• (1562 IO/s) / (43.9 IO/s) = 36 disks

– How many controllers?

• For 100MB/s we need five 20MB/s controllers

• But each can only hold 7 disks, 7*5 = 35

• So, we need six SCSI-2 controllers

• Caveat: real I/O systems modeled with simulation

17

Designing an I/O System for Bandwidth

Page 18: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

CS/ECE 552: Parallel Processors (Part 1)

Prof. Matthew D. Sinclair

Lecture notes based in part on slides created by MikkoLipasti, Mark Hill, David Wood, Guri Sohi, John Shen,

Joshua San Miguel, and Jim Smith

Page 19: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Announcements 4/16

• Project Phase 2.3 due tomorrow

– Posted trace of randbench

• Phase 3 next

– Integrate Phase 2.3 into Phase 2

– Add forwarding, etc. if you haven’t already

• Posted additional register renaming practice

• AEFIS Final Evals Released 4/17

– Will post requests for feedback shortly

19

Page 20: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Quiz Week 13 Renaming Question

20

Page 21: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Quiz Week 13 Renaming Question

• Key idea:

– Every instruction takes 1 cycle in X

– Start with 4 free physical registers

– By time 5th instruction (lw $t0) reaches Di, 1st

instruction R in same cycle

– Since we do R first in a cycle, reuse its free physical register right away for 5th instruction

– This process repeats for subsequent instructions

21

Page 22: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Quiz Week 13 Renaming Question

22

insn \ cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14

lw $t0, 0($s2) F De Di S X C R

and $s2, $t2, $t1 F De Di S X C R

or $s1, $s1, $t2 F De Di S X C R

sub $t2, $s0, $s2 F De Di S X C R

lw $t0, 4($t0) F De Di S X C R

lw $s2, 0($s1) F De Di S X C R

sub $t0, $t1, $s1 F De Di S X C R

add $t1, $t2, $s1 F De Di S X C R

or $s1, $t2, $t0 F De Di S X C

lw $t2, 0($t0) F De Di

Page 23: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Types of Parallelism

23

Compute-Level Parallelism:➢ Executing multiple computation streams

simultaneously

Data-Level Parallelism:➢ Processing multiple data streams simultaneously

Page 24: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

*-Level Parallelism

24

Request-Level Parallelism:➢ Users issue simultaneous requests (e.g., databases, web servers, transactional

systems)➢ Compute-level parallelism in multi-chip processors

Task-Level Parallelism:➢ Programs invoke simultaneous tasks (e.g., dataflow, message passing,

speculative processors)➢ Compute-level parallelism in multi-chip processors and multiprocessors

Thread-Level Parallelism:➢ Programs execute simultaneous threads (e.g., pthreads, OpenMP, GPUs)➢ Compute-level parallelism in multiprocessors

Memory-Level Parallelism:➢ Threads issue simultaneous memory accesses (e.g., cache misses, prefetchers)➢ Data-level parallelism in multiprocessors

Page 25: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

*-Level Parallelism

25

Instruction-Level Parallelism:➢ Threads issue simultaneous instructions (e.g., out-of-order, superscalar)➢ Compute-level parallelism in uniprocessors

Superword-Level Parallelism:➢ Instructions process simultaneous data words (e.g., vector instructions, Intel

SSE/AVX)➢ Data-level parallelism in uniprocessors

Bit-Level Parallelism:➢ Operations manipulate simultaneous bits (e.g., functional units, bitwise

arithmetic)➢ Data-level parallelism in uniprocessors

Page 26: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Amdahl’s Law, Redux

• Speedup = 1 / (1 – f + f/s)

– f is % done in parallel

– s is amount done in parallel (e.g., on multiple cores)

26

Page 27: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Amdahl’s Law Example

• Applying Amdahl's Law, you estimate that when executing on two cores, the speedup of your entire program is 1.5x. What is the fraction of your program that can be parallelized?

Speedup = 1.5, s = 2

Speedup = 1.5 = 1 / ((1 – x) + x/2)

x = 2/327

Page 28: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Example Vector ISA Extensions (SIMD)

• Extend ISA with floating point (FP) vector storage …– Vector register: fixed-size array of 32- or 64- bit FP elements

– Vector length: For example: 4, 8, 16, 64, …

• … and example operations for vector length of 4– Load vector: lv $v1, X($r1) //[X+r1]->v1

lv [X+r1+0]->v10

lv [X+r1+1]->v11

lv [X+r1+2]->v12

lv [X+r1+3]->v13

– Add two vectors: addv.f v3, v1, v2 // v3 = v1 + v2

addv.f v3i, v1i, v2i (where i is 0,1,2,3)

– Add vector to scalar: addv.f v3, v1, f2

addv.f v3i, v1i, f2 (where i is 0,1,2,3)

• Today’s vectors: short (128 or 256 bits), but fully parallel

28

Page 29: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Example Use of Vectors – 4-wide

• Operations– Load vector: lv v1, X(r1) // [X+r1]->v1

– Multiply vector to scalar: mulv.f // v1 * f2->v3

– Add two vectors: addv.f v3, v1, v2 // v1,v2->v3

– Store vector: sv v1, X(r1) // v1->[X+r1]

• Performance?– Best case: 4x speedup

– Tradeoff: execution width (implementation) vs vector width (ISA)

lwcl f1, X(r1)

mul.s f2, f0, f1

lwcl f3, Y(r1)

add.s f4, f2, f3

swcl f4, Z(r1)

addi r1, r1, 4

bne r1, 4096, L1

lv v1, X(r1)

mulv.f v2, v1, f0

lv v3, Y(r1)

addv.f v4, v2, v3

sv v4, Z(r1)

addi r1,16->r1

bne r1,4096,L1

7x1024 instructions 7x256 instructions

(4x fewer instructions)

29

Page 30: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Parallelism and Concurrency

30

Concurrency:Coexistence of multiple tasks/computations in progress➢ Logically doing multiple things at once➢ Typically algorithm/system-defined

Parallelism:Execution of multiple tasks/computations simultaneously➢ Physically doing multiple things at once➢ Typically architecture-defined

Page 31: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Parallelism and Concurrency

31

e.g., Concurrent but not parallel:

Page 32: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Parallelism and Concurrency

32

e.g., Concurrent but not parallel:➢ Context-switching threads on a single core

thread 1 thread 2

context switches

Page 33: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Parallelism and Concurrency

33

e.g., Parallel but not concurrent:

Page 34: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Parallelism and Concurrency

34

e.g., Parallel but not concurrent:➢ Pipelining

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

F D X M W

F D X M W

F D X M W

F D X M W

F D X M W

F D X M W

F D X M W

Page 35: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Parallelism and Concurrency

35

e.g., Parallel but not concurrent:➢ Parallelism via speculation

task 1

task 2

task 3 task 4

task 5

Page 36: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Parallelism and Concurrency

36

e.g., Parallel but not concurrent:➢ Parallelism via speculation

task 1

task 2

task 3 task 4

task 5

predict input, start early

Page 37: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Parallelism and Concurrency

37

e.g., Parallel but not concurrent:➢ Parallelism via speculation

task 1

task 2

task 3

task 4

task 5

run speculatively

Page 38: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Parallelism and Concurrency

38

e.g., Parallel but not concurrent:➢ Parallelism via speculation

task 1

task 2

task 3

task 4

task 5

run speculatively

Page 39: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Parallelism and Concurrency

39

e.g., Parallel but not concurrent:➢ Parallelism via surrogates

task 1 sort task 2

Page 40: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Parallelism and Concurrency

40

e.g., Parallel but not concurrent:➢ Parallelism via surrogates

task 1 radix sort

task 2

quick sort

bubble sort

Page 41: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

CS/ECE 552: Parallel Processors(Part 3)

Prof. Matthew D. Sinclair

Lecture notes based in part on slides created by MikkoLipasti, Mark Hill, David Wood, Guri Sohi, John Shen,

Joshua San Miguel, and Jim Smith

Page 42: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Announcements 4/20

• Phase 2.3 grading on-going

• Phase 3 due 5/1

• AEFIS final evals out

– Specific questions posted

42

Page 43: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Multithreading

43

context-switching

Page 44: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Multithreading

44

context-switching

simultaneous multithreading

Page 45: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Multithreading

45

context-switching

simultaneous multithreading

chip multiprocessor

Page 46: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Multithreading

46

context-switching

simultaneous multithreading

chip multiprocessor

parallelism/throughput

area/complexity

Page 47: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Multithreading

47

convoying

Page 48: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Multithreading

48

convoying self-navigating

Page 49: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Multithreading

49

convoying self-navigating separate roads

Page 50: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Chip Multiprocessor

50

main memory

L1-I cache

L1-D cache

unified I/DL2 cache

L1-I cache

L1-D cache

unified I/DL2 cache

L1-I cache

L1-D cache

unified I/DL2 cache

Page 51: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Simultaneous Multithreading

51

• Fetch and execute instructions from different threads in parallel on the same superscalar processor

OoO engine is naturally well-suited to SMT

Page 52: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Simultaneous Multithreading

52

• Partition the instruction/decode buffer, dispatch buffer, reorder buffer and map table

• Pool the reservation stations, functional units and physical registers

Both threads 1 and 2

Thread 1 Thread 2

Thread 1 Thread 2

Page 53: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Simultaneous Multithreading

53

• Partition the instruction/decode buffer, dispatch buffer, reorder buffer and map table

• Pool the reservation stations, functional units and physical registers

thread 1 map table

$t0 P15 rdy

$t1 P9 rdy

$s0 P6 !rdy

$s1 P4 rdy

$s2 P0 rdy

thread 2 map table

$t0 P2 !rdy

$t1 P10 rdy

$s0 P3 !rdy

$s1 P11 !rdy

$s2 P7 rdy

partitioned

free list

P8, P1, P14, P12, P5, P13

pooled

Page 54: CS/ECE 552: I/Osinclair/courses/cs552/spring2020/hando… · CS/ECE 552: I/O Prof. Matthew D. Sinclair Lecture notes based in part on slides created by Mikko Lipasti, Mark Hill, David

Simultaneous Multithreading

54[J. E. Smith]

partitionedpooled

e.g., Intel Hyperthreading:

• Partition the instruction/decode buffer, dispatch buffer, reorder buffer and map table

• Pool the reservation stations, functional units and physical registers