comp sci 251 -- pipelining 1 ch. 13 pipelining. comp sci 251 -- pipelining 2 pipelining

17
Comp Sci 251 -- pipelin ing 1 Ch. 13 Pipelining

Upload: elwin-hunter

Post on 04-Jan-2016

223 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining1

Ch. 13 Pipelining

Page 2: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining2

Pipelining

Page 3: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining3

Performance of Pipeline

One instruction completes every clock cycle n stages in pipeline up to n times faster

speedup < n because some instructions do not need every stage

Note: individual instructions are not faster

Page 4: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining4

MIPS Pipeline

5 stages IF: instruction fetch ID: instruction decode (read registers) EX: instruction execution, address calc MEM: memory access WB: write back (to register)

Page 5: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining5

Implementing a single-cycle pipeline

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

• Every stage takes the same time, whether there is work or not

• Each stage must be stretched to accommodate the slowest instruction

Page 6: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining6

Total time for each instruction

Instr Fetch

Reg Read ALU Operation

Data access

Register Write

Total Time

Load word (lw)

200 ps 100 ps 200 ps 200 ps 100 ps 800 ps

Store word (sw)

200 ps 100 ps 200 ps 200 ps 700 ps

R-format (add, sub, and, or, slt)

200 ps 100 ps 200 ps 100 ps 600 ps

Branch (beq)

200 ps 100 ps 200 ps 500 ps

Page 7: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining7

Single-cycle non-pipelined execution

IR R ALU MEM R

IR R ALU MEM R

lw $7, 100($5)

lw $8, 200($5)

lw $9, 300($5) IR800 ps

800 ps

3 independent lw instructions will take 3 x 800 ps = 2400 ps

Page 8: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining8

Pipelined execution

IF

IF

IF

ALU MEM RR

R ALU MEM

R ALU MEM

R

R

200 400 600 800 1000 1200 1400

lw $7, 100($5)

lw $8, 200($5)

lw $9, 300($5)200ps

200ps

200ps

3 independent lw instructions takes 3 x 200 ps = 600 ps

4 times faster than the single-cycle non-pipelined execution

Page 9: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining9

Pipeline Hazards

Control Hazards caused by conditional branch instruction cannot decide which instruction is next until

stage 3 (or stage 2 with beefed up processor)

pipeline wants to start next instruction during stage 2

Page 10: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining10

Solutions to Control Hazards

Stall: – start next instruction during stage 3 (assume branch is resolved

in stage 2)– equivalent to placing a “nop” after every branch

Predict: – if incorrect, flush the bad instruction– Some prediction strategies

assume all branches not taken static: assume some always taken, others never taken dynamic: use past history, keep stats on branches

Page 11: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining11

Solutions to Control Hazards

Delayed decision: (used in MIPS and SPARC)– instruction following branch always executes– branch takes place after this instruction– compiler or assembler fills “delay slots” with useful

instructions (or nop’s) Change order of neighboring instructions, if logically

acceptable Or nop

Page 12: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining12

Data Hazards

Assume register write happens in WB stage

add $s0, $t0, $t1

sub $t2, $s0, $t3

Example requires three pipeline stalls– too costly to allow– too frequent for compiler to resolve

Page 13: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining13

Solution to Data Hazards

Forwarding: getting the missing item early from internal resources

sub gets $s0 value from ALU, not reg. file sometimes forwarding avoids stalls

Page 14: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining14

Solution to Data Hazards

sometimes forwarding only reduces stalls

Page 15: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining15

Advanced Pipelining Techniques

Superpipelining: large number of stages Superscalar

– multiple copies of each stage– several instructions started/finished per cycle

Page 16: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining16

Advanced Pipelining Techniques

Dynamic Pipeline Scheduling

Page 17: Comp Sci 251 -- pipelining 1 Ch. 13 Pipelining. Comp Sci 251 -- pipelining 2 Pipelining

Comp Sci 251 -- pipelining17

Pentium Pro / Power PC 604