pipeline - ic.unicamp.brpannain/mc722/aulas/arq_hp6.pdf · 2 paulo c. centoducatte 1998 morgan...

52
1 Ch6.a-1 1998 Morgan Kaufmann Publishers Paulo C. Centoducatte Pipeline Ch6.a-2 1998 Morgan Kaufmann Publishers Paulo C. Centoducatte Lavanderia analogia com o pipelining Time 7 6 PM 8 9 10 11 12 1 2 AM A B C D Task order Time 7 6 PM 8 9 10 11 12 1 2 AM A B C D Task order

Upload: tranhanh

Post on 23-Mar-2019

228 views

Category:

Documents


0 download

TRANSCRIPT

1

Ch6.a-11998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipeline

Ch6.a-21998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Lavanderia – analogia com o pipelining

Time76 PM 8 9 10 11 12 1 2 AM

A

B

C

D

Task�order

Time76 PM 8 9 10 11 12 1 2 AM

A

B

C

D

Task�order

2

Ch6.a-31998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipeline de instruções no MIPS

• Fetch da instrução

• Leitura dos registradores e decodificação

• Execução da operação ou cálculo de endereço

• Acesso ao operando na memória

• Escrita do resultado em um registrador

Ch6.a-41998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Exemplo

• Compare o tempo médio entre instruções da implementação

em single-cycle (uma instrução por ciclo) com uma

implementação com pipeline. Supor maior tempo de

operação para acesso à memória = 2ns, operação da ULA =

2ns e acesso ao register file = 1ns. (Instrs lw, sw, add, sub,

and, or slt e beq).

• Inicialmente suponha a execução de 3 instruções lw (tempo

entre o início da 1ª instrução e o início da 4ª instrução)

3

Ch6.a-51998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Tempo total para as oito instruções calculado a partir do tempo de cada componente

Ch6.a-61998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Execução não-pipeline X pipeline

Instructionfetch

Reg ALUData

accessReg

8 nsInstruction

fetchReg ALU

Dataaccess

Reg

8 nsInstruction

fetch

8 ns

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14

...

Programexecutionorder(in instructions)

Instructionfetch

Reg ALUData

accessReg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Programexecutionorder(in instructions)

4

Ch6.a-71998 Morgan Kaufmann PublishersPaulo C. Centoducatte

OBS.:

• Sob condições ideais, com estágios balanceados, o speedup do

pipeline é igual ao número de estágios do pipeline ( 5 estágios

, 5 vezes mais rápido)

• Na realidade o tempo de execução de uma instrução é um

pouco superior (overheads) � speedup é menor que o número

de estágios do pipeline

Suponha a execução de 1003 instruções

com pipeline ����

1000 X 2ns + 14 = 2014 (para cada instrução adiciono 2ns)

sem pipeline ���� 1000 X 8ns + 24 = 8024

spedup = 8024 / 2014 = 3.98 ~~ 8 / 2

Desempenho do pipeline

é devido ao aumento do

throughput.

Ch6.a-81998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Projeto de um conjunto de instruções para pipeline

• O que torna a implementação mais fácil

– Instruções de mesmo tamanho

– Poucos formatos, com campos de registradores sempre dispostos no mesmo lugar (Simetria, no 2º

estágio podemos ler registradores e decodificar ao mesmo tempo).

– Acesso à memória apenas com as instruções lw e sw.

– Operandos alinhados na memória: o dado pode ser transferido da memória para a CPU e CPU para amemória em um único estágio do pipeline.

5

Ch6.a-91998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Projeto de um conjunto de instruções para pipeline

• O que torna a implementação mais dificil

– Hazard

• Hazard Estrural

• Hazard de Controle

• Hazard de Dados

Ch6.a-101998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipeline Hazards

• Hazard Estrutural

– O hardware não suporta uma combinação de instruções que queremos executar em um único período de clock

• Ex.: escrever e ler da memória em um mesmo ciclo

6

Ch6.a-111998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipeline Hazards

• Hazard de Controle

– Problemas devido à execução de instruções de desvio

• Ex.: Quando um branch é tomado, como tratar a(s) instruções que seguem (fisicamente) o branch no programa e que já estão no pipeline

Ch6.a-121998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipelining stalling para instruções branch

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5, $6

lw $3, 300($0)4 ns

Instructionfetch

Reg ALUData

accessReg

2ns

Instructionfetch

Reg ALUData

accessReg

2ns

2 4 6 8 10 12 14 16

Programexecutionorder(in instructions)

7

Ch6.a-131998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Branch prediction: Tentar “adivinhar” qual dos caminhos do branch

será tomado

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5, $6

lw $3, 300($0)

Instructionfetch

Reg ALUData

accessReg

2 ns

Instructionfetch

Reg ALUData

accessReg

2 ns

Program

executionorder(in instructions)

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5 ,$6

or $7, $8, $9

Instructionfetch

Reg ALUData

accessReg

2 4 6 8 10 12 14

2 4 6 8 10 12 14

Instructionfetch

Reg ALUData

accessReg

2 ns

4 ns

bubble bubble bubble bubble bubble

Programexecution

order(in instructions)

O branch não será tomado

Ch6.a-141998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipeline delayed branch

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5, $6

lw $3, 300($0)

Instructionfetch

Reg ALUData

accessReg

2 ns

Instructionfetch

Reg ALUData

accessReg

2 ns

2 4 6 8 10 12 14

2 ns

(Delayed branch slot)

Programexecutionorder(in instructions)

8

Ch6.a-151998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Hazard de Dados

• Quando uma instrução necessita de um dado que ainda não foi calculado

– Ex.: add $s0,$t0,$t1

sub $t2,$s0,$t3

Soluções : Compilador (programador) gera código livre de data hazard (introduzindo, por ex., instruções nopno código; alterando a ordem das instruções; ...)

Stall; Forwarding ou bypassing

Ch6.a-161998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Hazard de Dados

Time2 4 6 8 10

add $s0, $t0, $t1 IF ID WBEX MEM

add $s0, $t0, $t1

sub $t2, $s0, $t3

Programexecutionorder(in instructions)

IF ID WBEX

IF ID MEMEX

Time2 4 6 8 10

MEM

WBMEM

9

Ch6.a-171998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Hazard de Dados

Time2 4 6 8 10 12 14

lw $s0, 20($t1)

sub $t2, $s0, $t3

Programexecutionorder(in instructions)

IF ID WBMEMEX

IF ID WBMEMEX

bubble bubble bubble bubble bubble

Stall

Ch6.a-181998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Exemplo

• Encontre o hazard no código abaixo e resolva-o:

# $t1 tem o end. de v[k]lw $t0, 0($t1) # $t0 = v[k]lw $t2,4($t1) # $t2 = v[k+1]sw $t2, 0($t1) # v[k] = $t2sw $t0, 4($t1) # v[k+1] = $t0

Solução:# $t1 tem o end. de v[k]

lw $t0, 0($t1) # $t0 = v[k]lw $t2,4($t1) # $t2 = v[k+1]sw $t0, 4($t1) # v[k+1] = $t0sw $t2, 0($t1) # v[k] = $t2

10

Ch6.a-191998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipeline: Idéia Básica • 5 estágios: Fetch; Decodificação e leitura dos regs; execução ou cálculo

de end. ; acesso à memória; escrita no reg. destino

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

ReaddataAddress

Datamemory

1

ALUresult

Mux

ALUZero

IF: Instruction fetch ID: Instruction decode/register file read

EX: Execute/address calculation

MEM: Memory access WB: Write back

O que é necessário para tornar cada divisão em estágios?

Ch6.a-201998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Instruções sendo executadas pelo datapath

IM Reg DM RegALU

IM Reg DM RegALU

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7

Time (in clock cycles)

lw $2, 200($0)

lw $3, 300($0)

Programexecution

order

(in instructions)

lw $1, 100($0) IM Reg DM RegALU

11

Ch6.a-211998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipelined Datapath

Instructionmemory

Address

4

32

0

AddAdd

result

Shiftleft 2

Inst ructio

n

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

Pode acontecer algum problema nesta solução se não existir dependência de dados?

A execução de qual instrução causa o problema?

Ch6.a-221998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipelined Datapath

Instruction�memory

Address

4

32

0

Add Add�result

Shift�left 2

Instr

uct

ion

IF/ID EX/MEM MEM/WB

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

16Sign�

extend

Write�register

Write�data

Read�data

1

ALU�result

M�u�x

ALU

Zero

ID/EX

Instruction fetch

lw

Address

Data�memory

12

Ch6.a-231998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipelined Datapath

Instruction�memory

Address

4

32

0

Add Add�result

Shift�left 2

Instr

uct

ion

IF/ID EX/MEM

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

16Sign�

extend

Write�register

Write�data

Read�data

1

ALU�result

M�u�x

ALU

Zero

ID/EX MEM/WB

Instruction decode

lw

Address

Data�memory

Ch6.a-241998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipelined Datapath

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instr

uct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX MEM/WB

Execution

lw

Address

D atamemory

13

Ch6.a-251998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipelined Datapath

Instruction�memory

Address

4

32

0

AddAdd�

result

Shift�left 2

Inst

ructio

n

IF/ID EX/MEM

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

16Sign�

extend

Write�register

Write�data

Read�data

Data�memory

1

ALU�result

M�u�x

ALU

Zero

ID/EX MEM/WB

Memory

lw

Address

Ch6.a-261998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipelined Datapath

Instruction�memory

Address

4

32

0

Add Add�result

Shift�left 2

Instr

uct

ion

IF/ID EX/MEM

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

16Sign�

extend

Write�data

Read�dataData�

memory

1

ALU�result

M�u�x

ALU

Zero

ID/EX MEM/WB

Write back

lw

Write�register

Address

14

Ch6.a-271998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipelined Datapath

Instruction

memory

Address

4

32

0

AddAdd

result

Shift

left 2

Inst

ructio

nIF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Data

memory

1

ALUresult

Mux

ALU

Zero

ID/EX MEM/WB

Execution

sw

Address

Ch6.a-281998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipelined Datapath

Instruction�memory

Address

4

32

0

AddAdd�

result

Shift�left 2

Inst

ructio

n

IF/ID EX/MEM

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

16Sign�

extend

Write�register

Write�data

Read�data

Data�memory

1

ALU�result

M�u�x

ALU

Zero

ID/EX MEM/WB

Memory

sw

Address

15

Ch6.a-291998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipelined Datapath

Instruction�memory

Address

4

32

0

Add Add�result

Shift�left 2

Instr

uct

ion

IF/ID EX/MEM

M�u�x

0

1

Add

PC

0

Address

Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

16Sign�

extend

Write�register

Write�data

Read�data

Data�memory

1

ALU�result

M�u�x

ALU

Zero

ID/EX MEM/WB

Write back

sw

Ch6.a-301998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Datapath Correto

Instruction�memory

Address

4

32

0

Add Add�result

Shift�left 2

Instr

uct

ion

IF/ID EX/MEM MEM/WB

M�u�x

0

1

Add

PC

0

Address

Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

16Sign�

extend

Write�register

Write�data

Read�data

Data�memory

1

ALU�result

M�u�x

ALU

Zero

ID/EX

Problema na execução da instrução load.

Qual é o erro?

16

Ch6.a-311998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Datapath com os estágios usados para um instrução lw

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Ins

truct io

nIF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1

Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALU

Zero

ID/EX

Address

Datamemory

Ch6.a-321998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Representações Gráfica do Pipeline

IM R eg D M R e g

IM R e g D M R eg

C C 1 C C 2 C C 3 C C 4 C C 5 C C 6

T im e (in c lo c k c y cles )

lw $1 0 , 2 0 ( $1 )

P rog ra me x e c u t io no rd e r( in in s tr uc tio n s )

s ub $ 11 , $ 2 , $3

A L U

A L U

Ajuda a responder perguntas como:

Quantos ciclos são gasto para executar este código?

O que a ALU está fazendo durante o ciclo 10?

Ajuda a entender os datapaths

17

Ch6.a-331998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Representações Gráfica do Pipeline

Programexecutionorder(in instructions)

Time ( in clock cycles)

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Instructionfetch

Instructiondecode

Instructionfetch

Instructiondecode Execution Write back

Execution

Dataaccess

Dataaccess Write backlw $10, $20($1)

sub $11, $2, $3

Ch6.a-341998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Instruction�memory

Address

4

32

0

Add Add�result

Shift�left 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

16Sign�

extend

Write�register

Write�data

Read�data

1

ALU�result

M�u�x

ALU

Zero

ID/EX

Instruction fetch

lw $10, 20($1)

Address

Data�memory

Clock 1

18

Ch6.a-351998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Instruction�memory

Address

4

32

0

Add Add�result

Shift�left 2

Instr

uct

ion

IF/ID EX/MEM MEM/WB

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

16Sign�

extend

Write�register

Write�data

Read�data

1

ALU�result

M�u�x

ALU

Zero

ID/EX

Instruction decode

lw $10, 20($1)

Instruction fetch

sub $11, $2, $3

Address

Data�memory

Clock 2

Ch6.a-361998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Instruction�memory

Address

4

0

Add Add�result

Shift�left 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

Write�register

Write�data

Read�data

1

ALU�result

M�u�x

ALU

Zero

ID/EX

Execution

lw $10, 20($1)

Instruction decode

sub $11, $2, $3

3216Sign�

extend

Address

Data�memory

Clock 3

19

Ch6.a-371998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Instruction�memory

Address

4

0

Add Add�result

Shift�left 2

Instru

ctio

n

IF/ID EX/MEM MEM/WB

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

3216Sign�

extend

Write�register

Write�data

Memory

lw $10, 20($1)

Read�data

1

ALU�result

M�u�x

ALU

Zero

ID/EX

Execution

sub $11, $2, $3

Data�memory

Address

Clock 4

Ch6.a-381998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Instruction�memory

Address

4

32

0

Add Add�result

1

ALU�result

Zero

Shift�left 2

Inst

ruct

ion

IF/ID EX/MEMID/EX MEM/WB

Write backM�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

16Sign�

extend

M�u�x

ALURead�data

Write�register

Write�data

lw $10, 20($1)

Memory

sub $11, $2, $3

Address

Data�memory

Clock 5

20

Ch6.a-391998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Instruction�memory

Address

4

32

0

Add Add�result

1

ALU�result

Zero

Shift�left 2

Instr

uct

ion

IF/ID EX/MEMID/EX MEM/WB

Write backM�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

16Sign�

extend

M�u�x

ALURead�data

Write�register

Write�data

sub $11, $2, $3

Address

Data�memory

Clock 6

Ch6.a-401998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Controle do Pipeline

PC

Instructionmemory

Address

Inst

ruct

ion

Instruction[20– 16]

MemtoReg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1Write

data

Read

data Mux

1

ALUcontrol

RegWrite

MemRead

Instruction[15– 11]

6

IF/ID ID/EX EX/MEM MEM/WB

MemWrite

Address

Datamemory

PCSrc

Zero

AddAdd

result

Shift

left 2

ALUresult

ALU

Zero

Add

0

1

Mux

0

1

Mux

21

Ch6.a-411998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Controle do Pipeline

• 5 estágios. O que deve ser controlado em cada estágio?

– 10: Fetch da instrução e incremento do PC

– 20: Decodificação da instrução e Fetch dos registradores

– 30: Execução

– 40: Acesso à memória

– 50: Write back

Ch6.a-421998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Sinais de controle

22

Ch6.a-431998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Sinais de controle

Ch6.a-441998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Sinais de controle

Execution/Address

Calculation stage

control lines

Memory access stage

control lines

Write-back

stage control

lines

Instruction

Reg

Dst

ALU

Op1

ALU

Op0

ALU

Src Branch

Mem

Read

Mem

Write

Reg

write

Mem to

Reg

R-format 1 1 0 0 0 0 0 1 0

lw 0 0 0 1 0 1 0 1 1

sw X 0 0 1 0 0 1 0 X

beq X 0 1 0 1 0 0 0 X

23

Ch6.a-451998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction

Ch6.a-461998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Datapath com Controle

PC

Instruc tionmemory

Instr

ucti o

n

Add

Instruction[20– 16]

Mem

t oR

eg

ALU Op

Branch

RegD st

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

AddAdd

result

R egistersW riteregister

W ritedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Re

gWr ite

M emRead

Control

ALU

Instruction[15– 11]

6

EX

M

W B

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Write

Address

Datamemory

Address

24

Ch6.a-471998 Morgan Kaufmann PublishersPaulo C. Centoducatte

lw $10, 20 ($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7add $14, $8, $9

Instruction�memory

Instruction�[20–16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

Instruction�[15– 0]

0

M�u�x

0

1

AddAdd�

result

RegistersWrite�register

Write�data

Read�data 1

Read�data 2

Read�register 1

Read�register 2

Sign�extend

M�u�x

1

ALU�result

Zero

ALU�control

Shift�

left 2R

egW

rite

MemRead

Control

ALU

Instruction�[15–11]

EX

M

WB

M

WB

WB

Instr

uct

ion

IF/ID EX/MEMID/EX

ID: before<1> EX: before<2> MEM: before<3> WB: before<4>

MEM/WB

IF: lw $10, 20($1)

000

00

0000

000

00

000

0

00

00

0

0

0

M�u�x

0

1

Add

PC

0

Data�

memory

Address

Write�data

Read�data

M�u�x

1

Me

mW

rite

Address

Clock 1

1o ciclo

Ch6.a-481998 Morgan Kaufmann PublishersPaulo C. Centoducatte

lw $10, 20 ($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7add $14, $8, $9

WB

EX

M

Instruction�

memory

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

M�u�x

0

1

Add Add�result

Write�register

Write�data

M�u�x

1

ALU�result

Zero

ALU�control

Shift�

left 2

Re

gW

rite

ALU

M

WB

WB

Instr

uct

ion

IF/ID EX/MEMID/EX

ID: lw $10, 20($1) EX: before<1> MEM: before<2> WB: before<3>

MEM/WB

IF: sub $11, $2, $3

010

11

0001

000

00

00

0

0

00

0

0

0

0

0

M�u�x

0

1

Add

PC

0Write�data

Read�data

M�u�x

1

lwControl

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

X

10

20

X

1

Instruction�[20– 16]

Instruction�[15– 0] Sign�

extend

Instruction�[15– 11]

20

$X

$1

10

X

MemRead

Me

mW

rite

Data�

memory

Address

Address

Clock 2

2o ciclo

25

Ch6.a-491998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Instruction�memory

Address

Instruction�[20– 16]

Mem

toR

eg

Branch

ALUSrc

4

Instruction�[15– 0]

0

1

AddAdd�

result

RegistersWrite�register

Write�data

Read�data 1

Read�data 2

Read�register 1

Read�register 2

ALU�result

Shift�

left 2

Re

gW

rite

MemRead

Control

ALU

Instruction�[15– 11]

EX

M

WB

WBIn

str

uct

ion

IF/ID EX/MEMID/EX

ID: sub $11, $2, $3 EX: lw $10, . . . MEM: before<1> WB: before<2>

MEM/WB

IF: and $12, $4, $5

000

10

1100

010

11

000

1

00

00

0

0

0

M�u�x

0

1

Add

PC

0Write�data

Read�data

M�u�x

1

Me

mW

rite

sub

11

X

X

3

2

X

$3

$2

X

11

$1

20

10

M�u�x

0

M�u�x

1

ALUOp

RegDst

ALU�

control

M

WB

Zero

Sign�extend

Data�memory

Address

Clock 3

lw $10, 20 ($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7add $14, $8, $9

3o ciclo

Ch6.a-501998 Morgan Kaufmann PublishersPaulo C. Centoducatte

WB

EX

M

Instruction�memory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

Add Add�result

Write�register

Write�data 1

ALU�result

ALU�control

Shift�left 2

Re

gW

rite

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: and $12, $2, $3 EX: sub $11, . . . MEM: lw $10, . . . WB: before<1>

MEM/WB

IF: or $13, $6, $7

000

10

1100

000

10

101

0

11

10

0

0

0

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

andControl

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

12

X

X

5

4

Instruction�[20– 16]

Instruction�[15– 0]

Instruction�[15– 11]

X

$5

$4

X

12

MemRead

Me

mW

rite

$3

$2

11

M�u�x

M�u�x

ALUAddress Read�

dataData�

memory

10

WB

Zero

Sign�extend

Clock 4

lw $10, 20 ($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7add $14, $8, $9

4o ciclo

26

Ch6.a-511998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Instruction�

memory

Address

Instruction�[20– 16]

Branch

ALUSrc

4

Instruction�[15– 0]

0

1

AddAdd�

result

RegistersWrite�register

Write�data

Read�data 1

Read�data 2

Read�register 1

Read�register 2

ALU�result

Shift�

left 2R

egW

rite

MemRead

Control

ALU

Instruction�[15– 11]

EX

M

WB

Instr

uct

ion

IF/ID EX/MEMID/EX

ID: or $13, $6, $7 EX: and $12, . . . MEM: sub $11, . . . WB: lw $10, . . .

MEM/WB

IF: add $14, $8, $9

000

10

1100

000

10

10

1

0

10

0

0

0

M�u�x

0

1

Add

PC

0Write�data

Read�data

M�u�x

1

Me

mW

rite

or

13

X

X

7

6

X

$7

$6

X

13

$4

M�u�x

0

M�u�x

1

ALUOp

RegDst

ALU�

control

M

WB

11 10

10$5

12

WB

Mem

toR

eg

1

1

Zero

Data�

memory

Address

Sign�

extend

Clock 5

lw $10, 20 ($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7add $14, $8, $9

5o ciclo

Ch6.a-521998 Morgan Kaufmann PublishersPaulo C. Centoducatte

WB

EX

M

Instruction�

memory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

Add Add�result

1

ALU�result

ALU�control

Shift�left 2

Re

gW

rite

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: add $14, $8, $9 EX: or $13, . . . MEM: and $12, . . . WB: sub $11, . . .

MEM/WB

IF: after<1>

000

10

1100

000

10

101

0

10

00

0

1

0

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

addControl

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

14

X

X

9

8

Instruction�[20– 16]

Instruction�[15– 0]

Instruction�[15– 11]

X

$9

$8

X

14

MemRead

Me

mW

rite

$7

$6

13

M�u�x

M�u�x

ALURead�data

12

WB

11

11

Write�register

Write�data

Zero

Data�memory

Address

Sign�extend

Clock 6

lw $10, 20 ($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7add $14, $8, $9

6o ciclo

27

Ch6.a-531998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Instruction�

memory

Address

Instruction�[20–16]

Branch

ALUSrc

4

Instruction�[15–0]

0

1

AddAdd�

result

RegistersWrite�register

Write�data

ALU�result

Shift�left 2

Re

gW

rite

MemRead

Control

ALU

Instruction�[15–11]

Sign�

extend

EX

M

WB

Instr

uct

ion

IF/ID EX/MEMID/EX

ID: after<1> EX: add $14, . . . MEM: or $13, . . . WB: and $12, . . .

MEM/WB

IF: after<2>

000

00

0000

000

10

101

0

10

00

0

M�u�x

0

1

Add

PC

0Write�data

Read�data

M�u�x

1

ID: after<2> EX: after<1> MEM: add $14, . . . WB: or $13, . . .IF: after<3>M

em

Wri

te

$8

M�u�x

0

M�u�x

1

ALUOp

RegDst

ALU�

control

M

WB

13 12

12$9

14

WB

Mem

toR

eg

1

0

Read�data 1

Read�data 2

Read�register 1

Read�register 2 Zero

Data�

memory

Address

Clock 7

lw $10, 20 ($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7add $14, $8, $9

7o ciclo

Ch6.a-541998 Morgan Kaufmann PublishersPaulo C. Centoducatte

WB

EX

M

Instruction�

memory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

Add Add�result

1

ALU�result

Zero

ALU�control

Shift�

left 2

Re

gW

rite

M

WB

Inst

ruct

ion

IF/ID EX/MEMID/EX

ID: after<2> EX: after<1> MEM: add $14, . . . WB: or $13, . . .

MEM/WB

IF: after<3>

000

00

0000

000

00

00

0

0

10

0

0

0

1

0

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Control

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

Instruction�[20–16]

Instruction�[15–0] Sign�

extend

Instruction�[15–11]

MemRead

Me

mW

rite

M�u�x

M�u�x

ALURead�data

14

WB

13

13

Write�register

Write�data

Data�memory

Address

Clock 8

lw $10, 20 ($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7add $14, $8, $9

8o ciclo

28

Ch6.a-551998 Morgan Kaufmann PublishersPaulo C. Centoducatte

WB

EX

M

Instruction�

memory

Address

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

0

0

1

AddAdd�

result

1

ALU�result

Zero

ALU�

control

Shift�

left 2R

eg

Write

M

WB

Instr

uct

ion

IF/ID EX/MEMID/EX

ID: after<3> EX: after<2> MEM: after<1> WB: add $14, . . .

MEM/WB

IF: after<4>

000

00

0000

000

00

000

0

00

00

0

1

0

M�u�x

0

1

Add

PC

0Write�data

M�u�x

1

Control

Registers

Read�data 1

Read�data 2

Read�register 1

Read�register 2

Instruction�[20– 16]

Instruction�[15– 0] Sign�

extend

Instruction�[15– 11]

MemRead

Mem

Wri

te

M�u�x

M�u�x

ALURead�data

WB

14

14

Write�register

Write�data

Data�memory

Address

Clock 9

lw $10, 20 ($1)sub $11, $2, $3and $12, $4, $5or $13, $6, $7add $14, $8, $9

9o ciclo

Ch6.a-561998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Dependências

• Problema com iniciar a execução de uma instrução antes do término da anterior

– Exemplo: Instruções com dependências

sub $2, $1, $3 # reg $2 modificadoand $12, $2, $5 # valor (1º operando) de $2

# depende do subor $13, $6, $2 # idem (2º operando)add $14, $2, $2 # idem (1º e 2º operando)

sw $15, 100($2) # idem (base do endereçamento)

29

Ch6.a-571998 Morgan Kaufmann PublishersPaulo C. Centoducatte

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/–20 –20 –20 –20 –20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value ofregister $2:

DM Reg

Reg

Reg

Reg

DM

Se a escrita no banco de registradores éfeita no 1o semi-cicloe a leitura no 2o, em CC5 não há hazard.

CC4 CC5 CC6

Reg WR Reg RD

Ch6.a-581998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Solução por Software

• Compilador garante um código sem Hazard inserindo nops

– Onde inserir os nops?

sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

sub $2, $1, $3nopnopand $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

Problema: reduz o desempenho

30

Ch6.a-591998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Solução por Hardware

• Usar o resultado assim que calculado, não esperar que ele seja escrito.

• Neste caso é necessário mecanismo para detectar o hazard. Qual a condição para que haja hazard?

Quando uma instrução tenta ler um registrador (estágio EX) e esse registrador será escrito poruma instrução anterior no estágio WB

Ch6.a-601998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Forwarding

O que ocorre se $13 no lugar de $2

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecution order(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2 :

DM Reg

Reg

Reg

Reg

X X X – 20 X X X X XValue of EX/MEM :X X X X – 20 X X X XValue of MEM/W B :

DM

31

Ch6.a-611998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Condições que determinam Hazards de dados

• EX/MEM

– EX/MEM.RegisterRd = ID/EX.RegisterRs

– EX/MEM.RegisterRd = ID/EX.RegisterRt

• MEM/WB

– MEM/WB.RegisterRd = ID/EX.RegisterRs

– MEM/WB.RegisterRd = ID/EX.RegisterRt

Ch6.a-621998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Condições que determinam Hazard de dados

• Exemplo:

– sub–and: EX/MEM.RegisterRd = ID/EX.RegisterRs = $2

– sub-or: MEM/WB.RegisterRd = ID/EX.RegisterRt = $2

– sub-add: não tem hazard

– sub-sw: não tem hazard

sub $2, $1, $3 # reg $2 modificadoand $12, $2, $5 # valor de $2 depende do subor $13, $6, $2 # idem (2º operando)add $14,$2, $2 # idem (1º e 2º operandos)sw $15, 100($2) # idem (base do endereçamento)

32

Ch6.a-631998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Condições que determinam Hazard de dados

• Esta não é uma política precisa, pois existem instruções que não escrevem no register file.

– solução seria verificar o sinal RegWrite

• MIPS usa $0 para operandos de valor 0. Para instruções onde $0 é destino?

• sll $0, $1, 2

• valor diferente de zero nos regs de pipeline

• add $3, $0, $2

• se houver fowarding, $3 terá valor errado no fim da instrução add

• Para isto temos que incluir as condições

• EX/MEM.registerRd <> 0 (1º hazard)

• MEM/WB.registerRd <> 0 (2º hazard)

Ch6.a-641998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Datapath sem Forwarding

M�u�x

ALU

ID/EX MEM/WB

Data�memory

EX/MEM

a. No forwarding

Registers

33

Ch6.a-651998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Datapath com Forwarding ( para add, sub, and e or )

Registers

M�u�x M�

u�x

ALU

ID/EX MEM/WB

Data�memory

M�u�x

Forwarding�unit

EX/MEM

b. With forwarding

ForwardB

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

ForwardA

M�u�x

Ch6.a-661998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Valores dos sinais ForwardA e ForwardB

Mux Control Source ExplanaçãoForwardA = 00 ID/EX Primeiro operando da ULA ����

register fileForwardA = 10 EX/MEM Primeiro operando da ULA ����

resultado anteiror da ULAForwardA = 01 MEM/WB Primeiro operando da ULA é

antecipado da memória de dadosou um resultado anterior da ULA

ForwardB = 00 ID/EX Segundo operando da ULA ����

register fileForwardB = 10 EX/MEM Segundo operando da ULA ����

resultado anteiror da ULAForwardB = 01 MEM/WB Segundo operando da ULA é

antecipado da memória de dadosou um resultado anterior da ULA

34

Ch6.a-671998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Detecção e Controle de hazard

• O controle de fowarding será no estágio EX, pois é neste estágio que se encontram os multiplexadores de fowarding da ULA. Portanto devemos passar o númerodo registrador operando do estágio ID via registrador depipeline ID/EX ���� adicionar campo rs (bits 25-21).

Ch6.a-681998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Condições para detecção de hazard

EX hazardif (EX/MEM.RegWrite and (EX/MEM.RegisterRd <> 0 )

and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) FowardA = 10

if (EX/MEM.RegWrite and (EX/MEM.RegisterRd <> 0 ) and (EX/MEM.RegisterRd = ID/EX.RegisterRt))

FowardB = 10

MEM hazardif (MEM/WB.RegWrite and (MEM/WB.RegisterRd <> 0 )

and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) FowardA = 01

if (MEM/WB.RegWrite and (MEM/WB.RegisterRd <> 0 ) and (MEM/WB.RegisterRd = ID/EX.RegisterRt))

FowardB = 01

35

Ch6.a-691998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Observações:

• Não existe data hazard no estágio WB, pois estamos assumindo que o register file supre o resultado correto se a instrução no estágio ID é o mesmo registrador escrito pela instrução no estágio WB ���� forwarding no registerfile

– Na 1a edição do livro P&H tem hazard

Ch6.a-701998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Observações:

• Um data hazard potencial pode ocorrer entre o resultado de uma instrução em WB, o resultado de uma instrução em MEM e o operando fonte da instrução no estágio da ULA

• Neste caso o resultado é antecipado do estágio MEM porque o resultado neste estágio é mais recente

add $1, $1, $2add $1, $1, $3add $1, $1, $4

if (MEM/WB.RegWrite and (MEM/WB.RegisterRd <> 0 ) and (EX/MEM.RegisterRd <> ID/EX.registerRs)and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) FowardA = 01

if (MEM/WB.RegWrite and (MEM/WB.RegisterRd <> 0 ) and (EX/MEM.RegisterRd <> ID/EX.registerRt)and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) FowardB = 01

36

Ch6.a-711998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Datapath modificado para fowarding

PCInstruction

memory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Data

memory

Mux

Forwarding

unit

IF/ID

Instr

uc

tion

Mux

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Ch6.a-721998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Forwarding

PCInstruction�

memory

Registers

M�u�x

M�u�x

M�u�x

EX

M

WB

WB

Data�memory

M�u�x

Forwarding�unit

Inst

ruct

ion

IF/ID

and $4, $2, $5 sub $2, $1, $3

ID/EX

before<1>

EX/MEM

before<2>

MEM/WB

or $4, $4, $2

Clock 3

2

5

10 10

$2

$5

5

2

4

$1

$3

3

1

2

Control

ALU

M

WB

sub $2, $1, $3and $4, $2, $5or $4, $4, $2add $9, $4, $2

37

Ch6.a-731998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Forwarding

PCInstruction�

memory

Registers

M�u�x

M�u�x

M�u�x

EX

M

WB

M

WB

Data�memory

M�u�x

Forwarding�unit

Inst

ructio

n

IF/ID

or $4, $4, $2 and $4, $2, $5

ID/EX

sub $2, . . .

EX/MEM

before<1>

MEM/WB

add $9, $4, $2

Clock 4

4

6

10 10

$4

$2

6

2

4

$2

$5

5

2

4

Control

ALU

10

2

WB

sub $2, $1, $3and $4, $2, $5or $4, $4, $2add $9, $4, $2

Ch6.a-741998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Forwarding

PCInstruction�

memory

Registers

M�u�x

M�u�x

M�u�x

EX

M

WB

M

WB

Data�memory

M�u�x

Forwarding�unit

Inst

ruct

ion

IF/ID

add $9, $4, $2 or $4, $4, $2

ID/EX

and $4, . . .

EX/MEM

sub $2, . . .

MEM/WB

after<1>

Clock 5

4

2

10 10

$4

$2

2

4

9

$4

$2

4

2

24

Control

ALU

10

WB

2

1

4

sub $2, $1, $3and $4, $2, $5or $4, $4, $2add $9, $4, $2

38

Ch6.a-751998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Forwarding

PCInstruction�

memory

M�u�x

M�u�x

M�u�x

EX

M

WB

M

WB

Data�memory

M�u�x

Forwarding�unit

after<1>after<2> add $9, $4, $2 or $4, . . .

EX/MEM

and $4, . . .

MEM/WB

Clock 6

10

$4

$2

2

4

9

ALU

10

4

WB

4

1

Registers

Inst

ructio

n

IF/ID

ID/EX

4

Control

sub $2, $1, $3and $4, $2, $5or $4, $4, $2add $9, $4, $2

Ch6.a-761998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Forwarding: Datapath para entrada signed-imediate

necessária para lw e sw

ALUSrcRegisters

Mux

Mux

Mux

ALU

ID/EX MEM/WB

Datamemory

Mux

Forwardingunit

EX/MEM

Mux

39

Ch6.a-771998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Data Hazard e Stalls

Reg

IM

Reg

Reg

IM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $2, 20($1)

Program�execution�order�(in instructions)

and $4, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

DM Reg

Reg

Reg

DM

Ch6.a-781998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Data Hazards e Stalls

• Quando uma instrução tenta ler um registrador precedida por uma instrução de load, que escreve no mesmo registrador ���� o dado tem que ser mantido (ciclo 4) enquanto a ULA executa a operação ���� atrasar o pipelinepara que a instrução leia o valor correto.

• Condição de detecção de hazard para atraso no pipeline

# testa se é um load:

if (ID/EX.MemRead and

# verifica se registrador destino da instrução load em EX é o registrador fonte da instrução em ID:

((ID/EX.RegisterRt = IF/ID.RegisterRs) or( ID/EX.RegisterRt = IF/ID.RegisterRt)))

stall pipeline

40

Ch6.a-791998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Data Hazards e Stalls

lw $2, 20($1)

Program�execution�order�(in instructions)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

Reg

IM

Reg

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

IM Reg DM RegIM

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

RegReg

Reg

bubble

Ch6.a-801998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Data Hazards e Stalls

• Se a instrução no estágio ID é atrasada, a instrução no estágio IF também deve ser atrasada

• Como fazer?Impedir a mudança no PC e no registrador IF/IF. A instrução em IF continua sendo lida e em ID continuam sendo lido os mesmos campos da instrução.

• Stall pipeline: mesmo efeito da instrução nop começandopelo estágio EX ���� desativar os 9 sinais de controle dos estágios EX, MEM e WB

41

Ch6.a-811998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Datapath com forwarding e data hazard detection

PCInstruction

memory

Registers

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwarding

unit

0

Mux

IF/ID

Instr

uct

ion

ID/EX.MemRead

IF/I

DW

r ite

PC

Wri

te

ID/EX.RegisterRt

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

Rt

Rs

Rd

RtEX/MEM.RegisterRd

MEM/WB.RegisterRd

Ch6.a-821998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Seqüência de execução do exemplo anterior

Hazard�detection�

unit

0

M�u�x

IF/I

DW

rite

PC

Wri

te

ID/EX.RegisterRt

ID/EX.MemRead

M

WB

$1

$X

X

1

2

before<3>

PCInstruction�

memory

Registers

M�u�x

M�u�x

M�u�x

EX WB

Data�memory

M�u�x

Forwarding�unit

Inst

ruct

ion

IF/ID

ID/EX

EX/MEM

MEM/WB

and $4, $2, $5 lw $2, 20($1) before<1> before<2>

Clock 2

1

1

X

X11

Control

ALU

M

WB

42

Ch6.a-831998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Seqüência de execução do exemplo anterior

Hazard�detection�

unit

0

M�u�x

IF/I

DW

rite

PC

Wri

te

ID/EX.RegisterRt

lw $2, 20($1)

PCInstruction�

memory

Registers

M�u�x

M�u�x

M�u�x

EX

M

WB

WB

Data�memory

M�u�x

Forwarding�unit

Inst

ructio

n

IF/ID

and $4, $2, $5

ID/EX

before<1>

EX/MEM

before<2>

MEM/WB

or $4, $4, $2

Clock 3

2

5

2

500 11

$2

$5

5

2

4

$1

$X

X

1

2

Control

ALU

M

WB

ID/EX.MemRead

Ch6.a-841998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Seqüência de execução do exemplo anterior

$2

$5

5

2

24

WB

Hazard�detection�

unit

0

M�u�x

IF/ID

Wri

te

PC

Write

ID/EX.RegisterRt

PCInstruction�

memory

Registers

M�u�x

M�u�x

M�u�x

EX

M

WB

Data�memory

M�u�x

Instr

uctio

n

IF/ID

and $4, $2, $5 bubble

ID/EX

lw $2, . . .

EX/MEM

before<1>

MEM/WB

Clock 4

2

2

5

510

11

00

$2

$5

5

2

4

Control

ALU

M

WB

Forwarding�unit

ID/EX.MemRead

or $4, $4, $2

43

Ch6.a-851998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Seqüência de execução do exemplo anterior

Hazard�detection�

unit

0

M�u�x

IF/ID

Wri

te

PC

Write

ID/EX.RegisterRt

2

bubble lw $2, . . .

PCInstruction�

memory

Registers

M�u�x

M�u�x

M�u�x

EX

M

WB

M

WB

Data�memory

M�u�x

Forwarding�unit

Instr

uct

ion

IF/ID

and $4, $2, $5

ID/EX

EX/MEM

MEM/WB

add $9, $4, $2

Clock 5

2

2

10 10

11

$4

$2

2

4

4

4

2

4

$2

$5

5

2

4

Control

ALU

0

WB

ID/EX.MemRead

or $4, $4, $2

Ch6.a-861998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Seqüência de execução do exemplo anterior

PCInstruction�

memory

Hazard�detection�

unit

0

M�u�x

IF/I

DW

rite

PC

Wri

te

ID/EX.RegisterRt

bubble

Registers

M�u�x

M�u�x

EX

M

WB

M

WB

Data�memory

M�u�x

Inst

ructio

n

IF/ID

add $9, $4, $2

ID/EX

and $4, . . .

EX/MEM

MEM/WB

Clock 6

4

4

2

210 10

$4

$2

2

4

49

$2

2

Control

ALU

10

WB0

after<1>

Forwarding�unit

$4

4

4

or $4, $4, $2

ID/EX.MemRead

M�u�x

44

Ch6.a-871998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Seqüência de execução do exemplo anterior

Registers

Instr

uct

ion

ID/EX

4

Control

PCInstruction�

memory

IF/I

DW

rite

PC

Wri

te

add $9, $4, $2 or $4, . . . and $4, . . .after<2> after<1>

Clock 7

M�u�x

M�u�x

M�u�x

EX

M

WB

M

WB

Data�memory

M�u�x

Forwarding�unit

EX/MEM

MEM/WB

10 10

$4

$2

2

4

9

ALU

10

WB

44

1

Hazard�detection�

unit

0

M�u�x

ID/EX.RegisterRt

ID/EX.MemRead

IF/ID

Ch6.a-881998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Branch Hazards

• Devemos ter um fetch de instrução por ciclo de clock, decisão:qual caminho de um branch deve ocorrer até o estágio MEM.

Reg

Reg

CC 1

Time (in clock cycles)

40 beq $1, $3, 7

Program�execution�order�(in instructions)

IM Reg

IM DM

IM DM

IM DM

DM

DM Reg

Reg Reg

Reg

Reg

RegIM

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

Reg

45

Ch6.a-891998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Branch Hazards

• Tês esquemas para resolver control hazard :

• Branch-delay Slots

• Assume Branch Not Taken

• Dynamic Branch Prediction

• Assume Branch Not Taken

• continua a execução seqüencialmente e se o branch for tomado, descarta as instruções entre a instrução de branch e a instrução no endereço alvo, fazendo seus sinais de controle iguais a zero

Ch6.a-901998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Branch Hazards

• Redução do atraso de branches

• reduzir o custo se o branch for tomado

• adiantar a execução da instrução de branch.

• O next PC para uma instrução de branch é selecionado no estágio MEM

• executar o branch no estágio ID(apenas uma instrução será descartada)

• deslocar o cálculo do endereço de branch (branch adder) do MEM para o ID e comparando os registradores lidos do register file.

46

Ch6.a-911998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Branch Hazards

PCInstruction

memory

4

Registers

Mux

Mux

Mux

ALU

EX

M

WB

M

WB

WB

ID/EX

0

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

Signextend

Control

Mux

=

Shiftleft 2

Mux

Ch6.a-921998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Branch Hazards (exemplo)

36 sub $10, $4, $840 beq $1, $3, 7 # PC 40 + 4 +7*4 = 7244 and $12, $2, $548 or $13, $2, $652 add $14, $4, $256 slt $15, $6, $7.... ................ ............72 lw $4, 50($7)

47

Ch6.a-931998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Branch Hazards (exemplo)

PCInstruction�

memory

4

Registers

Sign�

extend

M�u�x

M�u�x

Control

EX

M

WB

M

WB

WB

M�u�x

Hazard�detection�

unit

Forwarding�unit

M�u�x

IF.Flush

IF/ID

and $12, $2, $5 beq $1, $3, 7 sub $10, $4, $8

MEM/WB

EX/MEM

ID/EX

Clock 3

72 44

48 44

28

7

$1

$3

10

48

72

72

0

$4

$8

ALUData�

memory

M�u�x

Shift�

left 2

before<1> before<2>

=

Ch6.a-941998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Branch Hazards (exemplo)

M�u�x

0

bubble (nop)lw $4, 50($7)

Clock 4

� beq $1, $3, 7 sub $10, . . . before<1>

PCInstruction�

memory

4

Registers

Sign�extend

M�u�x

M�u�x

Control

EX

M

WB

M

WB

WB

M�u�x

Hazard�

detection�

unit

Forwarding�unit

IF.Flush

IF/ID

MEM/WB

EX/MEM

ID/EX

76 72

76 72

$1

$3

10

76

ALUData�

memory

M�u�x

Shift�

left 2

=

48

Ch6.a-951998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Dymanic Branch Prediction

• Branch not taken

• forma de branch predicton que assume que o

branch não será tomado.

• Dymanic Branch Prediction

• descobre se o branch foi tomado ou não na última vez

que foi executado e faz o fetch das instruções pelo

mesmo local da última vez.

Ch6.a-961998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Dymanic Branch Prediction

• Implementação

• branch prediction buffer

• branch history table

• Branch prediction buffer: pequena memória indexada por bits menos significativos do endereço da instrução de branch. Ela contém um bit que diz se o branch foi recentemente tomado ou não (Neste esquema não sabemos se a previsão é correta ou não, pois este buffer pode ser alterado por outra instrução de branch que tem os mesmos bits menos significativos de endereço).

49

Ch6.a-971998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Tratamento de Exceção

PCInstruction�

memory

4

Registers

Sign�extend

M�u�x

M�u�x

M�u�x

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

M�u�x

Data�memory

M�u�x

Hazard�detection�

unit

Forwarding�unit

IF.Flush

IF/ID

=

Except�PC

40000040

0

M�u�x

0

M�u�x

0

M�u�x

ID.Flush EX.Flush

Cause

Shift�left 2

Ch6.a-981998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Tratamento de Exceção

Exemplo:

Dado a seqüência abaixo:

40hex sub $11, $2, $4

44hex and $12, $2, $5

48hex or $13, $2, $6

4Chex add $1, $2, $1

50hex slt $15, $6, $7

54hex lw $16, 50($7)

Assumir que as instruções a serem chamadas em um tratamento de exceção

comecem com:

40000040hex sw $25, 1000($0)

40000044hex sw $26, 1004($0)

Mostre o que acontece no pipeline se uma exceção de overflow ocorre na

instrução add.

50

Ch6.a-991998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Tratamento de Exceção - A figura abaixo mostra o que acontece a

partir da instrução add em EX (clock 5)s lt $ 1 5 , $ 6 , $ 7lw $ 1 6 , 5 0 ($ 7 ) a d d $ 1 , $ 2 , $ 1 o r $ 1 3 , . . . a n d $ 1 2 , . . .

C lo c k 5

4 0 0 0 0 0 4 0

0

0

0

0 1 0

1 0

0

0

1

5 8 5 4

5 4

1 2

$ 6

$ 7

1 5

5 0

$ 2

$ 1

$ 1

1 3 1 2

D a t a�m e m o r y

b u b b l e ( n o p )s w $ 2 5 , 1 0 0 0 ( $ 0 ) b u b b le b u b b le o r $ 1 3 , . . .

C lo c k 6

4 0 0 0 0 0 4 4

4 0 0 0 0 0 4 4

1 3

0

0

0

0 0 0

0 00 0

0 0

1

1 3

D a ta�m e m o r y

4 0 0 0 0 0 4 0

P C

4

R e g is t e r s

S ig n �e x te n d

M �

u �x

M �

u �x

M �

u �x

A L U

E X

M

W B

M

W B

W B

ID / E X

E X / M E M

M E M / W B

M �

u�x

M �

u �x

H a z a rd�

d e te c t i o n �

u n i t

F o r w a r d in g�u n it

IF .F l u s h

IF / ID

=

E x c e p t�

P C

4 0 0 0 0 0 4 0

0

M �

u�x

0

M �

u�x

0

M �

u�x

I D . F lu s h E X . F lu s h

C a u s e

S h ift�

le ft 2

P C

4

R e g is t e r s

S i g n �e x te n d

M �

u �x

M �

u �x

M �

u �x

C o n t ro l

A L U

E X

M

W B

M

W B

W B

ID / E X

E X / M E M

M E M / W B

M �

u�x

M �

u �x

H a z a rd�d e te c t i o n �

u n i t

F o r w a r d in g�

u n it

IF .F l u s h

IF / ID

=

E x c e p t�P C

4 0 0 0 0 0 4 0

0

M �

u�x

0

M �

u�x

0

M �

u�x

ID .F lu s h E X . F lu s h

C a u s e

S h i ft�le f t 2

In s t r u c t io n �m e m o r y

In s t r u c t io n �

m e m o r y

C o n tr o l

Ch6.a-1001998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipeline Super Escalar

PCInstruction�

memory

4

Registers

M�u�x

M�u�x

ALU

M�u�x

Data�memory

M�u�x

40000040

Sign�extend Sign�

extend

ALU Address

Write�data

51

Ch6.a-1011998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipeline Super Escalar

Exemplo:

Como o loop abaixo será escalonado em um MIPS superscalar:

Loop: lw $t0, $0($s2) # t0= elemento do arrayaddu$t0, $t0, $t2 # add escalar em $s2sw $t0, 0($s2) # armazena o resultadoaddi $s1, $s1,-4 # decrementa o ponteirobne $s1, $zero, Loop # branch se $s1 != 0

Reordenar as instruções para evitar tantos pipelines stalls quanto possível

Ch6.a-1021998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipeline Super Escalar

Solução:A primeira com a terceira e as duas últimas instruções tem dependência de dados.

52

Ch6.a-1031998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Pipeline Super Escalar

Loop unrolling para pipelines escalares ���� técnica para aumentar o desempenho para loops que acessam arrays ���� múltiplas cópias do corpo do loop são feitas e instruções de diferentes iterações são escalonadas juntas

Exemplo ���� supor exemplo anterior

Solução ���� 4 cópias do corpo do loop

Ch6.a-1041998 Morgan Kaufmann PublishersPaulo C. Centoducatte

Datapath final

PC

Instructionmemory

4

Registers

Signextend

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Mux

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

Mux

ExceptPC

40000040

0

Mux

0

Mux

0

Mux

ID.Flush EX.Flush

Cause

Shiftleft 2

Writedata

Readdata

Address

Readdata

Address Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

ALUcontrol

3216

Instruction

Instruction [15–11]

Instruction [20–16]

Instruction [20–16]

Instruction [25–21]

RegWrite

ALUOp

ALUSrc

RegDst

MemWrite

MemRead

MemtoReg

Branch

=