static ilp static (compiler based) scheduling

44
StaticILP.1 2/12/02 Static ILP Static (Compiler Based) Scheduling Σημειώσεις UW-Madison Διαβάστε κεφ. 4 βιβλίο, και Paper on Itanium στην ιστοσελίδα

Upload: vanida

Post on 24-Jan-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Static ILP Static (Compiler Based) Scheduling. Σημειώσεις UW-Madison Διαβάστε κεφ. 4 βιβλίο, και Paper on Itanium στην ιστοσελίδα. Today’s Theme and Contents. Let compiler uncover the ILP Objective:more ilp/simpler hardware/faster clock/less power How: Static Scheduling Loop Unrolling - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Static ILP   Static (Compiler Based) Scheduling

StaticILP.12/12/02

Static ILP Static (Compiler Based) Scheduling

• Σημειώσεις UW-Madison• Διαβάστε κεφ. 4 βιβλίο, και• Paper on Itanium στην ιστοσελίδα

Page 2: Static ILP   Static (Compiler Based) Scheduling

StaticILP.22/12/02

Today’s Theme and Contents

• Let compiler uncover the ILP– Objective:more ilp/simpler hardware/faster clock/less power

• How:– Static Scheduling– Loop Unrolling– software pipelining,– Static Multiple Issue: VLIW

» local, global scheduling» static branch prediction» software speculation: trace scheduling, superblocks» nops, lockstep» conditional moves,predication» speculative loads

• IA-64 and Itanium

Page 3: Static ILP   Static (Compiler Based) Scheduling

StaticILP.32/12/02

Basic Idea

• The compiler moves dependent instructions apart to avoid hazards

• This means:– such instructions exist (if not there employ

transformations)– the compiler knows implementation details

» latency AND superscalarity (issue width)

• What happens if implementation changes?

• Static ILP applicable to statically and dynamically scheduled processors

• Statically scheduled processors: the compiler dictates which instructions can execute together (scheduling done in software)

Page 4: Static ILP   Static (Compiler Based) Scheduling

StaticILP.42/12/02

(Local Scheduling)

Page 5: Static ILP   Static (Compiler Based) Scheduling

StaticILP.52/12/02

(Local Scheduling)

Page 6: Static ILP   Static (Compiler Based) Scheduling

StaticILP.62/12/02

Page 7: Static ILP   Static (Compiler Based) Scheduling

StaticILP.72/12/02

Page 8: Static ILP   Static (Compiler Based) Scheduling

StaticILP.82/12/02

Page 9: Static ILP   Static (Compiler Based) Scheduling

StaticILP.92/12/02

Page 10: Static ILP   Static (Compiler Based) Scheduling

StaticILP.102/12/02

Page 11: Static ILP   Static (Compiler Based) Scheduling

StaticILP.112/12/02

Page 12: Static ILP   Static (Compiler Based) Scheduling

StaticILP.122/12/02

Page 13: Static ILP   Static (Compiler Based) Scheduling

StaticILP.132/12/02

Page 14: Static ILP   Static (Compiler Based) Scheduling

StaticILP.142/12/02

Page 15: Static ILP   Static (Compiler Based) Scheduling

StaticILP.152/12/02

(useful for large iteration counts)

Page 16: Static ILP   Static (Compiler Based) Scheduling

StaticILP.162/12/02

Software speculation/Global Scheduling

Page 17: Static ILP   Static (Compiler Based) Scheduling

StaticILP.172/12/02

Page 18: Static ILP   Static (Compiler Based) Scheduling

StaticILP.182/12/02

HOW??

Static prediction, profile, frequency, pathWhich is better the above or dynamic prediction

Page 19: Static ILP   Static (Compiler Based) Scheduling

StaticILP.192/12/02

Page 20: Static ILP   Static (Compiler Based) Scheduling

StaticILP.202/12/02

Register pressure

Page 21: Static ILP   Static (Compiler Based) Scheduling

StaticILP.212/12/02

Superblocking: overcomes some of the complexities of trace schedulingsingle vs multiple entry

Page 22: Static ILP   Static (Compiler Based) Scheduling

StaticILP.222/12/02

Page 23: Static ILP   Static (Compiler Based) Scheduling

StaticILP.232/12/02

Page 24: Static ILP   Static (Compiler Based) Scheduling

StaticILP.242/12/02

Page 25: Static ILP   Static (Compiler Based) Scheduling

StaticILP.252/12/02

Does noy have

Page 26: Static ILP   Static (Compiler Based) Scheduling

StaticILP.262/12/02

Page 27: Static ILP   Static (Compiler Based) Scheduling

StaticILP.272/12/02

Page 28: Static ILP   Static (Compiler Based) Scheduling

StaticILP.282/12/02

PentiumIV +3GHz vs Itanium 1GHz

Page 29: Static ILP   Static (Compiler Based) Scheduling

StaticILP.292/12/02

LockStep: any hazard stall / NOPs if not enough //ism

Page 30: Static ILP   Static (Compiler Based) Scheduling

StaticILP.302/12/02

Page 31: Static ILP   Static (Compiler Based) Scheduling

StaticILP.312/12/02

Predicated Execution &Conditional Moves

Convert control dependences to data dependences

if (a=0) s=t;R1 R2 R3

bnez R1,Laddu R2,R3,0

L:

cmovz R2,R3,R1

Above for all itypes is called predication…

+/-?

Page 32: Static ILP   Static (Compiler Based) Scheduling

StaticILP.322/12/02

Speculative Loads

Bypass stores speculative - repair code in case ofmispeculationUse an address buffer

1. LookUp Table: updated by address of speculative load

2. Updated by addresses of intervening stores

3. Check instruction that no store conflicted and release

entry

Page 33: Static ILP   Static (Compiler Based) Scheduling

StaticILP.332/12/02

Page 34: Static ILP   Static (Compiler Based) Scheduling

StaticILP.342/12/02

Page 35: Static ILP   Static (Compiler Based) Scheduling

StaticILP.352/12/02

Page 36: Static ILP   Static (Compiler Based) Scheduling

StaticILP.362/12/02

Page 37: Static ILP   Static (Compiler Based) Scheduling

StaticILP.372/12/02

Page 38: Static ILP   Static (Compiler Based) Scheduling

StaticILP.382/12/02

Let the compiler do the work

• All• Most of it• As long as it improves performance• …

Page 39: Static ILP   Static (Compiler Based) Scheduling

StaticILP.392/12/02

by Harsh Sharangpani and Ken Arora

see web page

Page 40: Static ILP   Static (Compiler Based) Scheduling

StaticILP.402/12/02

Page 41: Static ILP   Static (Compiler Based) Scheduling

StaticILP.412/12/02

IdeaCompiler has

larger instruction

window than hardware.

Communicateto the hardware

more of the information gleaned at

compile time.

Page 42: Static ILP   Static (Compiler Based) Scheduling

StaticILP.422/12/02

Six instructions wide and ten stage deepTries to minimize latency of most frequent operations

Hardware support for compilation time indeterminacies

Page 43: Static ILP   Static (Compiler Based) Scheduling

StaticILP.432/12/02

Software initiated prefetch (requests filtered by instruction cache)prefetch must be 12 cycles before branch to hide latencyL2 -> streaming buffer -> instruction cache

Four level branch predictor hierarchy to prevent 9-cycle pipeline stall Decoupling buffer hold up to 8 bundles of code (bundle?)

Page 44: Static ILP   Static (Compiler Based) Scheduling

StaticILP.442/12/02

Conclusion/Future

• Compiler can do a lot of the work but need hardware assitance

• Currently in pursue of best of both worlds

• Future:– How long IA-32 will last --- and will IA-64 take over IA32

market?– Will IA64 be the only ISA in the world?