profile-guided microarchitectural floorplanning for deep submicron processor design mongkol...

32
Profile-Guided Profile-Guided Microarchitectural Microarchitectural Floorplanning for Deep Floorplanning for Deep Submicron Processor Design Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S. Lee, Thaisiri Watewai*, Hsien-Hsin S. Lee, and Sung Kyu Lim and Sung Kyu Lim Georgia Institute of Technology Georgia Institute of Technology , , * University of California at * University of California at Berkeley Berkeley

Upload: hugh-clarke

Post on 26-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Profile-Guided Profile-Guided Microarchitectural Microarchitectural

Floorplanning for Deep Floorplanning for Deep Submicron Processor DesignSubmicron Processor Design

Mongkol Ekpanyapong, Jacob R. Minz, Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S. Lee, and Thaisiri Watewai*, Hsien-Hsin S. Lee, and

Sung Kyu LimSung Kyu Lim

Georgia Institute of Technology Georgia Institute of Technology,,* University of California at Berkeley* University of California at Berkeley

Page 2: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Current Processor Design ParadigmCurrent Processor Design Paradigm

Computer Architecture Computer Architecture DesignDesign

Employ the availability of Employ the availability of silicon area.silicon area.

Employ the higher clock Employ the higher clock speed to enhance the speed to enhance the performance.performance.

Assume unit delay Assume unit delay model.model.

Architects just do their Architects just do their own good jobs assuming own good jobs assuming that smart CAD tools will that smart CAD tools will do the rest of the work.do the rest of the work.

VLSI & Physical Design CADVLSI & Physical Design CAD Minimize both gate and Minimize both gate and

wire delay.wire delay.

Minimize total die area.Minimize total die area.

Accomplish above by Accomplish above by knowing about the design knowing about the design as little as possible.as little as possible.

CAD designers just CAD designers just designdesigna good tools assuming a good tools assuming that computer architects that computer architects did their good job.did their good job.

Page 3: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Next Generation Processor Next Generation Processor DesignDesign

Computer Architecture Computer Architecture DesignDesign

Larger capacity, no longer Larger capacity, no longer mean better performance. mean better performance.

Higher clock speed does Higher clock speed does not imply the same rate of not imply the same rate of performance performance improvement.improvement.

Unit delay model is no Unit delay model is no longer practical.longer practical.

Good processor need Good processor need some interactions with some interactions with CAD tools.CAD tools.

VLSI & Physical Design CADVLSI & Physical Design CAD Performance driven Performance driven

Physical Planning is not Physical Planning is not enough.enough.

Employing some Employing some knowledge for the design knowledge for the design can result in better can result in better performance.performance.

Iterations between Iterations between computer architecture computer architecture design and CAD tools is design and CAD tools is necessary.necessary.

Smart CAD tools needSmart CAD tools needsome help from computer some help from computer architect.architect.

Page 4: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

TerminologyTerminology ProfilingProfiling

The techniques for compiler or computer The techniques for compiler or computer architecturearchitectureto collect statistic information that can result into collect statistic information that can result inbetter optimization.better optimization.

Instructions Per Cycle Instructions Per Cycle ((IPCIPC))Number of instructions that can be issued per a Number of instructions that can be issued per a

cycle.cycle.

Billions Instruction Per Second Billions Instruction Per Second ((BIPSBIPS))

Number of instructions that can be issued per aNumber of instructions that can be issued per agiven second.given second.

Page 5: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

OutlineOutline IntroductionIntroduction Related WorkRelated Work Wire Delay IssuesWire Delay Issues Profile-Guided FloorplanningProfile-Guided Floorplanning Simulation InfrastructureSimulation Infrastructure Experimental ResultsExperimental Results ConclusionsConclusions

Page 6: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Related WorkRelated Work Ho et al. [SRC 1999,IEEE 2001]Ho et al. [SRC 1999,IEEE 2001]

Discussed about the impact of wire delay in Discussed about the impact of wire delay in deep submicron technology.deep submicron technology.

Agarwal et al. [ISCA 2000]Agarwal et al. [ISCA 2000]

Raised the issue of wirelength impact in Raised the issue of wirelength impact in designing conventional microarchitecture in designing conventional microarchitecture in this submicron processor design.this submicron processor design.

Cong el al. [DAC 2003]Cong el al. [DAC 2003]

Proposed that BIPS should be used instead of Proposed that BIPS should be used instead of IPC, widely used metric in current processor IPC, widely used metric in current processor design.design.

Page 7: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

OutlineOutline IntroductionIntroduction Related WorkRelated Work Wire Delay IssuesWire Delay Issues Profile-Guided FloorplanningProfile-Guided Floorplanning Simulation InfrastructureSimulation Infrastructure Experimental ResultsExperimental Results ConclusionsConclusions

Page 8: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Ho et al. classify wires to be three classes:Ho et al. classify wires to be three classes: Local wire.Local wire. Global wire.Global wire. Repeated wire.Repeated wire.

For 30 nm technology Repeated wire delay is approximated to be 80pS/mm. A FO4 gate delay is approximately 17pS.

To archive the target high frequency, flipflop insertionTo archive the target high frequency, flipflop insertionis required.is required.

4For example, the Pentium processor design has 2dedicated pipeline stages for moving signal across

ttt tttt ttt tt tttt ttttt

When Wire Delay Becomes the When Wire Delay Becomes the ProblemProblem

Page 9: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Reducing Wire Delay ImpactReducing Wire Delay Impact Buffers InsertionBuffers Insertion

Ho et al. provide the repeated wire delay equation Ho et al. provide the repeated wire delay equation as follows:as follows:

Flipflops InsertionFlipflops Insertion

Module 1

Module 2

FF

FF

FF

Module 1

Module 2

FF

FF

FF

FF

FF

Module 1 Module 2

Page 10: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

OutlineOutline IntroductionIntroduction Related WorkRelated Work Wire Delay IssuesWire Delay Issues Profile-Guided FloorplanningProfile-Guided Floorplanning Simulation InfrastructureSimulation Infrastructure Experimental ResultsExperimental Results ConclusionsConclusions

Page 11: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Microarchitectural Planning Microarchitectural Planning FrameworkFramework

CACTI:CACTI:Area and delay estimator Area and delay estimator for buffer-like structure.for buffer-like structure.

GENGENESYS:ESYS:Area and delay estimator Area and delay estimator for other structure.for other structure.

PROFILING: Using Cycle-PROFILING: Using Cycle-Accurate Simulator to Accurate Simulator to acquire statistic acquire statistic information.information.

FLOORPLANNERFLOORPLANNER CYCLE ACCURATE CYCLE ACCURATE

SIMULATOR:SIMULATOR:Evaluating the result.Evaluating the result.

CACTI GENESYS PROFILING

Technology Parameter Machine Description Benchmark

FLOORPLANNER

Frequency TargetRange

CYCLEACCURATESIMULATOR

ArchitectureRedesign

Module Info. InterconnectStatistic Info.

Page 12: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Microarchitecture PlanningMicroarchitecture Planning

2 cycles

2 cycles 2 cycles

2 cycles2 cycles

2 cycles

3 cycles

2 cycles

3 cycles

1 cycles

1 cycles

1 cycles

1 cycles

1 cycles

1 cycles

To SimulatorMicroarchitectureRedesign

Page 13: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Mixed Integer Non-Linear Mixed Integer Non-Linear ProgrammingProgramming

Inputs:fij = number of flipflops between

module i and j before considering wire delay impact.

L = target cycle time (1/clock freq.).

gi = gate delay for module i.wmax,i , wmin,i = max. and min. half

width of module i.ij = interconnect traffic info.

between module i and j. = repeated delay per mm.Paremeters:xi,yi= location info for module iwi = half width of module iOutput:zij = number of flipflops between

module i and jNote that M is a large number.

Page 14: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

(MINP) Non-overlap Constraint(MINP) Non-overlap Constraint

jiw

ay

w

ay

jiw

ay

w

ay

jiwxwx

jiwxwx

wha

j

jj

ii

ii

j

jj

i

ii

jjii

jjii

iii

of aboveon is , 44

of belowon is , 44

ofright on the is ,

ofleft on the is ,

2 2

The relation between The relation between module module ii and and jj ca can be n be either left, right, above, or either left, right, above, or below relationship based below relationship based on value set by binary on value set by binary ccijij and and ddijij..

xi

wi

xj

wj

Page 15: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

(MINP) Non-linear Relationship(MINP) Non-linear Relationship

The relation between The relation between module module ii and and jj ca can be n be either left, right, above, or either left, right, above, or below relationship based below relationship based on value set by binary on value set by binary ccijij and and ddijij..

ai = 2hi x 2wi

xi+wi ≤ xj – wj , i is on the left of j

xi-wi ≥ xj + wj , i is on the right of j

4 yi wi wj + ai wj ≤ 4 yj wi wj – aj wi

, i is on the below of j4 yi wi wj + ai wj ≥ 4 yj wiwj – aj wi

, i is on the above of j

Page 16: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

(MINP) Flipflop Constraint(MINP) Flipflop Constraint

Number of flipflops Number of flipflops between modules between modules ii and and jj has to be larger than has to be larger than summation between gate summation between gate delay anddelay and wire delay wire delay between these two between these two modules divided by target modules divided by target cycle time.cycle time.

3 ns 2ns2ns

Cycle Time (L) = 4 ns

Page 17: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

(MINP) Objective(MINP) Objective

Minimizing weighted Minimizing weighted wire length when the wire length when the weight value is weight value is interconnect traffic interconnect traffic information from information from profiling.profiling.

Note that which the Note that which the same target technology same target technology and clock frequency: and clock frequency: ggii, , , and , and LL are constant. are constant.

Page 18: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Non-Linear RelaxationNon-Linear Relaxation

ih

iw

iwmin,iwmax,

iiii kwmh

i

ii w

ah

4

i

i

i

ii

ii

ii

w

a

w

ak

ww

am

min,max,

max,min,

4

4

4

=

= +

=

= +

Page 19: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Mixed Integer Linear Mixed Integer Linear ProgrammingProgramming

Page 20: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Integer RelaxationInteger Relaxation Solving Mixed Integer Programming is NP hard.Solving Mixed Integer Programming is NP hard. Using bipartitioning for relaxationUsing bipartitioning for relaxation

Page 21: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Linear ProgrammingLinear Programming

rrjj,,lljj,,ttjj,,bbj j are right, left, top, bottom of the hard virtual are right, left, top, bottom of the hard virtual box constraints imposed on our floorplanner.box constraints imposed on our floorplanner.

Soft virtual box Soft virtual box constraint that constraint that allow module to allow module to relocate (crossing relocate (crossing between blocks) by between blocks) by maintaining center maintaining center of gravity of gravity constraints.constraints.

Page 22: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Floorplanning AlgorithmFloorplanning Algorithm

Last iteration

Page 23: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

OutlineOutline IntroductionIntroduction Related WorkRelated Work Wire Delay IssuesWire Delay Issues Profile-Guided FloorplanningProfile-Guided Floorplanning Simulation InfrastructureSimulation Infrastructure Experimental ResultsExperimental Results ConclusionsConclusions

Page 24: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Simulation InfrastructureSimulation Infrastructure

fetch

i1cache mmu

reg file

dispatch

loadq

wb

bpred btb

issuecommit

dl1cache

d2cache

i2cache

L3cache

fp reg file

ruu

biu memctrl

fruu

ialu

fpissue

ialuialu

ialuialuialu

ialu

ialuialu

ialufpu

storeq

fetch q

Page 25: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Simulator ModificationsSimulator Modifications Including a new feature of configurable

pipeline depth.

From the impact of wire delay, the pipeline depthcan be impacted by module locations.

Non-uniform forwarding latency.Uniform latency is no longer practical.Location information is necessary to determine forwarding latency.

Page 26: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Microarchitecture ConfigurationsMicroarchitecture ConfigurationsStructure Config 1 Config 2 Config 3 Config 4 Bits

Bpred 128 512 512 512 2

BTB 128 512 512 512 96

RUU 64 128 512 512 168

Int RF 32 32 32 32 64

FP RF 32 32 32 32 64

L1 Icache 8K 64K 8K 8K 512

L1 Dcache 8K 64K 8K 8K 512

L2 Ucache 64K 512K 128K 128K 1024

L3 Ucache - - 2M 2M 1024

ITLB 32 128 128 128 112

DTLB 32 128 128 128 112

ALU 2 4 4 8 -

FPU 1 2 2 4

LSQ 16 64 128 128 84

Mem port 1 4 4 4

Page 27: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

OutlineOutline IntroductionIntroduction Related WorkRelated Work Wire Delay IssuesWire Delay Issues Profile-Guided FloorplanningProfile-Guided Floorplanning Simulation InfrastructureSimulation Infrastructure Experimental ResultsExperimental Results ConclusionsConclusions

Page 28: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

IPC improvementIPC improvement

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

gzip vpr mcf gap bzip2 tw olf sw im art equake lucas Avg.

IPC

WL_CONFIG1 PGF_CONFIG1 WL_CONFIG2 PGF_CONFIG2 WL_CONFIG3 PGF_CONFIG3 WL_CONFIG4 PGF_CONFIG4

Normalized

IPC

Page 29: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

Impact on WirelengthImpact on Wirelength

0

0.5

1

1.5

2

2.5

3

gzip vpr mcf gap bzip2 twolf swim art equake lucas Avg.

WL ratioconfig1 config2 config3 config4

Page 30: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

BIPS Impact on Frequency ScalingBIPS Impact on Frequency Scaling

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

5GHz 5.5GHz 7.1GHz 10GHz 14.3GHz 20GHz

WirelengthProfile-Guided

Page 31: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S

ConclusionsConclusions Profile- guided fl oorplan is formulated using

linear programming. T echnology scaling parameters and the

information of dynamic internnection traffic be tween microarchitectural modules are

employed to guide the floorplanner to minimized weighted wirelength.

Our algorithm shows up to 40% resultimprovement over wirelength objective floorplanning.

Our fl oorplanner is more scalable than a conve ntional approach.

Profile-guided floorplanning can outperformTiming driven floorplannning on high frequency.

Page 32: Profile-Guided Microarchitectural Floorplanning for Deep Submicron Processor Design Mongkol Ekpanyapong, Jacob R. Minz, Thaisiri Watewai*, Hsien-Hsin S