seeing the forest and the trees: steiner wirelength optimization …imarkov/pubs/sli/markovut... ·...
TRANSCRIPT
![Page 1: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/1.jpg)
Seeing the Forest and the Trees: Steiner Wirelength Optimization in Placement
Jarrod A. Roy, James F. Lu and Igor L. MarkovUniversity of Michigan Ann Arbor
![Page 2: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/2.jpg)
Outline
MotivationWhy current placement tools are outdatedAnalysis of placement objectivesA naïve attempt at optimization
Our placement frameworkNew techniquesEmpirical resultsConclusions
![Page 3: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/3.jpg)
Place-and-routePivotal step in any design flowClosely related to physical synthesisIs becoming harder every year
Greater scale, “boulders and dust”, fixed obstaclesNovel design techniques require P&R supportHeavily affected by variability
P&R in tool flowsSingle step for designers?P&R implemented as separate point toolsVery little interaction/communicationUse different optimization objectives
Motivation (1)
![Page 4: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/4.jpg)
The HPWL (half-perimeter wirelength) objectivehopelessly outdated – does not account for
Routing demand of multi-pin netsDetours around obstaclesViasImpact of buffers on delay (and where buffers can be inserted)
Our goal: reduce the gap between placement and routingby replacing the HPWL objective with realistic routes
Empirical results: consistent improvementover all published P&R resultsRoutability, routed wirelength, via countsCompared to Silicon Ensemble (Cadence):26% better routed WL, 3% fewer vias
Motivation (2)
![Page 5: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/5.jpg)
HPWL vs. Steiner Tree WL vs. MST WL
HPWL ≤ Steiner Tree WL ≤ MST WL
Steiner (tree)wirelength
Minimum Spanning Tree (MST) wirelength
MST WL:most accuratean averageSteiner WL:best fidelity
Half-perimeterwirelength
HPWL > rWL ?
Internal cell wiring not counted in rWL
![Page 6: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/6.jpg)
Computing Steiner TreesComputing HPWL takes linear time, MST super linear(P log P), but Steiner trees are NP-hardSteiner Tree tools we evaluate:
Batched Iterated 1-Steiner (BI1ST) [Kahng,Robins 1992]Slow (n3)Very accurate, even for 20+ pins
FastSteiner [Kahng,Mandoiu,Zelikovsky 2003]Faster but less accurate than BI1ST
FLUTE [Chu 2004, 2005]Very fastOptimal lookup tables for ≤ 9 pinsLess accurate for 10+ pins
![Page 7: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/7.jpg)
Optimizing Steiner Tree Length
Simple experimentTake a floorplanner that uses Sim. Annealing(we used Parquet)Consider the wirelength termin its objective functionReplace the HPWL computationwith Min. Steiner-tree length(we used FLUTE)
Empirical observationsSlow-down (even for 3-pin nets) – expectedDid not improve StWL – very surprising result !
+ = ?
![Page 8: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/8.jpg)
Outline
MotivationWhy current placement tools are outdatedAnalysis of placement objectivesA naïve attempt at optimization
Our placement frameworkNew techniquesEmpirical resultsConclusions
![Page 9: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/9.jpg)
Consider placement binsPartition them
Use min-cut bisectionPlace end-cases optimally
Traditional min-cut placement tracks HPWL
Existing Placement Framework
1 2
3 4
Placement bins
End-case placement
![Page 10: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/10.jpg)
Existing Placement Framework
Propagate terminalsbefore partitioning
Terminals: fixed cells orcells outside current binAssigned to one of partitions
Save runtime: a 20-pin may“propagate” into 3-pin net
“Inessential nets”: fixed terminals in both partitions(can be entirely ignored)
Traditional min-cut placement tracks HPWL
1 2
3
Placement bins
pins of one netpropagated
![Page 11: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/11.jpg)
Introduced in Theto placer [Selvakkumaran 2004]Refined in [Chen 2005]
Shown to accurately track HPWLUses three net costs
wleft: HPWL when all cells on left side (a)wright:HPWL when all cells on the right (b)wcut: HPWL when cells on both sides (c)
In min-cut partitioning, representseach net with 1 or 2 hyper-edges
Better Modeling of HPWLby Net Weights In Min-cut
Figure from [Chen,Chang,Lin 2005]
![Page 12: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/12.jpg)
Key ObservationFor bisection,cost of each net is characterized by 3 cases
Cost of net when cut wcutCost of net when entirely in left partition: wleftCost of net when entirely in right partition: wright
In our work, we compute these costsusing realistic routes
Can/should account for both X and Y components of costReal difficulty in data structures!
![Page 13: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/13.jpg)
Our ContributionsOptimization of Steiner WL
In global placement (runtime penalty ~25%)In detail placement
Whitespace allocation to tame congestionEmpirical evaluation of ROOSTER
No violations on 16 IBMv2 benchmarks (easy + hard)Consistent improvements of published results4-10% by routed wirelength10-15% by via counts
Vs Cadence: 26% better rWL, 3% fewer vias
![Page 14: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/14.jpg)
Optimizing Steiner WLDuring Global PlacementRecall: each net can be modeledby 3 numbers
This has only been applied to HPWL optimizationWe calculate wtop, wbottom, wcutusing Steiner-tree evaluator
For each net, before partitioning startsThe bottleneck is still in partitioning→ can afford a fast Steiner-tree evaluator
![Page 15: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/15.jpg)
Net Weights from Steiner Trees
For horizontal cutlines: wtop, wbottom, wcutFor vertical cutlines: wleft, wright, wcut
Optimal tree may look very different for each costRecompute tree from scratch each time
wtop wbottom wcut
![Page 16: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/16.jpg)
Net Weights from Steiner Trees
Pitfall : cannot propagate terminals !Nets that were inessential are now essentialMust consider all pins of each netMore accurate modeling, but potentially much slower
wtop wbottom wcut
![Page 17: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/17.jpg)
For each net, two pointsets with multiplicitiesUnique locations of fixed & movable pinsAt top placement layers, very few unique pin positions (except for fixed I/O pins)
Avoid repetitive/expensive re-computationMaintain the number of pins at each location
Sorted by (x,y) to enable batched linear-time operationsEasy detection of duplicates; binary searchFast maintenance when pins get reassigned to partitions (or move)
Facilitates efficient computation of the 3 costsIf net has large number (> 20)of unique locations, resort to HPWL
New Data Structurefor Global Placement
4422
66
11
![Page 18: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/18.jpg)
Pointsets in ActionConsider a netwith 4 movable pins
44 22 22
2211
11 11
111111
![Page 19: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/19.jpg)
Results depend on the Steiner tree evaluatorSurprisingly, running 2 or 3 evaluators and pickingmin wirelength is worse than using a single evaluatorQuality of Steiner-tree evaluation for 9+ pins mattersBut for 20+ unique locations use HPWL (also tried MST)
We choose FastSteiner(versus BI1ST and FLUTE)
Details in Appendix B of our ISPD`06 paperImpact of changes to global placement
Results consistent across IBMv2 benchmarksSteiner WL ↓2.9% , HPWL ↑1.3%, runtime ↑27%
Improvement in Global Placement
![Page 20: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/20.jpg)
We leverage the speed of FLUTEwith two sliding-window optimizers
Exhaustive enumeration for 4-5 cells in a single rowInterleaving by dynamic programming (5-8 cells)
Explores an exponential solution space in polynomial timeFast but not always optimal
Steiner WL ↓0.69%, routed WL ↓1.39%[global + detail] runtime ↑11.83%
Optimizing Steiner WLin Detail Placement 1 2 3 4 5** * *
3 2 5 4 1** * *
1 2 3 4 A** B C D * *
1 A 2 B 3** 4 C D * *
![Page 21: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/21.jpg)
Congestion-based Cutline ShiftingNon-uniform whitespace allocation
Performed during global placementUses progressive top-down congestion estimates
Main idea: after each min-cut,shift the cutline to balance congestion
Area constraints must always be metMore whitespace to the more congested bin
Compared to WSA [Li 2004], no need for legalization, reduces #viasTechnical difficulty: maintain congestion estimates efficiently over a slicing floorplan (not a grid)
15% WS
15% WS
Congestion100
10% WS
20% WS
Congestion200
Congestion150
Congestion150
Cutline shifting
![Page 22: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/22.jpg)
Empirical Results: IBMv2
7/16Not published1.093FengShui 2.61/16Not published1.107Dragon 3.010/16Not published1.056Capo 9.21/81.1191.042APlace 1.00/161.1561.055mPL-R+WSA0/161.0001.000ROOSTER
10/161.2301.097FengShui 5.12/161.0730.968APlace 2.040/161.0691.007mPL-R+WSA
ROOSTER: Rigorous Optimization Of Steiner Trees Eases Routing
Routed WL Ratio Via RatioRoutes with
ViolationPublished results:
Most recent results:
![Page 23: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/23.jpg)
ROOSTER with severaldetail placers: IBMv2
16/161.2481.114ROOSTER+FengShui 5.1 DP
2/161.0891.041ROOSTER+Dragon 4.0 DP
0/161.0040.990ROOSTER+WSA
0/161.0001.000ROOSTER
Routed WL Ratio Via RatioRoutes with
Violation
![Page 24: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/24.jpg)
AmoebaPlace vs.IWLS 2005 benchmarks
http://iwls.org/iwls2005/benchmarks.html
All IWLS placements routed with NanoRoute
1.0321.2651.0001.000Ratio2107617825.4051108350424.447vga_lcd0857391.1060853290.860usb_funct21173261.59801156751.176pci_bridge320900671.2240891530.890mem_ctrl14718007.74524133236.145ethernet11310491.65711266451.271aes_core
rWL Vias Viols rWL Vias ViolsRooster AmoebaPlace
![Page 25: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/25.jpg)
Improvement Breakdown: IBMv2 easy
V = Violations
![Page 26: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/26.jpg)
Improvement Breakdown: IBMv2 hard
![Page 27: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/27.jpg)
Congestion with and without
Capo -uniformWS
5 hours to route; 120 violations
ROOSTER
22 mins to route; 0 violations
![Page 28: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/28.jpg)
ConclusionsSteiner WL should be optimizedin global and detail placement
Improves routability and routed WL10-15% improvement in via counts (vs academic placers)Better Steiner evaluators may further reduce routed WL
Congestion-driven cutline shifting in global placement is competitive with WSA
Better via countsMay be improved if better congestion maps available
Compared to Cadence P&R26% reduction in routed WL3% fewer vias
ROOSTER freely available for all useshttp://vlsicad.eecs.umich.edu/BK/PDtools
![Page 29: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/29.jpg)
Ongoing Work: ECO-system
Challenge: repair/improve an existing placementA strong detail placer and legalizer(useful with analytical global placers)A strong ECO placer(useful in physical synthesis)
Complications: fixed obstacles, movable macrosPhilosophy
Do no harm (leave most cells where they are)When a section of layout must be redone,be prepared to re-place all gates in a region
![Page 30: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/30.jpg)
ECO-systemLegalize top-downFor each bin:
Quickly determine cut-lineCheck cut-line with single FM passIf cut improved significantly by FMor causes overfull child bin, replace
= Overlap= Original Placement= Untouched by legalizer= Replaced from scratch
1 2 3
4 5 6
3.67%4.91%Average1.85%881.04149102.24%884.395680932.01%bigblue4
0.79%388.46137087.50%414.293887341.06%bigblue3
1.37%156.6351832.96%159.081425230.15%bigblue2
4.61%105.1418041.44%101.96248628.53%bigblue1
3.04%203.2441324.56%206.231527136.78%adaptec4
7.67%227.3245009.49%231.171149547.12%adaptec3
5.58%99.4720427.88%101.64254347.25%adaptec2
4.67%84.8417303.48%83.87134634.74%adaptec1
APlace 2.04 Global APlace 2.04 Legalizer ECO-systemOverlap Runtime HPWL WL Increase Runtime HPWL WL Increase
![Page 31: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/31.jpg)
DAC`06: floorplan assistant (FLOORIST)AI-based floorplan legalizerPreliminary results:
Removes overlaps quickly, e.g., from APlace placementsMostly preserves initial placementMinimal increase in wirelength
APla
ce
Red:overlaps
Blue: displacement
![Page 32: Seeing the Forest and the Trees: Steiner Wirelength Optimization …imarkov/pubs/sli/MarkovUT... · 2006. 8. 1. · Quality of Steiner-tree evaluation for 9+ pins matters ... WS 15%](https://reader035.vdocuments.mx/reader035/viewer/2022071010/5fc83aa14b8af7661f54d0a0/html5/thumbnails/32.jpg)
DAC`06: floorplan assistant (FLOORIST)