may 10-02 mike drob grant furgiuele ben winters advisor: dr. chris chu client: ibm ibm contact karl...
DESCRIPTION
Project Plan Start with existing serial FastPlace algorithm Parallelize FastPlace algorithm to decrease run-time Hope to gain increases as close to N times speedup (N = cores) as possible Realistically, expect 0.75N or 0.5N End-goal is mostly proof-of-concept IBM uses in-house algorithm Contains proprietary circuit processingTRANSCRIPT
Circuit Placement on Multicore CPUs
May 10-02
Mike DrobGrant Furgiuele
Ben Winters
Advisor: Dr. Chris ChuClient: IBMIBM Contact – Karl Erickson
Project OverviewCircuit Placement problem is bottleneck of
physical designCurrently only single-core – no threadsWill attempt to parallelize some functions of
the FastPlace algorithm using the linux pthreads library.
Implement RQL idea (IBM) into FastPlace
Project PlanStart with existing serial FastPlace algorithmParallelize FastPlace algorithm to decrease
run-timeHope to gain increases as close to N times
speedup (N = cores) as possibleRealistically, expect 0.75N or 0.5N
End-goal is mostly proof-of-conceptIBM uses in-house algorithmContains proprietary circuit processing
Project DesignWritten in CRun under Linux using POSIX thread libraryConsider scalability – 2, 4, 8, etc. coresRQL implementation
IBM ConceptNetlist optimization for placement
Implementation – OverallUsing Data Parallelism as scheme
Assigning loop iterations to threadsLocalizing variable usage
Where absolutely necessary, using thread synchronization (mutex, etc..)
To maximize speed improvement with threads, minimize total number of tasks for threads to accomplishHave individual threads do as much as possible
Implementation – Thread PoolThreads are created once at startVarious Benefits:
Minimizes overhead from thread creationIncreases cache performanceAllows core scalability – number of threads
running can equal cores available
Implementation - RQLForce-vector Modulation
Forces acting upon cells Forces are modeled as a spring potential energy problem Native Force in the algorithm tries to reduce wire length by bringing
connected cells closer to each other Spreading Force tries to move cells into sparse areas within the placement
region Need a balance of the two to meet placement and wire length objectives
Modulate the Spreading Forces High Spreading Forces means the connection belongs to a fan-out net or
boundary Therefore, cells with connections in the top 5 percentile of spreading forces
are skipped in quadratic placement Skipping these leaves the cell’s other connections minimized instead of
degrading them. Results in placing cells in their overall optimal location
Implementation - RQLDuring quadratic placement (global
placement process) Calculate magnitude of spreading forces for all cells
in each iteration Calculate force on current cell If current cell’s force is above the 5% threshold,
skip its placement
Implementation - FunctionsMove_8pt family
move_8pt, move_8pt_withMap, move_8pt_mixedMode, move_8pt_mixedMode_withMap, move_8pt_clustering, move_8pt_clustering_withMap
Calculates score based on cell coordinates and bin utilization Doesn’t lend well to parallelization The fix?
If a new cell is within 3x3 box of cell being currently calculated for, new cell is skipped
Helps remove significant wirelength degradation
Implementation - FunctionsSwap_move family
swap_move_FM, vswap_move, local_order3_FM, flipAllCells
Row-based data processingBreak up matrix into segments based on
number of threadsAssign each thread to do X rows
TestingProfiled original FastPlace algorithm
gprof gives CPU time per function
Profiling parallel FastPlaceValgrind
FastPlace code outputs actual time elapsedCan be used to compare performanceNot 100% consistent
Testing & ResultsTest results for correctness
Compare “wire length” results Average total wirelength no worse than 1% greater
Threadpool is tested and working
Test results for speedupCompared actual run-timeSee slides on next page
Test Results – RQL ImplementationWire length Results
Between .12% - 2.08% decreased wire length on ISPD98 benchmarks with an average of .98%
Between .11% - 3.18% decreased wire length on ISPD2005 benchmarks with an average of 1.39%
Run-time ResultsSome run-time slow down
Average of 3.36% increased on ISPD98 Average of 4.02% increased on ISPD2005
Test Results – Global Placement
adaptec2 adaptec40
100
200
300
400
500
600
1 Core2 Cores8 Cores
Test Results – Detailed Placement
adaptec2 adaptec40
100
200
300
400
500
600
700
1 Core2 Cores8 Cores
Project ImpactShows that threads can be used to speed up
the placement process
With availability of multi-core CPU’s, and scalability of thread implementation, speed improvement could continue
Reduces bottleneck in development process
Questions?