profiling over adaptive ranges · shashi mysore 7 c1 ck naïve approach: limited counters code bb 1...
TRANSCRIPT
![Page 1: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/1.jpg)
Presented by: SHASHI MYSORE [email protected]
Profiling over Adaptive
RangesShashidhar Mysore, Banit Agrawal, Timothy Sherwood
Nisheeth Shrivastava, Subhash Suri
Department of Computer Science
University of California, Santa Barbara
4th Annual ACM/IEEE International Symposium on Code
Generation and Optimization (CGO), 27th March 2006,
Manhattan, NY
Shashi Mysore 2
Motivation
Program Profiling: Understand system-workload
interactions - gather data, quantify, analyze, and
optimize
Let us consider an example of code profiling …
• At the core: We need to count events
• Basic blocks, load value distribution, load instructions, load addresses, zero-value loads, narrow-
width operands, etc.
• Challenge:– Huge complex programs
– Limited storage - tiny streaming profilers
– Runtime analysis - feasible hardware solutions
![Page 2: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/2.jpg)
Shashi Mysore 3
BB1
BBn
push %ebp
mov %esp,%ebp
sub $0x38,%esp
and $0xfffffff0,%esp
mov $0x0,%eax
sub %eax,%esp
sub $0x8,%esp
push $0x28
push $0x8048468
call 80482b0 add
push %ebp
mov %esp,%ebp
sub $0x38,%esp
and $0xfffffff0,%esp
mov $0x0,%eax
sub %eax,%esp
sub $0x8,%esp
push $0x28
push $0x8048468
call 80482b0
add $0x10,%esp
push %ebp
mov %esp,%ebp
sub $0x38,%esp
and $0xfffffff0,%esp
mov $0x0,%eax
sub %eax,%esp
sub $0x8,%esp
push $0x28
push $0x8048468
call 80482b0
add $0x10,%esp
mov %esp,%ebp
sub $0x38,%esp
and $0xfffffff0,%esp
mov $0x0,%eax
sub %eax,%esp
sub $0x8,%esp
push $0x28
push $0x8048468
push $0x28
push $0x8048468
mov %esp,%ebp
sub $0x38,%esp
sub $0x8,%esp
push $0x28
push $0x8048468
mov %esp,%ebp
sub $0x8,%esp
push $0x28
push $0x8048468
An example: Code Profiling
Code Each basic block
executes some number
of times
Use counters …
Where are the hot regions?
Some are hot
Some are not
How hot are they?
… And can we discover this
knowledge at run time?
Shashi Mysore 4
Naïve Approach: Unlimited Counters
CodeBB1
BBn
Counter
Coverage of this counter
![Page 3: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/3.jpg)
Shashi Mysore 5
Naïve Approach: Unlimited Counters
CodeBB1
BBn
We get:
• High Coverage
• High Precision
N basic blocks – N counters
Each counter covers one
basic block
Problem:
• Many programs have 800000
basic blocks or more!
Counter
Coverage of this counter
So let’s limit the number of counters …
but.. not all of them are important
to be quantified
Shashi Mysore 6
C1
CK
Naïve Approach: Limited CountersCode
BB1
BBn
We get:
• High Precision - For the hot spots
• Low Coverage - At the right spots
N basic blocks – K counters
pick K basic blocks and let
the K counters cover them
but what if did …
![Page 4: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/4.jpg)
Shashi Mysore 7
C1
CK
Naïve Approach: Limited CountersCode
BB1
BBn
N basic blocks – K counters
pick K basic blocks and let
the K counters cover them
Shashi Mysore 8
C1
CK
Naïve Approach: Limited CountersCode
BB1
BBn
We get:
• Low Coverage – and at unimportant regions
• High Precision – but is not as useful
N basic blocks – K counters
pick K basic blocks and let
the K counters cover them
Problem:
•We have zero information about hot
regions
•How do we know which region of the
code to cover with the K counters?
Distribute the basic blocks among the counters …
![Page 5: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/5.jpg)
Shashi Mysore 9
Naïve Approach: Uniform RangesCode
BB1
BBn
We get:
• High Coverage with K counters
• Low Precision
Problem:
• One counter associated with a huge
set of basic blocks
• Only average behavior – low precision
• Precision important – especially
for hot regions
C1
CKK counters to cover the entire
program
Each counter counts a range of basic blocks
Shashi Mysore 10
Related Work• Profile Gathering and analysis schemes
– [Anderson, et. Al., ’97], [Arnold, et. Al., ’01],
[Heil and Smith, ’00], [Sastry, et. Al., ’01], [Ball and Larus, ’96], [Calder, et. Al., 97], [Hirzel and Chilimbi, ’01]
• Hardware assisted profiling and optimizations – [Brooks, et. Al., ’99], [Conte, et, al., ’94, ’96]
[Dean, et, al., ’97], [Narayanasamy, et., al.,’03], [Zhou, et. Al., ’04], [Zilles and Sohi, ’01], [Nagpurkar et. Al., ’05], [Mousa, et. Al, ’05]
�High Coverage
�High precision
�Limited number of counters
�Covers any stream of profile data
•Low precision information on cold regions
•Divide profile data hierarchically
![Page 6: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/6.jpg)
Shashi Mysore 11
Ideally: Best RangesCode
BB1
BBn
We want:
• High Precision for hot regions
Challenge:
Discovering what to count and
With how much precision
• Lower precision information about
colder regions
• High coverage by optimal use
of a few counters
Shashi Mysore 12
Challenges
Ideal Profiler: Selects the best possible
ranges; decides the precision
Real Problem: Identifying the best possible
ranges to count before we already start
counting
Start countingIdentify ranges
Or..
![Page 7: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/7.jpg)
Shashi Mysore 13
Challenges
Ideal Profiler: Selects the best possible
ranges; decides the precision
Real Problem: Identifying the best possible
ranges to count before we already start
counting
Start CountingWhat comes first?
Identify ranges
Shashi Mysore 14
Challenges
Ideal Profiler: Selects the best possible
ranges; decides the precision
Real Problem: Identifying the best possible
ranges to count before we already start
counting
Range Adaptive Profiler solves exactly this problem
by dynamically identifying ranges as we count
![Page 8: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/8.jpg)
Shashi Mysore 15
Our Approach: Adaptive ProfilingCode
BB1
BBn
Initially: One counter - covers entire range
When the counter is “hot” – split it
split
Shashi Mysore 16
Our Approach: Adaptive ProfilingCode
BB1
BBnNext: Whenever a node is “hot” – split it
Dynamically adapt counter coverage to suit the
execution frequency
![Page 9: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/9.jpg)
Shashi Mysore 17
Our Approach: Adaptive ProfilingCode
BB1
BBn
As the program executes:
Allocate counters towards regions
that are hotter
Dynamically adapt counter coverage to suit the
execution frequencysplit
Shashi Mysore 18
Our Approach: Adaptive ProfilingCode
BB1
BBn
At the end of the profiling phase -
We get:
• High precision for hot regions
![Page 10: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/10.jpg)
Shashi Mysore 19
Our Approach: Adaptive ProfilingCode
BB1
BBn
At the end of the profiling phase -
We get:
• Lower precision information about
colder regions
Shashi Mysore 20
Our Approach: Adaptive ProfilingCode
BB1
BBn
At the end of the profiling phase -
We get:
• High coverage by optimally using
a few counters
![Page 11: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/11.jpg)
Shashi Mysore 21
Range Adaptive Profiling
Advantages:
• A streaming (one-pass) technique to hierarchically classify events
• Fixed number of counters – O(log(R) * 1/E)
• Precision adaptive to hot regions
• Guaranteed error bounds
Any stream of profile data that can be divided hierarchically:
• Code profiling
• Values profiling
• Load address profiling
• Zero-value load profiling
• Narrow-width operand profiling
Shashi Mysore 22
• Program Profiling
– An example: Code profiles
– Related work
• Range Adaptive Profiling
– Advantages and Applications
– Splits
– Merges
• Making it efficient
– Batching merges
– Branching Factor
• RAP implementation
– Results – Quantify error and memory
– Hardware and Software
• Conclusions
Outline
![Page 12: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/12.jpg)
Shashi Mysore 23
Adaptive Profiling - SplitsCode
BB1
BBn
split
Shashi Mysore 24
Adaptive Profiling - SplitsCode
BB1
BBn
![Page 13: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/13.jpg)
Shashi Mysore 25
Adaptive Profiling - SplitsCode
BB1
BBn
split
Shashi Mysore 26
Adaptive Profiling - SplitsCode
BB1
BBn
split
split
split
• Adaptive to stream size
• Relative importance crucial
• Bounds error in counting
• SplitThreshold = E.N/(log(R))
… when to split?
![Page 14: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/14.jpg)
Shashi Mysore 27
Adaptive Profiling - MergesCode
BB1
BBn
Beware - temporary
hot regions!
For example – program initialization phase…
Adaptive Precision Profiling accomplished!
Shashi Mysore 28
Adaptive Profiling - Splits
Note: Something that is hot now –
may become cold later
Adapting to initialization phase …
CodeBB1
BBn
![Page 15: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/15.jpg)
Shashi Mysore 29
Adaptive Profiling - Splits
Adapting to initialization phase …
Note: Something that is hot now –
may become cold later
CodeBB1
BBn
Shashi Mysore 30
Adaptive Profiling - Splits
Adapting to initialization phase …
Note: Something that is hot now –
may become cold later
CodeBB1
BBn
![Page 16: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/16.jpg)
Shashi Mysore 31
Adaptive Profiling - Splits
Note: Something that is hot now –
may become cold later
Adapting to initialization phase …
CodeBB1
BBn
Shashi Mysore 32
Adaptive Profiling - SplitsEnd of initialization phase –
precisely captured temporary hot regions
Program continues to execute –
overall hot region shifts …
CodeBB1
BBn
![Page 17: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/17.jpg)
Shashi Mysore 33
Adaptive Profiling - SplitsCode
BB1
BBnRange adaptive profiler adapts
to the new hot region
}No longer hot
Program continues to execute –
overall hot region shifts
Shashi Mysore 34
Adaptive Profiling - MergesCode
BB1
BBn
Uses lot more counters than
needed!
Merge ‘non-hot counters’ …
Problem:
Changing hot region -
undo unnecessary adaptation
![Page 18: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/18.jpg)
Shashi Mysore 35
Adaptive Profiling - MergesCode
BB1
BBn
RAP solution:
• Recursive Merges
• Collapse counters to
parents
merge
We never throw away profile
information, we only merge
Shashi Mysore 36
Adaptive Profiling - MergesCode
BB1
BBn
merge
RAP solution:
• Recursive Merges
• Collapse counters to
parents
We never throw away profile
information, we only merge
![Page 19: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/19.jpg)
Shashi Mysore 37
Adaptive Profiling - MergesCode
BB1
BBn
merge
RAP solution:
• Recursive Merges
• Collapse counters to
parents
We never throw away profile
information, we only merge
Shashi Mysore 38
Adaptive Profiling - MergesCode
BB1
BBn
merge
RAP solution:
• Recursive Merges
• Collapse counters to
parents
We never throw away profile
information, we only merge
![Page 20: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/20.jpg)
Shashi Mysore 39
Adaptive Profiling - MergesCode
BB1
BBn
merge
RAP solution:
• Recursive Merges
• Collapse counters to
parents
We never throw away profile
information, we only merge
Shashi Mysore 40
Adaptive Profiling - MergesCode
BB1
BBn
RAP solution:
• Recursive Merges
• Collapse counters to
parents
![Page 21: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/21.jpg)
Shashi Mysore 41
Range Adaptive Profiling
Advantages
• Precision dynamically adaptive to hot regions
• Guaranteed error bounds
• Optimal usage of a few counters
• Plus -
• Independent of the stream size
• Independent of the stream order
Shashi Mysore 42
• Program Profiling
– An example: Code profiles
– Related work
• Range Adaptive Profiling
– Advantages and Applications
– Splits
– Merges
• Making it efficient
– Batching merges
– Branching Factor
• RAP implementation
– Results – Quantify error and memory
– Hardware and Software
• Conclusions
Outline
![Page 22: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/22.jpg)
Shashi Mysore 43
20 40 60 80 100
500
400
300
200
100
0
0Number of Points (in billions) - gcc
Num
ber
of C
ounte
rs
Batched Merges
When do we initiate a merge cycle:
• Periodic merging
• Exponentially increasing periods
RAP tree does not grow faster than logarithmic rate
Merge cycleMerge cycleMerge cycleMerge cycleMerge cycle Merge cycle
Shashi Mysore 44
Branching Factor
• We show that optimal branching factor is four
Less memory More memory
Faster Convergence
branching factor 2 4 8
![Page 23: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/23.jpg)
Shashi Mysore 45
• Program Profiling
– An example: Code profiles
– Related work
• Range Adaptive Profiling
– Advantages and Applications
– Splits
– Merges
• Making it efficient
– Batching merges
– Branching Factor
• RAP implementation
– Results – Quantify error and memory
– Hardware and Software
• Conclusions
Outline
Shashi Mysore 46
Results
• Range adaptive profiler– Online technique
– Does not have ideal knowledge – counts everything
– Error introduced by not splitting early enough
0
0.5
1
1.5
2
2.5
3
gcc
gzip
mcf
pars
er
vorte
xvp
r
aver
age
SPEC Benchmarks
Perc
ent Err
or
Average percent error less than 2%
![Page 24: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/24.jpg)
Shashi Mysore 47
Results
0
50
100
150
200
250
300
gcc
bzip2
mcf
vpr
gzip
parser
vorte
x
averag
e
SPEC Benchmarks
Num
ber of C
ounte
rs
• On an average - 150 counters provides 99% accurate information on code profiles
High accuracy – but at what cost?
We show more results in the paper
Shashi Mysore 48
Software implementation
• Simple set of APIs
– Offline and online profiling
– rap_init
– rap_add_points – builds the RAP tree,
takes care of splits and merges too.
– rap_finalize
• Webpage
– www.cs.ucsb.edu/~arch/rap
Extremely high throughput profile data analysis …
![Page 25: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/25.jpg)
Shashi Mysore 49
Hardware Profiling Engine
Stage 0:
Event Buffer
Event Name
(IP address, memory,
PC, value, etc…)
Quantity
0 0 *
0 0 0 1 *
0 0 0 0 *
0 0 1 0 *
0 0 0 0 0 0 *
0 0 1 1 *
0 0 0 0 1 0 *
0 0 0 0 0 0 *
0 0 0 0 1 1 *
0 0 0 0 0 0 1 *
. . .
Stage 1:
Range Matching
TCAM Cells
Stage 2:
Longest Match
Stage 3:
Counter Maintenance
Counter Indices
+pipeline latch
pipeline latch
pipeline latch
-
Split threshold
pipeline latch
Control
Block
positive?
Stage 4:
Split Handling
upda
te p
oint
ers
on s
plit
N x 1 Arbiter
Shashi Mysore 50
Conclusion
• Range Adaptive Profiling
– Summarizes high bandwidth profile data
– Fully streaming scheme
– Bounded memory and error
– General purpose – high applicability
– Multi-dimensional Profiling
![Page 26: Profiling over Adaptive Ranges · Shashi Mysore 7 C1 CK Naïve Approach: Limited Counters Code BB 1 BB n Nbasic blocks –Kcounters pick Kbasic blocks and let the Kcounters cover](https://reader033.vdocuments.mx/reader033/viewer/2022060601/605506919e94014352659f21/html5/thumbnails/26.jpg)
Shashi Mysore 51
Future Work
• Multi-Dimensional Profiling
Future Work: Multidimensional Profiling
Edges
Code-mem
Code-val
Val-mem
Shashi Mysore 52
Thank You
Profiling over Adaptive Ranges -
http://www.cs.ucsb.edu/~arch/rap
http://www.cs.ucsb.edu/~shashimc