ocr introspection edt characterization & profiling infrastructure intel tg team
TRANSCRIPT
![Page 1: OCR Introspection EDT Characterization & Profiling Infrastructure Intel TG Team](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eba5503460f94bc2440/html5/thumbnails/1.jpg)
OCR IntrospectionEDT Characterization & Profiling Infrastructure
Intel TG Team
![Page 2: OCR Introspection EDT Characterization & Profiling Infrastructure Intel TG Team](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eba5503460f94bc2440/html5/thumbnails/2.jpg)
Overview
• Background• Approach• Current Methodology• Initial Results• Implementation & Interfaces• Next Steps
![Page 3: OCR Introspection EDT Characterization & Profiling Infrastructure Intel TG Team](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eba5503460f94bc2440/html5/thumbnails/3.jpg)
Background
• Can we benefit from any ‘predictable behavior’ of EDTs ?– Code or data placement choices based
on EDTs’ resource requirements– Programmer productivity tool (datablock
partitioning, EDT functionality)
• Characterizing EDT behavior• Exploiting this information at runtime
![Page 4: OCR Introspection EDT Characterization & Profiling Infrastructure Intel TG Team](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eba5503460f94bc2440/html5/thumbnails/4.jpg)
Approach
0 20 40 60 80 100 1201
10
100
1000
10000
100000
1000000
10000000
100000000
f(x) = 0.333333333333 x³ + 0.5 x² + 0.166666666665 x + 1.61995338E-11
f(x) = 2 x³ + 10.0000000000001 x² + 51.9999999999917 x + 100.000000000032
f(x) = 22 x³ + 62.0000000000046 x² + 83.9999999997752 x + 68.0000000023669
sequential_cholesky() EDT
Read (bytes)Polynomial (Read (bytes))Polynomial (Read (bytes))
TileSize
Reads,
Wri
tes =
Byte
sFlo
ati
ng P
oin
t O
ps =
Count
• Express EDT metrics as a function of size of data operated upon by EDT
• Use the above function at runtime to quickly estimate EDT’s resource requirements
![Page 5: OCR Introspection EDT Characterization & Profiling Infrastructure Intel TG Team](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eba5503460f94bc2440/html5/thumbnails/5.jpg)
Current Methodology
Several simplifying assumptions– EDT behavior is consistent– Metrics can be expressed as polynomials– Sum of sizes of datablocks used as a proxy for size of data operated upon by EDT
OCR App
Raw EDT Metrics
- Instruction Count - Floating Point Ops- Reads & Writes - Datablock sizes
EDT MetricsRepresented as
polynomial function on sum of sizes of all
datablocks used by an EDT
Profiling header
“edt1”, {polynomial coeffs},
“edt2”, {polynomial coeffs},
“edt3”, {polynomial coeffs},
RuntimeCan use the
projections to inform its heuristics
LLVM Pass,Statistics
Summarize, Best-fit
Generate header
Rebuild
#include <ocr-profiling.h>
struct _profileStruct gProfilingTable[] = {
{"mainEdt", 1, 1, 1, 1, 0.0, 0.0, 0.0, 0.0, (double []){229,}, (double []){16,}, (double []){2068,}, (double []){2632,}, 0, },{"node_110_body", 2, 2, 2, 2, 0.0, 0.0, 0.0, 0.0, (double []){77620,25,}, (double []){0,0,}, (double []){14052,0,}, (double []){13432,1,}, 0, },{"node_210_body", 2, 2, 3, 2, 0.0, 0.0, 0.0, 0.0, (double []){63115,0,}, (double []){0,0,}, (double []){-12838.4,0.853455,-1.40977e-05,}, (double []){28000,7.27596e-12,}, 0, },{"node_220_body", 3, 3, 4, 2, 0.0, 0.0, 0.0, 0.0, (double []){113716,-0.0242976,2.13737e-07,}, (double []){7004.25,-0.00177152,1.55834e-08,}, (double []){33.7103,0.993679,-4.52053e-07,3.38311e-12,}, (double []){-3109.02,0.547309,}, 0, },};
OCR: LLVM Pass, Statistics credit: Romain Cledat
FP Ops = y = 0.33x^2 + 1.5x + 8000,x = sum(size_of(DB[i]))
![Page 6: OCR Introspection EDT Characterization & Profiling Infrastructure Intel TG Team](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eba5503460f94bc2440/html5/thumbnails/6.jpg)
Initial Results
SCF-N
WCHEM
Cholesky
Lules
h
Conjugate
Gradien
t
Embarr
assingly
Paralle
l
Fourie
r Tran
sform
Integer
Sort
LU So
lver
Multi-Grid
0
2
4
6
8
10
12
14
16
Instr Count
FP Ops
Reads
Writes
Wei
ghte
d N
orm
aliz
ed P
roje
ction
RM
S Er
rors
(%)
OCR Apps credit: Chih-Chieh Yang, Adam Smith (NAS), Roger Golliver (LULESH), Jamie Arteaga (SCF)
Two phases
Three modes
![Page 7: OCR Introspection EDT Characterization & Profiling Infrastructure Intel TG Team](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eba5503460f94bc2440/html5/thumbnails/7.jpg)
Implementation & Interfaces
• Profiling– On x86, LLVM-pass + OCR statistics hooks added by Romain– On FSim, hardware counters
• Data analysis– Offline tools for summarizing & best fit– Making them online will add to runtime overhead (how
much?)
• Projections– Optional “profile.h” #include’d by OCR app
• Lists function name corresponding to each EDT with the polynomial coefficients of its metrics
– Runtime picks it up during template creation– Runtime can use them when EDT is ready to run
![Page 8: OCR Introspection EDT Characterization & Profiling Infrastructure Intel TG Team](https://reader036.vdocuments.mx/reader036/viewer/2022072015/56649eba5503460f94bc2440/html5/thumbnails/8.jpg)
Next Steps
• Cleanup the post-processing tools & integrate with OCR
• Using the projections– Code/data placement decisions– Reacting to temperature, failures
• Better projections– Polynomial functions not always accurate– Datablock reads vs. writes & stack accesses– Quantifying other metrics