l llvm t cs 744: tvm
TRANSCRIPT
![Page 1: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/1.jpg)
CS 744: TVM
Shivaram VenkataramanFall 2020
TensorVirtual Machine
x.dk! L, Llvm
T-
![Page 2: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/2.jpg)
ADMINISTRIVIA
- Course project titles- Project proposal aka Introduction (10/16)
IntroductionRelated WorkTimeline (with eval plan)
- Midterm: Oct 22
Assignment→
<
]→ 2 pagewriteup
![Page 3: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/3.jpg)
MACHINE LEARNING: STACK
✓Distributed
noowed
Train efferent just lie↳ forward pass
→ Interplayinference
&
quondam \, training
makedistributedeasy ( dealing inference
groin v
Hardwareand saddle
![Page 4: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/4.jpg)
MOTIVATION: PERFORAMNCE PORTABILITYPytoreh → model file
"intent :?÷:www.rayf/4TE-iTIyconfute primitives matrix cow
multiply ed
you want high performance I 1
across hardware backends-
Dependence onvendor specific o o • q
libraries
MLmodels evolve fast ⇒ new operators
new combination of operators Y ⇒ Notavailable
in existingvendor
libraries
![Page 5: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/5.jpg)
AM
→ Python code describes
ML model
→Tvm
. . )=
→ Binary file thatruns on hardware
![Page 6: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/6.jpg)
OPTIMIZATION COMPUTATION GRAPHSOperator Fusion
Data layout
[ I¥÷÷÷¥÷÷÷÷÷:* " "÷ . :÷÷÷:- -T -
→ 1-1 operators ,"
map"
-
→ Sum reduction, scaling after
↳(Spg-
Kow Major ,
column Major ,Blocked
g, Infest
teatisrepresented
2- layerNN
.in/TEHtEi/:::IS...:: :as layout
![Page 7: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/7.jpg)
TENSOR EXPRESSION LANGUAGE
Common Arithmetic, Math operationsKnow the shape of the output and the data accessed
operator cry↳ expressed in tensor expression language
↳tensor) math operations
![Page 8: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/8.jpg)
CODE GENERATION
Nested parallelism
Tensorization
Halide→ expression OpenMP ← gu+ of imtmhns
ead does
a.← anime fi:÷÷÷÷÷÷:÷÷i÷÷÷'
jaihe"
for i in l : "
for j . int :S
threads can use as , Bstdgmptd.im#isIL;i!dffIHdeu- poker=doopiterah#bad ,
store, add → whet is the
hardware instructionset-
= Allows you to
- - register operatorExtensible ! intrinsic
![Page 9: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/9.jpg)
Latency HIDING
What is the goal?
Someas
Pytorchetc .
9↳ Overlap computation and
communication
Schedule thatutilizes
-
memorybandwidth &
compute units
ig:*year 'fad
![Page 10: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/10.jpg)
AUTOMATING OPTIMIZATION
Goal: Create a specialized operator for input shape and layoutChallenge:
Choose appropriate schedule optimizationsTiling size, loop unrolling
Automate the optimizer!
---
- - - - r . lots of differentchoices&
also lots of parameters Huntsto
choose .
FimMl ? -
"" m.
what configurationsI
to Try ←
![Page 11: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/11.jpg)
ML-Based Cost model
Machine Learning Model Design ChoicesSpeed: Faster than time it takes to evaluate a configQuality: Use a rank objective to predict the relative order of runtime
Gradient tree boosting modelmemory access countreuse ratio of each memory buffer at each loop levelone-hot encoding of loop annotations
as to
→ →n seconds
take
←
code generated
features
![Page 12: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/12.jpg)
ML-BASED COST MODEL
IterationSelect a batch of candidates Collect data Use as training data to update the model
How to select candidates?Parallel Simulated Annealing
Start from a random configWalk to a nearby config à
Successful if cost decreases Else Reject
model perfwhen using ←config
,
tom >LE -y
( Cz ,20ms
→each candidate is
ccz,
8ms>41
a wyignratim lahhhh
< £ , 'fail>fief a ← :
harder:c ,
→ step a)above trashy data
- toaB7✓~,
Aa ↳ → ↳'
Task model is cj better than b
Yes → go& try d
,
on cluster
No → generateanother
oyer config
![Page 13: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/13.jpg)
Distributed device pool
Pool of devices to speed up profilingRPC interface to run a trial on device
Share device pools for multiple graphs
![Page 14: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/14.jpg)
SUMMARY
TVM: Compiler for ML inference modelsSupport high performance for range of models, hardware devices
Key ideasGraph-level optimizationsTensor expression language: Code-gen, Latency hiding etcML based Cost Model for automation
→→ operator fusion
→
-
![Page 15: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/15.jpg)
DISCUSSIONhttps://forms.gle/WiVgJ3abGXXgfBN99
![Page 16: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/16.jpg)
Consider that you are building an optimizer for Spark programs instead of ML inference. What would be some configuration knobs that you could similarly tune? What might be different from the TVM optimizer?
Similar logic → latency hidingoverlap comp ,
communication
7rYYdimemim'operatorfmon→mapBoperahnaccess patterns
↳laa#↳ operators are
user definedchallenging ??-Partitioning → can you automate
↳ number of partitions / co- partitioning\ performance !Had .
cache →config space
!
Persistence → manually insert
![Page 17: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/17.jpg)
What is your takeaway from the following figure?
→ fastingqmon
f-T" bae:Enea.:c.
.
honey! week,unite:b.or
![Page 18: L Llvm T CS 744: TVM](https://reader031.vdocuments.mx/reader031/viewer/2022022217/6214689ef1f05a588f5c32a1/html5/thumbnails/18.jpg)
NEXT STEPS
Next class: RayCourse project: Oct 16 (introductions)Midterm: Oct 22
latency hiding in spark ?
Drddlsmaf tasks > ✓
credence tasks|D÷i;:D rddz map ← Hanffiles:
Edna > ← nocomm
D wait