Download - L Llvm T CS 744: TVM
CS 744: TVM
Shivaram VenkataramanFall 2020
TensorVirtual Machine
x.dk! L, Llvm
T-
ADMINISTRIVIA
- Course project titles- Project proposal aka Introduction (10/16)
IntroductionRelated WorkTimeline (with eval plan)
- Midterm: Oct 22
Assignment→
<
]→ 2 pagewriteup
MACHINE LEARNING: STACK
✓Distributed
noowed
Train efferent just lie↳ forward pass
→ Interplayinference
&
quondam \, training
makedistributedeasy ( dealing inference
groin v
Hardwareand saddle
MOTIVATION: PERFORAMNCE PORTABILITYPytoreh → model file
"intent :?÷:www.rayf/4TE-iTIyconfute primitives matrix cow
multiply ed
you want high performance I 1
across hardware backends-
Dependence onvendor specific o o • q
libraries
MLmodels evolve fast ⇒ new operators
new combination of operators Y ⇒ Notavailable
in existingvendor
libraries
AM
→ Python code describes
ML model
→Tvm
. . )=
→ Binary file thatruns on hardware
OPTIMIZATION COMPUTATION GRAPHSOperator Fusion
Data layout
[ I¥÷÷÷¥÷÷÷÷÷:* " "÷ . :÷÷÷:- -T -
→ 1-1 operators ,"
map"
-
→ Sum reduction, scaling after
↳(Spg-
Kow Major ,
column Major ,Blocked
g, Infest
teatisrepresented
2- layerNN
.in/TEHtEi/:::IS...:: :as layout
TENSOR EXPRESSION LANGUAGE
Common Arithmetic, Math operationsKnow the shape of the output and the data accessed
operator cry↳ expressed in tensor expression language
↳tensor) math operations
CODE GENERATION
Nested parallelism
Tensorization
Halide→ expression OpenMP ← gu+ of imtmhns
ead does
a.← anime fi:÷÷÷÷÷÷:÷÷i÷÷÷'
jaihe"
for i in l : "
for j . int :S
threads can use as , Bstdgmptd.im#isIL;i!dffIHdeu- poker=doopiterah#bad ,
store, add → whet is the
hardware instructionset-
= Allows you to
- - register operatorExtensible ! intrinsic
Latency HIDING
What is the goal?
Someas
Pytorchetc .
9↳ Overlap computation and
communication
Schedule thatutilizes
-
memorybandwidth &
compute units
ig:*year 'fad
AUTOMATING OPTIMIZATION
Goal: Create a specialized operator for input shape and layoutChallenge:
Choose appropriate schedule optimizationsTiling size, loop unrolling
Automate the optimizer!
---
- - - - r . lots of differentchoices&
also lots of parameters Huntsto
choose .
FimMl ? -
"" m.
what configurationsI
to Try ←
ML-Based Cost model
Machine Learning Model Design ChoicesSpeed: Faster than time it takes to evaluate a configQuality: Use a rank objective to predict the relative order of runtime
Gradient tree boosting modelmemory access countreuse ratio of each memory buffer at each loop levelone-hot encoding of loop annotations
as to
→ →n seconds
take
←
code generated
features
ML-BASED COST MODEL
IterationSelect a batch of candidates Collect data Use as training data to update the model
How to select candidates?Parallel Simulated Annealing
Start from a random configWalk to a nearby config à
Successful if cost decreases Else Reject
model perfwhen using ←config
,
tom >LE -y
( Cz ,20ms
→each candidate is
ccz,
8ms>41
a wyignratim lahhhh
< £ , 'fail>fief a ← :
harder:c ,
→ step a)above trashy data
- toaB7✓~,
Aa ↳ → ↳'
Task model is cj better than b
Yes → go& try d
,
on cluster
No → generateanother
oyer config
Distributed device pool
Pool of devices to speed up profilingRPC interface to run a trial on device
Share device pools for multiple graphs
SUMMARY
TVM: Compiler for ML inference modelsSupport high performance for range of models, hardware devices
Key ideasGraph-level optimizationsTensor expression language: Code-gen, Latency hiding etcML based Cost Model for automation
→→ operator fusion
→
-
DISCUSSIONhttps://forms.gle/WiVgJ3abGXXgfBN99
Consider that you are building an optimizer for Spark programs instead of ML inference. What would be some configuration knobs that you could similarly tune? What might be different from the TVM optimizer?
Similar logic → latency hidingoverlap comp ,
communication
7rYYdimemim'operatorfmon→mapBoperahnaccess patterns
↳laa#↳ operators are
user definedchallenging ??-Partitioning → can you automate
↳ number of partitions / co- partitioning\ performance !Had .
cache →config space
!
Persistence → manually insert
What is your takeaway from the following figure?
→ fastingqmon
f-T" bae:Enea.:c.
.
honey! week,unite:b.or
NEXT STEPS
Next class: RayCourse project: Oct 16 (introductions)Midterm: Oct 22
latency hiding in spark ?
Drddlsmaf tasks > ✓
credence tasks|D÷i;:D rddz map ← Hanffiles:
Edna > ← nocomm
D wait