![Page 1: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/1.jpg)
Joseph E. Gonzalez
Asst. Professor, UC Berkeley
Co-founder, GraphLab (now Turi Inc.)
what happens after learning?
Prediction Serving
![Page 2: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/2.jpg)
Prediction Serving
Learning Systems
Graph SystemsGraph
Frames
Time SeriesFrequency DomainAnalytics Systems
Cluster ManagementMulti Task Learningfor Job Scheduling
Cross-CloudPerf. Estimation
![Page 3: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/3.jpg)
Outline
DanielCrankshaw
XinWang
MichaelFranklin
IonStoica
ActiveCollaborators
![Page 4: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/4.jpg)
Big
Data
Big Model
Training
Learning
Timescale: minutes to days
Systems: offline and batch optimized
Heavily studied ... major focus of the AMPLab
![Page 5: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/5.jpg)
Big
Data
Big Model
Training
Application
Decision
Query
?
Learning Inference
![Page 6: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/6.jpg)
Big
Data
Trainin
g
LearningInference
Big ModelApplication
Decision
Query
Timescale: ~10 milliseconds
Systems: online and latency optimized
Less studied …
![Page 7: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/7.jpg)
Big
Data
Big Model
Training
Application
Decision
Query
Learning Inference
Feedback
![Page 8: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/8.jpg)
Big
Data
Training
Application
Decision
Learning Inference
Feedback
Timescale: hours to weeks
Systems: combination of systems
Less studied …
![Page 9: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/9.jpg)
Big
Data
Big Model
Training
Application
Decision
Query
Learning Inference
Feedback
Responsive
(~10ms)Adaptive
(~1 seconds)
![Page 10: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/10.jpg)
Responsive
(~10ms)Adaptive
(~1 seconds)
VELOX Model Serving System [CIDR’15]Daniel Crankshaw, Peter Bailis, Haoyuan Li, Zhao Zhang, Joseph Gonzalez, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan
Key Insight:
Decompose models into fast and slow changing components
![Page 11: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/11.jpg)
Big
DataTraining
Application
Decision
Query
Learning Inference
Feedback
![Page 12: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/12.jpg)
Big
DataTraining
Application
Decision
Query
Learning Inference
FeedbackSlow
Slow Changing
Model
Fast Changing
Model
![Page 13: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/13.jpg)
Hybrid Offline + Online Learning
Update the user weights online:• Simple to train + more robust model• Address rapidly changing user statistics
Update feature functions offline using batch solvers• Leverage high-throughput systems (Tensor Flow)• Exploit slow change in population statistics
![Page 14: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/14.jpg)
Common modeling structure
Items
Use
rs
Matrix
Factorization
Input
Deep
Learning
Ensemble
Methods
![Page 15: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/15.jpg)
Velox Online Learning for Recommendations(Simulated News Rec.)
0
0.1
0.2
0.3
0.4
0.5
0.6
0 10 20 30
Err
or
Examples
Partial Updates: 0.4 ms
Retraining: 7.1 seconds
>4 orders-of-magnitude
faster adaptation
![Page 16: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/16.jpg)
Big
DataTraining
Application
Decision
Query
Learning Inference
FeedbackSlow
Slow Changing
Model
Fast Changing
Model per user
![Page 17: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/17.jpg)
Big
Data Training
Application
Decision
Query
Learning Inference
FeedbackSlow
Slow Changing
Model
Fast Changing
Model per user
Velox
![Page 18: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/18.jpg)
B
D
ASTachyon
Mesos
Spark
HDFS, S3, …
Spark
StreamingSpark
SQL
BlinkDB
GraphX
Graph
Frames
MLLib
Keystone
ML
Learning
erke ley
ata
na ly t i cs
tack
VELOX: the Missing Piece of BDAS
![Page 19: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/19.jpg)
B
D
AS
erke ley
ata
na ly t i cs
tackTachyon
Mesos
Spark
HDFS, S3, …
Spark
StreamingSpark
SQL
BlinkDB
GraphX
Graph
Frames
MLLib
Keystone
ML
LearningManagement
and Serving
VELOX: the Missing Piece of BDAS
Velox
![Page 20: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/20.jpg)
B
D
AS
erke ley
ata
na ly t i cs
tackMesos
HDFS, S3, …
Spark
StreamingSpark
SQL
BlinkDB
GraphX
Graph
Frames
LearningManagement
and Serving
VELOX: the Missing Piece of BDAS
Velox
Tachyon
Spark
MLLib
Keystone
ML
![Page 21: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/21.jpg)
VELOX Architecture
Spark
MLLib
Single JVM Instance
Velox
Keystone ML
Content
Rec.
Fraud
Detection
![Page 22: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/22.jpg)
VELOX Architecture
Spark
MLLib
Single JVM Instance
Velox
Keystone ML
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machine
Translation
Create
VW
Caffe
![Page 23: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/23.jpg)
VELOX as a Middle Layer Arch?
SparkMLLib
Velox
Keystone ML
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machine
Translation
Create VWCaffe
Generalize ?
![Page 24: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/24.jpg)
Clipper Generalizes Velox Across ML Frameworks
Clipper
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machine
Translation
Create VWCaffe
![Page 25: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/25.jpg)
Clipper
Create VWCaffeKey Insight:
The challenges of prediction serving can be addressed between
end-user applications and machine learning frameworks
As a result, Clipper is able to:
hide complexity by providing a common prediction interface
bound latency and maximize throughput through approximate caching and adaptive batching
enable robust online learning and personalization through generalized split-model correction policies
without modifying machine learning frameworks or end-user applications
![Page 26: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/26.jpg)
Clipper Design Goals
Low and bounded latency predictions interactive applications need reliable latency objectives
Up-to-date and personalized predictions across models and frameworks
generalize the split model decomposition
Optimize throughput for performance under heavy load single query can trigger many predictions
Simplify deployment serve models using the original code and systems
![Page 27: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/27.jpg)
Clipper Architecture
Clipper
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machine
Translation
VWCaffe
Create
![Page 28: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/28.jpg)
Clipper Architecture
Clipper
Applications
Predict ObserveRPC/REST Interface
VWCaffe
Create
![Page 29: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/29.jpg)
Clipper Architecture
Clipper
Caffe
Applications
ust
Predict ObserveRPC/REST Interface
Model Wrapper (MW) MW MW MW
RPC RPC RPC RPC
![Page 30: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/30.jpg)
Clipper Architecture
Clipper
Caffe
Applications
Predict ObserveRPC/REST Interface
Model Wrapper (MW) MW MW MW
RPC RPC RPC RPC
Model Abstraction LayerProvide a common interface to modelswhile bounding latency and maximizing throughput.
Correction LayerImprove accuracy through ensembles,online learning and personalization
![Page 31: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/31.jpg)
Clipper Architecture
Clipper
Caffe
Applications
Predict ObserveRPC/REST Interface
Model Wrapper (MW) MW MW MW
RPC RPC RPC RPC
Correction LayerCorrection Policy
Model Abstraction LayerApproximate Caching
Adaptive Batching
![Page 32: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/32.jpg)
Caffe
Correction LayerCorrection Policy
Provides a unified generic prediction API across frameworks
Reduce Latency Approximate Caching
Increase Throughput Adaptive Batching
Simplify Deployment RPC + Model Wrapper
Model Wrapper (MW) MW MW MW
RPC RPC RPC RPC
Model Abstraction LayerApproximate Caching
Adaptive Batching
Approximate Caching
Adaptive Batching
Model Wrapper (MW) MW MW MW
![Page 33: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/33.jpg)
Caffe
Correction LayerCorrection Policy
Model Wrapper (MW) MW MW MW
RPC RPC RPC RPC
Model Abstraction LayerApproximate Caching
Adaptive Batching
Provide a common interface to models while bounding
![Page 34: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/34.jpg)
Correction LayerCorrection Policy
Model Wrapper (MW)
RPC
Caffe
MW
RPC
MW
RPC
MW
RPC
Model Abstraction LayerApproximate Caching
Adaptive Batching
Common Interface Simplifies Deployment:
Evaluate models using original code & systems
Models run in separate processes Resource isolation
![Page 35: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/35.jpg)
Correction LayerCorrection Policy
Model Abstraction LayerApproximate Caching
Adaptive Batching
Model Wrapper (MW)
RPC
Caffe
MW
RPC
MW
RPC
MW
RPC
MW
RPC
MW
RPC
Common Interface Simplifies Deployment:
Evaluate models using original code & systems
Models run in separate processes Resource isolation
Scale-out
Problem: frameworks optimized for batch processing not latency
![Page 36: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/36.jpg)
A single page load may generatemany queries
Adaptive Batching to Improve Throughput
Optimal batch depends on: hardware configuration
model and framework
system load
Clipper Solution:
be as slow as allowed…
Application specifies latency objective
Clipper uses TCP-like tuning algorithm to increase latency up to the objective
Why batching helps:
HardwareAcceleration
Helps amortizesystem overhead
![Page 37: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/37.jpg)
Thro
ug
hp
ut
(Querie
s P
er S
econd)
Late
ncy (
ms)
Batch Sizes (Queries)
Tensor Flow Conv. Net (GPU)
Latency
Deadline
Optimal Batch Size
![Page 38: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/38.jpg)
Comparison to TensorFlow Serving
Takeaway: Clipper is able to match the average latency of
TensorFlow Serving while reducing tail latency (2x) and
improving throughput (2x)
![Page 39: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/39.jpg)
Approximate Caching to Reduce Latency
Clipper Solution: Approximate Caching
apply locality sensitive hash functions
Opportunity for caching
Need for approximation
Popular items may be evaluatedfrequently
High Dimensional and continuous valued queries have low cache hit rate.
Bag-of-Words
ModelImages
?
?
Cache Hit
Cache Miss
?Cache Hit
Error
![Page 40: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/40.jpg)
Clipper Architecture
Clipper
Caffe
Applications
Predict ObserveRPC/REST Interface
Model Wrapper (MW) MW MW MW
RPC RPC RPC RPC
Correction LayerCorrection Policy
Model Abstraction LayerApproximate Caching
Adaptive Batching
![Page 41: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/41.jpg)
Goal:
Maximize accuracy through ensembles, online learning, and personalization
Generalize the split-model insight from Velox to achieve:
robust predictions by combining multiple models & frameworks
online learning and personalization by correcting and personalizing predictions in response to feedback
Clipper
Correction LayerCorrection Policy
![Page 42: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/42.jpg)
Big
Data
Application
Learning Inference
FeedbackSlow
Slow Changing
Model
Fast Changing
User Model
Velox
![Page 43: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/43.jpg)
Ca
ffe
Big
Data
Application
Learning Inference
FeedbackSlow
Slow Changing
Model
Fast Changing
User Model
Clipper
![Page 44: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/44.jpg)
Ca
ffe
Slow Changing
Model
Fast Changing
User Model
Clipper
Correction Policy
Improves prediction accuaray by:
Incorporating real-time feedback
Managing personalization
Combine models & frameworks enables frameworks to compete
![Page 45: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/45.jpg)
Improved Prediction Accuracy (ImageNet)
System Model Error Rate #Errors
Caffe VGG 13.05% 6525
Caffe LeNet 11.52% 5760
Caffe ResNet 9.02% 4512
TensorFlow Inception v3 6.18% 3088
sequence of pre-trained state-of-the-art models
![Page 46: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/46.jpg)
Improved Prediction Accuracy
System Model Error Rate #Errors
Caffe VGG 13.05% 6525
Caffe LeNet 11.52% 5760
Caffe ResNet 9.02% 4512
TensorFlow Inception v3 6.18% 3088
Clipper Ensemble 5.86% 2930
5.2% relative improvement
in prediction accuracy!
![Page 47: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/47.jpg)
Increased Load
Solutions: Caching and Batching
Load-shedding correction policy can prioritize frameworks
Stragglers e.g., framework fails to meet SLO
Solution: Anytime predictions Correction policy must render
predictions with missing inputs
e.g., built-in correction policies substitute expected value
Ca
ffe
Slow Changing
Model
Fast Changing
User Model
Clipper
Cost of Ensembles
?
![Page 48: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/48.jpg)
Ca
ffe
Slow Changing
Model
Fast Changing
User Model
Clipper
Anytime Predictions
Application
20ms ✓
✓
![Page 49: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/49.jpg)
Ca
ffe
Slow Changing
Model
Fast Changing
User Model
Anytime Predictions
+ +
✓ ✓
![Page 50: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/50.jpg)
Evaluation of Throughput Under Heavy LoadA
ccu
racy
Throughput (queries per second)
Takeaway: Clipper is able to gracefully degrade accuracy to
maintain availability under heavy load.
![Page 51: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/51.jpg)
No
Co
arsenin
g
Coarsening + Anytime Predictions
Ove
rly
Co
arse
ned
More FeaturesApprox. Expectation
Be
tter
Best
Coarser Hash
![Page 52: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/52.jpg)
Conclusion
Clipper
Create VWCaffe
Clipper sits between applications and ML frameworks to
to simplifying deployment
bound latency and increase throughput
and enable real-time learning and personalization
across machine learning frameworks
![Page 53: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/53.jpg)
Big
Data
Model
Training
Application
Decision
Query
Feedback
VELOX
Clipper
CreateVWCaffe
![Page 54: Prediction Serving · 2018-01-04 · Prediction Serving. Prediction Serving Learning Systems GraphSystems Graph Frames Time Series ... Streaming Spark SQL BlinkDB GraphX Graph Frames](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec5630d09faa8021b7aabe0/html5/thumbnails/54.jpg)
Ongoing & Future Research Directions
Serving and updating RL models
Bandit techniques in correction policies Collaboration with MSR
Splitting inference across the cloud and the client to reduce latency and bandwidth requirements
Secure model evaluation on the client (model DRM)