pipelineai + aws sagemaker + distributed tensorflow + ai model training and serving - december 2017...
TRANSCRIPT
PIPELINE.AI: HIGH PERFORMANCE MODEL TRAINING & SERVING WITH GPUS…
…AND AWS SAGEMAKER, GOOGLE CLOUD ML, AZURE ML & KUBERNETES!
CHRIS FREGLYFOUNDER @ PIPELINE.AI
RECENT PIPELINE.AI NEWS
Sept 2017
Dec 2017
INTRODUCTIONS: ME§ Chris Fregly, Founder & Engineer @PipelineAI
§ Formerly Netflix, Databricks, IBM Spark Tech
§ Advanced Spark and TensorFlow Meetup§ Please Join Our 60,000+ Global Members!!
Contact [email protected]
@cfregly
Global Locations* San Francisco* Chicago* Austin* Washington DC* Dusseldorf* London
INTRODUCTIONS: YOU§ Software Engineer, Data Scientist, Data Engineer, Data Analyst
§ Interested in Optimizing and Deploying TF Models to Production
§ Nice to Have a Working Knowledge of TensorFlow (Not Required)
PIPELINE.AI IS 100% OPEN SOURCE
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
§ Some VC’s Value GitHub Stars @ $15,000 Each (?!)
PIPELINE.AI OVERVIEW450,000 Docker Downloads60,000 Users Registered for GA60,000 Meetup Members40,000 LinkedIn Followers2,200 GitHub Stars12 Enterprise Beta Users
WHY HEAVY FOCUS ON MODEL SERVING?Model Training
Batch & BoringOffline in Research Lab
Pipeline Ends at Training
No Insight into Live Production
Small Number of Data Scientists
Optimizations Very Well-Known
Real-Time & Exciting!!Online in Live Production
Pipeline Extends into Production
Continuous Insight into Live Production
Huuuuuuge Number of Application Users
**Many Optimizations Not Yet Utilized
<<<
Model Serving
100’s Training Jobs per Day 1,000,000’s Predictions per Sec
AGENDA
§ Deploy and Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
PACKAGE MODEL + RUNTIME AS ONE§ Build Model with Runtime into Immutable Docker Image
§ Emphasize Immutable Deployment and Infrastructure§ Same Runtime Dependencies in All Environments
§ Local, Development, Staging, Production§ No Library or Dependency Surprises
§ Deploy and Tune Model + Runtime Togetherpipeline predict-server-build --model-type=tensorflow \
--model-name=mnist \--model-tag=A \--model-path=./models/tensorflow/mnist/
Build LocalModel Server A
LOAD TEST LOCAL MODEL + RUNTIME§ Perform Mini-Load Test on Local Model Server
§ Immediate, Local Prediction Performance Metrics
§ Compare to Previous Model + Runtime Variationspipeline predict-server-start --model-type=tensorflow \
--model-name=mnist \--model-tag=A
pipeline predict --model-endpoint-url=http://localhost:8080 \--test-request-path=test_request.json \--test-request-concurrency=1000
Load Test LocalModel Server A
Start LocalModel Server A
PUSH IMAGE TO DOCKER REGISTRY§ Supports All Public + Private Docker Registries
§ DockerHub, Artifactory, Quay, AWS, Google, …
§ Or Self-Hosted, Private Docker Registrypipeline predict-server-push --image-registry-url=<your-registry> \
--image-registry-repo=<your-repo> \--model-type=tensorflow \--model-name=mnist \--model-tag=A
Push Image ToDocker Registry
CLOUD-BASED OPTIONS§ AWS SageMaker
§ Released Nov 2017 @ Re-invent§ Custom Docker Images for Training & Serving ie. PipelineAI Images§ Distributed TensorFlow Training through Estimator API§ Traffic Splitting for A/B Model Testing
§ Google Cloud ML Engine§ Mostly Command-Line Based§ Driving TensorFlow Open Source API (ie. Experiment API)
§ Azure ML
TUNE MODEL + RUNTIME AS SINGLE UNIT§ Model Training Optimizations
§ Model Hyper-Parameters (ie. Learning Rate)§ Reduced Precision (ie. FP16 Half Precision)
§ Post-Training Model Optimizations§ Quantize Model Weights + Activations From 32-bit to 8-bit§ Fuse Neural Network Layers Together
§ Model Runtime Optimizations§ Runtime Configs (ie. Request Batch Size)§ Different Runtimes (ie. TensorFlow Lite, Nvidia TensorRT)
POST-TRAINING OPTIMIZATIONS§ Prepare Model for Serving
§ Simplify Network§ Reduce Model Size§ Quantize for Fast Matrix Math
§ Some Tools§ Graph Transform Tool (GTT)§ tfcompile
After TrainingAfter
Optimizing!
pipeline optimize --optimization-list=[quantize_weights, tfcompile] \--model-type=tensorflow \--model-name=mnist \--model-tag=A \--model-path=./tensorflow/mnist/model \--output-path=./tensorflow/mnist/optimized_model
Linear Regression
RUNTIME OPTION: TENSORFLOW LITE§ Post-Training Model Optimizations
§ Currently Supports iOS and Android
§ On-Device Prediction Runtime§ Low-Latency, Fast Startup
§ Selective Operator Loading§ 70KB Min - 300KB Max Runtime Footprint
§ Supports Accelerators (GPU, TPU)§ Falls Back to CPU without Accelerator
§ Java and C++ APIs
RUNTIME OPTION: NVIDIA TENSOR-RT
§ Post-Training Model Optimizations§ Specific to Nvidia GPU
§ GPU-Optimized Prediction Runtime§ Alternative to TensorFlow Serving
§ PipelineAI Supports TensorRT!
DEPLOY MODELS SAFELY TO PROD§ Deploy from CLI or Jupyter Notebook§ Tear-Down or Rollback Models Quickly§ Shadow Canary Deploy: ie.20% Live Traffic§ Split Canary Deploy: ie. 97-2-1% Live Traffic
pipeline predict-cluster-start --model-runtime=tflite \--model-type=tensorflow \--model-name=mnist \--model-tag=B \--traffic-split=2
Start ProductionModel Cluster B
pipeline predict-cluster-start --model-runtime=tensorrt \--model-type=tensorflow \--model-name=mnist \--model-tag=C \--traffic-split=1
Start ProductionModel Cluster C
pipeline predict-cluster-start --model-runtime=tfserving_gpu \--model-type=tensorflow \--model-name=mnist \--model-tag=A \--traffic-split=97
Start ProductionModel Cluster A
AGENDA
§ Deploy and Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
COMPARE MODELS OFFLINE & ONLINE§ Offline, Batch Metrics
§ Validation + Training Accuracy§ CPU + GPU Utilization
§ Live Prediction Values§ Compare Relative Precision§ Newly-Seen, Streaming Data
§ Online, Real-Time Metrics§ Response Time, Throughput§ Cost ($) Per Prediction
VIEW REAL-TIME PREDICTION STREAM§ Visually Compare Real-Time Predictions
PredictionInputs
PredictionResults &
Confidences
Model B Model CModel A
PREDICTION PROFILING AND TUNING§ Pinpoint Performance Bottlenecks
§ Fine-Grained Prediction Metrics
§ 3 Steps in Real-Time Prediction1.transform_request()2.predict()3.transform_response()
AGENDA
§ Deploy and Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
LIVE, ADAPTIVE TRAFFIC ROUTING§ A/B Tests
§ Inflexible and Boring
§ Multi-Armed Bandits§ Adaptive and Exciting!
pipeline traffic-router-split --model-type=tensorflow \--model-name=mnist \--model-tag-list=[A,B,C] \--model-weight-list=[1,2,97]
AdjustTraffic Routing
Dynamically
SHIFT TRAFFIC TO MAX(REVENUE)§ Shift Traffic to Winning Model using AI Bandit Algos
SHIFT TRAFFIC TO MIN(CLOUD CO$T)
§ Based on Cost ($) Per Prediction
§ Cost Changes Throughout Day§ Lose AWS Spot Instances§ Google Cloud Becomes Cheaper
§ Shift Across Clouds & On-Prem
AGENDA
§ Deploy and Tune Models + Runtimes Safely in Prod
§ Compare Models Both Offline and Online
§ Auto-Shift Traffic to Winning Model or Cloud
§ Live, Continuous Model Training in Production
LIVE, CONTINUOUS MODEL TRAINING
§ The Holy Grail of Machine Learning§ Q1 2018: PipelineAI Supports Continuous Model Training!§ Kafka, Kinesis§ Spark Streaming
PSEUDO-CONTINUOUS TRAINING§ Identify and Fix Borderline Predictions (~50-50% Confidence)
§ Fix Along Class Boundaries
§ Retrain Newly-Labeled Data
§ Game-ify Labeling Process
§ Enable Crowd Sourcing
DEMO: TRAIN, DEPLOY, TEST MODEL
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
pipeline predict-server-build --model-type=tensorflow \--model-name=mnist \--model-tag=A \--model-path=./models/tensorflow/mnist/
THANK YOU!!
§ https://github.com/PipelineAI/pipeline/
§ Please Star 🌟 this GitHub Repo!
§ Reminder: VC’s Value GitHub Stars @ $15,000 Each (!!)
Contact [email protected]
@cfregly