machine learning on aws with amazon sagemakeraws-de-media.s3-eu-west-1.amazonaws.com/images... ·...
TRANSCRIPT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Machine Learning on AWSwith Amazon SageMaker
Constantin GonzalezPrincipal Solutions Architect, Amazon Web Services
December 2017
Computer Programming (1936 – Today)
→
1. 2.
→↓
Machine Learning (1959 – Today)
→
1. 2.
→↓
Machine LearningAt Amazon (1995)
Deep Learning (1986 – Today)
→
1. 2.
→↓
Thousands Of Employees Across The Company Focused on AI
Discovery &Search
Fulfilment &Logistics
EnhanceExisting Products
Define NewProduct
Categories
Bring MachineLearning To All
Artificial Intelligence At Amazon
AWS Customers using AI
Netflix Recommendation Engine
Pinterest Lens
Frameworks &Infrastructure
AWS Deep Learning AMI
GPU(P3 Instances) MobileCPU IoT
(Greengrass)
Vision:Rekognition ImageRekognition Video
Speech:Polly
Transcribe
Language:Lex Translate
Comprehend
Apache MXNet PyTorchCognitive
Toolkit KerasCaffe2 & CaffeTensorFlow Gluon
Application Services
PlatformServices
Amazon Machine Learning
Mechanical Turk
Spark & EMR
Amazon SageMaker
AWS DeepLens
AWS ML Stack
ML Process
Data Visualization &Analysis
Business Problem –
ML problem framing Data Collection
Data Integration
Data Preparation &Cleaning
Feature Engineering
Model Training &Parameter Tuning
Model Evaluation
Are Business Goals met?
Model Deployment
Monitoring & Debugging
– Predictions
YesNo
Dat
a A
ugm
enta
tion
Feat
ure
Aug
men
tati
on
Re-training
ML Process: Discovery
Data Visualization &Analysis
Business Problem –
ML problem framing Data Collection
Data Integration
Data Preparation &Cleaning
Feature Engineering
Model Training &Parameter Tuning
Model Evaluation
Are Business Goals met?
Model Deployment
Monitoring & Debugging
– Predictions
YesNo
Dat
a A
ugm
enta
tion
Feat
ure
Aug
men
tati
on
Re-training
• Ask the rightquestions
• Domain Knowledge
ML Process: Integration – Data Architecture
Data Visualization &Analysis
Business Problem –
ML problem framing Data Collection
Data Integration
Data Preparation &Cleaning
Feature Engineering
Model Training &Parameter Tuning
Model Evaluation
Are Business Goals met?
Model Deployment
Monitoring & Debugging
– Predictions
YesNo
Dat
a A
ugm
enta
tion
Feat
ure
Aug
men
tati
on
Re-training
• Build theData Platform
• Amazon S3• AWS Glue• Amazon Athena• Amazon EMR• Amazon Redshift• Amazon Kinesis
Amazon S3 Data Lake
Amazon KinesisStreams & Firehose
AWS LambdaApache Storm on
EMR
Apache Flink on EMR
Spark Streaming on EMR
Hadoop / Spark
Amazon RedshiftData Warehouse
Amazon DynamoDB
NoSQL Database
Amazon Elasticsearch Service
Relational Database
Amazon EMR
Amazon Aurora
Amazon Machine LearningPredictive Analytics
Any Open Source Tool of Choice on EC2D
ata
Sour
ces
Data Science Sandbox
Visualization / Reporting
Amazon Kinesis Analytics
Clusterless SQL QueryAmazon Athena
Tran
sact
iona
l Dat
a
Amazon GlueClusterless ETL
Amazon ElastiCache
Redis
Deep Learning GPU Instances
AWSBig Data Services
ML Process: Model Training
Data Visualization &Analysis
Business Problem –
ML problem framing Data Collection
Data Integration
Data Preparation &Cleaning
Feature Engineering
Model Training &Parameter Tuning
Model Evaluation
Are Business Goals met?
Model Deployment
Monitoring & Debugging
– Predictions
YesNo
Dat
a A
ugm
enta
tion
Feat
ure
Aug
men
tati
on
Re-training
• Setup and managenotebook environments
• Setup and managetraining clusters
• Write data connectors• Scale ML algorithms to
large datasets• Distribute ML training
algorithms to multiplemachines
• Secure and managemodel artifacts
ML Process: Model Deployment
Data Visualization &Analysis
Business Problem –
ML problem framing Data Collection
Data Integration
Data Preparation &Cleaning
Feature Engineering
Model Training &Parameter Tuning
Model Evaluation
Are Business Goals met?
Model Deployment
Monitoring & Debugging
– Predictions
YesNo
Dat
a A
ugm
enta
tion
Feat
ure
Aug
men
tati
on
Re-training
• Setup and managemodel inference clusters
• Manage and scalemodel inference APIs
• Monitor and debugmodel predictions
• Model versioning andperformance tracking
• Automate new modelversion promotion toproduction(A/B testing)
Amazon SageMaker
A fully managed service that enables data scientists anddevelopers to quickly and easily build machine-learning
based models into production smart applications.
Amazon SageMaker
Highly-optimized machine learning
algorithms
BuildPre-built notebook instances
Amazon SageMaker
One-click training for ML, DL, and
custom algorithms
Easier training with hyperparameter
optimization
Highly-optimized machine learning
algorithms
BuildPre-built notebook instances
Train
Amazon SageMaker
One-click training for ML, DL, and
custom algorithms
Easier training with hyperparameter
optimization
Highly-optimized machine learning
algorithms
Deployment without
engineering effort
Fully-managed hosting at scale
BuildPre-built notebook instances
Deploy Train
Amazon SageMaker
End-to-End Machine Learning
Platform
Zero setup Flexible Model Training
Pay by the second
$
Build, train, and deploy machine learning models at scale
Behind the scenes
Amazon ECR
Model Training (on EC2)
Client application
Inference code
Training code
Amazon SageMaker
Behind the scenes
Amazon ECR
Model Training (on EC2)
Trai
ning
dat
a
Training code Helper code
Client application
Inference code
Training code
Amazon SageMaker
Behind the scenes
Amazon ECR
Model Training (on EC2)
Trai
ning
dat
a
Mod
el a
rtifa
cts
Training code Helper code
Client application
Inference code
Training code
Amazon SageMaker
Behind the scenes
Amazon ECR
Model Training (on EC2)
Model Hosting (on EC2)
Trai
ning
dat
a
Mod
el a
rtifa
cts
Training code Helper code
Helper codeInference code
Client application
Inference code
Training code
Amazon SageMaker
Behind the scenes
Amazon ECR
Model Training (on EC2)
Model Hosting (on EC2)
Trai
ning
dat
a
Mod
el a
rtifa
cts
Training code Helper code
Helper codeInference code
Client application
Inference code
Training code
Inference requestInference response
Inference Endpoint
Amazon SageMaker
Behind the scenes
Amazon ECR
Model Training (on EC2)
Model Hosting (on EC2)
Trai
ning
dat
a
Mod
el a
rtifa
cts
Training code Helper code
Helper codeInference code
Gro
und
Trut
h
Client application
Inference code
Training code
Inference requestInference response
Inference Endpoint
Amazon SageMaker
Customer Example: Intuit
Model Hosting (SageMaker)
Calculate Features
Reader
Cleanser
Processor
DataLookup
Training
Feature Store Model Training (SageMaker)
Model
Client ServiceAmazon
EMR
Amazon SageMaker
Amazon SageMaker
Near real-time fraud detection in AWS using Amazon SageMaker
Amazon SageMaker1 2 3 4
I I I INotebook Instances Algorithms ML Training Service ML Hosting Service
Notebook Instances
1Zero Setup For Exploratory Data Analysis
Authoring &Notebooks
ETL Access to AWSDatabase services
Access to S3 DataLake
• Recommendations/Personalization
• Fraud Detection
• Forecasting
• Image Classification
• Churn Prediction
• Marketing Email/Campaign Targeting
• Log processing and anomaly detection
• Speech to Text
• More…
“Just add data”
Algorithms
Streaming datasets, for cheaper training
Train faster, in a single pass
Greater reliability on extremely large
datasets
Choice of several ML algorithms
Amazon SageMaker: 10x better algorithms2
Linear Learner
• Find linear equation that bestapproximates the data
• Supervised• Supports:
• Binary classification• Multiclass classification• Linear regression• Floating-point or test• CSV or recordIO-protobuf
• Parallel training of multiple modelswith automatic hyperparameteroptimization
Linear Learner
Regression (mean squared error)SageMaker Other
1.02 1.061.09 1.02
0.332 0.1830.086 0.12983.3 84.5
Classification (F1 Score)SageMaker Other
0.980 0.9810.870 0.9300.997 0.9970.978 0.9640.914 0.8590.470 0.4720.903 0.9080.508 0.508
30 GB datasets for web-spam and web-url classification
0
0,2
0,4
0,6
0,8
1
1,2
0 5 10 15 20 25 30
Cos
t in
Dol
lars
Billable time in Minutes
sagemaker-url sagemaker-spam other-url other-spam
Factorization Machines
• Extends linear model to pair-wiseinteractions between features
• Good for high dimensional sparse datasets, e.g.:
• Click prediction• Item recommendation systems
• Supports binary classification orlinear regression
• recordIO-protobuf data format
Factorization Machines
Log_loss F1 Score Seconds
SageMaker 0.494 0.277 820
Other (10 Iter) 0.516 0.190 650
Other (20 Iter) 0.507 0.254 1300
Other (50 Iter) 0.481 0.313 3250
Click Prediction 1 TB advertising dataset, m4.4xlarge machines, perfect scaling.
$-$20,00 $40,00 $60,00 $80,00
$100,00 $120,00 $140,00 $160,00 $180,00 $200,00
1 2 3 4 5 6 7 8Co
st in
Dol
lars
Billable Time in Hours
10 machines
20machines
30machines
4050
K-Means Clustering
• Cluster data into k groups• Unsupervised• Algorithm:
1. Start with K= k*x cluster centers2. Iterate over cluster centers in mini-
batches and adjust them based on training data
3. Reduce K cluster centers to k final clusters by using the same algorithm.
Image: johnloeber.comUsed under CC BY-NC-SA 4.0
K-Means Clustering
0
1
2
3
4
5
6
7
8
10 100 500
Billa
ble
Tim
e in
Min
utes
Number of Clusters
sagemaker other
k SageMaker Other
Text1.2GB
10 1.18E3 1.18E3100 1.00E3 9.77E2500 9.18.E2 9.03E2
Images9GB
10 3.29E2 3.28E2100 2.72E2 2.71E2500 2.17E2 Failed
Videos27GB
10 2.19E2 2.18E2100 2.03E2 2.02E2500 1.86E2 1.85E2
Advertising127GB
10 1.72E7 Failed100 1.30E7 Failed500 1.03E7 Failed
Synthetic1100GB
10 3.81E7 Failed100 3.51E7 Failed500 2.81E7 Failed
Running Time vs. Number of Clusters
~10x Faster!
Principal Component Analysis (PCA)
• Reduce dimensionality whileretaining as much information aspossible
• Finds new components, sorted byinformation value.
• Unsupervised• Regular (for sparse) and
Randomized (for large, densedatasets) modes
Image: Wikimedia Commons
Principal Component Analysis (PCA)
More than 10x fasterat a fraction the cost!
0,00
20,00
40,00
60,00
80,00
100,00
120,00
8 10 20
Mb/
Sec/
Mac
hine
Number of Machines
other sagemaker-deterministic sagemaker-randomized
Cost vs. Time Throughput and Scalability
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
0 10 20 30 40 50
Cos
t in
Dol
lars
Billable time in Minutesother sagemaker-deterministic sagemaker-randomized
Neural Topic Modeling
Perplexity vs. Number of Topic(~200K documents, ~100K vocabulary)
Encoder: feedforward net
Input term counts vector
µ�
z
DocumentPosterior
Sampled DocumentRepresentation
Decoder: Softmax
Output term counts vector
0
2000
4000
6000
8000
10000
12000
0 50 100 150 200
Perp
lexi
ty
Number of Topics
NTM Other
Spectral Latent Dirichlet Allocation (LDA)
Training Time vs. Number of Topics
0
50
100
150
200
250
0 20 40 60 80 100Tr
aini
ng T
ime
in M
inut
esNumber of Topics
lda-data-a lda-data-b other-data-a other-data-b
Boosted Decision Trees
• XGBoost gradient boosted treesalgorithm
• Combines multiple weak decisiontree models
• Supervised• Binary and multiclass classification• Libsvm or CSV data
Image:Arxiv.org
Boosted Decision Trees
Throughput vs. Number of MachinesXGBoost is one of the most commonly used implementations of boosted decision trees in the world.
It is now available in Amazon SageMaker!
0
200
400
600
800
1000
1200
1400
0 10 20 30 40 50 60 70
Thro
ughp
ut in
MB/
Sec
Number of Machines (C4.8xLarge)
Sequence to Sequence
• Based on SockeyeSeq2Seq implementation
• Encoder, Attention, Decoder architecture
See: https://aws.amazon.com/blogs/ai/train-neural-machine-translation-models-with-sockeye/
Sequence to Sequence
English-German Translation
0
5
10
15
20
25
0 5 10 15 20 25
BLEU
Sco
re
Billable Time in HoursP2.16x P2.8x P2.x
Best known result!
Based on Sockeye and Apache incubated MxNet, Multi-GPU, and can be used for Neural Machine Translation.
Supports both RNN/CNN as encoder/decoder
Image Classification
• Build your own “Rekognition” service!• Currently based on ResNet• Configurable number of layers• Full training and transfer learning• Data formats:
• Apache MXNet RecordIO• .jpg or .png
Image Classification
Implementation in MxNet of ResNet. Other networks such as DenseNet and Inception will be added in the future.
Transfer learning: begin with a model already trained on ImageNet!
0
0,5
1
1,5
2
2,5
3
3,5
0 1 2 3 4 5
Spee
dup
Number of Machine (P2)
Speedup with Horizontal Scaling
Your Own Algorithms
• Bring your own algorithm!• Just wrap it into a Docker container
• One container for training• One container for inference
• Combine SageMaker containers withyour own
• Documented example on GitHub
Deep Neural Networks
• Train your ownDeep Neural Networks!
• TensorFlow and Apache MXNetsupported
• You provide training/inferencescripts with your DNN
• SageMaker does the rest
Using SageMaker with Apache Spark
• Apache Spark library for Amazon SageMaker provided
• Both Python and Scala• Makes
org.apache.spark.sql.DataFrameobjects available to SageMaker
• Training and Inference supported
Algorithms
2
Training code
• Matrix Factorization• Regression• Principal Component Analysis• K-Means Clustering• Gradient Boosted Trees• And More!
Amazon provided Algorithms
Bring Your Own Script(SageMaker builds the Container))
SageMakerEstimatorsin Apache Spark Bring Your Own Algorithm (You build the Container))
Amazon SageMaker: 10x better algorithms
ML Training Service
Training code
• Matrix Factorization• Regression• Principal Component Analysis• K-Means Clustering• Gradient Boosted Trees• And More!
Amazon provided AlgorithmsSageMakerEstimatorsin Apache Spark Bring Your Own Algorithm (You build the Container))
3Fetch Training data
Save Model Artifacts
Fully managed –
Secured–
Save Inference Image
CPU GPU HPO
Bring Your Own Script(SageMaker builds the Container))
Model Deployment
4
Amazon ECR
Versions of the same inference code saved in inference containers. Prod is the primary one, 50% of the traffic must be served there!
Amazon SageMaker
Model Deployment
4
Amazon ECR
Model Artifacts
Inference Image
Versions of the same inference code saved in inference containers. Prod is the primary one, 50% of the traffic must be served there!
Amazon SageMaker
Create a Model
ModelName: prod
Model Deployment
4
Amazon ECR
Model Artifacts
Inference ImageModel versions
Versions of the same inference code saved in inference containers. Prod is the primary one, 50% of the traffic must be served there!
Amazon SageMaker
Create versions of a Model
Model Deployment
4
Amazon ECR
30 50
10 10
ProductionVariant
Model Artifacts
Inference ImageModel versions
Versions of the same inference code saved in inference containers. Prod is the primary one, 50% of the traffic must be served there!
Amazon SageMaker
InstanceType: c3.4xlarge
InitialInstanceCount: 3
ModelName: prod
VariantName: primary
InitialVariantWeight: 50
Create weighted ProductionVariants
Model Deployment
4
Amazon ECR
30 50
10 10
ProductionVariant
Model Artifacts
Inference ImageModel versions
Versions of the same inference code saved in inference containers. Prod is the primary one, 50% of the traffic must be served there!
EndpointConfiguration
Inference Endpoint
Amazon SageMaker
InstanceType: c3.4xlarge
InitialInstanceCount: 3
ModelName: prod
VariantName: primary
InitialVariantWeight: 50
Create an Endpoint from one EndpointConfiguration
Model Deployment
4
Amazon ECR
30 50
10 10
ProductionVariant
Model Artifacts
Inference ImageModel versions
Versions of the same inference code saved in inference containers. Prod is the primary one, 50% of the traffic must be served there! One-Click!
EndpointConfiguration
Inference Endpoint
Amazon Provided Algorithms
Amazon SageMaker
InstanceType: c3.4xlarge
InitialInstanceCount: 3
ModelName: prod
VariantName: primary
InitialVariantWeight: 50
Model Deployment
4ü Auto-Scaling
Inference APIs
ü A/B Testing(more to come)
ü Low Latency &High Throughput
ü Bring Your Own Model
ü Python SDK
Amazon SageMaker
Amazon SageMaker—Your Turn
• Getting started with Amazon SageMaker:https://aws.amazon.com/sagemaker/
• Use the Amazon SageMaker SDK:• For Python: https://github.com/aws/sagemaker-python-sdk• For Spark: https://github.com/aws/sagemaker-spark
• SageMaker Examples:https://github.com/awslabs/amazon-sagemaker-examples
• Let us know what you build!