aws january 2016 webinar series - building smart applications with amazon machine learning
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Alex IngermanSr. Manager, Tech. Product Management, Amazon Machine Learning
1/28/2016
Real-World Smart Applications with Amazon Machine Learning
Agenda
• Why social media + machine learning = happy customers
• Using Amazon ML to find important social media conversations
• Building an end-to-end application to act on these conversations
Application details
Goal: build a smart application for social media listening in the cloud
Full source code and documentation are on GitHub: http://bit.ly/AmazonMLCodeSample
Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
Amazon Mechanical Turk
Motivation for listening to social media
Customer is reporting a possible service issue
Motivation for listening to social media
Customer is making a feature request
Motivation for listening to social media
Customer is angry or unhappy
Motivation for listening to social media
Customer is asking a question
Why do we need machine learning for this?
The social media stream is high-volume, and most of the messages are not CS-actionable
Amazon Machine Learning in one slide
• Easy to use, managed machine learning service built for developers
• Robust, powerful machine learning technology based on Amazon’s internal systems
• Create models using your data already stored in the AWS cloud
• Deploy models to production in seconds
Formulating the problem
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and analyze each one to predict whether a customer service agent should act on it, and, if so, send that tweet to customer service agents.
Formulating the problem
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and analyze each one to predict whether a customer service agent should act on it, and, if so, send that tweet to customer service agents.
Twitter API
Formulating the problem
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and analyze each one to predict whether a customer service agent should act on it, and, if so, send that tweet to customer service agents.
Twitter API Amazon Kinesis
Formulating the problem
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and analyze each one to predict whether a customer service agent should act on it, and, if so, send that tweet to customer service agents.
Twitter API Amazon Kinesis
AWSLambda
Formulating the problem
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and analyze each one to predict whether a customer service agent should act on it, and, if so, send that tweet to customer service agents.
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
Formulating the problem
We would like to…
Instantly find new tweets mentioning @awscloud, ingest and analyze each one to predict whether a customer service agent should act on it, and, if so, send that tweet to customer service agents.
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
Building smart applications
Pick the ML strategy
1
Preparedataset
2 3
CreateML model
4
Write and configure code
5
Try it out!
Picking the machine learning strategy
Question we want to answer: Is this tweet customer service-actionable, or not?
Our dataset: Text and metadata from past tweets mentioning @awscloud
Machine learning approach:Create a binary classification model to answer a yes/no question, and provide a confidence score
Building smart applications
Pick the ML strategy
1
Preparedataset
2 3
CreateML model
4
Write and configure code
5
Try it out!
Retrieve past tweets
Twitter API can be used to search for tweets containing our company’s handle (e.g., @awscloud)
import twitter
twitter_api = twitter.Api(**twitter_credentials)twitter_handle = ‘awscloud’search_query = '@' + twitter_handle + ' -from:' + twitter_handleresults = twitter_api.GetSearch(term=search_query, count=100, result_type='recent’)# We can go further back in time by issuing additional search requests
Retrieve past tweets
Twitter API can be used to search for tweets containing our company’s handle (e.g., @awscloud)
import twitter
twitter_api = twitter.Api(**twitter_credentials)twitter_handle = ‘awscloud’search_query = '@' + twitter_handle + ' -from:' + twitter_handleresults = twitter_api.GetSearch(term=search_query, count=100, result_type='recent')# We can go further back in time by issuing additional search requests
Good news: data is well-structured and cleanBad news: tweets are not categorized (labeled) for us
Labeling past tweets
Why label tweets? (Many) machine learning algorithms work by discovering patterns connecting data points and labels
How many tweets need to be labeled? Several thousands to start with
Can I pay someone to do this? Yes! Amazon Mechanical Turk is a marketplace for tasks that require human intelligence
Creating the Mechanical Turk task
Creating the Mechanical Turk task
Creating the Mechanical Turk task
Creating the Mechanical Turk task
Creating the Mechanical Turk task
Publishing the task
Publishing the task
Preview labeling resultsSample tweets from our previously collected dataset + their labels
This column was created from
Mechanical Turk responses
Preview labeling resultsSample tweets and labels (most metadata fields removed for clarity)
Preview labeling resultsSample tweets and labels (most metadata fields removed for clarity)
Preview labeling resultsSample tweets and labels (most metadata fields removed for clarity)
Preview labeling resultsSample tweets and labels (most metadata fields removed for clarity)
Preview labeling resultsSample tweets and labels (most metadata fields removed for clarity)
Building smart applications
Pick the ML strategy
1
Preparedataset
2 3
CreateML model
4
Write and configure code
5
Try it out!
Amazon ML process, in a nutshell
1. Create your datasourcesTwo API calls to create your training and evaluation dataSanity-check your data in service console
2. Create your ML modelOne API call to build a model, with smart default or custom setting
3. Evaluate your ML modelOne API call to compute your model’s quality metric
4. Adjust your ML modelUse console to align performance trade-offs to your business goals
Create the data schema string
{ "dataFileContainsHeader": true, "dataFormat": "CSV", "targetAttributeName": "trainingLabel", "attributes": [ { "attributeName": "description", "attributeType": "TEXT" }, <additional attributes here>, { "attributeName": "trainingLabel", "attributeType": "BINARY" } ]}
Schemas communicate metadata about your dataset: • Data format• Attributes’ names, types, and order • Names of special attributes
Create the training datasource
import botoml = boto.connect_machinelearning()
data_spec = {'DataLocationS3’ : s3_uri # E.g.: s3://my-bucket/dir/data.csv'DataSchema’ : data_schema } # Schema string (previous slide)
# Use only the first 70% of the datasource for training. data_spec['DataRearrangement'] = ‘{ "splitting”: {"percentBegin": 0, "percentEnd”: 70 } }’
ml.create_data_source_from_s3( data_source_id = “ds-tweets-train”,data_source_name = “Tweet
training data (70%)”, data_spec,
compute_statistics = True)
Create the evaluation datasource
import botoml = boto.connect_machinelearning()
data_spec = {'DataLocationS3’ : s3_uri # E.g.: s3://my-bucket/dir/data.csv'DataSchema’ : data_schema } # Schema string (previous slide)
# Use the last 30% of the datasource for evaluation. data_spec['DataRearrangement'] = ‘{ "splitting”: {"percentBegin": 70, "percentEnd”: 100 } }’
ml.create_data_source_from_s3( data_source_id = “ds-tweets-eval”,data_source_name = “Tweet
evaluation data (30%)”, data_spec,
compute_statistics = True)
Visually inspecting training data
Create the ML model
import botoml = boto.connect_machinelearning()
ml.create_ml_model( ml_model_id = “ml-tweets”, ml_model_name = “Tweets screening model”,
ml_model_type = “BINARY”, training_data_source_id = “ds-tweets-train”)
Input data location is looked up from the training datasource ID
Default model parameters and automatic data transformations are used, or you can provide your own
Evaluate the ML model
import botoml = boto.connect_machinelearning()
ml.create_evaluation( evaluation_id = “ev-tweets”,evaluation_name = “Evaluation of tweet screening
model”,ml_model_id = “ml-tweets”,
evaluation_data_source_id = “ds-tweets-eval”)
Input data location is looked up from the evaluation datasource ID
Amazon ML automatically selects and computes an industry-standard evaluation metric based on your ML model type
Visually inspecting and adjusting the ML model
Building smart applications
Pick the ML strategy
1
Preparedataset
2 3
CreateML model
4
Write and configure code
5
Try it out!
Reminder: Our data flow
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
Create an Amazon ML endpoint for retrieving real-time predictions
import botoml = boto.connect_machinelearning()
ml.create_realtime_endpoint(“ml-tweets”)# Endpoint information can be retrieved using the get_ml_model() method. Sample output: #"EndpointInfo": {# "CreatedAt": 1424378682.266, # "EndpointStatus": "READY", # "EndpointUrl": ”https://realtime.machinelearning.us-east-1.amazonaws.com", # "PeakRequestsPerSecond": 200}
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
Create an Amazon Kinesis stream for receiving tweets
import botokinesis = boto.connect_kinesis()
kinesis.create_stream(stream_name = ‘tweetStream’, shard_count = 1)# Each open shard can support up to 5 read transactions per second, up to a # maximum total of 2 MB of data read per second. Each shard can support up to # 1000 records written per second, up to a maximum total of 1 MB data written # per second.
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
Set up AWS Lambda to coordinate the data flow
The Lambda function is our application’s backbone. We will:
1. Write the code that will process and route tweets2. Configure the Lambda execution policy (what is it allowed to do?)3. Add the Kinesis stream as the data source for the Lambda function
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
Create Lambda functions
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
// These are our function’s signatures and globals only. See GitHub repository for full source. var ml = new AWS.MachineLearning(); var endpointUrl = ''; var mlModelId = ’ml-tweets'; var snsTopicArn = 'arn:aws:sns:{region}:{awsAccountId}:{snsTopic}'; var snsMessageSubject = 'Respond to tweet'; var snsMessagePrefix = 'ML model '+mlModelId+': Respond to this tweet: https://twitter.com/0/status/';
var processRecords = function() {…} // Base64 decode the Kinesis payload and parse JSONvar callPredict = function(tweetData) {…} // Call Amazon ML real-time prediction APIvar updateSns = function(tweetData) {…} // Publish CS-actionable tweets to SNS topicvar checkRealtimeEndpoint = function(err, data) {…} // Get Amazon ML endpoint URI
Create Lambda functions
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
// These are our function’s signatures and globals only. See GitHub repository for full source. var ml = new AWS.MachineLearning(); var endpointUrl = ''; var mlModelId = ’ml-tweets'; var snsTopicArn = 'arn:aws:sns:{region}:{awsAccountId}:{snsTopic}'; var snsMessageSubject = 'Respond to tweet'; var snsMessagePrefix = 'ML model '+mlModelId+': Respond to this tweet: https://twitter.com/0/status/';
var processRecords = function() {…} // Base64 decode the Kinesis payload and parse JSONvar callPredict = function(tweetData) {…} // Call Amazon ML real-time prediction APIvar updateSns = function(tweetData) {…} // Publish CS-actionable tweets to SNS topicvar checkRealtimeEndpoint = function(err, data) {…} // Get Amazon ML endpoint URI
Configure Lambda execution policy
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
{ "Statement": [ { "Action": [ "logs:*” ],
"Effect": "Allow", "Resource": "arn:aws:logs:{region}:{awsAccountId}:log-group:/aws/lambda/{lambdaFunctionName}:*"
},{ "Action": [ "sns:publish” ],
"Effect": "Allow", "Resource": "arn:aws:sns:{region}:{awsAccountId}:{snsTopic}"
},{ "Action": [ "machinelearning:GetMLModel”, "machinelearning:Predict” ],
"Effect": "Allow", "Resource": "arn:aws:machinelearning:{region}:{awsAccountId}:mlmodel/{mlModelId}”
},{ "Action": [ "kinesis:ReadStream”, "kinesis:GetRecords”,
"kinesis:GetShardIterator”, "kinesis:DescribeStream”,"kinesis:ListStreams” ], "Effect": "Allow", "Resource": "arn:aws:kinesis:{region}:{awsAccountId}:stream/{kinesisStream}"
}] }
Configure Lambda execution policy
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
{ "Statement": [ { "Action": [ "logs:*” ],
"Effect": "Allow", "Resource": "arn:aws:logs:{region}:{awsAccountId}:log-group:/aws/lambda/{lambdaFunctionName}:*"
},{ "Action": [ "sns:publish” ],
"Effect": "Allow", "Resource": "arn:aws:sns:{region}:{awsAccountId}:{snsTopic}"
},{ "Action": [ "machinelearning:GetMLModel”, "machinelearning:Predict” ],
"Effect": "Allow", "Resource": "arn:aws:machinelearning:{region}:{awsAccountId}:mlmodel/{mlModelId}”
},{ "Action": [ "kinesis:ReadStream”, "kinesis:GetRecords”,
"kinesis:GetShardIterator”, "kinesis:DescribeStream”,"kinesis:ListStreams” ], "Effect": "Allow", "Resource": "arn:aws:kinesis:{region}:{awsAccountId}:stream/{kinesisStream}"
}] }
Allow request logging in Amazon
CloudWatch
Configure Lambda execution policy
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
{ "Statement": [ { "Action": [ "logs:*” ],
"Effect": "Allow", "Resource": "arn:aws:logs:{region}:{awsAccountId}:log-group:/aws/lambda/{lambdaFunctionName}:*"
},{ "Action": [ "sns:publish” ],
"Effect": "Allow", "Resource": "arn:aws:sns:{region}:{awsAccountId}:{snsTopic}"
},{ "Action": [ "machinelearning:GetMLModel”, "machinelearning:Predict” ],
"Effect": "Allow", "Resource": "arn:aws:machinelearning:{region}:{awsAccountId}:mlmodel/{mlModelId}”
},{ "Action": [ "kinesis:ReadStream”, "kinesis:GetRecords”,
"kinesis:GetShardIterator”, "kinesis:DescribeStream”,"kinesis:ListStreams” ], "Effect": "Allow", "Resource": "arn:aws:kinesis:{region}:{awsAccountId}:stream/{kinesisStream}"
}] }
Allow publication of notifications to
SNS topic
Configure Lambda execution policy
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
{ "Statement": [ { "Action": [ "logs:*” ],
"Effect": "Allow", "Resource": "arn:aws:logs:{region}:{awsAccountId}:log-group:/aws/lambda/{lambdaFunctionName}:*"
},{ "Action": [ "sns:publish” ],
"Effect": "Allow", "Resource": "arn:aws:sns:{region}:{awsAccountId}:{snsTopic}"
},{ "Action": [ "machinelearning:GetMLModel”, "machinelearning:Predict” ],
"Effect": "Allow", "Resource": "arn:aws:machinelearning:{region}:{awsAccountId}:mlmodel/{mlModelId}”
},{ "Action": [ "kinesis:ReadStream”, "kinesis:GetRecords”,
"kinesis:GetShardIterator”, "kinesis:DescribeStream”,"kinesis:ListStreams” ], "Effect": "Allow", "Resource": "arn:aws:kinesis:{region}:{awsAccountId}:stream/{kinesisStream}"
}] }
Allow calls to Amazon ML
real-time prediction APIs
Configure Lambda execution policy
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
{ "Statement": [ { "Action": [ "logs:*” ],
"Effect": "Allow", "Resource": "arn:aws:logs:{region}:{awsAccountId}:log-group:/aws/lambda/{lambdaFunctionName}:*"
},{ "Action": [ "sns:publish” ],
"Effect": "Allow", "Resource": "arn:aws:sns:{region}:{awsAccountId}:{snsTopic}"
},{ "Action": [ "machinelearning:GetMLModel”, "machinelearning:Predict” ],
"Effect": "Allow", "Resource": "arn:aws:machinelearning:{region}:{awsAccountId}:mlmodel/{mlModelId}”
},{ "Action": [ "kinesis:ReadStream”, "kinesis:GetRecords”,
"kinesis:GetShardIterator”, "kinesis:DescribeStream”,"kinesis:ListStreams” ], "Effect": "Allow", "Resource": "arn:aws:kinesis:{region}:{awsAccountId}:stream/{kinesisStream}"
}] }
Allow reading of data from
Kinesis stream
Connect Kinesis stream and Lambda function
import boto
aws_lambda = boto.connect_awslambda()
aws_lambda.add_event_source( event_source = 'arn:aws:kinesis:' + region + ':' + aws_account_id + ':stream/' + “tweetStream”, function_name = “process_tweets”, role = 'arn:aws:iam::' + aws_account_id + ':role/' + lambda_execution_role)
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
Building smart applications
Pick the ML strategy
1
Preparedataset
2 3
CreateML model
4
Write and configure code
5
Try it out!
Amazon ML real-time predictions test
Here is a tweet:
Amazon ML real-time predictions test
Here is the same tweet…as a JSON blob:
{ "statuses_count": "8617", "description": "Software Developer", "friends_count": "96", "text": "`scala-aws-s3` A Simple Amazon #S3 Wrapper for #Scala 1.10.20 available :
https://t.co/q76PLTovFg", "verified": "False", "geo_enabled": "True", "uid": "3800711", "favourites_count": "36", "screen_name": "turutosiya", "followers_count": "640", "user.name": "Toshiya TSURU", "sid": "647222291672100864"
}
Amazon ML real-time predictions test
Let’s use the AWS Command Line Interface to request a prediction for this tweet:
aws machinelearning predict\
--predict-endpoint https://realtime.machinelearning.us-east-1.amazonaws.com\
--ml-model-id ml-tweets\
--record ‘<json_blob>’
Amazon ML real-time predictions test
Let’s use the AWS Command Line Interface to request a prediction for this tweet:
aws machinelearning predict\
--predict-endpoint https://realtime.machinelearning.us-east-1.amazonaws.com\
--ml-model-id ml-tweets\
--record ‘<json_blob>’{ "Prediction": { "predictedLabel": "0", "predictedScores": { "0": 0.012336540967226028 }, "details": { "PredictiveModelType": "BINARY", "Algorithm": "SGD" } }}
Recap: Our application’s data flow
Twitter API Amazon Kinesis
AWSLambda
Amazon Machine Learning
AmazonSNS
End-to-end application demo
Generalizing to more feedback channels
Amazon Kinesis
AWSLambda
Model 1 AmazonSNS
Model 2
Model 3
What’s next?
Try the service:http://aws.amazon.com/machine-learning/
Download the Social Media Listening application code: http://bit.ly/AmazonMLCodeSample
Get in [email protected]
Thank you!