[aws la media & entertainment event 2015]: cloud analytics for audience engagement
TRANSCRIPT
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
Cloud Analytics for Audience Engagement
Mike Limcaco | AWS Principal Solutions Architect Mick Bass | 47Lining CEO
http://mashable.com
Search
Watch
Listen
Play
Download
Purchase
Rate It
Review It
Sharing
Tagging
Bookmarking
GB TB PB
ZB
EB
65M+ Subscribers 50 Countries
1000+ Devices 10B+ Hours/Quarter
Netflix: Over 75% of what people watch come from recommendations [1]
[1] http://bit.ly/1WIInoh
Machine Learning for Predictive Analytics
• Branch of Artificial Intelligence and Statistics • Programming computers based on historical experience • Focuses on prediction based on known properties learned
from training data
Signals Predictions
A Few Machine Learning Tasks
Recommendations
Clustering
Classification
Why Cloud?
• Scale • Adaptability • Agility
(Some) Big Data Services
Amazon S3
Internet scale storage
Amazon Machine Learning
Hosted predictive analytics service
Amazon EMR
Hosted Hadoop framework
Maximizing Audience Engagement
• Recommendations
• Ad / Marketing Campaign Targeting – Sentiment analysis – Automated segmentation
• Predict audience churn
Examples
Recommendations
The Concept
ü Capture Audience Signal Data ü Create history of user and item preferences
ü Estimate similar users and items ü Record these in Search Engine ü Query Search Engine with User History
ü Enjoy recommendations!
http://www.slideshare.net/tdunning/recommendation-techn
Log Storage
ETL
User Interface
Serving Layer
users
Recommend Engine
users
Media platforms
Mobile
Search Play Buy Rate
Recommendations
Apache Mahout on Amazon EMR
• Library of scalable machine-learning algorithms
• Single-node as well as distributed capabilities
– Hadoop Map-Reduce (BATCH | OFFLINE) – Evolving to support other execution platforms
(NEAR-REALTIME) • Apache Spark • H20
Spark H20
Recommendation Clustering Classification
Math Library
Hadoop
Map-Reduce
mike,view,movie-a mike,view,movie-b mike,view,movie-c mike,buy,movie-b chris,view,movie-b chris,buy,movie-d …
movie-b movie-c:2.772588722239781 movie-a:2.772588722239781
movie-d ….Indicators
(“Items Similar To This….”)
% mahout spark-itemsimilarity -i input-folder/data.txt -o output-folder/ --filter1 buy -fc 1 -ic 2 --filter2 view
Step 1: Logs à History Matrix
User1 Thing1 User2 Thing2 User3 Thing3 User2 Thing4 User5 Thing1 User1 Thing2 User1 Thing3
Mike
Jon
Mary
Phil
Kris
Logs History Matrix
Step 2: Estimate Similar Things
History Matrix
2 8
2 4
8
4
Item-Item Matrix
Step 3: Reduce to Interesting Pairs
2 8
2 4
8
4
Item-Item Matrix
LLR
Indicators (“Items Similar To This….”)
Step 3: Reduce to Interesting Pairs
Indicators (“Items Similar To This….”)
Items Similar To This
Step 4: Store Indicators in a Search Engine (BATCH)
Superman Highlander, Dune
Star Wars Raiders, Minority Report
Highlander Superman Mulan Home Alone,
Mermaid Star Trek … … …
4587 223, 5234 748 5345, 235 12 8234 245 9543, 7673 3456 4587 … …
Index
Indicators
Step 5: Query Search Engine w/ User History (REALTIME)
748 Star Wars 45, 235 12 Highlander 8234 245 Mulan 9543,
7673 4587 Superman 12, 5234 3456 Star Trek 2458 …
Query
“12”
5345
3456
12
Sentiment Analysis
“I thought Episode 29 was not without merit J”
Is this Positive or Negative?
The Concept
ü Capture social media signals ü NRT Streams (Tweets, Comments on FB, IMDB, YouTube …)
ü Push through Sentiment Analyzer ü Tokenize ü Classify (estimate) as Positive | Negative
ü Provide actionable insight ü Improve sort order of recommendations ü Alert / advise Digital Marketing team
Training
Positive Negative
Knowledge Base
Segment Classify
Segment Classify
Segment Classify
Model Training
Positive Negative
Stream Ingest
Stream Ingest
Stream Ingest
Knowledge Base
Segment Classify
Segment Classify
Segment Classify
Model Training
Positive Negative
Stream Ingest
Stream Ingest
Stream Ingest
Knowledge Base
“I adored this movie”
“adore” = POSITIVE
GNIP
Datasift
Other
Positive Negative
Amazon Kinesis
Amazon Kinesis
Amazon Kinesis
Model Training & Storage
Stream Ingest
Sentiment Classification
Amazon Machine Learning
Trending Sentiment
Neutral Negative Positive
Sentiment Training Sets
Case Study: OTT Predictive Analytics
UsingDataforStrategicAdvantage
Proof-of-ConceptOTTservicesprovidetheopportunitytoestablisha“conversa.onwiththecustomer.”BigdatatechniquescanbeappliedtofusemassiveamountsofOTTlogs,3rdpartydemographics,andsocialmediasen<menttogaininsightintotheaudienceanddeveloppredic.vecapabili.es.47LiningandAmazonWebServicespartneredtodevelopaPoCforanOTTcustomertouseamachine-drivenapproachtoaccuratelypredictuserbehaviorwithintheirconsumervideoapplica<on.
100MConsumerinterac9ons
MillionsofUsers
3rdPartyDemographics
10KTitles
SocialMedia
Fuse/Visualize
5DataSources*
Per-SegmentRecommenda9ons
Predictors Enablers
71%AccurateChurn
Enhanceaudience
engagementthroughrelevantoffers
ResultsScalability
Automa9on
Agility
Cost
effec9veness
MachineLearning
*OrderofMagnitudeforPoCScope
Speed,Agility,Scalability
WhyCloud?
38
AWSMachineLearning.Delivered71%predic=veaccuracyforuserchurnduringPoC.
Hardware.Cloudeconomicsallowustoonlypayforwhatweuse.
RedshiC.Petabyte-scaledatawarehouse.Effec9velyfeedsMachineLearningorEMR.
Elas=cMapReduce.Unparalleledspeedandscaleforbigdatarecommenda9ons.
Managedservicesthat“justwork”providingspeed,agilityandscale
Pre-CloudChallengesNeedtoprocure
hardwareforpeaks
ProprietarymachinelearningsoTwareonlyDatascien9stscanuse
Needtoprocuredata
warehouse
Always-on,on-premisehadoopclusterswith
highmanagementcosts
PoCArchitecture
39
OTTApp
S3
RedshiT
MachineLearning
Transforma9ons
PredictorServices
BusinessIntelligenceToolsVisualiza9ons&Dashboards
Perio
dic
3rdPartyDemographicData
Sen9mentAnalysis/SocialData
S3
TitleData
AWSMLAPIMahout
ALSRecommender
Elas9cMapReduce
RAnalysisSandboxes
Lessthan~$1Kininfrastructure
AWS CloudFormation
template stack
Logs
Periodic
Automa<onEnvironmentcanbereplicatedusingNucleator,AnsibleandCloudForma9on
ThePowerofKnowingyourUsers
40
Per-AccountSignatures
ClusteringDimensions
DistanceHeuris9c
ProfilingDimensions
ClusterAnalysishierarchical|model-based
SegmentAnalysis
…
…
…
1
2
n
segments çCohortsè
100MConsumerinterac9ons
MillionsofUsers
3rdPartyDemographics
10kTitles
SocialMedia
AccountSignatureDefini9ons
6dis<nctuserpersonasemerged
*OrderofMagnitudeforPoCScope
TurningUserKnowledgeintoEngagement
41
1
2
n
segments
Per-SegmentRecommenda9ons
Predictors
71%AccurateChurn
AccountChurnPredictor.Usingtheiden9fiedSegmentswithAmazonMachineLearning,wedevelopedamodeltopredictwhenanaccountisatriskforchurn.Accuratepredic9onsenablemeaningfulofferstousersbeforethisoccurs.
ViewedContentPredictor.
Usingtheiden9fiedSegments,weusedRedshiC,RandAmazonElas=cMapReduce/Mahouttopredictcontentthatusersmayfinddesirablethatalsoachievesengagementobjec9ves.Suchrecommenda9onsenableproac9vesculp9ngofuserbehavior.
Realizinga“S9ckiness”StrategythroughData
42
AWS’speed,agilityandscalabilityareanaturalfitforOTTpredic9veanaly9cs.
CustomerStrategyAc.onWithcapabili<esdemonstratedandde-risked,customerisintegra<ngpredictors
intotheir2016userexperience.De-riskingcost~$1kininfrastructure
71%accuracyinchurnpredic9on Reten9onoffersin2016
Per-PersonaRecommenda9ons Keepusersgluedin2016
PoCwasonlypossibleintheCloud
Knowyourcustomer
S9ckiness=abilitytosteertheconversa9onwithyourcustomer
Summary
43
Summary
• Audience demands increasingly personalized content • We want to understand and predict these needs and
adapt discovery & delivery of media • Audience interactions can be analyzed to
– Surface patterns of common behavior – Estimate or predict audience demand / churn
• AWS enables tools & techniques for scalable machine learning