aws summit benelux 2013 - media and online advertising on aws
TRANSCRIPT
Media and Online Advertising
on AWS
Jan Borch – AWS Solutions Architect
Media Application on AWS
Music streaming
Media Application on AWS
Video streaming
Media Application on AWS
Digital publishing
Media Application on AWS
Media sharing
Martijn Bakker
Chief Engineer,
WeTransfer
Martijn Bakker, Chief Engineer, WeTransfer
AWS Summit Benelux 2013
WeTransfer
• Send big files anywhere
• Up to 2GB per transfer
• Files are stored for 7 days
• Beautiful backgrounds
(50/50 split between ads & art)
2
WeTransfer
• WeTransfer Plus: € 120 per year
• Up to 5GB per transfer
• Store up to 50GB permanently
• Your own backgrounds (no ads)
WeTransfer
• 1.8 Million transfers per day
• Downloads: 25 Gigabit per second
• 1.5 Million requests per hour (site + API)
• Over a Petabyte of storage used on S3
(Peak measurements - July 2013)
Company
• 15 wonderful, dedicated people
• Founded & based in Amsterdam
• Originated from Oy Communications
Origins
• Purely a functional tool
• Design company needs to send big files to clients
• FTP & friends “technical”, “confusing”
Early development
• OyTransfer + advertising = WeTransfer
• Goals:
• beautiful
• easy to use
• secure
Growth
• Double the transfers every
3 months
• Previous hoster:
• could not match growth
tempo adequately
• hardware-based platform;
adds maintenance & development costs
Growth
• Our 3 necessities:
• Storage
• Support
• Scalability
Growth
• AWS: on-demand, available
right away
• Initial migration:
• development time: 1 month
• using S3 through EC2 instances
The new WeTransfer
• Built for & with AWS: uses RDS, EC2, S3,
CloudWatch, DynamoDB, Route53, ElastiCache
• Ruby + HTML5 + JavaScript (frontend)
• Backend tailored around S3
• Launched January of this year
WeTransfer and S3
• Virtually unlimited storage capacity
• Redundancy: always available
• Fast, and cheap compared to similar offers
• Dramatically less costs
WeTransfer and S3
• Uses the multipart upload mechanism (where possible)
• Resumable uploads
• Uploads go directly to S3
thanks to CORS support
• Worker instances to process
uploaded content
WeTransfer and S3
• Secure upload / download, and encryption
• Regionalized: storage facilities all over the world to
ensure proper speeds to end users
• No maintenance
So why S3?
• Fast & flexible
• Almost no time spent on maintenance
• Virtually limitless capacity at the tips of your fingers
https://wetransfer.com/jobs
Online Advertising on AWS
YOUR
AD
HERE
Common challenge for Advertising Platforms ...
... device and media fragmentation ...
TXT
... scaled to millions of users
503 Service Temporarily Unavailable
The server is temporarily unable
to service your request due to
maintenance downtime or capacity
problems. Please try again later.
503 Service Temporarily Unavailable
The server is temporarily unable
to service your request due to
maintenance downtime or capacity
problems. Please try again later.
Maintain availability from one server…
…to thousands
Let's take a journey ...
Let's take a journey ...
Store
AWS S3 Storage for the Internet
STATIC FILES REPOSITORY
AMAZON S3
MEDIA AD SERVED TO USER
Let's take a journey ...
Store
Let's take a journey ...
Transform
AWS Elastic
Transcoder Video transcoding in the cloud
VIDEO FILES REPOSITORY
AMAZON S3
MEDIA AD SERVED TO USER
AMAZON ELASTIC
TRANSCODER
Let's take a journey ...
Transform
Let's take a journey ...
Deliver
AWS CloudFront Web service for content delivery
Dallas(2)
St.Louis
Miami
Jacksonville Los Angeles (2)
Palo Alto
Seattle
Ashburn(2)
Newark New York (2)
Dublin
London(2)
Amsterdam Stockholm
Frankfurt(2) Paris(2)
Singapore(2)
Hong Kong
Tokyo
Sao Paulo
South Bend
San Jose Osaka
Milan
Sydney
Reach a global audience Reach a global audience
CONTENT DELIVERY NETWORK
AMAZON CLOUDFRONT
IMPRESSION LOGS
Simple HLS video streaming architecture
In-house content
publication server
Source Video
Assets in S3
S3
Simple HLS video streaming architecture
In-house content
publication server
Source Video
Assets in S3
Video
transcoded into
HLS
S3 Elastic Transcoder
Simple HLS video streaming architecture
In-house content
publication server
Source Video
Assets in S3
Video
transcoded into
HLS
Edge Delivery
using CloudFront
Stockholm
NY
CloudFront S3 Elastic Transcoder
AWS CLI
aws s3 cp video.avi s3://mybucket/video
aws elastictranscoder create-job
--pipeline-id 1379510897399-mxjrif
--input '{"Key":"video/video.avi"}'
--outputs '[{"Key":"sample","PresetId":"1234-123", ...}]'
Let's take a journey ...
Deliver
Let's take a journey ...
Match
AWS EC2 Resizable compute capacity in the cloud
cc2.8xlarge
Virtual core: 32 - 2 x Intel Xeon
Memory: 60,5 GiB
I/O performance: 10 Gbit
Virtual core: 1
Memory: 1.7 GiB
I/O performance: Moderate
m1.small cr1.8xlarge
Virtual core: 32 - 2 x Intel Xeon
Memory: 240 GiB
I/O performance: 10 Gbit
SSD Instance store: 240 GB
cr1.8xlarge
Virtual core: 16
Memory: 60.5 GiB
I/O performance: 10 Gbit
SSD Instance store: 2 x 1TB
cr1.8xlarge
Virtual core: 16
Memory: 117 GiB
I/O performance: 10 Gbit
Instance store: 24 x 2TB
EC2 instance types
Amazon Route 53 Highly available and scalable Domain Name System
Extremely reliable and cost effective
Feature Details
Global Supported from AWS global edge locations for fast and reliable domain name resolution
Scalable Automatically scales based upon query volumes
Latency based
routing
Supports resolution of endpoints based upon latency, enabling multi-region application delivery
Integrated Integrates with other AWS services allowing Route 53 to front load balancers, S3 and EC2
Reach a global audience
Link to Ad Resource
AMAZON EC2 +
AUTOSCALING
Ad Servers
AMAZON ELB
AWS DYNAMODB fast & fully managed
NoSQL database service
AMAZON DYNAMODB
PROFILES DATABASE
ad-id advertiser max-price imps to
deliver
imps
delivered
1 AAA 100 50000 1200
2 BBB 150 30000 2500
user-id attribute1 attribute2 attribute3 attribute4
A XXX XXX XXX XXX
B YYY YYY YYY YYY
not many
rows
so many
rows
frequent
update
(near realtime)
batch manner update
Ads
Profiles(user-cookie)
Very general table structure
Let's take a journey ...
Match
Let's take a journey ...
Capture
Click-through Servers
AMAZON EC2 +
AUTOSCALING
AWS OPSWORKS INTEGRATED APPLICATION
MANAGEMENT
Stack
Layer Stack
Instances Layer Stack
Scale Instances Layer Stack
Agent on each
EC2 instance OpsWorks talks with
The heart of the service
Instance lifecycle and configuration hooks
Cookbooks
script "install_composer" do
interpreter "bash"
user "root"
cwd
"#{node[:deploy][:myphotoapp][:deploy_to]}/
current"
code <<-EOH
curl -s https://getcomposer.org/installer
| php
php composer.phar install
EOH
end
Amazon S3
Git repository
Let's take a journey ...
Capture
Let's take a journey ...
Report
CLICK-THROUGH LOG FILES
AMAZON S3
AMAZON ELASTIC MAP
REDUCE
CLICK-THROUGH LOG FILES
Data Growth
GB
TB
PB
Data Growth
Data Growth
Data Growth
Server Logs
Click Analysis
Impression logs
Sampling
Big Data
Time to process
Inflexible
Complexities of Big Data
Sampling
Big Data
Inflexible
Complexities of Big Data
Elastic Map Reduce &
Redshift
Sampling
Big Data
Complexities of Big Data
“Queryable”
Elastic Map Reduce &
Redshift
Big Data
Complexities of Big Data
“Queryable” All Data
Elastic Map Reduce &
Redshift
Data Insight
Turning Data into Information
Data Insight
Elastic
MapReduce
Turning Data into information
Redshift
AWS Elastic Map Reduce
Process vast amounts of data using Hadoop
AWS Redshift Fast, fully managed, petabyte-scale data
warehouse service
Let's take a journey ...
Store
Transform
Deliver
Match
Capture
Report
Amazon Web Services
Garry Turkington,
CTO,
Improve Digital
PRESENTATION
TYPE
IMPROVE DIGI
Amazon Web
Services at
Improve Digital
IMPROVE
DIGITAL
26 September 2013 Garry Turkington CTO
@garryimprove
• Cloud-based Real Time Advertising
Technology
• Focus on the premium publisher / media
owner
• Integrations with thousands of Demand
Partners
• Decision driven by Real Time Data
• Offices in UK, NL, DE & ES
• +100 Premium Publishers
IMPROVE
DIGITAL
ABOUT US
Use AWS in conjunction with dedicated physical
infrastructure
2 sides to the story
• Front end: serving of ads to end-users
• Back-end: Data processing and dev/test
Use of AWS
• Fleet of ad servers running mostly on EC2
• Ad serving process is computationally expensive and has strict time constraints
• Need ability to spin up additional instances based on demand: horizontally scalable system
• Place ad servers in different regions to reduce serving latency; big benefit of EC2 over physical kit
• Grow fleets in different regions separately
Serving ads
• S3/Glacier used for policy-driven data retention
• S3 is the starting point for AWS and on-premises data processing jobs
• S3 used as a shared storage space between distributed components
• VPC used to integrate AWS and on-premises flows and systems
• Automated deployment of dev/test software into VPC EC2 has been great
Backend systems
As a startup it was almost a no-brainer
• Didn't want/need overhead of own physical infrastructure
• Pricing model with hugely reduced (or zero) up-front cost is an easy sell
• Coordinating the ability to quickly grow the ad server fleet is *hard* with a physical data centre
• As a more mature company the above still apply
• In addition our needs have also matured from the lower level "give us servers and storage"
Why do we use AWS?
Lessons learned:
It works! Service integration is often ridiculously easy: pull S3 data into EMR, set
up auto-scaling etc
Geographic data locality -- helps with compliance
Automatic cost reductions does wonders for corporate acceptance
Continuous evolution of the services means that they suddenly can be
a great fit
Lessons learned:
It works in its own way
Need to understand exactly what each service offers
Need to design for fault tolerance; instances can fail at high scale
Had to work hard to get our network to integrate with VPC
Can't save you from yourself; poor design is poor design
Would still love to see another region in EU
• Growth means more of all the above
• Want to re-evaluate services that weren't a great fit for us in the past (RDS, DynamoDB)
• Believe we can use data processing services (Elastic MapReduce in particular) alongside on-premises systems
• Looking to Cloud Formation/ Elastic Beanstalk and Opsworks to extend automation much further
The future