aws re:invent 2016: deep learning at cloud scale: improving video discoverability by scaling up...

44
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. November 2016 MAC205 Deep Learning at Cloud Scale Improving Video Discoverability by Scaling Up Caffe on AWS Andres Rodriguez, PhD, Solutions Architect, Intel Corporation Juan Carlos Riverio, CEO, Vilynx

Upload: amazon-web-services

Post on 08-Jan-2017

126 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

November 2016

MAC205

Deep Learning at Cloud ScaleImproving Video Discoverability by Scaling Up Caffe on AWS

Andres Rodriguez, PhD, Solutions Architect, Intel Corporation

Juan Carlos Riverio, CEO, Vilynx

Page 2: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Content Outline

• Deep learning overview and usages

• Worked example for fine-tuning a NN

• Some theory behind deep learning

• Vilynx – videos discoverability

2

Page 3: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Deep Learning

• A branch of machine learning

• Data is passed through multiple non-linear

transformations

• Goal: Learn the parameters of the transformation that

minimize a cost function

3

Page 4: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Bigger Data Better Hardware Smarter Algorithms

Why Now?

Image: 1000 KB / picture

Audio: 5000 KB / song

Video: 5,000,000 KB / movie

Transistor density doubles

every 18 months

Cost / GB in 1995: $1000.00

Cost / GB in 2015: $0.03

Advances in algorithm innovation, including neural networks, leading to better accuracy in training models

4

Page 5: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Types of Deep Learning

• Supervised learning

• Data -> Labels

• Unsupervised learning

• No labels; Clustering; Reducing dimensionality

• Reinforcement learning

• Reward actions (e.g., robotics)

http://ode.engin.umich.edu/presentations/idetc2014/img/image_feature_learning_clear.png

5

Page 6: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

data

output expected

0.10 0.15 0.20 …0.05

person cat dog bike

0 1 0 … 0

person cat dog bike

penalty(error or cost)

Forward

Propagation

Back

Propagation

Training

6

Page 7: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

data

output expected

person cat dog bike

0 1 0 … 0

person cat dog bike

inference

Training

0.10 0.15 0.20 0.05

penalty(error or cost)

7

… …

Forward

Propagation

Back

Propagation

Page 8: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Deep Learning Use Cases

• Fraud / face detection

• Gaming, check processing

• Computer server

monitoring

• Financial forecasting and

prediction

• Network intrusion

detection

• Recommender systems

• Personal assistant

• Automatic Speech

recognition

• Natural language

processing

• Image & Video

recognition/tagging

• Targeted Ads

Cloud Service Providers

Financial

Services

Healthcare

Automotive

8

Page 9: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Optimized Deep Learning Environment

Fuel the development of vertical solutions

Deliver excellent deep learning environment

Develop deep networks across frameworks

Maximum performance on Intel architecture

EC2

Intel® Math Kernel Library (Intel® MKL)

9

Page 10: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Elastic Compute Cloud (EC2)

C4 Instances

• “Highest performing processors and the lowest price/compute

performance in EC2”1

• Vilynx

• Deep learning for video content extraction

• Supports various companies: CBS, TBS, etc.

1https://aws.amazon.com/ec2/instance-types/https://www.stlmag.com/news/st-louis-app-pikazo-will-turn-your-profile-picture/

• Pikazo app

• Transforms photos into artistic render

10

Page 11: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Elastic Compute Cloud (EC2)

C4 Instances

c4.8xlarge On-Demand:

• $1.675/hr

GoogleNet inference:

• batch size 32

• 237 ims/sec = 4.2 ms/im

• 1 million images costs

$1.96

Spot prices are cheaper

OS: Linux version 3.13.0-86-generic (buildd@lgw01-51) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #131-Ubuntu SMP Thu May 12 23:33:13

UTC 2016. MxNet Tip of tree: commit de41c736422d730e7cfad72dd6afc229ce08cf90, Tue Nov 1 11:43:04 2016 +0800. MKL 2017 Gold update 1

11

6.1 2.4 1.2 0.8

679.5

262.5

79.7 73.9

0

200

400

600

800

AlexNet GoogLeNet v1 ResNet-50 GoogLeNet v3

Imag

es/S

ec

c4.8xlarge MXNet Inference

No MKL MKL

Page 12: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Intel® Math Kernel Library 2017 (Intel® MKL 2017)

• Optimized for EC2 instances with Intel® Xeon® CPUs

• Optimized for common deep learning operations

• GEMM (useful in RNNs and fully connected layers)

• Convolutions

• Pooling

• ReLU

• Batch normalization

Recurrent NN Convolutional NN12

Page 13: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Naïve Convolution

https://en.wikipedia.org/wiki/Convolutional_neural_network

13

Page 14: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Cache Friendly Convolution

arxiv.org/pdf/1602.06709v1.pdf

14

Page 15: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Gradient Descent

𝐽 𝒘(0) =

𝑖=1

𝑁

𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)

𝒘𝒘(0)

15

Page 16: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Gradient Descent

𝐽 𝒘(0) =

𝑖=1

𝑁

𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)

𝒘𝒘(0)

𝑑𝐽 𝒘(0)

𝑑𝒘

16

Page 17: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Gradient Descent

𝐽 𝒘(0) =

𝑖=1

𝑁

𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)

𝒘𝒘(0)

𝒘(1) = 𝒘(0) −𝑑𝐽 𝒘(0)

𝑑𝒘

17

Page 18: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Gradient Descent

𝐽 𝒘(0) =

𝑖=1

𝑁

𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)

𝒘𝒘(0)

𝒘(1) = 𝒘(0) − 𝛼𝑑𝐽 𝒘(0)

𝑑𝒘

learning rate

18

Page 19: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Gradient Descent

𝐽 𝒘(0) =

𝑖=1

𝑁

𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)

𝒘𝒘(0)

𝒘(1) = 𝒘(0) − 𝛼𝑑𝐽 𝒘(0)

𝑑𝒘

𝒘(1)

too small

19

Page 20: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Gradient Descent

𝐽 𝒘(0) =

𝑖=1

𝑁

𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)

𝒘𝒘(0)

𝒘(1) = 𝒘(0) − 𝛼𝑑𝐽 𝒘(0)

𝑑𝒘

𝒘(1)

too large

20

Page 21: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Gradient Descent

𝐽 𝒘(0) =

𝑖=1

𝑁

𝑐𝑜𝑠𝑡(𝒘(0), 𝒙𝑖)

𝒘𝒘(0)

𝒘(1) = 𝒘(0) − 𝛼𝑑𝐽 𝒘(0)

𝑑𝒘

𝒘(1)

good enough

21

Page 22: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Gradient Descent

𝐽 𝒘(1) =

𝑖=1

𝑁

𝑐𝑜𝑠𝑡(𝒘(1), 𝒙𝑖)

𝒘𝒘(2)

𝒘(2) = 𝒘(1) − 𝛼𝑑𝐽 𝒘(1)

𝑑𝒘

𝒘(1)

22

Page 23: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Gradient Descent

𝐽 𝒘(2) =

𝑖=1

𝑁

𝑐𝑜𝑠𝑡(𝒘(2), 𝒙𝑖)

𝒘

𝒘(3) = 𝒘(2) − 𝛼𝑑𝐽 𝒘(2)

𝑑𝒘

𝒘(2)

𝒘(3)

23

Page 24: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Gradient Descent

𝐽 𝒘(3) =

𝑖=1

𝑁

𝑐𝑜𝑠𝑡(𝒘(3), 𝒙𝑖)

𝒘

𝒘(4) = 𝒘(3) − 𝛼𝑑𝐽 𝒘(3)

𝑑𝒘

𝒘(4)

𝒘(3)

24

Page 25: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Transfer learning via fine-tuning

• First few layers are usually very similar within a domain

• Last layers are task specific

• Take a trained model and fine-tune it for a particular task

http://vision.stanford.edu/Datasets/collage_s.png

https://www.kaggle.com/c/dogs-vs-cats

http://adas.cvc.uab.es/task-cv2016/papers/0026.pdf

25

Page 26: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

• Install Intel-Optimized Caffe (or your favorite framework)

• https://software.intel.com/en-us/articles/training-and-deploying-deep-

learning-networks-with-caffe-optimized-for-intel-architecture

• Download a pre-trained model

• http://dl.caffe.berkeleyvision.org/bvlc_reference_caffenet.caffemodel

• Modify the training model (next slide)

Fine-tuning steps

26

Page 27: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Fine-tuning: ILSVRC -> DogsVsCatslayer {

name: "data"

type: "Data"

data_param {

source: "ilsvrc12_train_lmdb"

...

}

...

}

...

layer {

name: "fc8"

type: "InnerProduct"

inner_product_param {

num_output: 1000

...

}

}

layer {

name: "data"

type: "Data"

data_param {

source: “dogs_cats_train_lmdb"

...

}

...

}

...

layer {

name: "fc8-ft"

type: "InnerProduct"

inner_product_param {

num_output: 2

...

}

}>> # From the command line

>> caffe train -solver solver.prototxt -weights trainedModel.caffemodel

27

Page 28: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Fine-tuning guidelines

• Freeze all but the last layer (or more if new dataset is very different)

• lr_mult=0 in local learning rates

• Earlier layer weights won't change very much

• Drop the initial learning rate (in the solver.prototxt) by 10x

Replace 1000 with 2 unit layer

Train the 4096+1 x 2 weights

http://www.mdpi.com/remotesensing/remotesensing-07-14680/article_deploy/html/images/remotesensing-07-14680-g002-1024.png

28

Page 29: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Demo

• Fine-tune trained model for dog vs cats

http://vision.stanford.edu/Datasets/collage_s.png

https://www.kaggle.com/c/dogs-vs-cats

29

Page 30: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Juan Carlos Riveiro: CEO and Cofounder

30

Page 31: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

How?. . .Building the biggest dataset for video deep learning by auto tagging selected video

scenes in real-time and leveraging web and social media to continues update the tags

Hello. We're Vilynx, the video personalization company

We select the relevant contents targeted to individual needs

solving the content discovery problem.

Benefit?..Increase views, time spent watching videos and in video search.

Markets: Media, Smart Phones, Drones, Security, Robots, Smart Cities.

31

Page 32: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Outstanding Tech Team: Experienced and Very Successful

Juan Carlos Riveiro, CEO

More than 100 patents in Signal Processing, Data

statistics/algorithms and Machine Learning.

Founder and CEO of Gigle Networks (Acquired by

Broadcom),

CTO & VP of R&D at DS2 (Acquired by Marvell).

Elisenda Bou, CTO

PhD from UPC and MIT and expert on Machine Learning

and Complex SW Architectures. Worked on adaptive

satellite control systems and recipient of the 2013 Google

Faculty Research Awards.

José Cordero Rama

MS for Deep Learning at UPC/BSC

Data Scientist at King, Bdigital and Gen-Med

Joan Capdevila, PhD

MS and PhD for Machine Learning

At Georgia Tech and UPC/BSC

Data Scientist at AIS and Accenture

Jordi Pont-Tuset, PhD

PostDoc on Machine Learning at ETH Zurich

PhD on Image Segmentation at UPC

Disney Research

Asier Aduriz

Computer Science and Telecom Engineering

degree at UPC (Top 1% of class)

Engineer at CERN.

Dèlia Fernàndez

MS on Deep Learning at Columbia University

Signal Processing Researcher at Northeastern University

Data Scientist at InnoTech

David Varas, PhD

PhD for Video Object Tracking at UPC

Adjunct Professor on Computer Vision &

Statistical Signal Processing at UPC

32

Page 33: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Vilynx: Indexing Visual Knowledge

8 cameras/car

Smart Cities

Connecting Everything

VR/AR Changing Everything A camera at every

corner in London

Drones everywhere (Amazon)

How is all this visual content going to be indexed?

Just like the internet before Google

+1000 hours of video uploaded

every minute in internet

33

Page 34: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

The Vilynx Knowledge Graph

The average vocabulary of a 5-year

old is 5000 words

• 4800 words/concepts

• 1.8 tags per video

• 8M videos

The average vocabulary of an adult

is 30,000 words

• 2M words/concepts

• 50 tags per video

• 10M videos 34

Page 35: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

First Market driven by Video Content Producers

Media companies need content personalization to drive audience

through multiple channels

35

Page 37: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Vilynx Products

Inputs:

Outputs:

Applications:

37

Videos Audience DataContextual Data:

Social Networks, YouTube, Web

Key 5 sec clips Intelligent Auto Tagging

• Better video

discovery

• Native Ad

integration

• Programmatic

Ad matching

• More video

views and

longer

engagement

times

• VOD & Live

Events

• Drive branding

• Amplification

with keyword

recommendation

• Drive Click

through rates

• Better user

experience

Video Thumbnails Social Sharing Recommendations Video Search Ad Market

Page 38: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Vilynx | Workflow

Machine Learning or Deep Learning

4

3

12

98% accuracy to find the relevant parts of the video

CTR increase between 50% to 500% (customer validated)38

1. We ingest customer videos and the contextual information around it.

2. We then take cues from around the Web and social networks.

3. This combined input is fed to the most advanced convolutional deep neural network in the industry.

4. Output are video previews optimized to engage your audience and rich metadata that can further drive your video content.

Page 39: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

A data training set of video moments that includes:

10M (and growing) tagged 5 sec video moments,

ImageNet for video has only 4000 moments

2M Contextual tags (and growing)

Continuously updated training set of new tags by

crawling of social media/the web

Real time unsupervised training of the network to

autonomously learn and identify new patterns

Advancing Deep Learning Networks:

Move from simple classification to indexing all visual content

39

Page 40: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Demo Results

• Fine-tune dogs vs cats classifier results

http://vision.stanford.edu/Datasets/collage_s.png

https://www.kaggle.com/c/dogs-vs-cats

40

Page 41: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Call to action

• Use Intel Optimized Frameworks for workloads

• https://github.com/intel/caffe

• https://github.com/dmlc/mxnet

• https://github.com/intel/theano

• https://github.com/intel/torch

• other frameworks coming soon…

• Deep learning tutorial

• https://software.intel.com/en-us/articles/training-and-deploying-deep-learning-networks-with-caffe-

optimized-for-intel-architecture

• Distributed training of deep networks on AWS

• https://software.intel.com/en-us/articles/distributed-training-of-deep-networks-on-amazon-web-

services-aws

41

Page 42: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Legal Notices & Disclaimers

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice.

Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at

intel.com, or from the OEM or retailer. No computer system can be absolutely secure.

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual

performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance

and benchmark results, visit http://www.intel.com/performance.

Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may

affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a

number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the

annual report on Form 10-K.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current

characterized errata are available on request.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm

whether referenced data are accurate.

Intel, the Intel logo, Pentium, Celeron, Atom, Core, Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

© 2016 Intel Corporation.

42

Page 43: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Thank you!

(huge) contributions from:

Joseph Spisak, Elisenda Bou, Hendrik Van der Meer, Zhenlin Luo, Ravi Panchumarthy,

Ryan Saffores, Niv Sundaram, and many more..

Page 44: AWS re:Invent 2016: Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS (MAC205)

Remember to complete

your evaluations!