pakdd2016 tutorial dlif: introduction and basics

22
Tutorial: Deep learning implementations and frameworks Seiya Tokui*, Kenta Oono*, Atsunori Kanemura + , Toshihiro Kamishima + *Preferred Networks, Inc. (PFN) {tokui,oono}@preferred.jp + National Institute of Advanced Industrial Science and Technology (AIST) [email protected], [email protected] 1 2016-04-19 DLIF Tutorial @ PAKDD2016

Upload: atsunori-kanemura

Post on 20-Jan-2017

67 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: PAKDD2016 Tutorial DLIF: Introduction and Basics

Tutorial:Deep learning

implementations and frameworks

Seiya Tokui*, Kenta Oono*, Atsunori Kanemura+, Toshihiro Kamishima+

*Preferred Networks, Inc. (PFN){tokui,oono}@preferred.jp

+National Institute of Advanced Industrial Science and Technology (AIST)

[email protected], [email protected] DLIF Tutorial @ PAKDD2016

Page 2: PAKDD2016 Tutorial DLIF: Introduction and Basics

Introduction

Atsunori KanemuraAIST, Japan

22016-04-19 DLIF Tutorial @ PAKDD2016

Page 3: PAKDD2016 Tutorial DLIF: Introduction and Basics

Objective•  Get into deep learning research and practices

•  1) Learn the building blocks that are common to most deep learning frameworks–  Review key technologies.

•  2) Understand the differences between the various implementations–  How specific DL frameworks differ–  Useful to decide which framework to start with

•  Not to know coding know-hows (although coding examples will be given).

32016-04-19 DLIF Tutorial @ PAKDD2016

Page 4: PAKDD2016 Tutorial DLIF: Introduction and Basics

Model audience•  Want to use neural networks•  Want to model neural network architectures for

practical problems •  Expected background:

– Basics of computer science and numerical computation

– General machine learning terminologies (in particular around supervised learning)

– Basic knowledge or practices of neural networks (recommended)

– Basic knowledge of Python programming language (recommended)

42016-04-19 DLIF Tutorial @ PAKDD2016

Page 5: PAKDD2016 Tutorial DLIF: Introduction and Basics

Overview• 1st session (8:30 – 10:00)

– Introduction (AK)– Basics of neural networks (AK)– Common design of neural network

implementations (KO)• 2nd session (10:30 – 12:30)

– Differences of deep learning frameworks (ST)

– Coding examples of frameworks (KO & ST)– Conclusion (ST)

52016-04-19 DLIF Tutorial @ PAKDD2016

Page 6: PAKDD2016 Tutorial DLIF: Introduction and Basics

Frameworks to be (and not to be) explained

•  Deeply explained with coding examples–  Chainer - Python–  Keras - Python–  Tensorflow – Python

•  Also compared–  Torch.nn – Lua–  Theano – Python–  Caffe – C++ & Python & Matlab–  MXNet ̶ Many–  autograd ̶ Python & Lua

•  Others not explained–  Cloud computing, Matlab toolboxes, DL4J, H2O, CNTK–  Wrappers: Lasagne, Blocks, skflow–  TensorBoard, DIGITS (only mention their names)

62016-04-19 DLIF Tutorial @ PAKDD2016

Page 7: PAKDD2016 Tutorial DLIF: Introduction and Basics

Basics of Neural Networks

Atsunori KanemuraAIST, Japan

72016-04-19 DLIF Tutorial @ PAKDD2016

Page 8: PAKDD2016 Tutorial DLIF: Introduction and Basics

Artificial neural networks•  Biologically inspired

– A biological neuron is a nonlinear unitconnected with synapses atthe dendrites (input) andthe axon (output)

•  A building block for pattern recognition systems (and more)

82016-04-19 DLIF Tutorial @ PAKDD2016

Page 9: PAKDD2016 Tutorial DLIF: Introduction and Basics

Why neural networks?•  Superior performance

–  Image recognition•  ImageNet LSVR Challenge – Exceeds human

performance– Playing games

•  AlphaGO – Human experts have defeated

•  Extended to other problems –  Images and text

•  Show & Tell – Generate texts from images with intermediate representations (“embeddings”)

–  Learn artist styles– Many others (translation, speech recognition, …)

92016-04-19 DLIF Tutorial @ PAKDD2016

Page 10: PAKDD2016 Tutorial DLIF: Introduction and Basics

Technical inside of NNs•  Layered processing with

linear transformation (aka. matrix multiplication, affine transformation)

+ nonlinear operation (aka. activation function)

•  Adapt to data

102016-04-19 DLIF Tutorial @ PAKDD2016

Page 11: PAKDD2016 Tutorial DLIF: Introduction and Basics

11

Mathematical model for a neuron•  Compare the product of input and

weights (parameters) with a threshold

– Plasticity of the neuron= The change of parameters and

… … ∑

f : nonlinear transform

2016-04-19 DLIF Tutorial @ PAKDD2016

bx

w

b

x1

x2y

w

xD

y = f

⇣ DX

d=1

wdxd � b

⌘= f(wT

x� b)

b

Page 12: PAKDD2016 Tutorial DLIF: Introduction and Basics

Generalized linear discriminant•  Generalized linear discriminant

–   :Nonlinear transformation– ⇒ Logistic (classical), Probit, etc.

122016-04-19 DLIF Tutorial @ PAKDD2016

f(·)

f(wTx)

?? by = f

⇣ DX

d=1

wdxd � b

⌘= f(wT

x� b)yn =

(1 (xn is positive)

0 (xn is negative)

Page 13: PAKDD2016 Tutorial DLIF: Introduction and Basics

Learning with loss minimization•  Learn from many samples•  Binary output

•  Define the loss function

•  Minimize J to learn (estimate) the parameters

132016-04-19 DLIF Tutorial @ PAKDD2016

(Squared error)

w⇤ = argminw

J(w)

{xn, y⇤n}Nn=1

y⇤n =

(1 (xn is positive)

0 (xn is negative)

J(w) =1

2

NX

n=1

(f(wTxn)� y⇤n)

2

Page 14: PAKDD2016 Tutorial DLIF: Introduction and Basics

Neural networks•  Multi-layered

•  Minimize the loss to learn the parameters

142016-04-19 DLIF Tutorial @ PAKDD2016

※ f works element-wisey

1 = f1(W10x)

y

2 = f2(W21y

1)

y

3 = f3(W32y

2)

...

y

L = fL(W(L)(L�1)

y

L�1)

J({W }) = 1

2

NX

n=1

(yL(xn)� y⇤n)2

Page 15: PAKDD2016 Tutorial DLIF: Introduction and Basics

Gradient descent•  The gradient of the loss for 1-layer model is

•  The update rule

152016-04-19 DLIF Tutorial @ PAKDD2016

(r is a constant learning rate)

rwJ(w) =1

2

NX

n=1

rw(f(wTxn)� y⇤n)

2

=NX

n=1

(f(wTxn)� y⇤n)rwf(wT

xn)

=NX

n=1

(f(wTxn)� y⇤n)f(w

Txn)(1� f(wT

xn))xn

w w � rrwJ(w) = w �NX

n=1

h(xn,w)xn

h(xn,w)def= (f(wT

xn)� y⇤n)f(wTxn)(1� f(wT

xn))

Page 16: PAKDD2016 Tutorial DLIF: Introduction and Basics

Backprop•  Use the chain rule to derive the gradient

•  E.g. 2-layer case

– ⇒ Calculate gradient recursively from top to bottom layers

•  Cf. Gradient vanishing, ReLU162016-04-19 DLIF Tutorial @ PAKDD2016

y

1n = f(W 10

xn), y2n = f(w21 · y1n)

@J

@W 10kl

=X

n,i

@J

@y1ni

@y1ni@W 10

kl

J(W 10,w21) =1

2

X

n

(y2n � y⇤n)2

Page 17: PAKDD2016 Tutorial DLIF: Introduction and Basics

Automatic Differentiation•  The math for backprop is obvious (but

tedious) if the NN architecture has been defined

•  Can be automatically calculated after defining the NN model

•  This is called automatic differentiation (which is a general concept that makes use of the chain rule)

172016-04-19 DLIF Tutorial @ PAKDD2016

Page 18: PAKDD2016 Tutorial DLIF: Introduction and Basics

Parameter update•  Gradient Descent (GD)

•  Stochastic Gradient Descent (SGD)– Take several samples (say, 128) from the

dataset (mini-batch), estimate the gradient.– Theoretically motivated as the Robbins-Monro

algorithm

•  SGD to general gradient-based algorithms– Adam, AdaGrad, etc.– Use momentum and other techniques

182016-04-19 DLIF Tutorial @ PAKDD2016

w w � rrwJ(w) = w �NX

n=1

h(xn,w)xn

h(xn,w¯)def= (f(wT

xn)� yn)f(wTxn)(1� f(wT

xn))

Page 19: PAKDD2016 Tutorial DLIF: Introduction and Basics

Overfitting and generalization error•  The goal of learning is to decrease the

generalization error, which is the error for previously unseen data

•  Having a low error at the data at hand is not enough (or even harmful)– We can achieve 0% error by memorizing all the

examples in the training data– Complicated models (i.e., NNs with many

parameters and layers) can achieve this (if the learning algorithm is clever enough).

192016-04-19 DLIF Tutorial @ PAKDD2016

Page 20: PAKDD2016 Tutorial DLIF: Introduction and Basics

Training procedure•  Avoid overfitting•  Split the data into two parts

– Training dataset• We optimize the parameters using this training dataset

– Validation dataset• We evaluate the performance of the learned NN with

this validation dataset

•  Optional: Test errors–  If you want to estimate the generalization error,

use three-way splitting of the data and use the last one, the test dataset, to measure generalization error

202016-04-19 DLIF Tutorial @ PAKDD2016

Train Validation

Available data

Page 21: PAKDD2016 Tutorial DLIF: Introduction and Basics

Extra topics implementedby most of the frameworks

•  Weights initialization– Random– Pretraining– Transfer from another trained network

•  Techniques for avoid overffiting– Dropout– Batch normalization– ResNet

•  Convolution•  Visualization

– Deconvolution212016-04-19 DLIF Tutorial @ PAKDD2016

Page 22: PAKDD2016 Tutorial DLIF: Introduction and Basics

Summary of this Part•  Neural networks are computational model

that stacks neurons, or non-linear computational units

•  The gradients of the loss w.r.t. the parameter are recursively calculated from top to bottom by backprop

•  Care must be taken to avoid overfitting by following validation procedures

222016-04-19 DLIF Tutorial @ PAKDD2016