Download - Towards Chainer v1.5
Towards Chainer v1.5
10/14 Chainer meetup @ PFI/PFN
Seiya Tokui (Preferred Networks)
Development history
l 6/12: v1.0
– Basics of Variable/Function, FunctionSet & Optimizer, CUDA support
l 7/7: v1.1
– Caffe referece model, type checking (forward/backward), Py3 support
l 8/19: v1.2
– Many functions are added, collect_̲parameters is deprecated, remove type checking on backward
l 9/2: v1.3
– CuPy, functions module is reorganized
2
CuPy
l CUDA array implementation with NumPy-‐‑‒subset API
l Custom elementwise and reduction kernels are still supported (with broadcasting)
l No dependence on PyCUDA and scikits.cuda
– Cf.) sudden renaming of scikit-‐‑‒cuda to scikits.cuda
l NumPy API coverage is still incomplete
l Most operations are not supported yet on the Function/Variable level
3
Development history
l 6/12: v1.0
– Basics of Variable/Function, FunctionSet & Optimizer, CUDA support
l 7/7: v1.1
– Caffe referece model, type checking (forward/backward), Py3 support
l 8/19: v1.2
– Many functions are added, collect_̲parameters is deprecated, remove type checking on backward
l 9/2: v1.3
– CuPy, functions module is reorganized
l 10/28: v1.4 (planned, delayed)
– Some functions are added?
4
The cause of the delay
l New model structure (#363)
l Iʼ’ve been working on this since the release of v1.3
l It is unexpectedly difficult to make the design
– Still in designing phase
– Iʼ’m planning to release this feature in v1.5
5
Objective
l Replacement of FunctionSet/Optimizer
l Goals:
– Provide a solid way of sharing and reusing (sub)network definitions
– Avoid the “to_̲cpu/to_̲gpu trap” between FunctionSet and Optimizer
– Portable save/load
– Make all functions pure for more flexibility and reusability
6
Solution (current idea)
l Hierarchy of network definitions
l Example:
– An autoencoder uses an encoder network and a decoder network
– Each of the networks might be MLPs, ConvNets, etc.
– MLP consists of several fully-‐‑‒connected layers
– Each fully-‐‑‒connected layer defines a simple operation on the input variable
l Call each component a chain
l Modeling in Chainer will be linking several chains into one big chain
7
Terminology
l Link
– A minimal component of the chain (e.g. Linear, Convolution2D, etc.)
– “Parameterized function” in the previous versions
– It combines parameter variables with input variables to compute the output variables
l Chain, ChainList
– Composition of child chains (including links)
– Chain manages the child chains by a dictionary, while ChainList does by a list
8
Schematic of Link/Chain
9
Linear Linear Linear
Link Chain Function
layer1 layer2 layer3 predictor
x
t
loss
Example of a classifier with a multi-‐‑‒layer perceptron
MLP
Classifier
Schematic of Link/Chain
Example of Variational AutoEncoder
10
LinearLinear
LinearLinear Linearx
kld
nll
loss +
encoder decoder
z
VariationalAutoEncoder
MLP MLP(?)
Define by Run
l Note that these diagrams do not mean the computational graph must be fixed at the defnition of chains
– The graph is dynamically constructed on the forward computation (define-‐‑‒by-‐‑‒run)
l A chain might implements multiple methods that constructs different graphs
11
Example (gist: https://goo.gl/JKQgSy)
12
Example (gist: https://goo.gl/JKQgSy)
13
Example (gist: https://goo.gl/JKQgSy)
14
User can freely design the predictor chain.
Example (gist: https://goo.gl/JKQgSy)
15
Example (gist: https://goo.gl/JKQgSy)
16User can freely design the encoder/decoder chains.
Planned features of Link/Chain/ChainList
l The hierarchy is directly mapped to HDF5 format on serialization
– Only the parameters and auxiliary variables (computed by learning) are saved
l Helper method to traverse the hierarchy
– Iterate all subchains in the hierarchy
– Iterate all parameter variables in the hierarchy
17
New Optimizer
l Optimizer is also updated
l Optimizer will be aware of the target chain
– Track the migration of the target chain between CPUs and GPUs
l Optimizer is also serializable (in HDF5 format)
18
Parallel work: introduction of Cython
l CuPy drawback: the CPU side manipulation is slow
l No single huge bottleneck: the cause of slow down is already scattered
l The easiest point to fix: ctypes
– ctypes is verrrrrrrrrrrry slow
– Even extracting the current device consumes non-‐‑‒negligible running time
– @okuta san is trying to make Cython replace it
l Major impact on the Chainer package
– Low level interface will change
– setup.py is drastically updated (since Cython extension requires Cython to build, while we have to make the package installable to environments into which Cython is not installed yet)
19
Future work
l Lazy computation
– See VAE example: it computes all intermediate variables in the _̲_̲call_̲_̲ operator, while there might be a usage that a user only wants some of them
– Chainer currently computes eagerly, which causes unneeded computations
– Avoiding unneeded computations is one of the easiest graph optimization
– More in general, I believe that the future is in fusion of symbolic and dynamic paradigms
l Symbolic optimization of computations on Variables (loop fusion, etc.)
l Variable tags (or annotations)
– Cf.) Blocks
l Learning process abstraction, Data loading abstraction, etc.
20