generating sequences with deep lstms & rnns in julia

31
Generating Sequences using Deep LSTMs & RNNs Andre Pemmelaar @QuantixResearch Julia Tokyo Meetup - April 2015

Upload: andre-pemmelaar

Post on 15-Jul-2015

908 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Generating Sequences using Deep LSTMs & RNNs

Andre Pemmelaar @QuantixResearch Julia Tokyo Meetup - April 2015

About MeAndre Pemmelaar • 5-yrs Financial System Solutions • 12 Buy-Side Finance

• 7-yrs Japanese Gov’t Bond Options Market Maker (JGBs) • 5-yrs Statistical Arbitrage (Global Equities)

• Low latency & Quantitative Algorithm • Primarily use mixture of basic statistics and machine

learning (Java, F#, Python,R) • Using Julia for most of my real work (90%) since July, 2014 • Can be reached at @QuantixResearch

Why my interest in LSTMs & RNNs• In my field, finance, so much of the work involves sequence models. !• Most deep learning models are not built for use with sequences. You have

to jury rig them to make it work. !

• RNNs and LSTM are specifically designed to work with sequence data. !

• Sequence models can be combined with Reinforcement Learning to produce some very nice results (more on this and a demo later) !

• They have begun producing amazing results.

• Better initialization procedures

• Use of Rectified Linear Units for RNNs and “Memory cells” in LSTM

So what is a Recurrent Neural Network?

In a word … Feedback

What are Recurrent Neural Networks1. In their simplest form (RNNs), they are just Neural Networks with a feedback loop

2. The previous time step’s hidden layer and final outputs are fed back into the network as part of the input to the next time step’s hidden layers.

@QuantixResearch

Why Generate Sequences?!• To improve classification?!

• To create synthetic training data?!

• Practical tasks like speech synthesis?!

• To simulate situations?!

• To understand the data

This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves

This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves

This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves

This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves

This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves

Some great examplesAlex Graves!Formerly at University of Toronto!Now part of Google Deep Mind Team!!Has a great example of generating handwriting using a LSTM!

• 3 inputs: Δx, Δy, pen up/down!

• 121 output units!

• 20 two dimensional Gaussians for x,y = 40 means (linear) + 40!std. devs (exp) + 20 correlations (tanh) + 20 weights (softmax)!

• 1 sigmoid for up/down!

• 3 hidden Layers, 400 LSTM cells in each!

• 3.6M weights total!

• Trained with RMSprop, learn rate 0.0001, momentum 0.9!

• Error clipped during backward pass (lots of numerical problems)!

• Trained overnight on fast multicore CPU

handwriting demohttp://www.cs.toronto.edu/~graves/handwriting.html

Some great examplesAndrej Karpathy!Now Stanford University!!Has a great example of generating characters using a LSTM!

• 51 inputs (unique characters)!

• 2 hidden Layers, 20 LSTM cells in each!

• Trained with RMSprop, learn rate 0.0001, momentum 0.9!

• Error clipped during backward pass

Character generation demohttp://cs.stanford.edu/people/karpathy/recurrentjs/

Some great examples@hardmaru!Tokyo, Japan!!Has a great example of an RNN + Reinforcement learning using the one of the pole balancing task!!• Uses a recurrent neural network!! !• Uses genetic algorithms to train the network.!!

• The demo is doing the balancing inverted double pendulum task which I suspect is quite hard even for humans !!

• All done in Javascript which makes for some great demos!

Pole balancing demohttp://otoro.net/ml/pendulum-esp-mobile/index.html

RecurrentNN.jl

RecurrentNN.jl• My first public package (Yay!!) !

• Based on Andrej Karpathy’s implementation in recurrentjs !

• https://github.com/Andy-P/RecurrentNN.jl !

• Implements both Recurrent Neural Networks, and Long-Short-Term Networks !

• Allows one to compose arbitrary network architecture using graph.jl !

• Makes use of Rmsprop (a variant of stochastic gradient decent)

graph.jl• Has functionality to construct arbitrary expression graphs

over which the library can perform automatic differentiation !• Similar to what you may find in Theano for Python, or in

Torch. !

• Basic idea is to allow the user to compose neural networks then call backprop() and have it all work with the solver !

• https://github.com/Andy-P/RecurrentNN/src/graph.jl

type Graph backprop::Array{Function,1} doBackprop::Bool function Graph(backPropNeeded::Bool) new(Array(Function,0),backPropNeeded) end end !function sigmoid(g::Graph, m::NNMatrix) … if g.doBackprop push!(g.backprop, function () … @inbounds m.dw[i,j] += out.w[i,j] * (1. - out.w[i,j]) * out.dw[i,j] end ) end return out end

graph.jl During forward pass we build up an array of anonymous functions to calculate each of the gradients

graph.jl type Graph backprop::Array{Function,1} doBackprop::Bool function Graph(backPropNeeded::Bool) new(Array(Function,0),backPropNeeded) end end !function sigmoid(g::Graph, m::NNMatrix) … if g.doBackprop push!(g.backprop, function () … @inbounds m.dw[i,j] += out.w[i,j] * (1. - out.w[i,j]) * out.dw[i,j] end ) end return out end … # use built up graph of backprop functions # to compute backprop (set .dw fields in matirices) for i = length(g.backprop):-1:1 g.backprop[i]() end

Then we loop backwards through the array calling each of the functions to propagate the gradients backwards through the network

solver.jlfunction step(solver::Solver, model::Model, …) … for k = 1:length(modelMatices) @inbounds m = modelMatices[k] # mat ref @inbounds s = solver.stepcache[k] for i = 1:m.n for j = 1:m.d ! # rmsprop adaptive learning rate @inbounds mdwi = m.dw[i,j] @inbounds s.w[i,j] = s.w[i,j] * solver.decayrate + (1.0 - solver.decayrate) * mdwi^2 ! # gradient clip … ! # update and regularize @inbounds m.w[i,j] += - stepsize * mdwi / sqrt(s.w[i,j] + solver.smootheps) - regc * m.w[i,j] end end end … end

Now that we have calculated each of the gradients, we can call the solver to loop through and update each of the weights based on the gradients we stored during the backprop pass

RMSProp uses an adaptive learning rate for each individual parameter

solve.jlExamples of RmsProp vs other optimization algorithms http://imgur.com/a/Hqolp

example.jl • Based on I. Sutskever et.al. “Generating Text with

Recurrent Neural Networks” ICML, 2011!!

• Closely follows Andrej Karpathy’s example!!

• Read in about 1400 English Sentences from Paul Graham’s essay’s on what makes a successful start-up!!

• Learns to predict the next character from the previous character!!

• Uses perplexity for cost function!!

• Takes about 8-12hrs to get a good model (need to anneal learning rate)!!

• letter embedding = 6, hidden units = 100 (note example default is set to 5 & [20,20])

sample output -1hr• be bet sroud thir an • the to be startups dalle a boticast that co thas as tame

goudtent wist • the dase mede dosle on astasing sandiry if the the op • that the dor slous seof the pos to they wame mace thas

theming obs and secofcagires morlillers dure t • you i it stark to fon'te nallof the they coulker imn to suof imas

to ge thas int thals le withe the t

sample output -5hrs!

• you dire prefor reple take stane to of conwe that there cimh the don't than high breads them one gro

• but startups you month • work of have not end a will araing thec sow about startup maunost

matate thinkij the show that's but • you dire prefor reple take stane to of conwe that there cimh the

don't than high breads them one gro • but cashe the sowe the mont pecipest fitlid just • Argmax: it's the startups the the seem the startups the the seem the

startups the the seem the startups the

sample output -10hrs• and if will be dismiss we can all they have to be a demo

every looking • you stall the right take to grow fast, you won't back • new rectionally not a lot of that the initial single of optimizing

money you don't prosperity don't pl • when you she have to probably as one there are on the

startup ideas week • the startup need of to a company is the doesn't raise in

startups who confident is that doesn't usual

What’s not yet so great about this package?

What’s not yet so great about this package?

Garbage Collection !• Tried to keep close to the original

implementation to make regression testing easier !

• Karpathy’s version frequently uses JS’ push to build arrays of matrices !

• This is appropriate in Javascript but creates a lot of GC in Julia. !

• The likely fix is to create the arrays only once and then update them inline on each pass (version 0.2!)

Model Types !• Models need some kind of interface

that the solver can call to get the collection of matrices !

• At the moment that is implemented in collectNNMat() function !

• Could be tightened up by making this part of the initialization of the models !

Thank you!

Andre Pemmelaar @QuantixResearch Julia Tokyo Meetup - April 2015