creative ai & multimodality: looking ahead

Creative AI & multimodality:looking aheadRoelof Pieters

@graphificImperial College London,

1 Dec 2015

[email protected]://artificialexperience.com/http://www.csc.kth.se/~roelof/

mailto:[email protected]?subject=

http://www.csc.kth.se/~roelof/

AICreative

AI

I kinda expect the audience to know AI & Machine Learning Let’s move on shall we ?

AI

All references to:- Arxiv or - GitXiv if the “code” or “dataset” is available

Collaborative Open Computer Sciencemore info (Medium)

https://medium.com/@samim/gitxiv-collaborative-open-computer-science-e5fea734cd45

AI > today’s focus

“Deep learning is a set of algorithms in machine learning that attempt to learn in multiple levels, corresponding to different levels of abstraction.”

AI > today’s focus

use of several modes (media) to create a single artifact.

Multimodality

“Mode”Socially and culturally shaped resource for making meaning.— Gunther Kress

Creativity

Creativity

• Many definitions: philosophical, sociological, historical, practical

Creativity

1. Making unfamiliar combinations of familiar ideas.

2. Explore a structured conceptual space

3. (Radically) transforming ones structured conceptual space

“Exploration”

“Remix”

“The Creative Mind”— Margaret Boden

“Transformation”

• Skill

• Appreciation

• Imagination

• Learning

• Innovation

• Accountability,

• Subjectivity

• Intentionality.

Creativity > “Traits” software has to exhibit in order to avoid easy criticism of being “non-creative”.

(Simon Colton)

• Skill

• Appreciation

• Imagination

• Learning

• Innovation

• Accountability,

• Subjectivity

• Intentionality

Creativity > software traits

AICreative

Creative AI > Current possibilities

• Appropriating “standard” nets for creative use

• Reinforcement Learning: Creativity as a Game

• RNNs/LSTMs/GRUs

• Sequence-to-Sequence: Creativity as a Translation Task

• Auto-Encoders

• Attention-based Models

• Generative Adversarial Nets

Creative AI > Current possibilities > Appropriating “standard” nets for creative use Deep Dream

see also: www.csc.kth.se/~roelof/deepdream/

http://www.csc.kth.se/~roelof/deepdream/


see also: www.csc.kth.se/~roelof/deepdream/ codeyoutubeRoelof Pieters 2015


https://github.com/graphific/DeepDreamVideo

https://www.youtube.com/watch?v=oyxSerkkP4o


see also: www.csc.kth.se/~roelof/deepdream/

C.M.Kosemen & Roelof Pieters (2015)Gizmodo


http://gizmodo.com/this-human-artist-is-making-hauting-paintings-with-goog-1716597566

Creative AI > Current possibilities > Appropriating “standard” nets for creative use

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge , 2015. A Neural Algorithm of Artistic Style (GitXiv)

Style Net

http://gitxiv.com/posts/jG46ukGod8R7Rdtud/a-neural-algorithm-of-artistic-style

Gene Kogan, 2015. Why is a Raven Like a Writing Desk? (vimeo)

https://vimeo.com/139123754




• RNNs/LSTMs/GRUs


• Auto-Encoders



Creative AI > Current possibilities > Reinforcement Learning

• AMN: Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov 2015, Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning (arxiv)

• DQN: Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A., Veness, Joel, Bellemare, Marc G., Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K., Ostrovski, Georg, Petersen, Stig, Beattie, Charles, Sadik, Amir, Antonoglou, Ioannis, King, Helen, Kumaran, Dharshan, Wierstra, Daan, Legg, Shane, and Hassabis, Demis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.

http://arxiv.org/abs/1511.06342

Creative AI > Current possibilities > Reinforcement Learning

Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, Raul Vicente, 2015 Multiagent Cooperation and Competition with Deep Reinforcement Learning (GitXiv)

(YouTube)

http://gitxiv.com/posts/ZgYBWCDtWQZDZcidJ/multiagent-cooperation-and-competition-with-deep

https://www.youtube.com/watch?v=nn6_GUVDnVw&list=PLfLv_F3r0TwyaZPe50OOUx8tRf0HwdR_u&index=1

Reinforcement Learning

Ning Xie, Hirotaka Hachiya, Masashi Sugiyama, 2013 , Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting (Paper, Lecture, YouTube)

http://www.ms.k.u-tokyo.ac.jp/2013/ArtistAgent.pdf

http://techtalks.tv/talks/artist-agent-a-reinforcement-learning-approach-to-automatic-stroke-generation-in-oriental-ink-painting/57470/

https://www.youtube.com/watch?v=f4T2eiS55qA

(YouTube)


Ning Xie, Hirotaka Hachiya, Masashi Sugiyama, 2013Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation

in Oriental Ink Painting (Paper, Lecture, YouTube)

http://www.ms.k.u-tokyo.ac.jp/2013/ArtistAgent.pdf

http://techtalks.tv/talks/artist-agent-a-reinforcement-learning-approach-to-automatic-stroke-generation-in-oriental-ink-painting/57470/





• RNNs/LSTMs/GRUs


• Auto-Encoders






• RNNs/LSTMs/GRUs


• Auto-encoders




• Standard (“denoising”) Autoencoders

• Variational Autoencoder (VAE) / Stochastic Gradient VB

• Deep Convolutional Inverse Graphics Network

• Variational RNN (VRNN)

Vincent et al, 2010. Stacked Denoising Autoencoders: Learning Useful Representations ina Deep Network with a Local Denoising Criterion (paper) (code)

http://www.jmlr.org/papers/volume11/vincent10a/vincent10a.pdf

http://deeplearning.net/tutorial/SdA.html


• Standard “denoising” Autoencoders

• Variational Autoencoder (VAE) / Stochastic Gradient VB



• Diederik P Kingma, Max Welling, 2013. Auto-Encoding Variational Bayes (GitXiv)

http://gitxiv.com/posts/HLWGLLZALt8ZCLp6m/auto-encoding-variational-bayes



• Variational Autoencoder (VAE)

• Deep Convolutional Inverse Graphics Network (modified VAE)


Tejas D. Kulkarni, Will Whitney, Pushmeet Kohli, Joshua B. Tenenbaum, 2015 Deep Convolutional Inverse Graphics Network (GitXiv)

http://gitxiv.com/posts/mEcXHof7Aw4ofufmk/deep-convolutional-inverse-graphics-network



• Variational Autoencoder (VAE)


• Variational RNN (VRNN) (VAE at every time step)

Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio, 2015 A Recurrent Latent Variable Model for Sequential Data (GitXiv)

VAEVAEVAE

http://gitxiv.com/posts/G4pqRmwMGELtdHdSb/variational-recurrent-neural-network

Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio , 2015. A Recurrent Latent Variable Model for Sequential Data (GitXiv) (Audio Samples)

http://gitxiv.com/posts/G4pqRmwMGELtdHdSb/variational-recurrent-neural-network

https://github.com/kastnerkyle/vrnn-samples




• RNNs/LSTMs/GRUs


• Auto-Encoders



Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra, 2015DRAW: A Recurrent Neural Network For Image Generation (GitXiv)

Variational Auto-Encoder Deep Recurrent Attentive Writer (DRAW) Network

http://gitxiv.com/posts/caZHv4PNrGMZAh6tY/draw-deep-recurrent-attentive-writer-network

(YouTube)

https://www.youtube.com/watch?v=Zt-7MI9eKEo




• RNNs/LSTMs/GRUs


• Auto-Encoders


• Generative Adverserial Nets

Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus, 2015. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks (GitXiv)

http://gitxiv.com/posts/D4vqTMemCfR2P6d6C/eyescream-deep-generative-image-models-laplacian-pyramid-of

Alec Radford, Luke Metz, Soumith Chintala , 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (GitXiv)

http://gitxiv.com/posts/pNosr4Zrgn8uPEi9R/unsupervised-representation-learning-with-deep-convolutional

”turn” vector created from four averaged samples of faces looking left vs looking right.

Alec Radford, Luke Metz, Soumith Chintala , 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (GitXiv)

http://gitxiv.com/posts/pNosr4Zrgn8uPEi9R/unsupervised-representation-learning-with-deep-convolutional

walking through the manifold

top: unmodified samplesbottom: same samples dropping out ”window” filters

Autonomy Supervision

Creativity?- unsupervised training- generator/discrimator- latent/z space- auto encoders- multimodality- query - target/class

Creativity?

Process Result

Creative AI > Needs as I see it

Creative AI as a “tool”

or “brush” to paint with

A system which marries the need for a creative process with the need for a creative output

• with as less human input as possible (data)

• with its own style

• with the possibility for human level supervision for rapid experimentation

Creative AI > a “brush”


• with as less human input as possible ( )


• with the possibility for human level supervision for rapid experimentation


data

Creative AI > a “brush” > data

• reuse nets as much as possible

• combining unsupervised & supervised

• multiple modalities

• plug in external knowledge bases

Creative AI > a “brush” > data input

• unlabeled & labeled data

• external knowledge bases (dbpedia, wikipedia)

• one-shot learning

• zero-shot learning

Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng, 2013 Zero-Shot Learning Through Cross-Modal Transfer

a zero-shot model that can predict both seen and unseen classes

Creative AI > a “brush” > data input

Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng, 2013 Zero-Shot Learning Through Cross-Modal Transfer

(slides)

http://www.slideshare.net/roelofp/zero-shot-learning-through-crossmodal-transfer

http://www.slideshare.net/roelofp/zero-shot-learning-through-crossmodal-transfer




• with the possibility for human level for rapid experimentation


supervision


• “rich” latent (“z”) space

• easy user supervision over output:

• priors

• constrain network (units, layers, etc)

• guided input

• mixed input

• latent space

Creative AI > a “brush” > dataDeep Dream

Alexander Mordvintsev, Christopher Olah, Mike Tyka, 2015. Inceptionism: Going Deeper into Neural Networks

Google Research Blog

http://googleresearch.blogspot.se/2015/06/inceptionism-going-deeper-into-neural.html


Roelof Pieters, 2015 DeepDream - Class visualization Experiment (link)

http://www.csc.kth.se/~roelof/deepdream/visclasses.html




• priors


• guided input

• mixed input

• latent space


Roelof Pieters, 2015 DeepDream - Overview of standard bvlc googlenet (inception) layers (link)

Constrain Layers

http://www.apple.com


Roelof Pieters, 2015 Single Unit Activations (early layer) (Flickr Album)

Constrain Units

https://www.flickr.com/photos/134901469@N05/albums/72157657250972188




• priors


• guided input

• mixed input

• latent space


Roelof Pieters, 2015 DeepDream Video (GitHub)

https://github.com/graphific/DeepDreamVideo




• priors


• guided input

• mixed input

• latent space

Creative AI > a “brush” > dataStyle Net

Roelof Pieters (graphific) (tweet) Roelof Pieters (graphific) (tweet)

https://twitter.com/graphific/status/638348974152355840

https://twitter.com/graphific/status/638349474541211648




• priors


• guided input

• mixed input

• latent space

Image -> Text

“A person riding a motorcycle on a dirt road.”???

Image -> Text

“Two hockey players are fighting over the puck.”???

Image -> Text

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (arxiv) (info) (code)

Andrej Karpathy Li Fei-Fei , 2015. Deep Visual-Semantic Alignments for Generating Image Descriptions (pdf) (info) (code)

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan , 2015. Show and Tell: A Neural Image Caption Generator (arxiv)


http://kelvinxu.github.io/projects/capgen.html

https://github.com/kelvinxu/arctic-captions

http://cs.stanford.edu/people/karpathy/cvpr2015.pdf

http://cs.stanford.edu/people/karpathy/deepimagesent/

https://github.com/karpathy/neuraltalk2


Text -> Image “A stop sign is flying in blue skies.”

“A herd of elephants flying in the blue skies.”

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov, 2015. Generating Images from Captions with Attention (arxiv) (examples)


http://www.cs.toronto.edu/~emansim/cap2im.html

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov, 2015. Generating Images from Captions with Attention (arxiv) (examples)

Text -> Image


http://www.cs.toronto.edu/~emansim/cap2im.html

Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko , 2015. Sequence to Sequence -- Video to Text (GitXiv)

Video -> Text

http://gitxiv.com/posts/uSMDejbFhduzEDoLs/sequence-to-sequence-video-to-text




• with the possibility for human level supervision for


rapid experimentation

Creative AI > a “brush” > rapid experimentation

Widening

Deepening

Tianqi Chen, Ian Goodfellow, Jonathon Shlens, 2015. Net2Net: Accelerating Learning via Knowledge Transfer (arxiv) / code (torch)

Reusing Nets:

Bigger Net


https://github.com/soumith/net2net.torch

Teacher and Student net Hint training

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio, 2014. FitNets: Hints for Thin Deep Nets (arxiv)

Knowledge distillation

SVHN Error MNIST Error

Reusing Nets:

Smaller Net


Hashed Net

Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, Yixin Chen, 2015. Compressing Neural Networks with the Hashing Trick (arxiv)

Shrinking Nets:

Hashing


Song Han, Huizi Mao, William J. Dally, 2015. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (arxiv)

Shrinking Nets:

Pruning, Quantization & Huffman coding



• experiments need “tooling”, specialised design software to

• try things

• explore latent spaces (z-space)

• push the AI in the right direction

• be surprised by AI


human-machine collaboration


(YouTube, Paper)

https://www.youtube.com/watch?v=ob1y8mJ6rfk

http://meyumer.com/pdfs/SemanticEditing.pdf


(YouTube, Paper)

https://www.youtube.com/watch?v=7FQrJ6sScbk

http://www.meyumer.com/pdfs/PmAutoencoder.pdf


(Vimeo, Paper)

https://vimeo.com/33408708

http://graphics.stanford.edu/~lfyg/cds.pdf


• Advertising and marketing• Architecture• Crafts• Design: product, graphic and fashion design• Film, TV, video, radio and photography• IT, software and computer services• Publishing• Museums, galleries and libraries• Music, performing and visual arts

Questions?

love letters? existential dilemma’s? academic questions? gifts? find me at: www.csc.kth.se/~roelof/

[email protected]

http://www.csc.kth.se/~roelof/

mailto:[email protected]

creative ai & multimodality: looking ahead

Presentations & Public Speaking