creative ai & multimodality: looking ahead

97
Creative AI & multimodality : looking ahead Roelof Pieters @graphific Imperial College London, 1 Dec 2015 roelof@graph-technologies.com http://artificialexperience.com/ http://www.csc.kth.se/~roelof/

Upload: roelof-pieters

Post on 12-Jan-2017

1.823 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

Page 1: Creative AI & multimodality: looking ahead

Creative AI & multimodality:looking aheadRoelof Pieters

@graphificImperial College London,

1 Dec 2015

[email protected]://artificialexperience.com/http://www.csc.kth.se/~roelof/

Page 2: Creative AI & multimodality: looking ahead

AICreative

Page 3: Creative AI & multimodality: looking ahead

AI

I kinda expect the audience to know AI & Machine Learning Let’s move on shall we ?

Page 4: Creative AI & multimodality: looking ahead

AI

All references to:- Arxiv or - GitXiv if the “code” or “dataset” is available

Collaborative Open Computer Sciencemore info (Medium)

Page 5: Creative AI & multimodality: looking ahead

AI > today’s focus

Page 6: Creative AI & multimodality: looking ahead

AI > today’s focus

Page 7: Creative AI & multimodality: looking ahead

“Deep learning is a set of algorithms in machine learning that attempt to learn in multiple levels, corresponding to different levels of abstraction.”

Page 8: Creative AI & multimodality: looking ahead

AI > today’s focus

use of several modes (media) to create a single artifact.

Multimodality

“Mode”Socially and culturally shaped resource for making meaning.— Gunther Kress

Page 9: Creative AI & multimodality: looking ahead

Creativity

Page 10: Creative AI & multimodality: looking ahead

Creativity

• Many definitions: philosophical, sociological, historical, practical

Page 11: Creative AI & multimodality: looking ahead

Creativity

1. Making unfamiliar combinations of familiar ideas.

2. Explore a structured conceptual space

3. (Radically) transforming ones structured conceptual space

“Exploration”

“Remix”

“The Creative Mind”— Margaret Boden

“Transformation”

Page 12: Creative AI & multimodality: looking ahead

• Skill

• Appreciation

• Imagination

• Learning

• Innovation

• Accountability,

• Subjectivity

• Intentionality.

Creativity > “Traits” software has to exhibit in order to avoid easy criticism of being “non-creative”.

(Simon Colton)

Page 13: Creative AI & multimodality: looking ahead

• Skill

• Appreciation

• Imagination

• Learning

• Innovation

• Accountability,

• Subjectivity

• Intentionality

Creativity > software traits

Page 14: Creative AI & multimodality: looking ahead

• Skill

• Appreciation

• Imagination

• Learning

• Innovation

• Accountability,

• Subjectivity

• Intentionality

Creativity > software traits

Page 15: Creative AI & multimodality: looking ahead

• Skill

• Appreciation

• Imagination

• Learning

• Innovation

• Accountability,

• Subjectivity

• Intentionality

Creativity > software traits

Page 16: Creative AI & multimodality: looking ahead

• Skill

• Appreciation

• Imagination

• Learning

• Innovation

• Accountability,

• Subjectivity

• Intentionality

Creativity > software traits

Page 17: Creative AI & multimodality: looking ahead

• Skill

• Appreciation

• Imagination

• Learning

• Innovation

• Accountability,

• Subjectivity

• Intentionality

Creativity > software traits

Page 18: Creative AI & multimodality: looking ahead

• Skill

• Appreciation

• Imagination

• Learning

• Innovation

• Accountability,

• Subjectivity

• Intentionality

Creativity > software traits

Page 19: Creative AI & multimodality: looking ahead

• Skill

• Appreciation

• Imagination

• Learning

• Innovation

• Accountability,

• Subjectivity

• Intentionality

Creativity > software traits

Page 20: Creative AI & multimodality: looking ahead

• Skill

• Appreciation

• Imagination

• Learning

• Innovation

• Accountability,

• Subjectivity

• Intentionality

Creativity > software traits

Page 21: Creative AI & multimodality: looking ahead

AICreative

Page 22: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Appropriating “standard” nets for creative use

• Reinforcement Learning: Creativity as a Game

• RNNs/LSTMs/GRUs

• Sequence-to-Sequence: Creativity as a Translation Task

• Auto-Encoders

• Attention-based Models

• Generative Adversarial Nets

Page 23: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Appropriating “standard” nets for creative use

• Reinforcement Learning: Creativity as a Game

• RNNs/LSTMs/GRUs

• Sequence-to-Sequence: Creativity as a Translation Task

• Auto-Encoders

• Attention-based Models

• Generative Adversarial Nets

Page 24: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities > Appropriating “standard” nets for creative use Deep Dream

see also: www.csc.kth.se/~roelof/deepdream/

Page 25: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities > Appropriating “standard” nets for creative use Deep Dream

see also: www.csc.kth.se/~roelof/deepdream/ codeyoutubeRoelof Pieters 2015

Page 26: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities > Appropriating “standard” nets for creative use Deep Dream

see also: www.csc.kth.se/~roelof/deepdream/

C.M.Kosemen & Roelof Pieters (2015)Gizmodo

Page 27: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities > Appropriating “standard” nets for creative use

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge , 2015. A Neural Algorithm of Artistic Style (GitXiv)

Style Net

Page 28: Creative AI & multimodality: looking ahead
Page 29: Creative AI & multimodality: looking ahead

Gene Kogan, 2015. Why is a Raven Like a Writing Desk? (vimeo)

Page 30: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Appropriating “standard” nets for creative use

• Reinforcement Learning: Creativity as a Game

• RNNs/LSTMs/GRUs

• Sequence-to-Sequence: Creativity as a Translation Task

• Auto-Encoders

• Attention-based Models

• Generative Adversarial Nets

Page 31: Creative AI & multimodality: looking ahead
Page 32: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities > Reinforcement Learning

• AMN: Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov 2015, Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning (arxiv)

• DQN: Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A., Veness, Joel, Bellemare, Marc G., Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K., Ostrovski, Georg, Petersen, Stig, Beattie, Charles, Sadik, Amir, Antonoglou, Ioannis, King, Helen, Kumaran, Dharshan, Wierstra, Daan, Legg, Shane, and Hassabis, Demis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.

Page 33: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities > Reinforcement Learning

Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, Raul Vicente, 2015 Multiagent Cooperation and Competition with Deep Reinforcement Learning (GitXiv)

(YouTube)

Page 34: Creative AI & multimodality: looking ahead

Reinforcement Learning

Ning Xie, Hirotaka Hachiya, Masashi Sugiyama, 2013 , Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation in Oriental Ink Painting (Paper, Lecture, YouTube)

Page 36: Creative AI & multimodality: looking ahead

Ning Xie, Hirotaka Hachiya, Masashi Sugiyama, 2013Artist Agent: A Reinforcement Learning Approach to Automatic Stroke Generation

in Oriental Ink Painting (Paper, Lecture, YouTube)

Page 37: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Appropriating “standard” nets for creative use

• Reinforcement Learning: Creativity as a Game

• RNNs/LSTMs/GRUs

• Sequence-to-Sequence: Creativity as a Translation Task

• Auto-Encoders

• Attention-based Models

• Generative Adversarial Nets

Page 38: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Appropriating “standard” nets for creative use

• Reinforcement Learning: Creativity as a Game

• RNNs/LSTMs/GRUs

• Sequence-to-Sequence: Creativity as a Translation Task

• Auto-Encoders

• Attention-based Models

• Generative Adversarial Nets

Page 39: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Appropriating “standard” nets for creative use

• Reinforcement Learning: Creativity as a Game

• RNNs/LSTMs/GRUs

• Sequence-to-Sequence: Creativity as a Translation Task

• Auto-encoders

• Attention-based Models

• Generative Adversarial Nets

Page 40: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Standard (“denoising”) Autoencoders

• Variational Autoencoder (VAE) / Stochastic Gradient VB

• Deep Convolutional Inverse Graphics Network

• Variational RNN (VRNN)

Vincent et al, 2010. Stacked Denoising Autoencoders: Learning Useful Representations ina Deep Network with a Local Denoising Criterion (paper) (code)

Page 41: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Standard “denoising” Autoencoders

• Variational Autoencoder (VAE) / Stochastic Gradient VB

• Deep Convolutional Inverse Graphics Network

• Variational RNN (VRNN)

• Diederik P Kingma, Max Welling, 2013. Auto-Encoding Variational Bayes (GitXiv)

Page 42: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Standard “denoising” Autoencoders

• Variational Autoencoder (VAE)

• Deep Convolutional Inverse Graphics Network (modified VAE)

• Variational RNN (VRNN)

Tejas D. Kulkarni, Will Whitney, Pushmeet Kohli, Joshua B. Tenenbaum, 2015 Deep Convolutional Inverse Graphics Network (GitXiv)

Page 43: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Standard “denoising” Autoencoders

• Variational Autoencoder (VAE)

• Deep Convolutional Inverse Graphics Network

• Variational RNN (VRNN) (VAE at every time step)

Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio, 2015 A Recurrent Latent Variable Model for Sequential Data (GitXiv)

VAEVAEVAE

Page 44: Creative AI & multimodality: looking ahead

Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio , 2015. A Recurrent Latent Variable Model for Sequential Data (GitXiv) (Audio Samples)

Page 45: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Appropriating “standard” nets for creative use

• Reinforcement Learning: Creativity as a Game

• RNNs/LSTMs/GRUs

• Sequence-to-Sequence: Creativity as a Translation Task

• Auto-Encoders

• Attention-based Models

• Generative Adversarial Nets

Page 46: Creative AI & multimodality: looking ahead

Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra, 2015DRAW: A Recurrent Neural Network For Image Generation (GitXiv)

Variational Auto-Encoder Deep Recurrent Attentive Writer (DRAW) Network

Page 48: Creative AI & multimodality: looking ahead

Creative AI > Current possibilities

• Appropriating “standard” nets for creative use

• Reinforcement Learning: Creativity as a Game

• RNNs/LSTMs/GRUs

• Sequence-to-Sequence: Creativity as a Translation Task

• Auto-Encoders

• Attention-based Models

• Generative Adverserial Nets

Page 49: Creative AI & multimodality: looking ahead

Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus, 2015. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks (GitXiv)

Page 50: Creative AI & multimodality: looking ahead

Alec Radford, Luke Metz, Soumith Chintala , 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (GitXiv)

Page 51: Creative AI & multimodality: looking ahead

Alec Radford, Luke Metz, Soumith Chintala , 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (GitXiv)

Page 52: Creative AI & multimodality: looking ahead

”turn” vector created from four averaged samples of faces looking left vs looking right.

Alec Radford, Luke Metz, Soumith Chintala , 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (GitXiv)

Page 53: Creative AI & multimodality: looking ahead

walking through the manifold

Page 54: Creative AI & multimodality: looking ahead

top: unmodified samplesbottom: same samples dropping out ”window” filters

Page 55: Creative AI & multimodality: looking ahead

Autonomy Supervision

Creativity?- unsupervised training- generator/discrimator- latent/z space- auto encoders- multimodality- query - target/class

Page 56: Creative AI & multimodality: looking ahead

Creativity?

Process Result

Page 57: Creative AI & multimodality: looking ahead

Creative AI > Needs as I see it

Creative AI as a “tool”

or “brush” to paint with

Page 58: Creative AI & multimodality: looking ahead

A system which marries the need for a creative process with the need for a creative output

• with as less human input as possible (data)

• with its own style

• with the possibility for human level supervision for rapid experimentation

Creative AI > a “brush”

Page 59: Creative AI & multimodality: looking ahead

A system which marries the need for a creative process with the need for a creative output

• with as less human input as possible ( )

• with its own style

• with the possibility for human level supervision for rapid experimentation

Creative AI > a “brush”

data

Page 60: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > data

• reuse nets as much as possible

• combining unsupervised & supervised

• multiple modalities

• plug in external knowledge bases

Page 61: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > data input

• unlabeled & labeled data

• external knowledge bases (dbpedia, wikipedia)

• one-shot learning

• zero-shot learning

Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng, 2013 Zero-Shot Learning Through Cross-Modal Transfer

a zero-shot model that can predict both seen and unseen classes

Page 62: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > data input

Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng, 2013 Zero-Shot Learning Through Cross-Modal Transfer

(slides)

Page 63: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > data input

Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng, 2013 Zero-Shot Learning Through Cross-Modal Transfer

(slides)

Page 64: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > data input

Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng, 2013 Zero-Shot Learning Through Cross-Modal Transfer

(slides)

Page 65: Creative AI & multimodality: looking ahead

A system which marries the need for a creative process with the need for a creative output

• with as less human input as possible (data)

• with its own style

• with the possibility for human level for rapid experimentation

Creative AI > a “brush”

supervision

Page 66: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > data

• “rich” latent (“z”) space

• easy user supervision over output:

• priors

• constrain network (units, layers, etc)

• guided input

• mixed input

• latent space

Page 67: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > data

• “rich” latent (“z”) space

• easy user supervision over output:

• priors

• constrain network (units, layers, etc)

• guided input

• mixed input

• latent space

Page 68: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > dataDeep Dream

Alexander Mordvintsev, Christopher Olah, Mike Tyka, 2015. Inceptionism: Going Deeper into Neural Networks

Google Research Blog

Page 69: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > dataDeep Dream

Roelof Pieters, 2015 DeepDream - Class visualization Experiment (link)

Page 70: Creative AI & multimodality: looking ahead

Roelof Pieters, 2015 DeepDream - Class visualization Experiment (link)

Page 71: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > data

• “rich” latent (“z”) space

• easy user supervision over output:

• priors

• constrain network (units, layers, etc)

• guided input

• mixed input

• latent space

Page 72: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > dataDeep Dream

Roelof Pieters, 2015 DeepDream - Overview of standard bvlc googlenet (inception) layers (link)

Constrain Layers

Page 73: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > dataDeep Dream

Roelof Pieters, 2015 Single Unit Activations (early layer) (Flickr Album)

Constrain Units

Page 74: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > data

• “rich” latent (“z”) space

• easy user supervision over output:

• priors

• constrain network (units, layers, etc)

• guided input

• mixed input

• latent space

Page 75: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > dataDeep Dream

Roelof Pieters, 2015 DeepDream Video (GitHub)

Page 76: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > data

• “rich” latent (“z”) space

• easy user supervision over output:

• priors

• constrain network (units, layers, etc)

• guided input

• mixed input

• latent space

Page 77: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > dataStyle Net

Roelof Pieters (graphific) (tweet) Roelof Pieters (graphific) (tweet)

Page 78: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > data

• “rich” latent (“z”) space

• easy user supervision over output:

• priors

• constrain network (units, layers, etc)

• guided input

• mixed input

• latent space

Page 79: Creative AI & multimodality: looking ahead

Image -> Text

“A person riding a motorcycle on a dirt road.”???

Page 80: Creative AI & multimodality: looking ahead

Image -> Text

“Two hockey players are fighting over the puck.”???

Page 81: Creative AI & multimodality: looking ahead

Image -> Text

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (arxiv) (info) (code)

Andrej Karpathy Li Fei-Fei , 2015. Deep Visual-Semantic Alignments for Generating Image Descriptions (pdf) (info) (code)

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan , 2015. Show and Tell: A Neural Image Caption Generator (arxiv)

Page 82: Creative AI & multimodality: looking ahead

Text -> Image “A stop sign is flying in blue skies.”

“A herd of elephants flying in the blue skies.”

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov, 2015. Generating Images from Captions with Attention (arxiv) (examples)

Page 83: Creative AI & multimodality: looking ahead

Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov, 2015. Generating Images from Captions with Attention (arxiv) (examples)

Text -> Image

Page 84: Creative AI & multimodality: looking ahead

Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko , 2015. Sequence to Sequence -- Video to Text (GitXiv)

Video -> Text

Page 85: Creative AI & multimodality: looking ahead

A system which marries the need for a creative process with the need for a creative output

• with as less human input as possible (data)

• with its own style

• with the possibility for human level supervision for

Creative AI > a “brush”

rapid experimentation

Page 86: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > rapid experimentation

Page 87: Creative AI & multimodality: looking ahead

Widening

Deepening

Tianqi Chen, Ian Goodfellow, Jonathon Shlens, 2015. Net2Net: Accelerating Learning via Knowledge Transfer (arxiv) / code (torch)

Reusing Nets:

Bigger Net

Page 88: Creative AI & multimodality: looking ahead

Teacher and Student net Hint training

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio, 2014. FitNets: Hints for Thin Deep Nets (arxiv)

Knowledge distillation

SVHN Error MNIST Error

Reusing Nets:

Smaller Net

Page 89: Creative AI & multimodality: looking ahead

Hashed Net

Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, Yixin Chen, 2015. Compressing Neural Networks with the Hashing Trick (arxiv)

Shrinking Nets:

Hashing

Page 90: Creative AI & multimodality: looking ahead

Song Han, Huizi Mao, William J. Dally, 2015. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (arxiv)

Shrinking Nets:

Pruning, Quantization & Huffman coding

Page 91: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > rapid experimentation

• experiments need “tooling”, specialised design software to

• try things

• explore latent spaces (z-space)

• push the AI in the right direction

• be surprised by AI

Page 92: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > rapid experimentation

human-machine collaboration

Page 93: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > rapid experimentation

(YouTube, Paper)

Page 94: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > rapid experimentation

(YouTube, Paper)

Page 95: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > rapid experimentation

(Vimeo, Paper)

Page 96: Creative AI & multimodality: looking ahead

Creative AI > a “brush” > rapid experimentation

• Advertising and marketing• Architecture• Crafts• Design: product, graphic and fashion design• Film, TV, video, radio and photography• IT, software and computer services• Publishing• Museums, galleries and libraries• Music, performing and visual arts

Page 97: Creative AI & multimodality: looking ahead

Questions?

love letters? existential dilemma’s? academic questions? gifts? find me at: www.csc.kth.se/~roelof/

[email protected]