deep dreams

Deep Dreams (with Caffe)This notebook demonstrates how to use Caffe neural network framework to produce "dream" visuals shown in the Google Research blog post.

It'll be interesting to see what imagery people are able to generate using the described technique. If you post images to Google+, Facebook, or Twitter, be sure to tag them with #deepdream so other researchers can check them out too.

DependenciesThis notebook is designed to have as few dependencies as possible:

Standard Python scientific stack: NumPy, SciPy, PIL, IPython. Those libraries can also be installed as a part of one of scientific packages for Python, such as Anaconda or Canopy.

Caffe deep learning framework (installation instructions).

Google protobuf library that is used for Caffe model manipulation.

In [33]:

# imports and basic notebook setupimport syssys.path.append("/usr/local/lib/python2.7/dist-packages")from cStringIO import StringIOimport numpy as npimport scipy.ndimage as nd#import PIL.Imageimport PIL from IPython.display import clear_output, Image, displayfrom google.protobuf import text_format

import caffe

def showarray(a, fmt='jpeg'): a = np.uint8(np.clip(a, 0, 255)) f = StringIO() PIL.Image.fromarray(a).save(f, fmt) display(Image(data=f.getvalue()))

Loading DNN modelIn this notebook we are going to use a GoogLeNet model trained on ImageNet dataset. Feel free to experiment with other models from Caffe Model Zoo. One particularly interesting model was trained in MIT Places dataset. It produced many visuals from the original blog post.

In [34]:

model_path = '../caffe/models/bvlc_googlenet/' # substitute your path herenet_fn = model_path + 'deploy.prototxt'param_fn = model_path + 'bvlc_googlenet.caffemodel'

# Patching model to be able to compute gradients.# Note that you can also manually add "force_backward: true" line to "deploy.prototxt".

model = caffe.io.caffe_pb2.NetParameter()text_format.Merge(open(net_fn).read(), model)model.force_backward = Trueopen('tmp.prototxt', 'w').write(str(model))

net = caffe.Classifier('tmp.prototxt', param_fn, mean = np.float32([104.0, 116.0, 122.0]), # ImageNet mean, training set dependent channel_swap = (2,1,0)) # the reference model has channels in BGR order instead of RGB

# a couple of utility functions for converting to and from Caffe's input image layoutdef preprocess(net, img): return np.float32(np.rollaxis(img, 2)[::-1]) - net.transformer.mean['data']def deprocess(net, img): return np.dstack((img + net.transformer.mean['data'])[::-1])

Producing dreamsMaking the "dream" images is very simple. Essentially it is just a gradient ascent process that tries to maximize the L2 norm of activations of a particular DNN layer. Here are a few simple tricks that we found useful for getting good images:

offset image by a random jitter

normalize the magnitude of gradient ascent steps

apply ascent across multiple scales (octaves)

First we implement a basic gradient ascent step function, applying the first two tricks:

In [35]:

def make_step(net, step_size=1.5, end='inception_4c/output', jitter=32, clip=True): '''Basic gradient ascent step.'''

src = net.blobs['data'] # input image is stored in Net's 'data' blob dst = net.blobs[end]

ox, oy = np.random.randint(-jitter, jitter+1, 2) src.data[0] = np.roll(np.roll(src.data[0], ox, -1), oy, -2) # apply jitter shift net.forward(end=end) dst.diff[:] = dst.data # specify the optimization objective net.backward(start=end) g = src.diff[0] # apply normalized ascent step to the input image src.data[:] += step_size/np.abs(g).mean() * g

src.data[0] = np.roll(np.roll(src.data[0], -ox, -1), -oy, -2) # unshift image if clip: bias = net.transformer.mean['data'] src.data[:] = np.clip(src.data, -bias, 255-bias)

Next we implement an ascent through different scales. We call these scales "octaves".

In [36]:

def deepdream(net, base_img, iter_n=10, octave_n=4, octave_scale=1.4, end='inception_4c/output', clip=True, **step_params): # prepare base images for all octaves octaves = [preprocess(net, base_img)] for i in xrange(octave_n-1): octaves.append(nd.zoom(octaves[-1], (1, 1.0/octave_scale,1.0/octave_scale), order=1)) src = net.blobs['data'] detail = np.zeros_like(octaves[-1]) # allocate image for network-produced details for octave, octave_base in enumerate(octaves[::-1]): h, w = octave_base.shape[-2:] if octave > 0: # upscale details from the previous octave h1, w1 = detail.shape[-2:] detail = nd.zoom(detail, (1, 1.0*h/h1,1.0*w/w1), order=1)

src.reshape(1,3,h,w) # resize the network's input image size src.data[0] = octave_base+detail for i in xrange(iter_n): make_step(net, end=end, clip=clip, **step_params) # visualization vis = deprocess(net, src.data[0]) if not clip: # adjust image contrast if clipping is disabled vis = vis*(255.0/np.percentile(vis, 99.98)) showarray(vis) print octave, i, end, vis.shape clear_output(wait=True) # extract details produced on the current octave detail = src.data[0]-octave_base # returning the resulting image return deprocess(net, src.data[0])

Now we are ready to let the neural network to reveal its dreams! Let's take a cloud image as a starting point:

In [37]:

from PIL import Imageimage = Image.open('sky1024px.jpg')img = np.float32(image)showarray(img)---------------------------------------------------------------------------IOError Traceback (most recent call last) in () 1 from PIL import Image----> 2 image = Image.open('sky1024px.jpg') 3 img = np.float32(image) 4 showarray(img)

/usr/lib/python2.7/dist-packages/PIL/Image.pyc in open(fp, mode)

IOError: cannot identify image file

Running the next code cell starts the detail generation process. You may see how new patterns start to form, iteration by iteration, octave by octave.

In [26]:

_=deepdream(net, img)

---------------------------------------------------------------------------NameError Traceback (most recent call last) in ()----> 1 _=deepdream(net, img)

NameError: name 'img' is not defined

The complexity of the details generated depends on which layer's activations we try to maximize. Higher layers produce complex features, while lower ones enhance edges and textures, giving the image an impressionist feeling:

In [ ]:

_=deepdream(net, img, end='inception_3b/5x5_reduce')

We encourage readers to experiment with layer selection to see how it affects the results. Execute the next code cell to see the list of different layers. You can modify the make_step function to make it follow some different objective, say to select a subset of activations to maximize, or to maximize multiple layers at once. There is a huge design space to explore!

In [ ]:

net.blobs.keys()

What if we feed the deepdream function its own output, after applying a little zoom to it? It turns out that this leads to an endless stream of impressions of the things that the network saw during training. Some patterns fire more often than others, suggestive of basins of attraction.

We will start the process from the same sky image as above, but after some iteration the original image becomes irrelevant; even random noise can be used as the starting point.

In [ ]:

!mkdir framesframe = imgframe_i = 0

In [ ]:

h, w = frame.shape[:2]s = 0.05 # scale coefficientfor i in xrange(100): frame = deepdream(net, frame) PIL.Image.fromarray(np.uint8(frame)).save("frames/%04d.jpg"%frame_i) frame = nd.affine_transform(frame, [1-s,1-s,1], [h*s/2,w*s/2,0], order=1) frame_i += 1

Be careful running the code above, it can bring you into very strange realms!

In [ ]:

Image(filename='frames/0029.jpg')

Let's make it brain-dead simple to launch your very own deepdreaming server (in the cloud, on an Ubuntu machine, Mac via Docker, and maybe even Windows if you try out Kitematic by Docker)!

MotivationI decided to create a self-contained Caffe+GoogLeNet+Deepdream Docker image which has everything you need to generate your own deepdream art. In order to make the Docker image very portable, it uses the CPU version of Caffe and comes bundled with the GoogLeNet model.

The compilation procedure was done on Docker Hub and for advanced users, the final image can bepulled down via:

docker pull visionai/clouddream

The docker image is 2.5GB, but it contains a precompiled version of Caffe, all of the python dependencies, as well as the pretrained GoogLeNet model.

For those of you who are new to Docker, I hope you will pick up some valuable engineering skills and tips along the way. Docker makes it very easy to bundle complex software. If you're somebody like me who likes a clean Mac OS X on a personal laptop, and do the heavy-lifting in the cloud, then read on.

InstructionsWe will be monitoring the inputs directory for source images and dumping results into the outputsdirectory. Nginx (also inside a Docker container) will be used to serve the resulting files and a simple AngularJS GUI to render the images in a webpage.

Prerequisite:

You've launched a Cloud instance using a VPS provider like DigitalOcean and this instance has Docker running. If you don't know about DigitalOcean, then you should give them a try. You can lauch a Docker-ready cloud instance in a few minutes. If you're going to set up a new DigitalOcean account, consider using my referral link: https://www.digitalocean.com/?refcode=64f90f652091.

Will need an instance with at least 1GB of RAM for processing small output images.

Let's say our cloud instance is at the address 1.2.3.4 and we set it up so that it contains our SSH key for passwordless log-in.

ssh [email protected] clone https://github.com/VISIONAI/clouddream.gitcd clouddream./start.sh

To make sure everything is working properly you can do

docker ps

You should see three running containers: deepdream-json, deepdream-compute, and deepdream-files

root@deepdream:~/clouddream# docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES21d495211abf ubuntu:14.04 "/bin/bash -c 'cd /o 7 minutes ago Up 7 minutes deepdream-json7dda17dafa5a visionai/clouddream "/bin/bash -c 'cd /o 7 minutes ago Up 7 minutes deepdream-compute010427d8c7c2 nginx "nginx -g 'daemon of 7 minutes ago Up 7 minutes 0.0.0.0:80->80/tcp, 443/tcp deepdream-files

If you want to stop the processing, just run:

./stop.sh

If you want to jump inside the container to debug something, just run:

./enter.sh

cd /opt/deepdreampython deepdream.py#This will take input.jpg, run deepdream, and write output.jpg

Feeding images into deepdreamFrom your local machine you can just scp images into the inputs directory inside deepdream as follows:

# From your local machinescp images/*jpg [email protected]:~/clouddream/deepdream/inputs/

Instructions for Mac OS X and boot2dockerFirst, install boot2docker. Now start boot2docker.

boot2docker start

My boot2docker on Mac returns something like this:

Waiting for VM and Docker daemon to start................oStarted.Writing /Users/tomasz/.boot2docker/certs/boot2docker-vm/ca.pemWriting /Users/tomasz/.boot2docker/certs/boot2docker-vm/cert.pemWriting /Users/tomasz/.boot2docker/certs/boot2docker-vm/key.pem

To connect the Docker client to the Docker daemon, please set: export DOCKER_TLS_VERIFY=1 export DOCKER_HOST=tcp://192.168.59.103:2376 export DOCKER_CERT_PATH=/Users/tomasz/.boot2docker/certs/boot2docker-vm

So I simply paste the last three lines (the ones starting with export) right into the terminal.

export DOCKER_TLS_VERIFY=1export DOCKER_HOST=tcp://192.168.59.103:2376export DOCKER_CERT_PATH=/Users/tomasz/.boot2docker/certs/boot2docker-vm

Keep this IP address in mind. For me it is 192.168.59.103.

NOTE: if running a docker ps command fails at this point and it says something about certificates, you can try:

boot2docker ssh sudo /etc/init.d/docker restart

Now proceed just like you're in a Linux environment.

cd ~/projectsgit clone https://github.com/VISIONAI/clouddream.gitcd clouddream./start.sh

You should now be able to visit http://192.168.59.103 in your browser.

Processing a YouTube videoIf don't have your own source of cool jpg images to process, or simply want to see what the output looks like on a youtube video, I've included a short youtube.sh script which does all the work for you.

If you want to start processing the "Charlie Bit My Finger" video, simply run:

./youtube.sh https://www.youtube.com/watch?v=DDZQAJuB3rI

And then visit the http://1.2.3.4:8000 URL to see the frames show up as they are being

processed one by one. The final result will be writen to http://1.2.3.4/out.mp4

Navigating the Image GalleryYou should now be able to visit http://1.2.3.4 in your browser and see the resulting images appear in a nicely formatted mobile-ready grid.

You can also show only N images by changing to the URL so something like this:

http://1.2.3.4/#/?N=20

And instead of showing random N images, you can view the latest images:

http://1.2.3.4/#/?latest

You can view the processing log here:

http://1.2.3.4/log.html

You can view the current image being processed:

http://1.2.3.4/image.jpg

You can view the current settings:

http://1.2.3.4/settings.json

Here is a screenshot of what things should look like when using the 'conv2/3x3' setting:

Additionally, you can browse some more cool images on the deepdream.vision.ai server, which I've currently configured to run deepdream through some Dali art. When you go to the page, just hit refresh to see more goodies.

Changing image size and processing layerInside deepdream/settings.json you'll find a settings file that looks like this:

{ "maxwidth" : 400, "layer" : "inception_4c/output"}

You can change maxwidth to something larger like 1000 if you want big output images for big input images, remeber that will you need more RAM memory for processing lager images. For testingmaxwidth of 200 will give you results much faster. If you change the settings and want to regenerate outputs for your input images, simply remove the contents of the outputs directory:

rm deepdream/outputs/*

Possible values for layer are as follows. They come from the tmp.prototxt file which lists the layers of the GoogLeNet network used in this demo. Note that the ReLU and Dropout layers are notvalid for deepdreaming.

Just a few days ago, the Google Research blog published a post demonstrating a unique, interesting,and perhaps even disturbing method to visualize whats going inside the layers of a Convolutional Neural Network (CNN).

Note: Before you go, I suggest taking a look at the images generated using bat-country most of them came out fantastic, especially the Jurassic Park images.

Their approach works by turning the CNN upside down, inputting an image, and gradually

tweaking the image to what the network thinks a particular object or class looks like.

The results are breathtaking to say the least. Lower levels reveal edge-like regions in the images. Intermediate layers are able to represent basic shapes and components of objects (doorknob, eye, nose, etc.). And lastly, the final layers are able to form the complete interpretation (dog, cat, tree, etc.) and often in a psychedelic, spaced out manner.

Along with their results, Google also published an excellent IPython Notebook allowing you to playaround and create some trippy images of your own.

The IPython Notebook is indeed fantastic. Its fun to play around with. And since its an IPython Notebook, its fairly easy to get started with. But I wanted to take it a step further. Make it modular. More customizable. More like a Python modules that acts and behaves like one. And of course, it has to be pip-installable (youll need to bring your own Caffe installation).

Thats why I put together bat-country, an easy to use, highly extendible, lightweight Python module for inceptionism and deep dreaming with Convolutional Neural Networks and Caffe.

Comparatively, my contributions here are honestly pretty minimal. All the real research has been done by Google Im simply taking the IPython Notebook, turning it into a Python module, while keeping in mind the importance of extensibility, such as custom step functions.

Before we dive into the rest of this post, I would like to take a second and call attention to Justin Johnsons cnn-vis, a command line tool for generating inceptionism images. His tool is quite powerful and more like what Google is (probably) using for their own research publications. If youre looking for a more advanced, complete package, definitely go take a look at cnn-vis. You also might be interested in Vision.ais co-founder Tomasz Malisiewiczsclouddream docker image to quickly get Caffe up and running.

But in the meantime, if youre interested in playing around with a simple, easy to use Python package, go grab the source from GitHub or install it via pip install bat-country

The rest of this blog post is organized as follows:

A simple example. 3 lines of code to generate your own deep dream/inceptionism images.

Requirements. Libraries and packages required to run bat-country (mostly just Caffe and its associated dependencies).

Whats going on under the hood? The anatomy of bat-country and how to extend it.

Show and tell. If there is any section of this post that you dont want to miss, its this one. I have put together a gallery of some really awesome images generated I generated over the weekend usingbat-country . The results are quite surreal, to say the least.

bat-country: an extendible, lightweight Pythonpackage for deep dreaming with Caffe and CNNsAgain, I want make it clear that the code for bat-country is heavily based on the work from theGoogle Research Team. My contributions here are mainly refactoring the code into a usable Python package, making the package easily extendible via custom preprocessing, deprocessing, stepfunctions, etc., and ensuring that the package is pip-installable. With that said, lets go ahead and getour first look at bat-country.

A simple example.As I mentioned, one of the goals of bat-country is simplicity. Provided you have already installed Caffe and bat-country on your system, it only takes 3 lines of Python code to generate a deep dream/inceptionism image:

bat-country: an extendible, lightweight Python package for deep dreaming with Caffe and Convolutional Neural Networks

Python

1234

# we can't stop here...bc = BatCountry("caffe/models/bvlc_googlenet")image = bc.dream(np.float32(Image.open("/path/to/image.jpg")))bc.cleanup()

After executing this code, you can then take the image returned by the dream method and write it to file:


Python

12

result = Image.fromarray(np.uint8(image))result.save("/path/to/output.jpg")

And thats it! You can see the view source code of demo.py here on GitHub.

Requirements.The bat-country packages requires Caffe, an open-source CNN implementation from Berkeley, to be already installed on your system. This section will detail the basic steps to get Caffe setup on your system. However, an excellent alternative is to use the Docker image provided by Tomasz of Vision.ai. Using the Docker image will get you up and running quite painlessly. But for those who would like their own install, keep reading.

Step 1: Install CaffeTake a look at the official installation instructions to get Caffe up and running. Instead of installing Caffe on your own system, I recommend spinning up an Amazon EC2 g2.2xlarge instance (so you have access to the GPU) and working from there.

Step 2: Compile Python bindings for CaffeAgain, use the official install instructions from Caffe. Creating a separate virtual environment for allthe packages from requirements.txt is a good idea, but certainly not required.

An important step to do here is update your $PYTHONPATH to include your Caffe installation directory:


Shell

1 export PYTHONPATH=/path/to/caffe/python:$PYTHONPATHOn Ubuntu, I also like to but this export in my .bashrc file so that its loaded each time I login or open up a new terminal, but thats up to you.

Step 3: Optionally install cuDNNCaffe works fine out of the box on the CPU. But if you really want to make Caffe scream, you should be using the GPU. Installing cuDNN isnt too difficult of a process, but if youve never doneit before, be prepared to spend some time working through this step.

Step 4: Set your $CAFFE_ROOTThe $CAFFE_ROOT directory is the base directory of your Caffe install:


Shell

1 export CAFEE_ROOT=/path/to/caffeHeres what my $CAFFE_ROOT looks like:


Shell

1 export CAFFE_ROOT=/home/ubuntu/libraries/caffeAgain, I would suggest putting this in your .bashrc file so its loaded each time you login.

Step 5: Download the pre-trained GoogLeNet modelYoull need a pre-trained model to generate deep dream images. Lets go ahead and use the GoogLeNet model which Google used in their blog post. The Caffe package provides a script that downloads the model for you:


Shell

12

$ cd $CAFFE_ROOT$ ./scripts/download_model_binary.py models/bvlc_googlenet/

Step 6: Install bat-countryThe bat-country package is dead simple to install. The easiest way is to use pip:


Shell

1 $ pip install bat-countryBut you can also pull down the source from GitHub if you want to do some hacking:


Shell

1234

$ git clone https://github.com/jrosebr1/bat-country.git$ cd bat-country... do some hacking ...$ python setup.py install

Whats going on under the hood and how to extend bat-countryThe vast majority of the bat-country code is from Googles IPython Notebook. My contributions are pretty minimal, just re-factoring the code to make it act and behave like a Python module andto facilitate easy modifications and customizability.

The first important method to consider is the BatCountry constructor which allows you to pass in custom CNNs like GoogLeNet, MIT Places, or other models from the Caffe Model Zoo. All you need to do is modify the base_path , deploy_path , model_path , and imagemean . The mean itself will have to be computed from the original training set. Take a look at the BatCountry constructor for more details.

The internals of BatCountry take care of patching the model to compute gradients, along with loading the network itself.

Now, lets say you wanted to override the standard gradient ascent function for maximizing the L2-norm activations for a given layer. All you would need to do is provide your custom function to the dream method. Heres a trivial example of overriding the default behavior of the gradient ascent function to use a smaller step , and larger jitter :


Python

12345678910111213141516171819202122

def custom_step(net, step_size=1.25, end="inception_4c/output",jitter=48, clip=True):src = net.blobs["data"]dst = net.blobs[end] ox, oy = np.random.randint(-jitter, jitter + 1, 2)src.data[0] = np.roll(np.roll(src.data[0], ox, -1), oy, -2) net.forward(end=end)dst.diff[:] = dst.datanet.backward(start=end)g = src.diff[0] src.data[:] += step_size / np.abs(g).mean() * gsrc.data[0] = np.roll(np.roll(src.data[0], -ox, -1), -oy, -2) if clip:bias = net.transformer.mean["data"]src.data[:] = np.clip(src.data, -bias, 255 - bias) image = bc.dream(np.float32(Image.open("image.jpg")),step_fn=custom_step)

Again, this is just a demonstration of implementing a custom step function and not meant to be anything too exciting.

You can also override the default preprocess and deprocess functions by passing in a custom preprocess_fn and deprocess_fn to dream :


Shell

12345678910

def custom_preocess(net, img):# do something interesting here...pass def custom_deprocess(net, img):# do something interesting here...pass image = bc.dream(np.float32(Image.open("image.jpg")),preprocess_fn=custom_preocess, deprocess_fn=custom_deprocess)

Finally, bat-country also supports visualizing each octave, iteration, and layer of the network:

bat-country: an extendible, lightweight Python package for deep dreaming with Caffe and

Convolutional Neural Networks

Python

123456789

bc = BatCountry(args.base_model)(image, visualizations) = bc.dream(np.float32(Image.open(args.image)),end=args.layer, visualize=True)bc.cleanup() for (k, vis) in visualizations:outputPath = "{}/{}.jpg".format(args.vis, k)result = Image.fromarray(np.uint8(vis))result.save(outputPath)

To see the full demo_vis.py script on GitHub, just click here.

Show and tell.I had a lot of fun playing with bat-country over the weekend, specifically with images fromFear and Loathing in Las Vegas, The Matrix, and Jurassic Park. I also included a few of my favorite desktop wallpapers and photos from my recent vacation on the western part of the United States for fun.

For each of the original images (top), I have generated a deep dream using the conv2/3x3 ,inception_3b/5x5_reduce , inception_4c/output layers, respectively.

The conv2/3x3 and inception_3b/5x5_reduce layers are lower level layers in the network that give more edge-like features. The inception_4c/output layer is the final output that generates trippy hallucinations of dogs, snails, birds, and fish.

deep dreams

Documents