A Gentle Autoencoder Tutorial (with keras)

Benjamin Irving
30 October 2016

https://github.com/benjaminirving/mlseminars-autoencoders

References

How to run these slides yourself

Setup python environment

  • Jupyter notebook
  • Requirements: numpy, keras, theano, jupyter
  • Python 3
  • Install RISE for an interactive presentation viewer

Get and run code

# Clone the repository
git clone https://github.com/benjaminirving/mlseminars-autoencoders
cd mlseminars-autoencoders
# Run the notebook
jupyter notebook

Introduction

  • Neural networks
  • Designed to automatically learn from unlabelled data
  • Learn a compact and meaningful representation
  • Unsupervised (or self-supervised) - No annotations required
  • Linear version is equivalent to Principal Component Analysis but can learn much more sophisticated representations

Introduction

  • Train a neural network to reproduce the input image as the output
  • A hidden layer creates a code that represents the output
  • hidden layer $h=f(x)$ and reconstruction $r=g(h)$
  • Autoencoders need to be limited or regularised in some way so that they can't just copy the input
  • Forced to learn useful representations

Introduction

Introduction - Applications

  • Dimensionality reduction
  • Feature learning
  • Denoising or filling holes
  • Pretraining deep networks (not so common anymore)

The ultimate in unsupervised learning?

  • Well... still active research in unsupervised deep learning (actually self-supervised)

But...

  • Useful to understand autoencoders to relate to other deep networks
  • Integrated into some adversarial networks, recurrent neural networks, for example, with some very interesting unsupervised applications
  • Medical imaging with large unlabelled data...?

Basic autoencoder example in Keras (keras.io)

In [1]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

from keras.layers import Input, Dense
from keras.models import Model

# this is the size of our encoded representations
encoding_dim = 32  # 32 floats -> compression of factor 24.5, assuming the input is 784 floats

# this is our input placeholder
input_img = Input(shape=(784,))
# "encoded" is the encoded representation of the input
encoded = Dense(encoding_dim, activation='relu')(input_img)
# "decoded" is the lossy reconstruction of the input
decoded = Dense(784, activation='sigmoid')(encoded)

# this model maps an input to its reconstruction
autoencoder = Model(input=input_img, output=decoded)
print("autoencoder model created")
Using Theano backend.
autoencoder model created
In [2]:
# this model maps an input to its encoded representation
encoder = Model(input=input_img, output=encoded)

# create a placeholder for an encoded (32-dimensional) input
encoded_input = Input(shape=(encoding_dim,))
# retrieve the last layer of the autoencoder model
decoder_layer = autoencoder.layers[-1]
# create the decoder model
decoder = Model(input=encoded_input, output=decoder_layer(encoded_input))
In [3]:
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
In [6]:
 
from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()

# plt.imshow(x_train[2001], cmap='gray')

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))
print(x_train.shape)
print(x_test.shape)
(60000, 784)
(10000, 784)
In [7]:
autoencoder.fit(x_train, x_train,
                nb_epoch=5,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 4s - loss: 0.2762 - val_loss: 0.1879
Epoch 2/5
60000/60000 [==============================] - 4s - loss: 0.1701 - val_loss: 0.1539
Epoch 3/5
60000/60000 [==============================] - 4s - loss: 0.1447 - val_loss: 0.1342
Epoch 4/5
60000/60000 [==============================] - 4s - loss: 0.1285 - val_loss: 0.1210
Epoch 5/5
60000/60000 [==============================] - 4s - loss: 0.1178 - val_loss: 0.1125
Out[7]:
<keras.callbacks.History at 0x1128c7d30>
In [10]:
# encode and decode some digits
# note that we take them from the *test* set
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)

n = 10  # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(10):
    # display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()

Basic example

  • Learnt a representation of handwriting using only 32 features
  • Single hidden layer but representation is easily extendible to a deep neural network
In [ ]:
input_img = Input(shape=(784,))
encoded = Dense(128, activation='relu')(input_img)
encoded = Dense(64, activation='relu')(encoded)
encoded = Dense(32, activation='relu')(encoded)

decoded = Dense(64, activation='relu')(encoded)
decoded = Dense(128, activation='relu')(decoded)
decoded = Dense(784, activation='sigmoid')(decoded)

Fully connected vs convolutional layers

  • Previous method showed dense layers
  • For Images it makes sense to take advantage of the convolutional approach
  • i.e. reuse weights across the image
  • Encoder and decoder steps are now convolutional layers
  • Basically a fully convolutional network but self-supervised (if the encoding layer is also convolutional)

In [11]:
from keras.layers import Input, Dense, Convolution2D, MaxPooling2D, UpSampling2D
from keras.models import Model

input_img = Input(shape=(1, 28, 28))

x = Convolution2D(16, 3, 3, activation='relu', border_mode='same')(input_img)
x = MaxPooling2D((2, 2), border_mode='same')(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
x = MaxPooling2D((2, 2), border_mode='same')(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
encoded = MaxPooling2D((2, 2), border_mode='same')(x)

# at this point the representation is (8, 4, 4) i.e. 128-dimensional
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(encoded)

x = UpSampling2D((2, 2))(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
x = UpSampling2D((2, 2))(x)
x = Convolution2D(16, 3, 3, activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Convolution2D(1, 3, 3, activation='sigmoid', border_mode='same')(x)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
In [12]:
from keras.datasets import mnist
import numpy as np

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 1, 28, 28))
x_test = np.reshape(x_test, (len(x_test), 1, 28, 28))
In [ ]:
autoencoder.fit(x_train, x_train,
                nb_epoch=10,
                batch_size=128,
                shuffle=True,
                validation_data=(x_test, x_test))
autoencoder.save_weights('data/convauto.h5')
In [13]:
print("loading weights")
autoencoder.load_weights('data/convauto.h5')

print("predicting output")
decoded_imgs = autoencoder.predict(x_test)

n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
    
    # original
    ax = plt.subplot(2, n, i+1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()
loading weights
predicting output

Regularisation types

  • Recap: Encoder and and decoder steps can be fully connected, convolutional, recurrent...
  • Key feature of the autoencoder is some type of "regularisation"

Types:

- Undercomplete autoencoder
- Denosoising autoencoder
- Sparsity
- Variational autoencoder

Undercomplete autoencoder

  • Limit the number of hidden units

Denoising autoencoder

  • learn a more robust representation by forcing the autoencoder to learn an input from a corrupted version of itself
  • Autoencoders and inpainting
    • Stronger variant of denoising autoencoders

  • inpaint and denoising can also be the aim

Sparse autoencoder

  • Add a sparsity constraint to the hidden layer
  • Still discover interesting variation even if the number of hidden nodes is large
  • Mean activation for a single unit: $$ \rho_j = \frac{1}{m} \sum^m_{i=1} a_j(x^{(i)})$$
  • Add a penalty that limits of overall activation of the layer to a small value
  • activity_regularizer in keras

Reference: http://web.stanford.edu/class/cs294a/sae/sparseAutoencoderNotes.pdf

Variational autoencoders (VAE)

  • Probabalistic approach to autoencoders
  • Constraints on the encoded representations being learned
  • Instead of learning an arbitrary representation
  • Now learns a Latent Variable Model of the data
    • unobserved variables that represent a probabalistic model
  • Allows unsupervised learning of complex distributions
  • Generally no parameter tuning required

VAE - Latent variable model

  • Learns a set of latent variables that roughly follow unit gaussian
  • decoder acts as a complex mapping function of the latent variable

any distribution in $d$ dimensions can be generated by taking a set of d normally distributed variables, and mapping them through a complex function

  • Performed in the network using a simple parameterisation trick
    • instead of learning real values, we learning means and standard deviations in the encoding

References: http://kvfrans.com/variational-autoencoders-explained/

In [ ]:
z_mean = Dense(2)(hidden)
z_log_var = Dense(2)(hidden)


def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape=(batch_size, 2),
                              mean=0., std=epsilon_std)
    return z_mean + K.exp(z_log_var) * epsilon

# Just used to evaluate a previous expression
z = Lambda(sampling, output_shape=(2,))([z_mean, z_log_var])

VAE - Loss

  • Optimisation is performed using a modified loss function that combines two errors.
  • 1) Image error
    • Means squared error or cross entropy
    • Binary cross entropy: $H(p, q) = - \sum_x p(x) \log q(x)$
    • Measures how closely the recostructed image matches the original image
  • 2) KL Divergence
    • KL Divergence: $D_{KL}(P||Q) = \int_{-\infty}^{\infty} p(x) \log \frac{p(x)}{q(x)}dx$
    • How closely the latent variables match a unit gaussian

Why

  • Allows unsupervised learning of complex distributions
  • Generally no parameter tuning required
  • Generative model so provides a framework for generation of new examples
  • Lots of extensions that I will get into later...

Demo: Learned face manifold

In [14]:
from IPython.display import YouTubeVideo
YouTubeVideo("nHX7hCeOtFc")
Out[14]:
In [4]:
# Define model

x = Input(batch_shape=(batch_size,) + original_img_size)
conv_1 = Convolution2D(img_chns, 2, 2, border_mode='same', activation='relu')(x)
conv_2 = Convolution2D(nb_filters, 2, 2,
                       border_mode='same', activation='relu',
                       subsample=(2, 2))(conv_1)
conv_3 = Convolution2D(nb_filters, nb_conv, nb_conv,
                       border_mode='same', activation='relu',
                       subsample=(1, 1))(conv_2)
conv_4 = Convolution2D(nb_filters, nb_conv, nb_conv,
                       border_mode='same', activation='relu',
                       subsample=(1, 1))(conv_3)
flat = Flatten()(conv_4)
hidden = Dense(intermediate_dim, activation='relu')(flat)

z_mean = Dense(latent_dim)(hidden)
z_log_var = Dense(latent_dim)(hidden)


def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape=(batch_size, latent_dim),
                              mean=0., std=epsilon_std)
    return z_mean + K.exp(z_log_var) * epsilon

# note that "output_shape" isn't necessary with the TensorFlow backend
# so you could write `Lambda(sampling)([z_mean, z_log_var])`
z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var])

# we instantiate these layers separately so as to reuse them later
decoder_hid = Dense(intermediate_dim, activation='relu')
decoder_upsample = Dense(nb_filters * 14 * 14, activation='relu')

if K.image_dim_ordering() == 'th':
    output_shape = (batch_size, nb_filters, 14, 14)
else:
    output_shape = (batch_size, 14, 14, nb_filters)

decoder_reshape = Reshape(output_shape[1:])
decoder_deconv_1 = Deconvolution2D(nb_filters, nb_conv, nb_conv,
                                   output_shape,
                                   border_mode='same',
                                   subsample=(1, 1),
                                   activation='relu')
decoder_deconv_2 = Deconvolution2D(nb_filters, nb_conv, nb_conv,
                                   output_shape,
                                   border_mode='same',
                                   subsample=(1, 1),
                                   activation='relu')
if K.image_dim_ordering() == 'th':
    output_shape = (batch_size, nb_filters, 29, 29)
else:
    output_shape = (batch_size, 29, 29, nb_filters)
decoder_deconv_3_upsamp = Deconvolution2D(nb_filters, 2, 2,
                                          output_shape,
                                          border_mode='valid',
                                          subsample=(2, 2),
                                          activation='relu')
decoder_mean_squash = Convolution2D(img_chns, 2, 2,
                                    border_mode='valid',
                                    activation='sigmoid')

hid_decoded = decoder_hid(z)
up_decoded = decoder_upsample(hid_decoded)
reshape_decoded = decoder_reshape(up_decoded)
deconv_1_decoded = decoder_deconv_1(reshape_decoded)
deconv_2_decoded = decoder_deconv_2(deconv_1_decoded)
x_decoded_relu = decoder_deconv_3_upsamp(deconv_2_decoded)
x_decoded_mean_squash = decoder_mean_squash(x_decoded_relu)

def vae_loss(x, x_decoded_mean):
    # NOTE: binary_crossentropy expects a batch_size by dim
    # for x and x_decoded_mean, so we MUST flatten these!
    x = K.flatten(x)
    x_decoded_mean = K.flatten(x_decoded_mean)
    xent_loss = img_rows * img_cols * objectives.binary_crossentropy(x, x_decoded_mean)
    kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
    return xent_loss + kl_loss

vae = Model(x, x_decoded_mean_squash)
vae.compile(optimizer='rmsprop', loss=vae_loss)
# vae.summary()
In [7]:
plt.figure(figsize=(6, 6))
plt.scatter(x_test_encoded[:, 0], x_test_encoded[:, 1], c=y_test, cmap="jet")
plt.colorbar()
plt.show()
In [21]:
plt.figure(figsize=(10, 10))
plt.imshow(figure, cmap="jet")
plt.show()