Deep meditations: a meaningful exploration of inner self, a controlled navigation of latent space.

It’s all about the Journey.

“Deep Medtitations: When I’m stuck with a day that’s grey…”. A journey in a 512D latent space, carefully constructed to tell a particular story.

Abstract

Background

‘Generative’ Models

‘Deep’ Learning (in this context)

A sledgehammer. Big and heavy.
Screenshot of Andy Dufresne (played by Tim Robbins) using his rock hammer in The Shawshank Redemption (source: Castle Rock Entertainment. https://small-change.uq.edu.au/blog/2016/03/hammer-shaped-university)
https://www.machines4u.com.au/mag/100-years-ago-today-antique-woodworking-tools/

Control, Relinquishing control, Meaningful control

Introduction

Challenges

  1. The space is very large and high dimensional (e.g. 128, 512, 2048 etc.).
  2. It is not distributed ‘evenly’, or as one might expect or desire. I.e. if we were to sample uniformly across the space, we might end up with many more images of one type over another (e.g. in the case of our test model, flowers seem to occupy a large portion of the space, probably due to a higher representation in the training data).
  3. As a result of the uneven distribution, interpolating from latent vector z_A to z_B at a constant speed might produce images changing with a perceptually variable speed.
  4. It is very difficult to anticipate trajectories in high dimensions. E.g. interpolating from z_A to z_B might pass through points z_X and z_Y, which may be undesirable.
  5. The mass of the distribution is concentrated in the shell of a hypersphere.
  6. The latent space changes with subsequent training iterations.

Method

Background

Summary

  1. We can edit the videos in a conventional Non Linear Video Editor (NLE) such as Adobe Premiere, Apple Final Cut, Avid Media Composer or Kdenlive (my weapon of choice, discussed more in the appendix).
  2. Run a custom script to conform the edit with the corresponding numpy arrays containing the z-sequences (i.e. apply the video edit from the NLE, onto the corresponding numpy arrays).
  3. Feed the resulting conformed z-sequence into the model for final output.
“A brief history of almost everything in 5 minutes”. A journey in a 512D latent space, carefully constructed to tell a particular story.

Process

  1. Take many (e.g. hundreds or thousands of) unbiased (i.e. totally ‘random’) samples in latent space and render. This produces a video (and corresponding z-sequence) where each frame is an entirely different ‘random’ image. This gives us an idea of what the model has learnt, and how it is distributed. It also gives us an idea of how the distribution changes across subsequent training iterations, and which snapshots provide more aesthetically desirable images.
  2. Edit the video in a NLE to remove undesirable (i.e. ‘bad’) images or to bias (or de-bias) the distribution (e.g. remove some frames containing flowers if there are too many, or duplicate frames containing bacteria if there’s not enough of them etc.)
  3. Run the script to conform the edit with the original z-sequence and re-render. This produces a new video (and corresponding z-sequence) where each frame is still an entirely different ‘random’ image, but which has hopefully a desired distribution (i.e. no ‘bad’ images, and a desirable balance between different images).
  4. Repeat steps 2–3 until we happy with the distribution (one or two rounds is usually enough). Optionally apply varying amounts of noise in z to explore neighborhoods of selected frames (e.g. to look for and include more images of bacteria, with more variation).
  5. Load the final edited z-sequence (with desired distribution) and render many (e.g. tens or hundreds of) short journeys interpolating between two or three random (or hand picked) z (selected from the z-sequence). This produces tens or hundreds of short videos (and corresponding z-sequences) that contain smooth, slow interpolations between two or three keyframes where the keyframes are chosen from our preferred distribution. This gives us an idea on how the model transitions between selected images. E.g. The shortest path from a mountain to a face might have to go through buildings, which might not be desirable, but inserting a flower in between might avoid the buildings and look nicer — both aesthetically and conceptually.
  6. Repeat step 5, honing in on journeys which seem promising, optionally applying varying amounts of noise in z to explore neighborhoods of selected frames and journeys.
A one hour seamless loop. A journey in a 512D latent space, carefully constructed to tell a particular story.

Appendix

Model architecture and data

Video editing and conforming the edit

Example of an editing z-sequenes in Kdenlive

Interpolation

Fig C1. z sequence using spherical interpolation
Fig C2. z sequence using physical interpolation

Snapshots across time

An example z-sequence fed through different snapshots from training. Each frame shows the same z-vector decoded from 28 snapshots spaced 1000 training iterations apart.

Conclusion

Acknowledgments

--

--

computational ar̹͒ti͙̕s̼͒t engineer curious philomath; nature ∩ science ∩ tech ∩ ritual; spirituality ∩ arithmetic; PhD AI×expressive human-machine interaction;

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Memo Akten

computational ar̹͒ti͙̕s̼͒t engineer curious philomath; nature ∩ science ∩ tech ∩ ritual; spirituality ∩ arithmetic; PhD AI×expressive human-machine interaction;