↑↑%

Latent Terrain: Dissecting Neural Audio Codecs

Type

Creative Coding Tool

Materials

Max/MSP package, portfolio

Display

Sónar+D 2025, NIME 2026

[github]

[paper]

[demos]

Jun, 2026

There's a category of audio technology that's been doing interesting things quietly in the background of the current AI frenzy: neural audio codecs. Neural codec is a module used behind a lot of the GenAI audio tools that have been making the rounds. I'm not particularly interested in typing prompts to make stuff, I'm interested in breaking them and dissecting them, and when you crack them open, what's inside turns out to be genuinely playable.

A neural audio codec is, at its core, a compression system. It squeezes audio into a compact numerical representation: the "latent code", and then reconstructs it. Unlike an MP3 or AAC file, the codec is learned from data, which means it implicitly captures something about the structure and texture of sound. The result is a kind of internal map of timbral territory. Move through that map, and you move through sound.

Latent Terrain was introduced as a Max external package nn.terrain~, developed over 12 months through our own performance practice and collaboration with four musicians. It builds a 2D navigable map from a customised audio collection, allows one to move through that map (e.g., with a stylus, a touchscreen, sensor inputs, whatever), and the tool synthesises audio in real time from where you are and how you're moving.

This connects to something that's been brewing in NIME (New Interfaces for Musical Expression) research for a few years — the idea of using the decoder half of a codec as a synthesis algorithm, with live input driving the latent codes. People call it latent space navigation, or latent space walking. You can find it in a handful of tools, but accessing it has typically required some comfort with Python and a willingness to dig through research code.

Latent Terrain aims to make this more accessible in instrument building. The Max externals load samples, plot paths through a 2D canvas in the GUI, train a small neural network, and then that network becomes your sound space. Navigate it slowly, and you get gradual timbral shifts; faster, and it fractures, jumps, and develops texture.

How this works

To build a terrain from scratch, we need a training dataset: pairs of latent trajectories and spatial trajectories.

- Latent trajectories are sequences of latent codes encoded from audio buffers, by a neural codec.

- Spatial trajectories are sequences of coordinates in a control space. For instance: Trajectory of mouse in an XY track pad, trajectory of hand gestures in an XYZ 3D space, or timestamps in a timeline playback system

A terrain is a supervised machine learning model that learns this coordinates-to-latents pairs, to produce new latent vectors given any coordinates in the control space, so that the control space can be rendered as a "map" for the latent space.

Portfolio of works

After the initial package release, we collaborated with four artist-researchers over three to five months to create an annotated portfolio of works — a research method for surfacing design possibilities that might not emerge from one person working alone. What's interesting is how differently each of them found their way into the material, and how centrally the uncertainty of the tool figures in every piece.

Keigo Yoshida built a meditative data sonification patch in which EEG readings from an OpenBCI headset drive the terrain navigation — while the sound space itself continuously retrains on incoming data, creating what he describes as an adversarial tension between the performer's desire for calm and the system's pull toward arousal. The tool doesn't cooperate. That resistance becomes the piece.

Jiatong Liu's nn/mémoire is a virtual gallery soundscape built from archival recordings of Beijing's Hutong neighbourhoods — a rapidly disappearing urban soundworld. The terrain becomes an ambient archive you move through spatially. Liu described "learning to deal with the unpredictability" as a central design question, not a problem to eliminate.

The honest bit

This is an exploratory tool, not an automation tool. Dan described the workflow as "explorative and serendipitous." Nico talked about active listening as a compositional approach — you work with what the space gives you. Keigo framed it as letting the system "trace and access past states in real-time, not to establish full control over the outcome." That's not a polite way of saying it's hard to control. It's the actual musical experience of the thing.

Worth noting that Latent Terrain doesn't replace corpus-based synthesis tools like those in FluCoMa, it extends the palette. If you're already working with fluid.corpus~ or similar, this is complementary rather than competing territory.

Get it

Package + documentation + tutorials: https://jasper-zheng.github.io/nn_terrain/

Research article behind it: https://ualresearchonline.arts.ac.uk/id/eprint/26518/