This repository contains code, experiments and writing aimed at developing a perceptually-lossless audio codec that is sparse, interpretable, and easy to manipulate. The work is currently focused on "natural" sounds, and leverages knowledge about physics, acoustics, and human perception to remove perceptually-irrelevant or redundant information.

The basis of modern-day, lossy audio codecs is a model that slices audio into fixed-size and fixed-rate "frames" or "windows" of audio. This is even true of cutting-edge , "neural" audio codecs such as Descript's Audio Codec. While convenient, flexible, and generally able to represent the universe of possible sounds, it isn't easy to understand or manipulate, and is not the way humans conceptualize sound. While modern music can sometimes deviate from this model, humans in natural settings typically perceive sound as the combination of various streams or sound sources sharing some physical space.

Recent Work

Sparse Intereptable Audio Model

This small model attempts to decompose audio featuring acoustic instruments into the following components:

Some maximum number of small (16-dimensional) event vectors, representing individual audio events Times at which each event occurs.

Gamma/Gaussian Splatting for Audio

In this work, we apply a Gaussian Splatting-like approach to audio to produce a lossy, sparse, interpretable, and manipulatable representation of audio. We use a source-excitation model for each audio "atom" implemented by convolving a burst of band-limited noise with a variable-length "resonance", which is built using a number of exponentially decaying harmonics, meant to mimic the resonance of physical objects. Envelopes are built in both the time and frequency domain using gamma and/or gaussian distributions. Sixty-four atoms are randomly initialized and then fitted (3000 iterations) to a short segment of audio via a loss using multiple STFT resolutions. A sparse solution, with few active atoms is encouraged by a second, weighted loss term. Complete code for the experiment can be found on GitHub. Trained segments come from the MusicNet dataset.

Overfitting as Encoder

Assuming that our decoder is parameter-less (i.e., only the encoded representation is needed), and that the path from decoder to loss function is fully differentiable, then we can perform gradient descent to find the best encoded representation for a given piece of audio. The Gaussian/Gamma Splatting Experiment is an example of this; audio is defined as a set of events/parameters passed to a fixed synthesizer.

Playable State-Space Model From Single Audio Example

This work takes a slightly different approach to extracting a sparse audio representation, but ties in to the Overfitting as Encoder idea. We attempt to decompose a single musical audio signal into two distinct components:

A state-space model representing the resonances, or transfer function of the system as whole, including the instrument and the room in which it was played
A sparse control signal, representing the ways in which energy is injected into the system

Other Areas of Interest

simpler, linear sparse decompositions such as matching pursuit
perceptually-motivated loss functions, inspired by Mallat's scattering transform and the Auditory Image Model.
approximate convolutions for long kernels
- since many useful kernels in music are highly redundant, can we find a low-rank approximation in the frequency domain and perform the multiplication/convolution there?
- approximate convolutions via hyperdimensional computing

Getting Started

Environment File Template

AUDIO_PATH=
PORT=9999
IMPULSE_RESPONSE_PATH=
S3_BUCKET=

MusicNet

The MusicNet dataset should be downloaded and extracted to a location on your local machine.

You can then update the AUDIO_PATH environment variable to point to the musicnet/train_data directory, wherever that may be on your local machine.

Room Impulse Responses

Room impulse responses to support convolution-based reverb can be downloaded here.

You can then update the IMPULSE_RESPONSE environment variable to point at the directory on your local machine that contains the impulse response audio files.

Name		Name	Last commit message	Last commit date
Latest commit History 916 Commits
avenues		avenues
config		config
data		data
experiments		experiments
loss		loss
modules		modules
perceptual		perceptual
scratchpad		scratchpad
train		train
util		util
.gitignore		.gitignore
current.py		current.py
damped.py		damped.py
dilatedresonance.py		dilatedresonance.py
embeddings.dat		embeddings.dat
freqdomain.py		freqdomain.py
freqdomainresonance.py		freqdomainresonance.py
hierarchical.py		hierarchical.py
inference.py		inference.py
instrumentmodel.py		instrumentmodel.py
iterativedecomposition.py		iterativedecomposition.py
loss_model.dat		loss_model.dat
mp.py		mp.py
noise.py		noise.py
overfitresonance.py		overfitresonance.py
phaseinvariance.py		phaseinvariance.py
readme.md		readme.md
requirements.txt		requirements.txt
scalar.py		scalar.py
scatter.py		scatter.py
scratch3.py		scratch3.py
search.py		search.py
segment.py		segment.py
setup.py		setup.py
sparse.py		sparse.py
splat-report.html		splat-report.html
splat_report.py		splat_report.py
ssm.py		ssm.py
ssmcompression.py		ssmcompression.py
v3blogpost.py		v3blogpost.py
vector_based_event_model_report.py		vector_based_event_model_report.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Recent Work

Sparse Intereptable Audio Model

Gamma/Gaussian Splatting for Audio

Overfitting as Encoder

Playable State-Space Model From Single Audio Example

Other Areas of Interest

Getting Started

Environment File Template

MusicNet

Room Impulse Responses

About

Releases

Packages

Languages

JohnVinyard/matching-pursuit

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Recent Work

Sparse Intereptable Audio Model

Gamma/Gaussian Splatting for Audio

Overfitting as Encoder

Playable State-Space Model From Single Audio Example

Other Areas of Interest

Getting Started

Environment File Template

MusicNet

Room Impulse Responses

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages