My current area of research is in sparse, interpretable and easy-to-manipulate models of audio (especially music). My ambitous goal is to define and build an audio codec that is sparse, compressive, perceptually lossless, and directly manipulatable. In this nascent codec, some types of musical composition might take directly in the codec space, rather than in a DAW, or other specialized tool. My ideal is that this codec will be:
- sparse and event based, enumerating conceptually-distinct events and their times of occurrence
- sample-rate independent, not rasterized, and not dependent on block-coding of fixed-size frames, with audio generated using something akin to neural operators, NERF, or SIREN-type models
- perceptually-losseless, taking advantage of masking, invariances over phase, the loss of phase-locking starting at 5khz, and other perceptual phenomena
- compressive at a rate at least as competitive as that of current codecs like MP3 or Ogg Vorbis

This evolving work describes a deep-learning architecture for decomposing musical audio into a sparse, event-based representation iteratively, something like neural matching pursuit.

This work posits that one key to sparsity is something like a "resonance prior" that factors audio signals into two distinct parts:
- A model of the dynamics of the instrument, room, synth, vocal tract, or combination thereof
- A sparse "control signal" that describes how energy is injected into this system by a performer
In addition to sparsity, this factorization also means that a playable instrument of sorts might be extracted from an audio signal, allowing new compositions to be created by altering the sparse control signal while holding the model constant.
In this work, I "learn" a sparse representation without an encoder model, by overfitting a number of events to a single audio example, again, using an event generator with roughly-physics-based assumptions.
This Web Audio demo uses a custom AudioWorklet
to perform physical-modeling synthesis using a spring-mass system. It was inspired by the amazing work of mi-creative. Models with physics-based assumptions baked in are a natural tool for deriving sparser audio representations and imposing the "resonance prior". This toy was a fun way to start learning about this topic!
Each "Interactive Instrument" section in the Playable State-Space Models article includes a custom AudioWorklet
that performs inference given the weights of pre-trained, single layer RNN, using them to produce real-time audio!
I'm an Austin TX-based software engineer with over 20 years of experience. In addition to deep familiarity with NumPy, SciPy, PyTorch and scientific computing in general, I am also a full-stack engineer who has worked extensively designing and building APIs and frontends on teams large and small. Feel free to reach out if you have a project that you think I'd be a good fit for!
If you find my research and would like to see more, I'd be deeply appreciative if you wanted to sponsor me!