AudioTree documentation¶

AudioTree is a JAX library for audio data loading and augmentations. The source code is here. AudioTree follows Effort-based Versioning.

There are two requirements:

pip install "git+https://github.com/DBraun/argbind.git@improve.subclasses"
pip install "git+https://github.com/boris-kuz/jaxloudnorm.git"

Then AudioTree can be installed with pip:

pip install audiotree

The namesake class AudioTree is a flax.struct.dataclass with properties for audio data, sample rate, on-demand data such as loudness, and optional data such as filepaths, MIDI pitch, velocity, and duration. An AudioTree can also store arrays for codebooks or latent embeddings.

Although AudioTree is a specific class, we loosely refer to any combination of dictionaries and lists of AudioTrees as also an AudioTree (check out the Pytree JAX docs). For example, in the code below, we consider batch to be an AudioTree.

from jax import numpy as jnp
from audiotree import AudioTree
sample_rate = 44_100
data = jnp.zeros((16, 2, sample_rate*10)) # dummy placeholder shaped (B, C, T)
audio_tree = AudioTree(data, sample_rate)
batch = {"src": [audio_tree, audio_tree], "target": audio_tree}

The batch above can be used with jax.tree.map to create a new batch. That’s essentially what the Transform classes in transforms do. They perform GPU-based jax.jit-compatible augmentations on arbitrarily shaped AudioTrees. When used with ArgBind, they are also highly configurable from the command-line, YAML and Python.

Whether you’re creating a data loader for training, validation, testing, or prompt generation, AudioTree can help.

Content¶

Introduction

AudioTree API

Acknowledgments¶

AudioTree is inspired by AudioTools. Thank you!

Citation¶

@software{Braun_AudioTree_2025,
   author = {Braun, David},
   month = mar,
   title = {{AudioTree}},
   url = {https://github.com/DBraun/audiotree},
   version = {0.2.0},
   year = {2025}
}