AudioTree documentation

AudioTree is a JAX library for audio data loading and augmentations. The source code is here.

There are three requirements:

pip install "git+https://github.com/DBraun/argbind.git@improve.subclasses"
pip install "git+https://github.com/DBraun/dm_aux.git@DBraun-patch-2"
pip install "git+https://github.com/DBraun/jaxloudnorm.git@feature/speed-optimize"

Then AudioTree can be installed with pip:

pip install audiotree

The namesake class AudioTree is a flax.struct.dataclass with properties for audio data, sample rate, on-demand data such as loudness, and optional metadata such as MIDI pitch, velocity, and duration.

Although AudioTree is a specific class, we loosely refer to any combination of dictionaries and lists of AudioTrees as also an AudioTree (check out the Pytree JAX docs). For example, in the code below, we consider batch to be an AudioTree.

from jax import numpy as jnp
from audiotree import AudioTree
sample_rate = 44100
data = jnp.zeros((16, 2, 441000)) # dummy placeholder shaped (B, C, T)
audio_tree = AudioTree(data, sample_rate)
batch = {"src": [audio_tree, audio_tree], "target": audio_tree}

The batch above can be used with jax.tree.map to create a new batch. That’s essentially what the Transform classes in transforms do. They perform GPU-based jax.jit-compatible augmentations on arbitrarily shaped AudioTrees. When used with ArgBind, they are also highly configurable from the command-line and YAML.

Whether you’re creating a data loader for training, validation, testing, or prompt generation, AudioTree can help.

Content

Introduction

Acknowledgments

AudioTree is inspired by AudioTools. Thank you!

Citation

@software{Braun_AudioTree_2024,
   author = {Braun, David},
   month = aug,
   title = {{AudioTree}},
   url = {https://github.com/DBraun/audiotree},
   version = {0.1.2},
   year = {2024}
}