
Transforms in audiotree.transforms are Grain transformations that operate on batches. Examples include:

  • GPU-based volume normalization to a LUFS value in a configurable uniformly sampled range

  • Encoding to DAC-JAX audio tokens

  • Swapping stereo channels

  • Randomly shifting or corrupting the phase(s) of a waveform

  • and more…


AudioTree is compatible with ArgBind but does not require it. For the examples directly below, some other setup is required, so consider this to be an overview. Before transformations, your data source might provide a single AudioTree or a “tree” of AudioTree:

from jax import numpy as jnp
from audiotree import AudioTree
sample_rate = 44100
data = jnp.zeros((16, 2, 441000))  # dummy placeholder shaped (B, C, T)
audio_tree = AudioTree(data, sample_rate)
batch = {"src": [audio_tree, audio_tree], "target": audio_tree}

Then from YAML you can write the following to get a 90% chance of a random volume change between -12 and 3 decibels on just the "src" AudioTree:

    min_db: -12
    max_db: 3
        scope: True

Split Seed

By setting split_seed to False, you can apply the same augmentations to both the src and target.

VolumeChange.split_seed: 0

This would make the most sense if the waveforms in src and target have the same dimensions. For some transformations, having differently sized tensors would cause the augmentations to be different despite sharing the same jax.random.PRNGKey.

Output Key

You can specify an output key so that the result of the transformation is stored in a new sibling key:

VolumeChange.output_key: "src_modified"
        scope: True

The above will produce a batch shaped like this:

    "src": [audio_tree, audio_tree],
    "src_modified": [audio_tree, audio_tree],
    "target": audio_tree,


Depending on the scope, we can end up with multiple new output leaves. Let’s start with this batch:

batch = {
        "GT": audio_tree
        "GT": audio_tree

Then with a scope of None (default) and this YAML:

VolumeChange.output_key: "modified"

We can produce this shape:

        "GT": audio_tree,
        "modified": audio_tree
        "GT": audio_tree,
        "modified": audio_tree


You can also make more powerful (but complex) configs and scopes:

    max_db: 3
        min_db: -12
        min_db: -2

Note that the max_db is inherited by both src and target. This ability to inherit comes at the cost of potential name clashes between the keys of the config (e.g., "min_db", "max_db") and the keys in the AudioTree ("src", "target", etc.). The user is expected to use a data source to create AudioTrees that avoid these clashes.

Without ArgBind

Above, we’ve been using ArgBind and YAML, but we can create transforms with just Python:

from audiotree.transforms import VolumeNorm

config = {
    "max_db": -6,
    "src": {"min_db": -20},
    "target": {"min_db": -15},

transform = VolumeNorm(config=config, split_seed=True, prob=0.9, scope=None)
audio_tree = transform.random_map(audio_tree)

Further examples

For now, the tests/transforms/ is somewhat useful for thinking through the expected outputs. AudioTree is also used in DAC-JAX, which shows how to use ArgBind and data sources.