Transforms¶
Transforms in audiotree.transforms
are Grain
transformations that operate on batches.
Examples include:
GPU-based volume normalization to a LUFS value in a configurable uniformly sampled range
Encoding to DAC-JAX audio tokens
Swapping stereo channels
Randomly shifting or corrupting the phase(s) of a waveform
and more…
Config¶
AudioTree is compatible with ArgBind but does not require it.
For the examples directly below, some other setup is required, so consider this to be an overview.
Before transformations, your data source might provide a single AudioTree
or a “tree” of AudioTree
:
from jax import numpy as jnp
from audiotree import AudioTree
sample_rate = 44100
data = jnp.zeros((16, 2, 441000)) # dummy placeholder shaped (B, C, T)
audio_tree = AudioTree(data, sample_rate)
batch = {"src": [audio_tree, audio_tree], "target": audio_tree}
Then from YAML you can write the following to get a 90% chance of a random volume change between -12 and 3 decibels on just the "src"
AudioTree
:
VolumeChange.prob:
0.9
VolumeChange.config:
min_db: -12
max_db: 3
VolumeChange.scope:
src:
scope: True
Split Seed¶
By setting split_seed
to False, you can apply the same augmentations to both the src
and target
.
VolumeChange.split_seed: 0
This would make the most sense if the waveforms in src
and target
have the same dimensions.
For some transformations, having differently sized tensors would cause the augmentations to be different despite sharing the same jax.random.PRNGKey
.
Output Key¶
You can specify an output key so that the result of the transformation is stored in a new sibling key:
VolumeChange.output_key: "src_modified"
VolumeChange.scope:
src:
scope: True
The above will produce a batch shaped like this:
{
"src": [audio_tree, audio_tree],
"src_modified": [audio_tree, audio_tree],
"target": audio_tree,
}
Scope¶
Depending on the scope, we can end up with multiple new output leaves. Let’s start with this batch:
batch = {
"src":
{
"GT": audio_tree
},
"target":
{
"GT": audio_tree
}
}
Then with a scope of None
(default) and this YAML:
VolumeChange.output_key: "modified"
We can produce this shape:
{
"src":
{
"GT": audio_tree,
"modified": audio_tree
},
"target":
{
"GT": audio_tree,
"modified": audio_tree
}
}
Inheritance¶
You can also make more powerful (but complex) configs and scopes:
VolumeChange.config:
max_db: 3
src:
min_db: -12
target:
min_db: -2
Note that the max_db
is inherited by both src
and target
.
This ability to inherit comes at the cost of potential name clashes between the keys of the config (e.g., "min_db"
, "max_db"
) and the keys in the AudioTree ("src"
, "target"
, etc.).
The user is expected to use a data source to create AudioTrees that avoid these clashes.
Without ArgBind¶
Above, we’ve been using ArgBind and YAML, but we can create transforms with just Python:
from audiotree.transforms import VolumeNorm
config = {
"max_db": -6,
"src": {"min_db": -20},
"target": {"min_db": -15},
}
transform = VolumeNorm(config=config, split_seed=True, prob=0.9, scope=None)
audio_tree = transform.random_map(audio_tree)
Further examples¶
For now, the tests/transforms/test_core.py is somewhat useful for thinking through the expected outputs. AudioTree is also used in DAC-JAX, which shows how to use ArgBind and data sources.