audiotree¶

class AudioTree(audio_data: ~numpy.ndarray, sample_rate: int, loudness: ~numpy.ndarray = None, pitch: ~numpy.ndarray = None, velocity: ~numpy.ndarray = None, duration: ~numpy.ndarray = None, codes: ~numpy.ndarray = None, latents: ~numpy.ndarray = None, metadata: dict = <factory>)¶

A flax.struct.dataclass for holding audio information including a waveform, sample rate, and metadata.

The AudioTree class is inspired by Descript AudioTools’s AudioSignal.

Parameters:

audio_data (jnp.ndarray) – Audio waveform data in JAX numpy tensor shaped (Batch, Channels, Samples)
sample_rate (int) – Sample rate of audio_data, such as 44100 Hz.
loudness (jnp.ndarray, optional) – Loudness of the audio waveform in LUFs. You may not need to set this when initializing. Instead, use replace_loudness() to create a new AudioTree with loudness calculated.
pitch (jnp.ndarray, optional) – The MIDI pitch where 60 is middle C. The shape is (Batch,).
velocity (jnp.ndarray, optional) – The MIDI velocity between 0 and 127. The shape is (Batch,).
duration (jnp.ndarray, optional) – The duration of the audio waveform in seconds (like a note duration). The shape is (Batch,).
codes (jnp.ndarray, optional) – The neural audio codec tokens for the audio.
latents (jnp.ndarray, optional) – The latent representations of the audio.
metadata (dict) – Any extra metadata can be placed here.

classmethod excerpt(audio_path: str, rng: Generator, offset: float = 0.0, duration: float = None, search_function: Callable = None, **kwargs) → Self¶

Create an AudioTree from a random section of audio from a file path.

Parameters:

audio_path (str) – Path to audio file.
rng (np.random.Generator) – Random number generator.
offset (float, optional) – Offset in seconds to audio data.
duration (float, optional) – Duration in seconds of audio data. The audio data will be trimmed or lengthened as necessary.
search_function (Callable, optional) – A function that determines the random offset.
**kwargs – Keyword arguments passed to AudioTree.__init__.

Returns:

An instance of AudioTree.

Return type:

AudioTree

property filepath: List[str]¶: Return a list of filepaths assuming information exists in metadata['filepath']]

classmethod from_array(audio_data: ndarray, sample_rate: int) → Self¶

Create an AudioTree from an audio array and a sample rate.

Parameters:

audio_data (np.ndarray) – Audio data shaped (Samples), (Channels, Samples), or (Batch, Channels, Samples)
sample_rate (int) – Sample rate of audio data, such as 44100 Hz.

Returns:

An instance of AudioTree.

Return type:

AudioTree

classmethod from_file(audio_path: str, sample_rate: int = None, offset: float = 0.0, duration: float = None, mono: bool = False, pad_mode: Literal['constant'] = 'constant')¶

Create an AudioTree from an audio file path.

Parameters:

audio_path (str) – Path to audio file.
sample_rate (int, optional) – Sample rate of audio data, such as 44100 Hz. If left as None, the file’s original sample rate will be used.
offset (float, optional) – Offset in seconds to audio data.
duration (float, optional) – Duration in seconds of audio data. The audio data will be trimmed or extended as necessary.
mono (bool, optional) – Whether to force the audio data to be single-channel.
pad_mode (Literal) – If duration is not None, and duration is less than the length of the audio, then pad_mode controls how the audio is right-padded. The default is “constant” (zeros). A choice of None results in no padding. Another useful choice is “wrap” to loop the audio.

Returns:

An instance of AudioTree.

Return type:

AudioTree

replace(**updates)¶: Returns a new object replacing the specified fields with new values.

replace_loudness() → Self¶: Replace loudness property with a JAX scalar.

resample(sample_rate: int, zeros: int = 24, rolloff: float = 0.945, output_length: int = None, full: bool = False) → Self¶

Resample the AudioTree’s audio_data to a new sample rate. The algorithm is a JAX port of ResampleFrac from the PyTorch library Julius.

Parameters:

sample_rate (int) – The new sample rate of audio data, such as 44100 Hz.
zeros (int, optional) – number of zero crossing to keep in the sinc filter.
rolloff (float) – use a lowpass filter that is rolloff * sample_rate / 2, to ensure sufficient margin due to the imperfection of the FIR filter used. Lowering this value will reduce antialiasing, but will reduce some of the highest frequencies.
output_length (None or int) – This can be set to the desired output length (last dimension). Allowed values are between 0 and ceil(length * sample_rate / old_sr). When None (default) is specified, the floored output length will be used. In order to select the largest possible size, use the full argument.
full (bool) – return the longest possible output from the input. This can be useful if you chain resampling operations, and want to give the output_length only for the last one, while passing full=True to all the other ones.

Returns:

An instance of AudioTree.

Return type:

AudioTree

classmethod salient_excerpt(audio_path: str | Path, rng: Generator, saliency_params: SaliencyParams, **kwargs) → Self¶

Create an AudioTree from a salient section of audio from a file path.

Parameters:

audio_path (str) – Path to audio file.
rng (np.random.Generator) – Random number generator such as np.random.default_rng(42).
saliency_params (SaliencyParams) – Saliency parameters to use to find a salient section.
**kwargs – Keyword arguments passed to AudioTree.__init__.

Returns:

An instance of AudioTree.

Return type:

AudioTree

to_mono() → Self¶

Reduce the audio_data to mono.

Returns:: An instance of AudioTree.
Return type:: AudioTree

class SaliencyParams(enabled: bool = False, num_tries: int = 8, loudness_cutoff: float = -40.0, search_function: str = 'SaliencyParams.search_uniform')¶

The parameters for saliency detection.

Parameters:

enabled (bool) – Whether to enable saliency detection.
num_tries (int) – Maximum number of attempts to find a salient section of audio (default 8).
loudness_cutoff (float) – Minimum loudness cutoff in decibels for determining salient audio (default -40).
search_function (Union[Callable, str]) – The search function for determining the random offset. The default is SaliencyParams.search_uniform. Another option is SaliencyParams.search_bias_early which gradually searches earlier in the file as more attempts are made.

replace(**updates)¶: Returns a new object replacing the specified fields with new values.