audiotree

class AudioTree(audio_data: ~jax.Array, sample_rate: int, loudness: float = None, pitch: float = None, velocity: float = None, duration: float = None, codes: ~jax.Array = None, metadata: dict = <factory>)

A flax.struct.dataclass for holding audio information including a waveform, sample rate, and metadata.

The AudioTree class is inspired by Descript AudioTools’s AudioSignal.
Parameters:
  • audio_data (jnp.ndarray) – Audio waveform data in JAX numpy tensor shaped (Batch, Channels, Samples)

  • sample_rate (int) – Sample rate of audio_data, such as 44100 Hz.

  • loudness (float, optional) – Loudness of the audio waveform in LUFs. Don’t set this when initializing. Instead, use replace_loudness() to create a new AudioTree with loudness calculated.

  • pitch (float, optional) – The MIDI pitch where 60 is middle C.

  • velocity (float, optional) – The MIDI velocity between 0 and 127.

  • duration (float, optional) – The duration of the audio waveform in seconds.

  • codes (jnp.ndarray) – The neural audio codec tokens for the audio.

  • metadata (dict) – Any extra metadata can be placed here.

classmethod excerpt(audio_path: str, rng: Generator, offset: float = 0.0, duration: float = None, search_function: Callable = None, **kwargs) Self

Create an AudioTree from a random section of audio from a file path.

Parameters:
  • audio_path (str) – Path to audio file.

  • rng (np.random.Generator) – Random number generator.

  • offset (float, optional) – Offset in seconds to audio data.

  • duration (float, optional) – Duration in seconds of audio data. The audio data will be trimmed or lengthened as necessary.

  • search_function (Callable, optional) – A function that determines the random offset.

  • **kwargs – Keyword arguments passed to AudioTree.__init__.

Returns:

An instance of AudioTree.

Return type:

AudioTree

property filepath: List[str]

Return a list of filepaths assuming information exists in metadata['filepath']]

classmethod from_array(audio_data: ndarray, sample_rate: int) Self

Create an AudioTree from an audio array and a sample rate.

Parameters:
  • audio_data (jnp.ndarray) – Audio data shaped (Batch, Channels, Samples)

  • sample_rate (int) – Sample rate of audio data, such as 44100 Hz.

Returns:

An instance of AudioTree.

Return type:

AudioTree

classmethod from_file(audio_path: str, sample_rate: int = None, offset: float = 0.0, duration: float = None, mono: bool = False, cpu: bool = False)

Create an AudioTree from an audio file path.

Parameters:
  • audio_path (str) – Path to audio file.

  • sample_rate (int, optional) – Sample rate of audio data, such as 44100 Hz. If left as None, the file’s original sample rate will be used.

  • offset (float, optional) – Offset in seconds to audio data.

  • duration (float, optional) – Duration in seconds of audio data. The audio data will be trimmed or extended as necessary.

  • mono (bool, optional) – Whether to force the audio data to be single-channel.

Returns:

An instance of AudioTree.

Return type:

AudioTree

replace(**updates)

“Returns a new object replacing the specified fields with new values.

replace_loudness() Self

Replace loudness property with a JAX scalar.

resample(sample_rate: int, zeros: int = 24, rolloff: float = 0.945, output_length: int = None, full: bool = False) Self

Resample the AudioTree’s audio_data to a new sample rate. The algorithm is a JAX port of ResampleFrac from the PyTorch library Julius.

Parameters:
  • sample_rate (int) – The new sample rate of audio data, such as 44100 Hz.

  • zeros (int, optional) – number of zero crossing to keep in the sinc filter.

  • rolloff (float) – use a lowpass filter that is rolloff * sample_rate / 2, to ensure sufficient margin due to the imperfection of the FIR filter used. Lowering this value will reduce antialiasing, but will reduce some of the highest frequencies.

  • output_length (None or int) – This can be set to the desired output length (last dimension). Allowed values are between 0 and ceil(length * sample_rate / old_sr). When None (default) is specified, the floored output length will be used. In order to select the largest possible size, use the full argument.

  • full (bool) – return the longest possible output from the input. This can be useful if you chain resampling operations, and want to give the output_length only for the last one, while passing full=True to all the other ones.

Returns:

An instance of AudioTree.

Return type:

AudioTree

classmethod salient_excerpt(audio_path: str | Path, rng: Generator, saliency_params: SaliencyParams, **kwargs) Self

Create an AudioTree from a salient section of audio from a file path.

Parameters:
  • audio_path (str) – Path to audio file.

  • rng (np.random.Generator) – Random number generator such as np.random.default_rng(42).

  • saliency_params (SaliencyParams) – Saliency parameters to use to find a salient section.

  • **kwargs – Keyword arguments passed to AudioTree.__init__.

Returns:

An instance of AudioTree.

Return type:

AudioTree

to_mono() Self

Reduce the audio_data to mono.

Returns:

An instance of AudioTree.

Return type:

AudioTree

class SaliencyParams(enabled: bool = False, num_tries: int = 8, loudness_cutoff: float = -40, search_function: str = 'SaliencyParams.search_uniform')

The parameters for saliency detection.

Parameters:
  • enabled (bool) – Whether to enable saliency detection.

  • num_tries (int) – Maximum number of attempts to find a salient section of audio (default 8).

  • loudness_cutoff (float) – Minimum loudness cutoff in decibels for determining salient audio (default -40).

  • search_function (Union[Callable, str]) – The search function for determining the random offset. The default is SaliencyParams.search_uniform. Another option is SaliencyParams.search_bias_early which gradually searches earlier in the file as more attempts are made.

replace(**updates)

“Returns a new object replacing the specified fields with new values.