audiotree¶
- class AudioTree(audio_data: ~jax.Array, sample_rate: int, loudness: float = None, pitch: float = None, velocity: float = None, duration: float = None, codes: ~jax.Array = None, metadata: dict = <factory>)¶
A flax.struct.dataclass for holding audio information including a waveform, sample rate, and metadata.
- The
AudioTree
class is inspired by Descript AudioTools’s AudioSignal.
- Parameters:
audio_data (jnp.ndarray) – Audio waveform data in JAX numpy tensor shaped
(Batch, Channels, Samples)
sample_rate (int) – Sample rate of
audio_data
, such as 44100 Hz.loudness (float, optional) – Loudness of the audio waveform in LUFs. Don’t set this when initializing. Instead, use
replace_loudness()
to create a new AudioTree withloudness
calculated.pitch (float, optional) – The MIDI pitch where 60 is middle C.
velocity (float, optional) – The MIDI velocity between 0 and 127.
duration (float, optional) – The duration of the audio waveform in seconds.
codes (jnp.ndarray) – The neural audio codec tokens for the audio.
metadata (dict) – Any extra metadata can be placed here.
- classmethod excerpt(audio_path: str, rng: Generator, offset: float = 0.0, duration: float = None, search_function: Callable = None, **kwargs) Self ¶
Create an AudioTree from a random section of audio from a file path.
- Parameters:
audio_path (str) – Path to audio file.
rng (np.random.Generator) – Random number generator.
offset (float, optional) – Offset in seconds to audio data.
duration (float, optional) – Duration in seconds of audio data. The audio data will be trimmed or lengthened as necessary.
search_function (Callable, optional) – A function that determines the random offset.
**kwargs – Keyword arguments passed to
AudioTree.__init__
.
- Returns:
An instance of
AudioTree
.- Return type:
- property filepath: List[str]¶
Return a list of filepaths assuming information exists in
metadata['filepath']]
- classmethod from_array(audio_data: ndarray, sample_rate: int) Self ¶
Create an AudioTree from an audio array and a sample rate.
- Parameters:
audio_data (jnp.ndarray) – Audio data shaped
(Batch, Channels, Samples)
sample_rate (int) – Sample rate of audio data, such as 44100 Hz.
- Returns:
An instance of
AudioTree
.- Return type:
- classmethod from_file(audio_path: str, sample_rate: int = None, offset: float = 0.0, duration: float = None, mono: bool = False, cpu: bool = False)¶
Create an AudioTree from an audio file path.
- Parameters:
audio_path (str) – Path to audio file.
sample_rate (int, optional) – Sample rate of audio data, such as 44100 Hz. If left as
None
, the file’s original sample rate will be used.offset (float, optional) – Offset in seconds to audio data.
duration (float, optional) – Duration in seconds of audio data. The audio data will be trimmed or extended as necessary.
mono (bool, optional) – Whether to force the audio data to be single-channel.
- Returns:
An instance of
AudioTree
.- Return type:
- replace(**updates)¶
“Returns a new object replacing the specified fields with new values.
- replace_loudness() Self ¶
Replace
loudness
property with a JAX scalar.
- resample(sample_rate: int, zeros: int = 24, rolloff: float = 0.945, output_length: int = None, full: bool = False) Self ¶
Resample the AudioTree’s
audio_data
to a new sample rate. The algorithm is a JAX port ofResampleFrac
from the PyTorch library Julius.- Parameters:
sample_rate (int) – The new sample rate of audio data, such as 44100 Hz.
zeros (int, optional) – number of zero crossing to keep in the sinc filter.
rolloff (float) – use a lowpass filter that is
rolloff * sample_rate / 2
, to ensure sufficient margin due to the imperfection of the FIR filter used. Lowering this value will reduce antialiasing, but will reduce some of the highest frequencies.output_length (None or int) – This can be set to the desired output length (last dimension). Allowed values are between 0 and
ceil(length * sample_rate / old_sr)
. WhenNone
(default) is specified, the floored output length will be used. In order to select the largest possible size, use the full argument.full (bool) – return the longest possible output from the input. This can be useful if you chain resampling operations, and want to give the
output_length
only for the last one, while passingfull=True
to all the other ones.
- Returns:
An instance of
AudioTree
.- Return type:
- classmethod salient_excerpt(audio_path: str | Path, rng: Generator, saliency_params: SaliencyParams, **kwargs) Self ¶
Create an AudioTree from a salient section of audio from a file path.
- Parameters:
audio_path (str) – Path to audio file.
rng (np.random.Generator) – Random number generator such as
np.random.default_rng(42)
.saliency_params (SaliencyParams) – Saliency parameters to use to find a salient section.
**kwargs – Keyword arguments passed to
AudioTree.__init__
.
- Returns:
An instance of
AudioTree
.- Return type:
- The
- class SaliencyParams(enabled: bool = False, num_tries: int = 8, loudness_cutoff: float = -40, search_function: str = 'SaliencyParams.search_uniform')¶
The parameters for saliency detection.
- Parameters:
enabled (bool) – Whether to enable saliency detection.
num_tries (int) – Maximum number of attempts to find a salient section of audio (default 8).
loudness_cutoff (float) – Minimum loudness cutoff in decibels for determining salient audio (default -40).
search_function (Union[Callable, str]) – The search function for determining the random offset. The default is
SaliencyParams.search_uniform
. Another option isSaliencyParams.search_bias_early
which gradually searches earlier in the file as more attempts are made.
- replace(**updates)¶
“Returns a new object replacing the specified fields with new values.