audiotree.transforms

class Choose(*transforms, c: int = 1, weights=None, prob: float = 1)

With probability prob, choose c transform(s) among transforms with optional probability weights weights.

random_map(element, rng: Generator)

Maps a single element.

class CorruptPhase(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)

Perform a phase corruption on the audio. The phase shift range is in the range [-pi * amount, pi * amount], and it’s independently selected for each frequency in the STFT.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {
        "amount": 1,
        "hop_factor": 0.5,
        "frame_length": 2048,
        "window": "hann",
    }
static get_default_config() Dict[str, Any]

Get the default configuration for the transform.

Returns:

Default configuration dictionary

class Identity(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)

A transform that returns each item without any modifications.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {}
static get_default_config() Dict[str, Any]

Get the default configuration for the transform.

Returns:

Default configuration dictionary

class InvertPhase(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)

Invert the phase of all channels of audio.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {}
static get_default_config() Dict[str, Any]

Get the default configuration for the transform.

Returns:

Default configuration dictionary

class NeuralAudioCodecEncodeTransform(encode_audio_fn: Callable[[Array], Array], num_codebooks: int)

Use a neural audio codec such as Descript Audio Codec (DAC) or EnCodec to encode audio into tokens.

Parameters:
  • encode_audio_fn (Callable) – A jitted function that takes audio shaped (B, C, T) and returns tokens shaped ((B C), K, S), where T is length in samples, S is encoded sequence length, and K is number of codebooks.

  • num_codebooks (int) – The number of codebooks in the codec.

map(audio_signal: AudioTree)

Maps a single element.

class ReduceBatchTransform(sample_rate: int)
map(audio_signal: AudioTree) AudioTree

Maps a single element.

class RescaleAudio(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)

Rescale the audio so that the largest absolute value is 1.0. If none of the values are outside the range [-1., 1.], then no transformation is applied.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {}
static get_default_config() Dict[str, Any]

Get the default configuration for the transform.

Returns:

Default configuration dictionary

class ShiftPhase(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)

Perform a phase shift on the audio. The phase shift range is in the range [-pi * amount, pi * amount].

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {
        "amount": 1,
        "hop_factor": 0.5,
        "frame_length": 2048,
        "window": "hann",
    }
static get_default_config() Dict[str, Any]

Get the default configuration for the transform.

Returns:

Default configuration dictionary

class SwapStereo(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)

Swap the channels of stereo audio.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {}
static get_default_config() Dict[str, Any]

Get the default configuration for the transform.

Returns:

Default configuration dictionary

class VolumeChange(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)

Change the volume by a uniformly randomly selected decibel value.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {
        "min_db": 0,
        "max_db": 0,
    }
static get_default_config() Dict[str, Any]

Get the default configuration for the transform.

Returns:

Default configuration dictionary

class VolumeNorm(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)

Normalize the volume to a randomly selected loudness value specified in LUFS.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {
        "min_db": 0,
        "max_db": 0,
    }
static get_default_config() Dict[str, Any]

Get the default configuration for the transform.

Returns:

Default configuration dictionary