audiotree.transforms¶

class Choose(*transforms, c: int = 1, weights=None, prob: float = 1)¶

With probability prob, choose c transform(s) among transforms with optional probability weights weights.

random_map(element, rng: Generator)¶: Maps a single element.

class CorruptPhase(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)¶

Perform a phase corruption on the audio. The phase shift range is in the range [-pi * amount, pi * amount], and it’s independently selected for each frequency in the STFT.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {
        "amount": 1,
        "hop_factor": 0.5,
        "frame_length": 2048,
        "window": "hann",
    }

static get_default_config() → Dict[str, Any]¶

Get the default configuration for the transform.

Returns:: Default configuration dictionary

class Identity(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)¶

A transform that returns each item without any modifications.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {}

static get_default_config() → Dict[str, Any]¶

Get the default configuration for the transform.

Returns:: Default configuration dictionary

class InvertPhase(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)¶

Invert the phase of all channels of audio.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {}

static get_default_config() → Dict[str, Any]¶

Get the default configuration for the transform.

Returns:: Default configuration dictionary

class NeuralAudioCodecEncodeTransform(encode_audio_fn: Callable[[Array], Array], num_codebooks: int)¶

Use a neural audio codec such as Descript Audio Codec (DAC) or EnCodec to encode audio into tokens.

Parameters:

encode_audio_fn (Callable) – A jitted function that takes audio shaped (B, C, T) and returns tokens shaped ((B C), K, S), where T is length in samples, S is encoded sequence length, and K is number of codebooks.
num_codebooks (int) – The number of codebooks in the codec.

map(audio_signal: AudioTree)¶: Maps a single element.

class ReduceBatchTransform(sample_rate: int)¶

map(audio_signal: AudioTree) → AudioTree¶: Maps a single element.

class RescaleAudio(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)¶

Rescale the audio so that the largest absolute value is 1.0. If none of the values are outside the range [-1., 1.], then no transformation is applied.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {}

static get_default_config() → Dict[str, Any]¶

Get the default configuration for the transform.

Returns:: Default configuration dictionary

class ShiftPhase(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)¶

Perform a phase shift on the audio. The phase shift range is in the range [-pi * amount, pi * amount].

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {
        "amount": 1,
        "hop_factor": 0.5,
        "frame_length": 2048,
        "window": "hann",
    }

static get_default_config() → Dict[str, Any]¶

Get the default configuration for the transform.

Returns:: Default configuration dictionary

class SwapStereo(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)¶

Swap the channels of stereo audio.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {}

static get_default_config() → Dict[str, Any]¶

Get the default configuration for the transform.

Returns:: Default configuration dictionary

class VolumeChange(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)¶

Change the volume by a uniformly randomly selected decibel value.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {
        "min_db": 0,
        "max_db": 0,
    }

static get_default_config() → Dict[str, Any]¶

Get the default configuration for the transform.

Returns:: Default configuration dictionary

class VolumeNorm(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)¶

Normalize the volume to a randomly selected loudness value specified in LUFS.

@staticmethod
def get_default_config() -> Dict[str, Any]:
    return {
        "min_db": 0,
        "max_db": 0,
    }

static get_default_config() → Dict[str, Any]¶

Get the default configuration for the transform.

Returns:: Default configuration dictionary