audiotree.transforms¶
- class Choose(*transforms, c: int = 1, weights=None, prob: float = 1)¶
With probability
prob
, choosec
transform(s) amongtransforms
with optional probability weightsweights
.- random_map(element, rng: Generator)¶
Maps a single element.
- class CorruptPhase(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)¶
Perform a phase corruption on the audio. The phase shift range is in the range
[-pi * amount, pi * amount]
, and it’s independently selected for each frequency in the STFT.@staticmethod def get_default_config() -> Dict[str, Any]: return { "amount": 1, "hop_factor": 0.5, "frame_length": 2048, "window": "hann", }
- static get_default_config() Dict[str, Any] ¶
Get the default configuration for the transform.
- Returns:
Default configuration dictionary
- class Identity(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)¶
A transform that returns each item without any modifications.
@staticmethod def get_default_config() -> Dict[str, Any]: return {}
- static get_default_config() Dict[str, Any] ¶
Get the default configuration for the transform.
- Returns:
Default configuration dictionary
- class InvertPhase(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)¶
Invert the phase of all channels of audio.
@staticmethod def get_default_config() -> Dict[str, Any]: return {}
- static get_default_config() Dict[str, Any] ¶
Get the default configuration for the transform.
- Returns:
Default configuration dictionary
- class NeuralAudioCodecEncodeTransform(encode_audio_fn: Callable[[Array], Array], num_codebooks: int)¶
Use a neural audio codec such as Descript Audio Codec (DAC) or EnCodec to encode audio into tokens.
- Parameters:
encode_audio_fn (Callable) – A jitted function that takes audio shaped
(B, C, T)
and returns tokens shaped((B C), K, S)
, whereT
is length in samples,S
is encoded sequence length, andK
is number of codebooks.num_codebooks (int) – The number of codebooks in the codec.
- class ReduceBatchTransform(sample_rate: int)¶
- class RescaleAudio(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)¶
Rescale the audio so that the largest absolute value is 1.0. If none of the values are outside the range
[-1., 1.]
, then no transformation is applied.@staticmethod def get_default_config() -> Dict[str, Any]: return {}
- static get_default_config() Dict[str, Any] ¶
Get the default configuration for the transform.
- Returns:
Default configuration dictionary
- class ShiftPhase(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)¶
Perform a phase shift on the audio. The phase shift range is in the range
[-pi * amount, pi * amount]
.@staticmethod def get_default_config() -> Dict[str, Any]: return { "amount": 1, "hop_factor": 0.5, "frame_length": 2048, "window": "hann", }
- static get_default_config() Dict[str, Any] ¶
Get the default configuration for the transform.
- Returns:
Default configuration dictionary
- class SwapStereo(config: Dict[str, Dict[str, Any]] = None, scope: Dict[str, Dict[str, Any]] = None, output_key: str | Callable[[List[str]], str] = None)¶
Swap the channels of stereo audio.
@staticmethod def get_default_config() -> Dict[str, Any]: return {}
- static get_default_config() Dict[str, Any] ¶
Get the default configuration for the transform.
- Returns:
Default configuration dictionary
- class VolumeChange(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)¶
Change the volume by a uniformly randomly selected decibel value.
@staticmethod def get_default_config() -> Dict[str, Any]: return { "min_db": 0, "max_db": 0, }
- static get_default_config() Dict[str, Any] ¶
Get the default configuration for the transform.
- Returns:
Default configuration dictionary
- class VolumeNorm(config: Dict[str, Any] = None, split_seed: bool = True, prob: float = 1.0, scope: Dict[str, Any] = None, output_key: str | Callable[[List[str]], str] = None)¶
Normalize the volume to a randomly selected loudness value specified in LUFS.
@staticmethod def get_default_config() -> Dict[str, Any]: return { "min_db": 0, "max_db": 0, }
- static get_default_config() Dict[str, Any] ¶
Get the default configuration for the transform.
- Returns:
Default configuration dictionary