librosax.layers¶
- class DropStripes(axis: int, drop_width: int, stripes_num: int, deterministic: bool | None = None, parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)¶
A module that randomly drops stripes (time or frequency bands) from a spectrogram.
This module is used for data augmentation in audio tasks by randomly masking time frames or frequency bands in the spectrogram.
- Variables:
axis (int) – Axis along which to drop stripes. 2 for time, 3 for frequency.
drop_width (int) – Maximum width of stripes to drop.
stripes_num (int) – Number of stripes to drop.
deterministic (bool | None) – If
True
, no dropping is performed. Default isNone
.
- class LogMelFilterBank(sr: int = 22050, n_fft: int = 2048, n_mels: int = 64, fmin: float = 0.0, fmax: float = None, is_log: bool | None = True, ref: float = 1.0, amin: float = 1e-10, top_db: float | None = 80.0, freeze_parameters: bool | None = True, parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)¶
A module that converts spectrograms to (log) mel spectrograms.
This module applies mel filterbank on spectrogram and optionally converts the result to log scale.
- Variables:
sr (int) – Sample rate of the audio signal. Default is 22_050.
n_fft (int) – FFT size. Default is 2048.
n_mels (int) – Number of mel filterbanks. Default is 64.
fmin (float) – Minimum frequency for mel filterbank. Default is 0.0.
fmax (float) – Maximum frequency for mel filterbank. Default is
sr // 2
.is_log (bool | None) – If
True
, convert to log scale. Default isTrue
.ref (float) – Reference value for log scaling. Default is 1.0.
amin (float) – Minimum value for log scaling. Default is 1e-10.
top_db (float | None) – Maximum dynamic range in dB. Default is 80.0.
freeze_parameters (bool | None) – If
True
, parameters are not updated during training. Default isTrue
.
- class MFCC(sr: int = 22050, n_fft: int = 2048, n_mels: int = 64, fmin: float = 0.0, fmax: float = None, is_log: bool | None = True, ref: float = 1.0, amin: float = 1e-10, top_db: float | None = 80.0, freeze_parameters: bool | None = True, n_mfcc: int = 20, dct_type: int = 2, norm: str = 'ortho', lifter: int = 0, parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)¶
A module that computes Mel-Frequency Cepstral Coefficients (MFCCs).
This module extends LogmelFilterBank to compute MFCCs by applying a Discrete Cosine Transform (DCT) to the log-mel spectrogram.
- Variables:
n_mfcc (int) – Number of MFCCs to return. Default is 20.
dct_type (int) – Type of DCT (1-4). Default is 2.
norm (str) – Normalization mode for DCT. Default is “ortho”.
lifter (int) – Liftering coefficient. 0 means no liftering. Default is 0.
is_log (bool | None) – If
True
, convert to log scale (must beTrue
for MFCCs). Default isTrue
.LogmelFilterBank. (Inherits all attributes from)
- class SpecAugmentation(time_drop_width: int, time_stripes_num: int, freq_drop_width: int, freq_stripes_num: int, deterministic: bool | None = None, parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)¶
A module that applies SpecAugment data augmentation to spectrograms.
SpecAugment is a data augmentation technique that applies both time and frequency masking to spectrograms for audio tasks.
- Variables:
time_drop_width (int) – Maximum width of time stripes to drop.
time_stripes_num (int) – Number of time stripes to drop.
freq_drop_width (int) – Maximum width of frequency stripes to drop.
freq_stripes_num (int) – Number of frequency stripes to drop.
deterministic (bool | None) – If
True
, no augmentation is applied. Default isNone
.
- class Spectrogram(n_fft: int = 2048, hop_length: int = None, win_length: int = None, window: str = 'hann', center: bool | None = True, pad_mode: str = 'reflect', power: float = 2.0, freeze_parameters: bool | None = True, parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)¶
A module that computes a spectrogram from a waveform using JAX.
This module transforms audio time-domain signals into time-frequency representation.
- Variables:
n_fft (int) – FFT size. Default is 2048.
hop_length (int) – Step between successive frames. Default is
n_fft // 4
.win_length (int) – Window size. Default is
n_fft
.window (str) – Window function type. Default is
"hann"
.center (bool | None) – If
True
, the waveform is padded so that frames are centered. Default isTrue
.pad_mode (str) – Padding mode for the waveform. Default is
"reflect"
.power (float) – Exponent for the magnitude (2.0 means power spectrogram). Default is 2.0.
freeze_parameters (bool | None) – If
True
, parameters are not updated during training. Default isTrue
.