librosax.layers¶

class DropStripes(axis: int, drop_width: int, stripes_num: int, deterministic: bool | None = None, parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)¶

A module that randomly drops stripes (time or frequency bands) from a spectrogram.

This module is used for data augmentation in audio tasks by randomly masking time frames or frequency bands in the spectrogram.

Variables:

axis (int) – Axis along which to drop stripes. 2 for time, 3 for frequency.
drop_width (int) – Maximum width of stripes to drop.
stripes_num (int) – Number of stripes to drop.
deterministic (bool | None) – If True, no dropping is performed. Default is None.

class LogMelFilterBank(sr: int = 22050, n_fft: int = 2048, n_mels: int = 64, fmin: float = 0.0, fmax: float = None, is_log: bool | None = True, ref: float = 1.0, amin: float = 1e-10, top_db: float | None = 80.0, freeze_parameters: bool | None = True, parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)¶

A module that converts spectrograms to (log) mel spectrograms.

This module applies mel filterbank on spectrogram and optionally converts the result to log scale.

Variables:

sr (int) – Sample rate of the audio signal. Default is 22_050.
n_fft (int) – FFT size. Default is 2048.
n_mels (int) – Number of mel filterbanks. Default is 64.
fmin (float) – Minimum frequency for mel filterbank. Default is 0.0.
fmax (float) – Maximum frequency for mel filterbank. Default is sr // 2.
is_log (bool | None) – If True, convert to log scale. Default is True.
ref (float) – Reference value for log scaling. Default is 1.0.
amin (float) – Minimum value for log scaling. Default is 1e-10.
top_db (float | None) – Maximum dynamic range in dB. Default is 80.0.
freeze_parameters (bool | None) – If True, parameters are not updated during training. Default is True.

class MFCC(sr: int = 22050, n_fft: int = 2048, n_mels: int = 64, fmin: float = 0.0, fmax: float = None, is_log: bool | None = True, ref: float = 1.0, amin: float = 1e-10, top_db: float | None = 80.0, freeze_parameters: bool | None = True, n_mfcc: int = 20, dct_type: int = 2, norm: str = 'ortho', lifter: int = 0, parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)¶

A module that computes Mel-Frequency Cepstral Coefficients (MFCCs).

This module extends LogMelFilterBank to compute MFCCs by applying a Discrete Cosine Transform (DCT) to the log-mel spectrogram.

Variables:

n_mfcc (int) – Number of MFCCs to return. Default is 20.
dct_type (int) – Type of DCT (1-4). Default is 2.
norm (str) – Normalization mode for DCT. Default is “ortho”.
lifter (int) – Liftering coefficient. 0 means no liftering. Default is 0.
is_log (bool | None) – If True, convert to log scale (must be True for MFCCs). Default is True.
LogMelFilterBank. (Inherits all attributes from)

class SpecAugmentation(time_drop_width: int, time_stripes_num: int, freq_drop_width: int, freq_stripes_num: int, deterministic: bool | None = None, parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)¶

A module that applies SpecAugment data augmentation to spectrograms.

SpecAugment is a data augmentation technique that applies both time and frequency masking to spectrograms for audio tasks.

Variables:

time_drop_width (int) – Maximum width of time stripes to drop.
time_stripes_num (int) – Number of time stripes to drop.
freq_drop_width (int) – Maximum width of frequency stripes to drop.
freq_stripes_num (int) – Number of frequency stripes to drop.
deterministic (bool | None) – If True, no augmentation is applied. Default is None.

class Spectrogram(n_fft: int = 2048, hop_length: int = None, win_length: int = None, window: str = 'hann', center: bool | None = True, pad_mode: str = 'reflect', power: float = 2.0, freeze_parameters: bool | None = True, parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)¶

A module that computes a spectrogram from a waveform using JAX.

This module transforms audio time-domain signals into time-frequency representation.

Variables:

n_fft (int) – FFT size. Default is 2048.
hop_length (int) – Step between successive frames. Default is n_fft // 4.
win_length (int) – Window size. Default is n_fft.
window (str) – Window function type. Default is "hann".
center (bool | None) – If True, the waveform is padded so that frames are centered. Default is True.
pad_mode (str) – Padding mode for the waveform. Default is "reflect".
power (float) – Exponent for the magnitude (2.0 means power spectrogram). Default is 2.0.
freeze_parameters (bool | None) – If True, parameters are not updated during training. Default is True.