librosax.feature.melspectrogram¶

melspectrogram(*, y: Array | None = None, sr: float = 22050, S: Array | None = None, n_fft: int = 2048, hop_length: int = 512, win_length: int | None = None, window: str = 'hann', center: bool = True, pad_mode: str = 'constant', power: float = 2.0, n_mels: int = 128, fmin: float = 0.0, fmax: float | None = None, htk: bool = False, norm: str | float | None = 'slaney', dtype: dtype = <class 'jax.numpy.float32'>) → Array[source]¶

Compute a mel-scaled spectrogram.

If a time-series input y is provided, its magnitude spectrogram S is first computed, and then mapped onto the mel scale by mel_f.dot(S**power).

By default, power=2 operates on a power spectrum.

Note

For JAX JIT compilation, all arguments except y and S should be marked as static: sr, n_fft, hop_length, win_length, window, center, pad_mode, power, n_mels, fmin, fmax, htk, norm, dtype

Parameters:

y –
Audio time series. The last axis must be time.
- (T,) - single waveform
- (B, T) - batch of waveforms
sr – Audio sampling rate
S – (optional) Pre-computed spectrogram magnitude with shape (..., F, N)
n_fft – FFT window size
hop_length – Hop length for STFT
win_length – Window length
window – Window function
center – If True, pad the signal
pad_mode – Padding mode
power – Exponent for the magnitude melspectrogram. e.g., 1 for energy, 2 for power, etc. If 0, return the STFT magnitude directly.
n_mels – Number of mel bands to generate
fmin – Lowest frequency (in Hz)
fmax – Highest frequency (in Hz). If None, use fmax = sr / 2.0
htk – Use HTK formula instead of Slaney
norm – {None, “slaney”, float > 0} If “slaney”, divide the triangular mel weights by the width of the mel band (area normalization). If numeric, use norm as a mel exponent normalization. See librosax.filters.mel for details.
dtype – Data type of the output array

Returns:

Mel spectrogram with shape (..., n_mels, N).

(T,) → (n_mels, N)
(B, T) → (B, n_mels, N)