Data Sources¶
Data Sources in audiotree.datasources
are Grain data sources
that are specially designed for audio. Grain is a new library for dataset operations in JAX with no TensorFlow dependency.
For now, there are only two types of Data Sources and one Data Set, but they fit many needs. You can take a look at DAC-JAX’s input_pipeline.py to see how they’re used.
The three AudioDataSimpleSource
, AudioDataBalancedSource
, and AudioDataBalancedDataset
are initialized with a dictionary of sources
.
For example, with ArgBind, the YAML might be this (adapted from DAC):
train/AudioDataBalancedSource.extensions:
- .wav
- .flac
train/AudioDataBalancedSource.sources:
speech_fb:
- /data/daps/train
speech_hq:
- /data/vctk
- /data/vocalset
- /data/read_speech
- /data/french_speech
speech_uq:
- /data/emotional_speech/
- /data/common_voice/
- /data/german_speech/
- /data/russian_speech/
- /data/spanish_speech/
music_hq:
- /data/musdb/train
music_uq:
- /data/jamendo
general:
- /data/audioset/data/unbalanced_train_segments/
- /data/audioset/data/balanced_train_segments/
The folders can also be glob expressions (due to the balancing, the result is different from the above):
train/AudioDataBalancedSource.sources:
speech:
- /data/*speech/**/*.wav
- /data/daps/train
- /data/vctk
- /data/vocalset
- /data/common_voice
music:
- /data/musdb/train
- /data/jamendo
The second thing to know is that AudioDataSimpleSource
, AudioDataBalancedSource
, and AudioDataBalancedDataset
can be initialized with an instance of SaliencyParams
. If SaliencyParams
has enabled
set to True
, then a random section of an audio file will be selected until it meets a specified minimum loudness.