Data Sources¶
Data Sources in audiotree.datasources
are Grain data sources
that are specially designed for audio. Grain is a new library for dataset operations in JAX with no TensorFlow dependency.
For now, there are only two types of Data Sources, but they fit many needs. You can take a look at DAC-JAX’s input_pipeline.py to see how they’re used.
Both AudioDataSimpleSource
and AudioDataBalancedSource
are initialized with a dictionary of sources
.
For example, with ArgBind, the YAML might be this (adapted from DAC):
train/AudioDataSimpleSource.sources:
speech_fb:
- /data/daps/train
speech_hq:
- /data/vctk
- /data/vocalset
- /data/read_speech
- /data/french_speech
speech_uq:
- /data/emotional_speech/
- /data/common_voice/
- /data/german_speech/
- /data/russian_speech/
- /data/spanish_speech/
music_hq:
- /data/musdb/train
music_uq:
- /data/jamendo
general:
- /data/audioset/data/unbalanced_train_segments/
- /data/audioset/data/balanced_train_segments/
The second thing to know is that both AudioDataSimpleSource
and AudioDataBalancedSource
can be initialized with an instance of SaliencyParams
. If SaliencyParams
has enabled
set to True
, then a random section of an audio file will be selected until it meets a specified minimum loudness.