Data Sources

Data Sources in audiotree.datasources are Grain data sources that are specially designed for audio. Grain is a new library for dataset operations in JAX with no TensorFlow dependency.

For now, there are only two types of Data Sources and one Data Set, but they fit many needs. You can take a look at DAC-JAX’s input_pipeline.py to see how they’re used.

The three AudioDataSimpleSource, AudioDataBalancedSource, and AudioDataBalancedDataset are initialized with a dictionary of sources. For example, with ArgBind, the YAML might be this (adapted from DAC):

train/AudioDataBalancedSource.extensions:
    - .wav
    - .flac
train/AudioDataBalancedSource.sources:
    speech_fb:
        - /data/daps/train
    speech_hq:
        - /data/vctk
        - /data/vocalset
        - /data/read_speech
        - /data/french_speech
    speech_uq:
        - /data/emotional_speech/
        - /data/common_voice/
        - /data/german_speech/
        - /data/russian_speech/
        - /data/spanish_speech/
    music_hq:
        - /data/musdb/train
    music_uq:
        - /data/jamendo
    general:
        - /data/audioset/data/unbalanced_train_segments/
        - /data/audioset/data/balanced_train_segments/

The folders can also be glob expressions (due to the balancing, the result is different from the above):

train/AudioDataBalancedSource.sources:
    speech:
        - /data/*speech/**/*.wav
        - /data/daps/train
        - /data/vctk
        - /data/vocalset
        - /data/common_voice
    music:
        - /data/musdb/train
        - /data/jamendo

The second thing to know is that AudioDataSimpleSource, AudioDataBalancedSource, and AudioDataBalancedDataset can be initialized with an instance of SaliencyParams. If SaliencyParams has enabled set to True, then a random section of an audio file will be selected until it meets a specified minimum loudness.