Data Sources

Data Sources in audiotree.datasources are Grain data sources that are specially designed for audio. Grain is a new library for dataset operations in JAX with no TensorFlow dependency.

For now, there are only two types of Data Sources, but they fit many needs. You can take a look at DAC-JAX’s input_pipeline.py to see how they’re used.

Both AudioDataSimpleSource and AudioDataBalancedSource are initialized with a dictionary of sources. For example, with ArgBind, the YAML might be this (adapted from DAC):

train/AudioDataSimpleSource.sources:
    speech_fb:
        - /data/daps/train
    speech_hq:
        - /data/vctk
        - /data/vocalset
        - /data/read_speech
        - /data/french_speech
    speech_uq:
        - /data/emotional_speech/
        - /data/common_voice/
        - /data/german_speech/
        - /data/russian_speech/
        - /data/spanish_speech/
    music_hq:
        - /data/musdb/train
    music_uq:
        - /data/jamendo
    general:
        - /data/audioset/data/unbalanced_train_segments/
        - /data/audioset/data/balanced_train_segments/

The second thing to know is that both AudioDataSimpleSource and AudioDataBalancedSource can be initialized with an instance of SaliencyParams. If SaliencyParams has enabled set to True, then a random section of an audio file will be selected until it meets a specified minimum loudness.