aac_datasets.datasets.wavcaps module¶

Bases: AACDataset[WavCapsItem]

Unofficial WavCaps PyTorch dataset.

WavCaps Paper : https://arxiv.org/pdf/2303.17395.pdf HuggingFace source : https://huggingface.co/datasets/cvssp/WavCaps

This dataset contains 4 training subsets, extracted from different sources: - BBC Sound Effects “bbc” - SoundBible “soundbible” - AudioSet strongly labeled without AudioCaps V1 val and test subsets “audioset_no_audiocaps_v1” - FreeSound without Clotho dev, val, eval and test subsets “freesound_no_clotho_v2”

Other subsets exists but they does not comply DCASE Challenge rules: - AudioSet strongly labeled “audioset” - FreeSound “freesound”

Warning

WavCaps download is experimental ; it requires a lot of disk space and can take very long time to download and extract, so you might expect errors.

Dataset folder tree¶

{root}
└── WavCaps
    ├── Audio
    │   ├── AudioSet_SL
    │   │    └── (108317 flac files, ~64GB)
    │   ├── BBC_Sound_Effects
    │   │    └── (31201 flac files, ~142GB)
    │   ├── FreeSound
    │   │    └── (262300 flac files, ~1.4TB)
    │   └── SoundBible
    │        └── (1232 flac files, ~884MB)
    ├── Zip_files
    │   ├── AudioSet_SL
    │   │    └── (8 zip files, ~76GB)
    │   ├── BBC_Sound_Effects
    │   │    └── (26 zip files, ~562GB)
    │   ├── FreeSound
    │   │    └── (123 zip? files, ~1.4TB)
    │   └── SoundBible
    │        └── (1 zip? files, ~624GB)
    ├── json_files
    │    ├── AudioSet_SL
    │    │    └── as_final.json
    │    ├── BBC_Sound_Effects
    │    │    └── bbc_final.json
    │    ├── FreeSound
    │    │    ├── fsd_final_2s.json
    │    │    └── fsd_final.json
    │    ├── SoundBible
    │    │    └── sb_final.json
    │    └── blacklist
    │         ├── blacklist_exclude_all_ac.json
    │         ├── blacklist_exclude_test_ac.json
    │         └── blacklist_exclude_ubs8k_esc50_vggsound.json
    ├── .gitattributes
    └── README.md

CARD : ClassVar[WavCapsCard] = <aac_datasets.datasets.functional.wavcaps.WavCapsCard object>¶

property download : bool¶

property root : str¶

property sr : int¶

property subset : 'audioset' | 'bbc' | 'freesound' | 'soundbible' | 'audioset_no_audiocaps_v1' | 'freesound_no_clotho_v2'¶

class WavCapsItem[source]¶

Bases: TypedDict

audio : Tensor¶

author : str | None¶

captions : List[str]¶

dataset : str¶

description : str | None¶

download_link : str | None¶

duration : float¶

fname : str¶

href : str | None¶

id : str¶

index : int¶

source : Literal['AudioSet_SL', 'BBC_Sound_Effects', 'FreeSound', 'SoundBible']¶

sr : int¶

subset : Literal['audioset', 'bbc', 'freesound', 'soundbible', 'audioset_no_audiocaps_v1', 'freesound_no_clotho_v2']¶

tags : List[str]¶