aac_datasets.datasets.wavcaps module

class WavCaps(
root: str | Path | None = None,
subset: 'audioset' | 'bbc' | 'freesound' | 'soundbible' | 'audioset_no_audiocaps_v1' | 'freesound_no_clotho_v2' = 'audioset_no_audiocaps_v1',
download: bool = False,
transform: Callable[[WavCapsItem], Any] | None = None,
verbose: int = 0,
force_download: bool = False,
verify_files: bool = False,
*,
clean_archives: bool = False,
hf_cache_dir: str | None = None,
repo_id: str | None = None,
revision: str | None = '85a0c21e26fa7696a5a74ce54fada99a9b43c6de',
zip_path: str | Path | None = None,
)[source]

Bases: AACDataset[WavCapsItem]

Unofficial WavCaps PyTorch dataset.

WavCaps Paper : https://arxiv.org/pdf/2303.17395.pdf HuggingFace source : https://huggingface.co/datasets/cvssp/WavCaps

This dataset contains 4 training subsets, extracted from different sources: - BBC Sound Effects “bbc” - SoundBible “soundbible” - AudioSet strongly labeled without AudioCaps V1 val and test subsets “audioset_no_audiocaps_v1” - FreeSound without Clotho dev, val, eval and test subsets “freesound_no_clotho_v2”

Other subsets exists but they does not comply DCASE Challenge rules: - AudioSet strongly labeled “audioset” - FreeSound “freesound”

Warning

WavCaps download is experimental ; it requires a lot of disk space and can take very long time to download and extract, so you might expect errors.

Dataset folder tree
{root}
└── WavCaps
    ├── Audio
    │   ├── AudioSet_SL
    │   │    └── (108317 flac files, ~64GB)
    │   ├── BBC_Sound_Effects
    │   │    └── (31201 flac files, ~142GB)
    │   ├── FreeSound
    │   │    └── (262300 flac files, ~1.4TB)
    │   └── SoundBible
    │        └── (1232 flac files, ~884MB)
    ├── Zip_files
    │   ├── AudioSet_SL
    │   │    └── (8 zip files, ~76GB)
    │   ├── BBC_Sound_Effects
    │   │    └── (26 zip files, ~562GB)
    │   ├── FreeSound
    │   │    └── (123 zip? files, ~1.4TB)
    │   └── SoundBible
    │        └── (1 zip? files, ~624GB)
    ├── json_files
    │    ├── AudioSet_SL
    │    │    └── as_final.json
    │    ├── BBC_Sound_Effects
    │    │    └── bbc_final.json
    │    ├── FreeSound
    │    │    ├── fsd_final_2s.json
    │    │    └── fsd_final.json
    │    ├── SoundBible
    │    │    └── sb_final.json
    │    └── blacklist
    │         ├── blacklist_exclude_all_ac.json
    │         ├── blacklist_exclude_test_ac.json
    │         └── blacklist_exclude_ubs8k_esc50_vggsound.json
    ├── .gitattributes
    └── README.md
CARD : ClassVar[WavCapsCard] = <aac_datasets.datasets.functional.wavcaps.WavCapsCard object>
property download : bool
property root : str
property sr : int
property subset : 'audioset' | 'bbc' | 'freesound' | 'soundbible' | 'audioset_no_audiocaps_v1' | 'freesound_no_clotho_v2'
class WavCapsItem[source]

Bases: TypedDict

audio : Tensor
author : str | None
captions : List[str]
dataset : str
description : str | None
duration : float
fname : str
href : str | None
id : str
index : int
source : Literal['AudioSet_SL', 'BBC_Sound_Effects', 'FreeSound', 'SoundBible']
sr : int
subset : Literal['audioset', 'bbc', 'freesound', 'soundbible', 'audioset_no_audiocaps_v1', 'freesound_no_clotho_v2']
tags : List[str]