aac_datasets.datasets.wavcaps module

class WavCaps(
root: str | Path | None = None,
subset: str = 'audioset_no_audiocaps',
download: bool = False,
transform: Callable[[WavCapsItem], Any] | None = None,
verbose: int = 0,
force_download: bool = False,
verify_files: bool = False,
*,
clean_archives: bool = False,
hf_cache_dir: str | None = None,
repo_id: str | None = None,
revision: str | None = '85a0c21e26fa7696a5a74ce54fada99a9b43c6de',
zip_path: str | Path | None = None,
)[source]

Bases: AACDataset[WavCapsItem]

Unofficial WavCaps PyTorch dataset.

WavCaps Paper : https://arxiv.org/pdf/2303.17395.pdf HuggingFace source : https://huggingface.co/datasets/cvssp/WavCaps

This dataset contains 4 training subsets, extracted from different sources: - AudioSet strongly labeled (“audioset”) - BBC Sound Effects (“bbc”) - FreeSound (“freesound”) - SoundBible (“soundbible”) - AudioSet strongly labeled without AudioCaps (“audioset_no_audiocaps”) - FreeSound without Clotho (“freesound_no_clotho”)

Warning

WavCaps download is experimental ; it requires a lot of disk space and can take very long time to download and extract, so you might expect errors.

Dataset folder tree
{root}
└── WavCaps
    ├── Audio
    │   ├── AudioSet_SL
    │   │    └── (108317 flac files, ~64GB)
    │   ├── BBC_Sound_Effects
    │   │    └── (31201 flac files, ~142GB)
    │   ├── FreeSound
    │   │    └── (262300 flac files, ~1.4TB)
    │   └── SoundBible
    │        └── (1232 flac files, ~884MB)
    ├── Zip_files
    │   ├── AudioSet_SL
    │   │    └── (8 zip files, ~76GB)
    │   ├── BBC_Sound_Effects
    │   │    └── (26 zip files, ~562GB)
    │   ├── FreeSound
    │   │    └── (123 zip? files, ~1.4TB)
    │   └── SoundBible
    │        └── (1 zip? files, ~624GB)
    ├── json_files
    │    ├── AudioSet_SL
    │    │    └── as_final.json
    │    ├── BBC_Sound_Effects
    │    │    └── bbc_final.json
    │    ├── FreeSound
    │    │    ├── fsd_final_2s.json
    │    │    └── fsd_final.json
    │    ├── SoundBible
    │    │    └── sb_final.json
    │    └── blacklist
    │         ├── blacklist_exclude_all_ac.json
    │         ├── blacklist_exclude_test_ac.json
    │         └── blacklist_exclude_ubs8k_esc50_vggsound.json
    ├── .gitattributes
    └── README.md
CARD: ClassVar[WavCapsCard] = <aac_datasets.datasets.functional.wavcaps.WavCapsCard object>
property download: bool
property root: str
property sr: int
property subset: str
class WavCapsItem[source]

Bases: TypedDict

audio: Tensor
author: str | None
captions: List[str]
dataset: str
description: str | None
duration: float
fname: str
href: str | None
id: str
index: int
source: str
sr: int
subset: str
tags: List[str]