aac_datasets.datasets.wavcaps module¶

class WavCaps( root: str | Path | None = None, subset: str = 'audioset_no_audiocaps', download: bool = False, transform: Callable[[WavCapsItem], Any] | None = None, verbose: int = 0, force_download: bool = False, verify_files: bool = False, *, clean_archives: bool = False, hf_cache_dir: str | None = None, repo_id: str | None = None, revision: str | None = '85a0c21e26fa7696a5a74ce54fada99a9b43c6de', zip_path: str | Path | None = None, )[source]¶

Bases: AACDataset[WavCapsItem]

Unofficial WavCaps PyTorch dataset.

WavCaps Paper : https://arxiv.org/pdf/2303.17395.pdf HuggingFace source : https://huggingface.co/datasets/cvssp/WavCaps

This dataset contains 4 training subsets, extracted from different sources: - AudioSet strongly labeled (“audioset”) - BBC Sound Effects (“bbc”) - FreeSound (“freesound”) - SoundBible (“soundbible”) - AudioSet strongly labeled without AudioCaps (“audioset_no_audiocaps”) - FreeSound without Clotho (“freesound_no_clotho”)

Warning

WavCaps download is experimental ; it requires a lot of disk space and can take very long time to download and extract, so you might expect errors.

Dataset folder tree¶

{root}
└── WavCaps
    ├── Audio
    │   ├── AudioSet_SL
    │   │    └── (108317 flac files, ~64GB)
    │   ├── BBC_Sound_Effects
    │   │    └── (31201 flac files, ~142GB)
    │   ├── FreeSound
    │   │    └── (262300 flac files, ~1.4TB)
    │   └── SoundBible
    │        └── (1232 flac files, ~884MB)
    ├── Zip_files
    │   ├── AudioSet_SL
    │   │    └── (8 zip files, ~76GB)
    │   ├── BBC_Sound_Effects
    │   │    └── (26 zip files, ~562GB)
    │   ├── FreeSound
    │   │    └── (123 zip? files, ~1.4TB)
    │   └── SoundBible
    │        └── (1 zip? files, ~624GB)
    ├── json_files
    │    ├── AudioSet_SL
    │    │    └── as_final.json
    │    ├── BBC_Sound_Effects
    │    │    └── bbc_final.json
    │    ├── FreeSound
    │    │    ├── fsd_final_2s.json
    │    │    └── fsd_final.json
    │    ├── SoundBible
    │    │    └── sb_final.json
    │    └── blacklist
    │         ├── blacklist_exclude_all_ac.json
    │         ├── blacklist_exclude_test_ac.json
    │         └── blacklist_exclude_ubs8k_esc50_vggsound.json
    ├── .gitattributes
    └── README.md

CARD: ClassVar[WavCapsCard] = <aac_datasets.datasets.functional.wavcaps.WavCapsCard object>¶

property download: bool¶

property root: str¶

property sr: int¶

property subset: str¶

class WavCapsItem[source]¶

Bases: TypedDict

audio: Tensor¶

author: str | None¶

captions: List[str]¶

dataset: str¶

description: str | None¶

download_link: str | None¶

duration: float¶

fname: str¶

href: str | None¶

id: str¶

index: int¶

source: str¶

sr: int¶

subset: str¶

tags: List[str]¶