aac_datasets.datasets.functional.wavcaps module¶
- class WavCapsCard[source]¶
Bases:
DatasetCard
- CAPTIONS_PER_AUDIO: Dict[str, int] = {'audioset': 1, 'audioset_no_audiocaps': 1, 'bbc': 1, 'freesound': 1, 'freesound_no_clotho': 1, 'freesound_no_clotho_v2': 1, 'soundbible': 1}¶
- CITATION: str = '\n @article{mei2023WavCaps,\n title = {Wav{C}aps: A {ChatGPT}-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research},\n author = {Xinhao Mei and Chutong Meng and Haohe Liu and Qiuqiang Kong and Tom Ko and Chengqi Zhao and Mark D. Plumbley and Yuexian Zou and Wenwu Wang},\n year = 2023,\n journal = {arXiv preprint arXiv:2303.17395},\n url = {https://arxiv.org/pdf/2303.17395.pdf}\n }\n '¶
- DESCRIPTION: str = 'WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research.'¶
- EXPECTED_SIZES: Dict[str, int] = {'AudioSet_SL': 108317, 'BBC_Sound_Effects': 31201, 'FreeSound': 262300, 'SoundBible': 1320}¶
- HOMEPAGE = 'https://huggingface.co/datasets/cvssp/WavCaps'¶
- download_wavcaps_dataset(
- root: str | Path | None = None,
- subset: str = 'audioset_no_audiocaps',
- force: bool = False,
- verbose: int = 0,
- verify_files: bool = False,
- clean_archives: bool = False,
- hf_cache_dir: str | None = None,
- repo_id: str | None = None,
- revision: str | None = None,
- zip_path: str | Path | None = None,
Prepare WavCaps data.
- Parameters:
root – Dataset root directory. defaults to “.”.
subset – The subset of MACS to use. Can be one of
SUBSETS
. defaults to “audioset_no_audiocaps”.force – If True, force to download again all files. defaults to False.
verbose – Verbose level. defaults to 0.
verify_files – If True, check all file already downloaded are valid. defaults to False.
clean_archives – If True, remove the compressed archives from disk to save space. defaults to True.
hf_cache_dir – Optional override for HuggingFace cache directory path. defaults to None.
repo_id – Repository ID on HuggingFace. defaults to “cvssp/WavCaps”.
revision – Optional override for revision commit/name for HuggingFace rapository. defaults to None.
zip_path – Path to zip executable path in shell. defaults to “zip”.
- download_wavcaps_datasets(
- root: str | Path | None = None,
- subsets: str | Iterable[str] = 'audioset_no_audiocaps',
- force: bool = False,
- verbose: int = 0,
- clean_archives: bool = False,
- hf_cache_dir: str | None = None,
- repo_id: str | None = None,
- revision: str | None = None,
- verify_files: bool = False,
- zip_path: str | Path | None = None,
Function helper to download a list of subsets. See
download_wavcaps_dataset()
for details.
- load_wavcaps_dataset(
- root: str | Path | None = None,
- subset: str = 'audioset_no_audiocaps',
- verbose: int = 0,
- hf_cache_dir: str | None = None,
- revision: str | None = None,
Load WavCaps metadata.
- Parameters:
root – Dataset root directory. defaults to “.”.
subset – The subset of MACS to use. Can be one of
SUBSETS
. defaults to “audioset_no_audiocaps”.verbose – Verbose level. defaults to 0.
hf_cache_dir – Optional override for HuggingFace cache directory path. defaults to None.
revision – Optional override for revision commit/name for HuggingFace rapository. defaults to None.
- Returns:
A dictionnary of lists containing each metadata.