aac_datasets.datasets.audiocaps module¶
- class AudioCaps(
- root: str | Path | None =
None, - subset: 'train' | 'val' | 'test' | 'train_fixed' =
'train', - download: bool =
False, - transform: Callable[[AudioCapsItem], Any] | None =
None, - verbose: int =
0, - force_download: bool =
False, - verify_files: bool =
False, - *,
- audio_duration: float =
10.0, - audio_format: str =
'flac', - audio_n_channels: int =
1, - download_audio: bool =
True, - exclude_removed_audio: bool =
True, - ffmpeg_path: str | Path | None =
None, - flat_captions: bool =
False, - max_workers: int | None =
1, - sr: int =
32000, - with_tags: bool =
False, - ytdlp_path: str | Path | None =
None, - ytdlp_opts: Iterable[str] =
(), - version: 'v1' | 'v2' =
'v1', - num_dl_attempts: int =
2, Bases:
AACDataset[AudioCapsItem]Unofficial AudioCaps PyTorch dataset.
Subsets available are ‘train’, ‘val’ and ‘test’.
Audio is a waveform tensor of shape (1, n_times) of 10 seconds max, sampled at 32kHz by default. Target is a list of strings containing the captions. The ‘train’ subset has only 1 caption per sample and ‘val’ and ‘test’ have 5 captions. Download from YouTube requires ‘yt-dlp’ and ‘ffmpeg’ commands.
- /!YouTube website can sometimes block your IP when downloading audio with the error:
Sign in to confirm you’re not a bot. Use –cookies-from-browser or –cookies for the authentication. See https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp for how to manually pass cookies. Also see https://github.com/yt-dlp/yt-dlp/wiki/Extractors#exporting-youtube-cookies for tips on effectively exporting YouTube cookies.
You can pass yt-dlp args with ytdlp_opts argument, e.g. AudioCaps(ytdlp_opts=[”–cookies-from-browser”, “firefox”]).
See also: AudioCaps paper : https://www.aclweb.org/anthology/N19-1011.pdf
Dataset folder tree (for version v1)¶{root} └── AUDIOCAPS ├── csv_files_v1 │ ├── train.csv │ ├── val.csv │ └── test.csv └── audio_32000Hz ├── train │ └── (46231/49838 flac files, ~42G for 32kHz) ├── val │ └── (465/495 flac files, ~425M for 32kHz) └── test └── (913/975 flac files, ~832M for 32kHz)-
CARD : ClassVar[AudioCapsCard] =
<aac_datasets.datasets.functional.audiocaps.AudioCapsCard object>¶
- property subset : 'train' | 'val' | 'test' | 'train_fixed'¶
- property version : 'v1' | 'v2'¶