aac_datasets.datasets.clotho module¶
- class Clotho(
- root: str | Path | None = None,
- subset: str = 'dev',
- download: bool = False,
- transform: Callable[[ClothoItem], Any] | None = None,
- verbose: int = 0,
- force_download: bool = False,
- verify_files: bool = False,
- *,
- clean_archives: bool = True,
- flat_captions: bool = False,
- version: str = 'v2.1',
Bases:
AACDataset
[ClothoItem
]Unofficial Clotho PyTorch dataset.
Subsets available are ‘train’, ‘val’, ‘eval’, ‘dcase_aac_test’, ‘dcase_aac_analysis’, ‘dcase_t2a_audio’ and ‘dcase_t2a_captions’.
Audio are waveform sounds of 15 to 30 seconds, sampled at 44100 Hz. Target is a list of 5 different sentences strings describing an audio sample. The maximal number of words in captions is 20.
Clotho V1 Paper : https://arxiv.org/pdf/1910.09387.pdf
{root} └── CLOTHO_v2.1 ├── archives | └── (5 7z files, ~8.9GB) ├── clotho_audio_files │ ├── clotho_analysis │ │ └── (8360 wav files, ~19GB) │ ├── development │ │ └── (3839 wav files, ~7.1GB) │ ├── evaluation │ │ └── (1045 wav files, ~2.0GB) │ ├── test │ | └── (1043 wav files, ~2.0GB) │ ├── test_retrieval_audio │ | └── (1000 wav files, ~2.0GB) │ └── validation │ └── (1045 wav files, ~2.0GB) └── clotho_csv_files ├── clotho_captions_development.csv ├── clotho_captions_evaluation.csv ├── clotho_captions_validation.csv ├── clotho_metadata_development.csv ├── clotho_metadata_evaluation.csv ├── clotho_metadata_test.csv ├── clotho_metadata_validation.csv ├── retrieval_audio_metadata.csv └── retrieval_captions.csv
- CARD: ClassVar[ClothoCard] = <aac_datasets.datasets.functional.clotho.ClothoCard object>¶