aac_datasets.datasets.functional.macs module

class MACSCard[source]

Bases: DatasetCard

ANNOTATIONS_CREATORS: Tuple[str, ...] = ('crowdsourced',)
CITATION: str = '\n    @inproceedings{Martin2021b,\n        title        = {Diversity and Bias in Audio Captioning Datasets},\n        author       = {Martin, Irene and Mesaros, Annamaria},\n        year         = 2021,\n        month        = {November},\n        booktitle    = {Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)},\n        address      = {Barcelona, Spain},\n        pages        = {90--94},\n        isbn         = {978-84-09-36072-7},\n        url          = {https://dcase.community/documents/workshop2021/proceedings/DCASE2021Workshop_Martin_34.pdf},\n        abstract     = {Describing soundscapes in sentences allows better understanding of the acoustic scene than a single label indicating the acoustic scene class or a set of audio tags indicating the sound events active in the audio clip. In addition, the richness of natural language allows a range of possible descriptions for the same acoustic scene. In this work, we address the diversity obtained when collecting descriptions of soundscapes using crowdsourcing. We study how much the collection of audio captions can be guided by the instructions given in the annotation task, by analysing the possible bias introduced by auxiliary information provided in the annotation process. Our study shows that even when given hints on the audio content, different annotators describe the same soundscape using different vocabulary. In automatic captioning, hints provided as audio tags represent grounding textual information that facilitates guiding the captioning output towards specific concepts. We also release a new dataset of audio captions and audio tags produced by multiple annotators for a subset of the TAU Urban Acoustic Scenes 2018 dataset, suitable for studying guided captioning.},\n        doi.         = {10.5281/zenodo.5770113}\n    }\n    '
DEFAULT_SUBSET: str = 'full'
DESCRIPTION: str = 'Multi-Annotator Captioned Soundscapes dataset.'
HOMEPAGE: str = 'https://zenodo.org/record/5114771'
LANGUAGE: Tuple[str, ...] = ('en',)
LANGUAGE_DETAILS: Tuple[str, ...] = ('en-US',)
MAX_CAPTIONS_PER_AUDIO: Dict[str, int] = {'full': 5}
MIN_CAPTIONS_PER_AUDIO: Dict[str, int] = {'full': 2}
NAME: str = 'macs'
N_CHANNELS: int = 2
PRETTY_NAME: str = 'MACS'
SAMPLE_RATE: int = 48000
SIZE_CATEGORIES: Tuple[str, ...] = ('1K<n<10K',)
SUBSETS: Tuple[str, ...] = ('full',)
TASK_CATEGORIES: Tuple[str, ...] = ('audio-to-text', 'text-to-audio')
download_macs_dataset(
root: str | Path | None = None,
subset: str = 'full',
force: bool = False,
verbose: int = 0,
verify_files: bool = True,
clean_archives: bool = True,
) None[source]

Prepare MACS data.

Parameters:
  • root – Dataset root directory. defaults to “.”.

  • subset – The subset of MACS to use. Can be one of SUBSETS. defaults to “full”.

  • force – If True, force to download again all files. defaults to False.

  • verbose – Verbose level. defaults to 0.

  • verify_files – If True, check all file already downloaded are valid. defaults to False.

  • clean_archives – If True, remove the compressed archives from disk to save space. defaults to True.

download_macs_datasets(
root: str | Path | None = None,
subsets: str | Iterable[str] = 'full',
force: bool = False,
verbose: int = 0,
clean_archives: bool = True,
verify_files: bool = True,
) None[source]

Function helper to download a list of subsets. See download_macs_dataset() for details.

load_macs_dataset(
root: str | Path | None = None,
subset: str = 'full',
verbose: int = 0,
) Tuple[Dict[str, List[Any]], Dict[int, float]][source]

Load MACS metadata.

Parameters:
  • root – Dataset root directory. defaults to “.”.

  • subset – The subset of MACS to use. Can be one of SUBSETS. defaults to “full”.

  • verbose – Verbose level. defaults to 0.

Returns:

A dictionnary of lists containing each metadata.