top of page

Download - 736 740 Zip

Explain that the goal is "Automated Audio Captioning" (AAC)—predicting a textual description from an audio signal.

Clotho is an audio dataset used for intermodal translation (audio-to-text) tasks. It is widely utilized in the (Detection and Classification of Acoustic Scenes and Events) challenges. 📂 Key Data Components Download 736 740 zip

If you are writing a technical report or paper using this data, ensure you include these standard sections: Explain that the goal is "Automated Audio Captioning"

The full development set is approximately 6.5 GB . Download 736 740 zip

Thousands of sound samples ranging from 15 to 30 seconds.

bottom of page