top of page
Download - 736 740 Zip
Explain that the goal is "Automated Audio Captioning" (AAC)—predicting a textual description from an audio signal.
Clotho is an audio dataset used for intermodal translation (audio-to-text) tasks. It is widely utilized in the (Detection and Classification of Acoustic Scenes and Events) challenges. 📂 Key Data Components Download 736 740 zip
If you are writing a technical report or paper using this data, ensure you include these standard sections: Explain that the goal is "Automated Audio Captioning"
The full development set is approximately 6.5 GB . Download 736 740 zip
Thousands of sound samples ranging from 15 to 30 seconds.
bottom of page
