Skip to content

leofanzeres/s2i_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

S2I Translation Datasets

Audiovisual data for Sound-to-Image (S2I) translation experiments.

1-s Mel spectrogram segments and respective central frames extracted from a video track of the class Rail transport. 1-s Mel spectrogram segments and respective central frames extracted from a video track of the class Rail transport

This repository provides the audiovisual data required for training and testing the models available in https://github.com/leofanzeres/s2i.git. The datasets consist of log-Mel spectrograms (or the extracted audio embeddings) computed from 1-s audio segments and the respective frames from the original video tracks. The complete VEGAS dataset was made available by Zhou et al. during their study on crossmodal translation (Visual to Sound: Generating Natural Sound for Videos in the Wild, CVPR, 2018). It consists of of 10-s maximal duration videos distributed in 10 sound classes, among which we use five: Baby crying, Dog, Rail transport, Fireworks, and Water flowing.

This dataset is made available under a Creative Commons Attribution 4.0 International License CC BY 4.0

About

Audiovisual data for sound-to-image (S2I) experiments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published