S2I Translation Datasets

Audiovisual data for Sound-to-Image (S2I) translation experiments.

1-s Mel spectrogram segments and respective central frames extracted from a video track of the class Rail transport

This repository provides the audiovisual data required for training and testing the models available in https://github.com/leofanzeres/s2i.git. The datasets consist of log-Mel spectrograms (or the extracted audio embeddings) computed from 1-s audio segments and the respective frames from the original video tracks. The complete VEGAS dataset was made available by Zhou et al. during their study on crossmodal translation (). It consists of of 10-s maximal duration videos distributed in 10 sound classes, among which we use five: Baby crying, Dog, Rail transport, Fireworks, and Water flowing.

This dataset is made available under a Creative Commons Attribution 4.0 International License

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
batches		batches
images		images
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batches

batches

images

images

.gitattributes

.gitattributes

LICENSE

LICENSE

README.md

README.md

Repository files navigation

S2I Translation Datasets

About

Releases

Packages

License

leofanzeres/s2i_data

Folders and files

Latest commit

History

Repository files navigation

S2I Translation Datasets

About

Resources

License

Stars

Watchers

Forks