arXiv:2407.09495v1 Announce Type: new
Abstract: This short position paper provides a manually curated list of non-English image captioning datasets (as of May 2024). Through this list, we can observe the dearth of datasets in different languages: only 23 different languages are represented. With the addition of the Crossmodal-3600 dataset (Thapliyal et al., 2022, 36 languages) this number increases somewhat, but still this number is tiny compared to the thousands of spoken languages that exist. This paper closes with some open questions for the field of Vision & Language.
Source link
lol