[Submitted on 10 Oct 2024]
View a PDF of the paper titled In Search of Forgotten Domain Generalization, by Prasanna Mayilvahanan and 6 other authors
Abstract:Out-of-Domain (OOD) generalization is the ability of a model trained on one or more domains to generalize to unseen domains. In the ImageNet era of computer vision, evaluation sets for measuring a model’s OOD performance were designed to be strictly OOD with respect to style. However, the emergence of foundation models and expansive web-scale datasets has obfuscated this evaluation process, as datasets cover a broad range of domains and risk test domain contamination. In search of the forgotten domain generalization, we create large-scale datasets subsampled from LAION — LAION-Natural and LAION-Rendition — that are strictly OOD to corresponding ImageNet and DomainNet test sets in terms of style. Training CLIP models on these datasets reveals that a significant portion of their performance is explained by in-domain examples. This indicates that the OOD generalization challenges from the ImageNet era still prevail and that training on web-scale data merely creates the illusion of OOD generalization. Furthermore, through a systematic exploration of combining natural and rendition datasets in varying proportions, we identify optimal mixing ratios for model generalization across these domains. Our datasets and results re-enable meaningful assessment of OOD robustness at scale — a crucial prerequisite for improving model robustness.
Submission history
From: Prasanna Mayilvahanan [view email]
[v1]
Thu, 10 Oct 2024 17:50:45 UTC (31,677 KB)
Source link
lol