View a PDF of the paper titled Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection, by Yuanze Li and 8 other authors
Abstract:Due to the training configuration, traditional industrial anomaly detection (IAD) methods have to train a specific model for each deployment scenario, which is insufficient to meet the requirements of modern design and manufacturing. On the contrary, large multimodal models~(LMMs) have shown eminent generalization ability on various vision tasks, and their perception and comprehension capabilities imply the potential of applying LMMs on IAD tasks. However, we observe that even though the LMMs have abundant knowledge about industrial anomaly detection in the textual domain, the LMMs are unable to leverage the knowledge due to the modality gap between textual and visual domains. To stimulate the relevant knowledge in LMMs and adapt the LMMs towards anomaly detection tasks, we introduce existing IAD methods as vision experts and present a novel large multimodal model applying vision experts for industrial anomaly detection~(abbreviated to {Myriad}). Specifically, we utilize the anomaly map generated by the vision experts as guidance for LMMs, such that the vision model is guided to pay more attention to anomalous regions. Then, the visual features are modulated via an adapter to fit the anomaly detection tasks, which are fed into the language model together with the vision expert guidance and human instructions to generate the final outputs. Extensive experiments are applied on MVTec-AD, VisA, and PCB Bank benchmarks demonstrate that our proposed method not only performs favorably against state-of-the-art methods, but also inherits the flexibility and instruction-following ability of LMMs in the field of IAD. Source code and pre-trained models are publicly available at url{this https URL}.
Submission history
From: Yuanze Li [view email]
[v1]
Sun, 29 Oct 2023 16:49:45 UTC (10,199 KB)
[v2]
Wed, 1 Nov 2023 03:50:52 UTC (10,199 KB)
[v3]
Fri, 17 Jan 2025 06:13:20 UTC (39,379 KB)
Source link
lol