DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving

arXiv:2501.05081v1 Announce Type: new
Abstract: In recent years, large language models have had a very impressive performance, which largely contributed to the development and application of artificial intelligence, and the parameters and performance of the models are still growing rapidly. In particular, multimodal large language models (MLLM) can combine multiple modalities such as pictures, videos, sounds, texts, etc., and have great potential in various tasks. However, most MLLMs require very high computational resources, which is a major challenge for most researchers and developers. In this paper, we explored the utility of small-scale MLLMs and applied small-scale MLLMs to the field of autonomous driving. We hope that this will advance the application of MLLMs in real-world scenarios.

Source link
lol

DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving

By stp2y

Leave a Reply Cancel reply