DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences

stp2ySeptember 16, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 8 Mar 2024 (v1), last revised 13 Sep 2024 (this version, v2)]

View a PDF of the paper titled DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences, by Peidong Li and 3 other authors

View PDF
HTML (experimental)

Abstract:Camera-based Bird’s-Eye-View (BEV) perception often struggles between adopting 3D-to-2D or 2D-to-3D view transformation (VT). The 3D-to-2D VT typically employs resource-intensive Transformer to establish robust correspondences between 3D and 2D features, while the 2D-to-3D VT utilizes the Lift-Splat-Shoot (LSS) pipeline for real-time application, potentially missing distant information. To address these limitations, we propose DualBEV, a unified framework that utilizes a shared feature transformation incorporating three probabilistic measurements for both strategies. By considering dual-view correspondences in one stage, DualBEV effectively bridges the gap between these strategies, harnessing their individual strengths. Our method achieves state-of-the-art performance without Transformer, delivering comparable efficiency to the LSS approach, with 55.2% mAP and 63.4% NDS on the nuScenes test set. Code is available at url{this https URL}

Submission history

From: Peidong Li [view email]
[v1]
Fri, 8 Mar 2024 15:58:00 UTC (17,015 KB)
[v2]
Fri, 13 Sep 2024 07:01:43 UTC (7,576 KB)

Source link
lol

By stp2y