PCIE_LAM Solution for Ego4D Looking At Me Challenge

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


[Submitted on 18 Jun 2024]

View a PDF of the paper titled PCIE_LAM Solution for Ego4D Looking At Me Challenge, by Kanokphan Lertniphonphan and 5 other authors

View PDF
HTML (experimental)

Abstract:This report presents our team’s ‘PCIE_LAM’ solution for the Ego4D Looking At Me Challenge at CVPR2024. The main goal of the challenge is to accurately determine if a person in the scene is looking at the camera wearer, based on a video where the faces of social partners have been localized. Our proposed solution, InternLSTM, consists of an InternVL image encoder and a Bi-LSTM network. The InternVL extracts spatial features, while the Bi-LSTM extracts temporal features. However, this task is highly challenging due to the distance between the person in the scene and the camera movement, which results in significant blurring in the face image. To address the complexity of the task, we implemented a Gaze Smoothing filter to eliminate noise or spikes from the output. Our approach achieved the 1st position in the looking at me challenge with 0.81 mAP and 0.93 accuracy rate. Code is available at this https URL

Submission history

From: Kanokphan Lertniphonphan [view email]
[v1]
Tue, 18 Jun 2024 02:16:32 UTC (1,732 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.