MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

stp2yDecember 17, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 19 Apr 2024 (v1), last revised 16 Dec 2024 (this version, v2)]

View a PDF of the paper titled MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction, by Zixuan Gong and 6 other authors

View PDF
HTML (experimental)

Abstract:Decoding natural visual scenes from brain activity has flourished, with extensive research in single-subject tasks and, however, less in cross-subject tasks. Reconstructing high-quality images in cross-subject tasks is a challenging problem due to profound individual differences between subjects and the scarcity of data annotation. In this work, we proposed MindTuner for cross-subject visual decoding, which achieves high-quality and rich semantic reconstructions using only 1 hour of fMRI training data benefiting from the phenomena of visual fingerprint in the human visual system and a novel fMRI-to-text alignment paradigm. Firstly, we pre-train a multi-subject model among 7 subjects and fine-tune it with scarce data on new subjects, where LoRAs with Skip-LoRAs are utilized to learn the visual fingerprint. Then, we take the image modality as the intermediate pivot modality to achieve fMRI-to-text alignment, which achieves impressive fMRI-to-text retrieval performance and corrects fMRI-to-image reconstruction with fine-tuned semantics. The results of both qualitative and quantitative analyses demonstrate that MindTuner surpasses state-of-the-art cross-subject visual decoding models on the Natural Scenes Dataset (NSD), whether using training data of 1 hour or 40 hours.

Submission history

From: Qi Zhang [view email]
[v1]
Fri, 19 Apr 2024 05:12:04 UTC (47,896 KB)
[v2]
Mon, 16 Dec 2024 13:59:51 UTC (22,552 KB)

Source link
lol

By stp2y