ProtoS-ViT: Visual foundation models for sparse self-explainable classifications

stp2yDecember 17, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 14 Jun 2024 (v1), last revised 14 Dec 2024 (this version, v2)]

View a PDF of the paper titled ProtoS-ViT: Visual foundation models for sparse self-explainable classifications, by Hugues Turb’e and 3 other authors

View PDF
HTML (experimental)

Abstract:Prototypical networks aim to build intrinsically explainable models based on the linear summation of concepts. Concepts are coherent entities that we, as humans, can recognize and associate with a certain object or entity. However, important challenges remain in the fair evaluation of explanation quality provided by these models. This work first proposes an extensive set of quantitative and qualitative metrics which allow to identify drawbacks in current prototypical networks. It then introduces a novel architecture which provides compact explanations, outperforming current prototypical models in terms of explanation quality. Overall, the proposed architecture demonstrates how frozen pre-trained ViT backbones can be effectively turned into prototypical models for both general and domain-specific tasks, in our case biomedical image classifiers. Code is available at url{this https URL}.

Submission history

From: Hugues Turbé [view email]
[v1]
Fri, 14 Jun 2024 13:36:30 UTC (9,567 KB)
[v2]
Sat, 14 Dec 2024 03:38:30 UTC (14,584 KB)

Source link
lol

By stp2y