arXiv:2405.17444v1 Announce Type: new
Abstract: In this paper, we explore the feasibility of using a transformer-based, spatiotemporal attention network (STAN) for gradient-based time-series explanations. First, we trained the STAN model for video classifications using the global and local views of data and weakly supervised labels on time-series data (i.e. the type of an activity). We then leveraged a gradient-based XAI technique (e.g. saliency map) to identify salient frames of time-series data. According to the experiments using the datasets of four medically relevant activities, the STAN model demonstrated its potential to identify important frames of videos.
Source link
lol