LossVal: Efficient Data Valuation for Neural Networks

stp2yDecember 18, 20240 Comments

Every’s Master Plan

[Submitted on 5 Dec 2024 (v1), last revised 17 Dec 2024 (this version, v2)]

View a PDF of the paper titled LossVal: Efficient Data Valuation for Neural Networks, by Tim Wibiral and 3 other authors

View PDF
HTML (experimental)

Abstract:Assessing the importance of individual training samples is a key challenge in machine learning. Traditional approaches retrain models with and without specific samples, which is computationally expensive and ignores dependencies between data points. We introduce LossVal, an efficient data valuation method that computes importance scores during neural network training by embedding a self-weighting mechanism into loss functions like cross-entropy and mean squared error. LossVal reduces computational costs, making it suitable for large datasets and practical applications. Experiments on classification and regression tasks across multiple datasets show that LossVal effectively identifies noisy samples and is able to distinguish helpful from harmful samples. We examine the gradient calculation of LossVal to highlight its advantages. The source code is available at: this https URL

Submission history

From: Tim Wibiral [view email]
[v1]
Thu, 5 Dec 2024 13:46:55 UTC (467 KB)
[v2]
Tue, 17 Dec 2024 16:40:37 UTC (480 KB)

Source link
lol

By stp2y