Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning

stp2yMay 31, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 29 May 2024]

View a PDF of the paper titled Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning, by Everlyn Asiko Chimoto and 5 other authors

View PDF
HTML (experimental)

Abstract:Neural Machine Translation models are extremely data and compute-hungry. However, not all data points contribute equally to model training and generalization. Data pruning to remove the low-value data points has the benefit of drastically reducing the compute budget without significant drop in model performance. In this paper, we propose a new data pruning technique: Checkpoints Across Time (CAT), that leverages early model training dynamics to identify the most relevant data points for model performance. We benchmark CAT against several data pruning techniques including COMET-QE, LASER and LaBSE. We find that CAT outperforms the benchmarks on Indo-European languages on multiple test sets. When applied to English-German, English-French and English-Swahili translation tasks, CAT achieves comparable performance to using the full dataset, while pruning up to 50% of training data. We inspect the data points that CAT selects and find that it tends to favour longer sentences and sentences with unique or rare words.

Submission history

From: Everlyn Asiko Chimoto [view email]
[v1]
Wed, 29 May 2024 19:21:49 UTC (7,473 KB)

Source link
lol

By stp2y