Tracking Changing Probabilities via Dynamic Learners

stp2yDecember 25, 20240 Comments

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

[Submitted on 15 Feb 2024 (v1), last revised 24 Dec 2024 (this version, v3)]

View a PDF of the paper titled Tracking Changing Probabilities via Dynamic Learners, by Omid Madani

View PDF
HTML (experimental)

Abstract:Consider a predictor, a learner, whose input is a stream of discrete items. The predictor’s task, at every time point, is probabilistic multiclass prediction, i.e. to predict which item may occur next by outputting zero or more candidate items, each with a probability, after which the actual item is revealed and the predictor updates. To output probabilities, the predictor keeps track of the proportions of the items it has seen. The stream is unbounded (lifelong), and the predictor has finite limited space. The task is open-ended: the set of items is unknown to the predictor and their totality can also grow unbounded. Moreover, there is non-stationarity: the underlying frequencies of items may change, substantially, from time to time. For instance, new items may start appearing and a few recently frequent items may cease to occur again. The predictor, being space-bounded, need only provide probabilities for those items which, at the time of prediction, have sufficiently high frequency, i.e., the salient items. This problem is motivated in the setting of Prediction Games, a self-supervised learning regime where concepts serve as both the predictors and the predictands, and the set of concepts grows over time, resulting in non-stationarities as new concepts are generated and used. We design and study a number of predictors, sparse moving averages(SMAs), for the task. One SMA adapts the sparse exponentiated moving average and another is based on queuing a few counts, keeping dynamic per-item histories. Evaluating the predicted probabilities, under noise and non-stationarity, presents challenges, and we discuss and develop evaluation methods, one based on bounding log-loss. We show that a combination of ideas, supporting dynamic predictand-specific learning rates, offers advantages in terms of faster adaption to change (plasticity), while also supporting low variance (stability).

Submission history

From: Omid Madani [view email]
[v1]
Thu, 15 Feb 2024 17:48:58 UTC (14,784 KB)
[v2]
Tue, 30 Apr 2024 04:15:24 UTC (26,313 KB)
[v3]
Tue, 24 Dec 2024 04:56:07 UTC (27,128 KB)

Source link
lol

By stp2y