Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits, by Julien Zhou (Thoth and 6 other authors

View PDF

Abstract:We address the problem of stochastic combinatorial semi-bandits, where a player selects among P actions from the power set of a set containing d base items. Adaptivity to the problem’s structure is essential in order to obtain optimal regret upper bounds. As estimating the coefficients of a covariance matrix can be manageable in practice, leveraging them should improve the regret. We design “optimistic” covariance-adaptive algorithms relying on online estimations of the covariance structure, called OLS-UCB-C and COS-V (only the variances for the latter). They both yields improved gap-free regret. Although COS-V can be slightly suboptimal, it improves on computational complexity by taking inspiration from ThompsonSampling approaches. It is the first sampling-based algorithm satisfying a T^1/2 gap-free regret (up to poly-logs). We also show that in some cases, our approach efficiently leverages the semi-bandit feedback and outperforms bandit feedback approaches, not only in exponential regimes where P >> d but also when P <= d, which is not covered by existing analyses.

Submission history

From: Julien Zhou [view email] [via CCSD proxy]
[v1]
Fri, 23 Feb 2024 08:07:54 UTC (133 KB)
[v2]
Wed, 3 Jul 2024 14:29:43 UTC (33 KB)
[v3]
Fri, 8 Nov 2024 11:10:48 UTC (618 KB)
[v4]
Fri, 15 Nov 2024 09:41:26 UTC (618 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.