On Lai’s Upper Confidence Bound in Multi-Armed Bandits

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning


View a PDF of the paper titled On Lai’s Upper Confidence Bound in Multi-Armed Bandits, by Huachen Ren and Cun-Hui Zhang

View PDF
HTML (experimental)

Abstract:In this memorial paper, we honor Tze Leung Lai’s seminal contributions to the topic of multi-armed bandits, with a specific focus on his pioneering work on the upper confidence bound. We establish sharp non-asymptotic regret bounds for an upper confidence bound index with a constant level of exploration for Gaussian rewards. Furthermore, we establish a non-asymptotic regret bound for the upper confidence bound index of Lai (1987) which employs an exploration function that decreases with the sample size of the corresponding arm. The regret bounds have leading constants that match the Lai-Robbins lower bound. Our results highlight an aspect of Lai’s seminal works that deserves more attention in the machine learning literature.

Submission history

From: Huachen Ren [view email]
[v1]
Thu, 3 Oct 2024 07:58:43 UTC (27 KB)
[v2]
Fri, 4 Oct 2024 02:19:45 UTC (27 KB)



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.