Change search
ReferencesLink to record
Permanent link

Direct link
Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms
KTH, School of Electrical Engineering (EES), Automatic Control.ORCID iD: 0000-0003-2988-8464
Supelec, France.
KTH, School of Electrical Engineering (EES), Automatic Control. INRIA, France.
2014 (English)Conference paper (Refereed)
Abstract [en]

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitzfunction of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitzbandits, we derive asymptotic problem specific lower bounds for the regret satisfied by anyalgorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitzstructure of the problem. In fact, we prove that OSLB is asymptotically optimal, as its asymptoticregret matches the lower bound. The regret analysis of our algorithms relies on a new concentrationinequality for weighted sums of KL divergences between the empirical distributions of rewards andtheir true distributions. For continuous Lipschitz bandits, we propose to first discretize the actionspace, and then apply OSLB or CKL-UCB, algorithms that provably exploit the structure efficiently.This approach is shown, through numerical experiments, to significantly outperform existing algorithmsthat directly deal with the continuous set of arms. Finally the results and algorithms areextended to contextual bandits with similarities.

Place, publisher, year, edition, pages
2014. Vol. 35, 975-999 p.
, Journal of Machine Learning Research, ISSN 1532-4435
Keyword [en]
Multi-armed Bandits
National Category
Probability Theory and Statistics Computer Science
Research subject
Computer Science
URN: urn:nbn:se:kth:diva-146876ScopusID: 2-s2.0-84939604123OAI: diva2:737623
COLT,Barcelona, Spain, June 13-15, 2014

QC 20140826

Available from: 2014-08-13 Created: 2014-06-17 Last updated: 2015-10-19Bibliographically approved

Open Access in DiVA

fulltext(557 kB)17 downloads
File information
File name FULLTEXT01.pdfFile size 557 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

ScopusConference websiteFulltext at JMLR

Search in DiVA

By author/editor
Magureanu, StefanProutiere, Alexandre
By organisation
Automatic Control
Probability Theory and StatisticsComputer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 17 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 65 hits
ReferencesLink to record
Permanent link

Direct link