05-27【晋天元】五教5101 学习理论系列001

发布者:卢珊珊发布时间:2026-05-25


报告题目:Recent Advances in Thompson Sampling: From Multi-Armed Bandits to Combinatorial Semi-Bandits


报告人:晋天元 (Thrust of Data Science and Analytics, The Hong Kong University of Science and Technology (Guangzhou)


报告时间:5月27日,周三,上午10:00-11:00


报告地点:第五教学楼5101教室


摘要:

Thompson Sampling (TS) is one of the most widely used algorithms for sequential decision-making due to its simplicity and strong empirical performance. This talk presents recent advances in Thompson Sampling for both classical multi-armed bandits (MABs) and combinatorial multi-armed bandits (CMABs). For the MAB setting, we discuss epsilon-Exploring Thompson Sampling, a simple modification of TS that performs posterior-based exploration only with probability epsilon and acts greedily otherwise. This reduced-exploration strategy achieves both minimax and asymptotic optimality for several common reward distributions, including Gaussian, Bernoulli, Poisson, and Gamma rewards. It also leads to substantial computational gains and significantly improved empirical performance compared with standard TS. For the CMAB setting, we focus on the challenge that standard Combinatorial Thompson Sampling (CTS) can suffer from regret bounds with exponential dependence on the optimal super-arm size. We introduce single-seed CTS, where all base-arm posterior samples are coupled through one shared random seed. This coupling preserves the correct posterior marginals while inducing coordinated optimism across base arms, leading to the first polynomial regret guarantee for Thompson Sampling in general CMAB settings. Together, these results show that carefully controlling exploration—either by reducing its frequency in MABs or coordinating it in CMABs—can significantly improve both the theoretical guarantees and practical performance of Thompson Sampling.