sic-mmab: synchronisation involves communication · lower bounds centralizedlowerbound x k>m...
TRANSCRIPT
![Page 1: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/1.jpg)
SIC-MMAB: Synchronisation involvescommunication
Etienne Boursier Vianney Perchet
MLMDA Seminar, November 2019
![Page 2: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/2.jpg)
Overview
Multiplayer bandits problem
SIC-MMAB
Contradiction with lower bounds
Dynamic setting
Related works
![Page 3: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/3.jpg)
Multiplayer bandits problem
![Page 4: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/4.jpg)
Introduction
Motivation: Cognitive Radio (5G)Optimize spectrum access for Primary and Secondary userswhen Primary user on channel k → priority over Secondary userswhen several Secondary on same channel: interference/collision
Goal for secondary users: find and communicate on best channels
1 / 29
![Page 5: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/5.jpg)
Bandit game at round t ∈ {1, . . . ,T}K arms
Player
X1(t) X2(t) X3(t) X4(t)
µ1 µ2 µ3 µ4
i.i.d. Xk(t) ∼ B(µk) in [0, 1]pull arm π(t) given pastobserve reward Xπ(t)(t)
arms
means
Xk(t) =
{0 if Primary user on k
1 otherwise
2 / 29
![Page 6: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/6.jpg)
Bandit game at round t ∈ {1, . . . ,T}K arms
Player
X1(t) X2(t) X3(t) X4(t)
µ1 µ2 µ3 µ4
Pull arm2
i.i.d. Xk(t) ∼ B(µk) in [0, 1]pull arm π(t) given pastobserve reward Xπ(t)(t)
arms
means
Xk(t) =
{0 if Primary user on k
1 otherwise
2 / 29
![Page 7: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/7.jpg)
Multiplayer Bandit game at round t ∈ {1, . . . ,T}K arms, M players
Player 1 Player 2 Player 3
X1(t) X2(t) X3(t) X4(t)
µ1 µ2 µ3 µ4
arms
means
Xk(t) =
{0 if Primary user on k
1 otherwise
2 / 29
![Page 8: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/8.jpg)
Multiplayer Bandit game at round t ∈ {1, . . . ,T}K arms, M players
Player 1 Player 2 Player 3
X1(t) X2(t) X3(t) X4(t)
µ1 µ2 µ3 µ4
arms
means
Xk(t) =
{0 if Primary user on k
1 otherwise
2 / 29
![Page 9: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/9.jpg)
Multiplayer Bandit game at round t ∈ {1, . . . ,T}K arms, M players
Player 1 Player 2 Player 3
X1(t) 0 X3(t) X4(t)
µ1 µ2 µ3 µ4
Collision
arms
means
Xk(t) =
{0 if Primary user on k
1 otherwise
2 / 29
![Page 10: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/10.jpg)
Model: Multiplayer Multi-Armed Bandits
K arms with Bernoulli rewards Xk(t) ∼ B(µk)
w.l.o.g. µ1 ≥ µ2 ≥ . . . ≥ µK
M ≤ K players pull arms πj(t) simultaneously for t = 1, . . . ,TDecentralized: players can not communicate & M is unknownget reward r j(t) = Xπj (t)(t)1no collision on πj (t)
Regret: RT = TM∑k=1
µk − Eµ[ T∑
t=1
M∑j=1
r j(t)
]
3 / 29
![Page 11: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/11.jpg)
Feedback/sensing settings
r j(t) = Xπj (t)(t)1no collision on πj (t)
Collision sensing: observe r j(t) and 1no collision on πj (t)
No sensing: observe only r j(t)
Statistic sensing: observe r j(t) and Xπj (t)(t)
4 / 29
![Page 12: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/12.jpg)
Collision Sensing: SIC-MMAB
![Page 13: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/13.jpg)
Centralized case
Players communicate (for free) → no collisionCombinatorial bandits, tight bound:[Anantharam et al., 1987, Komiyama et al., 2015]
Regret in∑k>M
log(T )
µk − µM
5 / 29
![Page 14: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/14.jpg)
Lower bounds
Centralized lower bound∑k>M
log(T )µM−µk
[Anantharam et al., 1987]
Decentralized lower bound
M∑k>M
log(T )µM−µk
[Liu and Zhao, 2010][Besson and Kaufmann, 2018]
6 / 29
![Page 15: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/15.jpg)
Lower bounds
Centralized lower bound∑k>M
log(T )µM−µk
[Anantharam et al., 1987]
Decentralized lower bound
[Liu and Zhao, 2010][Besson and Kaufmann, 2018]
�����
��HHHH
HHH
M∑k>M
log(T )µM−µk
SIC-M
MAB
6 / 29
![Page 16: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/16.jpg)
Lower bounds
Centralized lower bound∑k>M
log(T )µM−µk
[Anantharam et al., 1987]
Decentralized lower bound
[Liu and Zhao, 2010][Besson and Kaufmann, 2018]
�����
��HHHH
HHH
M∑k>M
log(T )µM−µk
Decentralized ∼ Centralized
SIC-M
MAB
How is this possible?
6 / 29
![Page 17: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/17.jpg)
Main trick
Observation: 1no collision on k ∈ {0, 1} seen as a bit sent between players
force collisions during communication rounds
when i talks to j :
{collide with j to send a 1 bitdo not collide to send a 0
players communicate empirical means to each other→ centralizationsublogarithmic number of communication rounds
7 / 29
![Page 18: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/18.jpg)
Algorithm structure
Algorithm 1: SIC-MMABInitialization Phasefor p = 1, ...,∞ do
Exploration phase ppp for 2p roundsCommunication phase pppAccept/reject (sub)-optimal arms
endExploitation phase: pull optimal arms until T
8 / 29
![Page 19: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/19.jpg)
Initialization phase
Orthogonalize players: Musical Chairs for K log(T ) rounds[Rosenski et al., 2016]
Sample arm k uniformly at randomIf collision → continueNo collision → stick to arm k until K log(T )
With proba 1−M/T , all players end on different arms
Compute M and rank j : Sequential Hoppingplayer on arm k waits for 2k roundsplayer then hops for 2(K − k) roundsM − 1 = number of collisions andj − 1 = number of collisions for the 2k first rounds
9 / 29
![Page 20: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/20.jpg)
Initialization phase
Orthogonalize players: Musical Chairs for K log(T ) rounds[Rosenski et al., 2016]
Sample arm k uniformly at randomIf collision → continueNo collision → stick to arm k until K log(T )
With proba 1−M/T , all players end on different arms
Compute M and rank j : Sequential Hoppingplayer on arm k waits for 2k roundsplayer then hops for 2(K − k) roundsM − 1 = number of collisions andj − 1 = number of collisions for the 2k first rounds
9 / 29
![Page 21: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/21.jpg)
Exploration phase p
each player explores each arm 2p roundsstart at different positions given by rankssequential hopping → no collision
player j gathered statistics on arm k
S jk(p) rewards 1
T jk(p) pulls
10 / 29
![Page 22: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/22.jpg)
Communication phase p
player i communicates S ik(p) ∈ [2p] to player j :
encoded in p bits (0, 1, 0, . . . , 0)send it in p rounds: (no coll., coll., no coll., . . ., no coll.)
players communicate one at a timethey know when and how to do so, thanks to their ranks jpossible quantization for non binary rewards
length of comm. phase p: KM2p
11 / 29
![Page 23: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/23.jpg)
Algorithm structure
Algorithm 2: SIC-MMABInitialization Phasefor p = 1, ...,∞ do
Exploration phase pppCommunication phase pppAccept/reject (sub)-optimal arms
endExploitation phase
12 / 29
![Page 24: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/24.jpg)
Accept/Eliminate (sub)-optimal arms
All players have the same centralized empirical means µ̂k
Concentration inequality (Hoeffding)
With high proba, |µk − µ̂k | ≤√
2 log(T )/Tk(p)
→ arm k is detected better than l if:
µ̂k −√
2 log(T )/Tk(p) ≥ µ̂l +√
2 log(T )/Tl(p)
happens after log(T )(µk−µl )2
pulls
13 / 29
![Page 25: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/25.jpg)
Accept/Eliminate (sub)-optimal arms
arm k sub-optimal if M arms are detected better→ eliminated from the set to explorearm k optimal if K −M arms are detected worse→ attributed to player with largest rank
→ exploration ends after N = log(
log(T )(µM−µM+1)2
))phases
14 / 29
![Page 26: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/26.jpg)
Regret bound
Initialization: M × length ' MK log(T )
Communication: M ×∑N
p=1 pM2K ' M3K log2( log(T )
(µM−µM+1)2)
Exploration: centralized regret bound∑
k>Mlog(T )µM−µk
Low probability events: o(log(T ))
Total regret
RT .∑k>M
log(T )
µM − µk+ MK log(T )
15 / 29
![Page 27: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/27.jpg)
Contradiction with lower bounds
![Page 28: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/28.jpg)
Contradict the lower bound?
RecallLower bound M
∑k>M
log(T )µk−µM
SIC-MMAB∑
k>Mlog(T )µk−µM
+ KM log(T )
Why this contradiction?Lower bound proofs assumed that best algorithms do not collideWrong: SIC-MMAB deduces a lot of information from collisionsDecentralized as hard as centralized
16 / 29
![Page 29: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/29.jpg)
Towards a better model?
SIC-MMAB uses unrealistic/undesired communication protocolsabuses from a loophole allowing itneed for a better model, without such a loopholewhich model assumption did go wrong?
collision sensing?
17 / 29
![Page 30: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/30.jpg)
Towards a better model?
SIC-MMAB uses unrealistic/undesired communication protocolsabuses from a loophole allowing itneed for a better model, without such a loopholewhich model assumption did go wrong?
collision sensing?
17 / 29
![Page 31: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/31.jpg)
No sensing setting
AssumptionKnown lower bounds µk ≥ µmin > 0
Observation: we can send a bit with high proba. in log(T )/µmin rounds
Algo 1 SIC-MMAB with log(T )/µmin comm. rounds instead of 1
comm. regret becomes M3K log(T )log(T )log(T )µmin
log2(log(T ))log2(log(T ))log2(log(T ))
Algo 2 limited & different communicationdo not communicate statistics but only when an arm isfound (sub)-optimalregret in M
∑k>M
log(T )µk−µM
+ MK2
µminlog(T )
18 / 29
![Page 32: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/32.jpg)
Towards a better model?
SIC-MMAB uses unrealistic/undesired communication protocolsabuses from a loophole allowing such protocolswhich model assumption did go wrong?
collision sensing?
cooperative players? (work in progress)synchronisation between players?→ more realistic dynamic model
19 / 29
![Page 33: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/33.jpg)
Towards a better model?
SIC-MMAB uses unrealistic/undesired communication protocolsabuses from a loophole allowing such protocolswhich model assumption did go wrong?
collision sensing?cooperative players? (work in progress)
synchronisation between players?→ more realistic dynamic model
19 / 29
![Page 34: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/34.jpg)
Towards a better model?
SIC-MMAB uses unrealistic/undesired communication protocolsabuses from a loophole allowing such protocolswhich model assumption did go wrong?
collision sensing?cooperative players? (work in progress)synchronisation between players?→ more realistic dynamic model
19 / 29
![Page 35: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/35.jpg)
Dynamic setting: DYN-MMAB
![Page 36: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/36.jpg)
Dynamic Model
Asynchronicity assumptionPlayer j enters game at unknown time τ j ∈ [T ] and stays until T .
varying & unknown set of playersM(t)
no synchronisation =⇒ similar protocols are not possibleNo Sensing setting
20 / 29
![Page 37: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/37.jpg)
A dynamic algorithm
Only 2 different states:Exploration: sample arm uniformly at randomExploitation: occupy some optimal arm until T
Three difficulties:1. Detect arms occupied by other players2. Estimate the best available arm3. Start occupying the best available arm
21 / 29
![Page 38: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/38.jpg)
Detect occupied arms
If k occupied, rewards only 0If k not occupied, positive reward with proba µk(1− 1
K )Mt−1 ≥ µk
e
For an occupied arm k
if µk tightly estimated: after ' e log(T )µk
successive 0, k is assumedoccupiedotherwise, µ̂k will quickly drop to 0 and k will become sub-optimal
22 / 29
![Page 39: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/39.jpg)
Estimate available arms
Players sample uniformly at random =⇒ E[rk(t)] = µk(1− 1K )Mt−1
Player estimates γtµk where γt = 1t
∑τ j+ts=τ j+1(1− 1
K )Ms
µk ≥ µl ⇐⇒ γtµk ≥ γtµl
concentration inequalities for γtµk (when k still free)γt ≥ 1
e =⇒ estimating γtµk instead of µk takes roughly same time
Player detects best available arm k after time O(
K log(T )(µk−µk+1)2
)
23 / 29
![Page 40: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/40.jpg)
Occupy best available arm
Once arm detected as best available → try to occupy itContinue sampling uniformly at randompositive reward → occupy that armobserve only 0 rewards ?
detect it as occupiedcontinue exploration until next available arm
At some point, succeed in occupying an arm, while all better arms occupied
24 / 29
![Page 41: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/41.jpg)
Regret bound
New regret definition:
T∑t=1
card(M(t))∑k=1
µk − Eµ[ T∑
t=1
∑j∈M(t)
r j(t)
]
Dynamic regret bound
RT .
detection of optimal arms︷ ︸︸ ︷MK log(T )
∆̄2M
+
detection of occupied arms︷ ︸︸ ︷M2K log(T )
µM
with ∆̄M = mink≤M µk − µk+1
Drawback: quadratic dependence in ∆ (due to uniform sampling)
25 / 29
![Page 42: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/42.jpg)
Some related works (in random order)
![Page 43: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/43.jpg)
Adversarial case
[Bubeck et al., 2019] considered adversarial rewards Xk(t)√T regret for 2 players
uses communication trick to coordinate players:one with high frequency switchesthe other with low frequency switches
26 / 29
![Page 44: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/44.jpg)
Improving SIC-MMAB
Heterogeneous case: [Boursier et al., 2019]
Arm means µjk differ between players
Improvement of comm. protocol: a leader gathers the informationand decides for the othersDo not eliminate arms, but player-arm pairs (j , k)
Optimal algorithm for homogeneous: [Proutiere and Wang, 2019]initialization in constant time (in T )exploration only by the leader
regret ≤∑
k>Mlog(T )µM−µk
+ o(log(T ))
Confirms: decentralized is as hard as centralized
27 / 29
![Page 45: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/45.jpg)
Improving SIC-MMAB
Heterogeneous case: [Boursier et al., 2019]
Arm means µjk differ between players
Improvement of comm. protocol: a leader gathers the informationand decides for the othersDo not eliminate arms, but player-arm pairs (j , k)
Optimal algorithm for homogeneous: [Proutiere and Wang, 2019]initialization in constant time (in T )exploration only by the leader
regret ≤∑
k>Mlog(T )µM−µk
+ o(log(T ))
Confirms: decentralized is as hard as centralized
27 / 29
![Page 46: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/46.jpg)
Other recent works
Heterogeneous case:similar protocols [Tibrewal et al., 2019]implicit comm. through Markov chains [Bistritz and Leshem, 2018]arms have preferences over players [Liu et al., 2019]
No sensing [Lugosi and Mehrabian, 2018]Collision only implies drop in reward [Magesh and Veeravalli, 2019]
28 / 29
![Page 47: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/47.jpg)
Recap & Open questions
Recap:Synchronisation allows communication protocolscontradicts previous lower bounds: decentralized ∼ centralizedsynchronisation is a loophole in the model and has to be removedmore realistic dynamic model: first logarithmic regret algorithm
Open questions:is the dynamic setting a perfect choice?room for improvement in hard settings (statistic sensing, adversarialrewards, heterogeneous, dynamic, etc.)
Thank you!
29 / 29
![Page 48: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/48.jpg)
References I
Anantharam, V., Varaiya, P., and Walrand, J. (1987).Asymptotically efficient allocation rules for the multiarmed banditproblem with multiple plays-part i: I.i.d. rewards.IEEE Transactions on Automatic Control, 32(11):968–976.
Besson, L. and Kaufmann, E. (2018).Multi-Player Bandits Revisited.In Algorithmic Learning Theory, Lanzarote, Spain.
Bistritz, I. and Leshem, A. (2018).Distributed multi-player bandits-a game of thrones approach.In Advances in Neural Information Processing Systems, pages7222–7232.
Boursier, E., Kaufmann, E., Mehrabian, A., and Perchet, V. (2019).A practical algorithm for multiplayer bandits when arm means varyamong players.arXiv preprint arXiv:1902.01239.
![Page 49: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/49.jpg)
References II
Bubeck, S., Li, Y., Peres, Y., and Sellke, M. (2019).Non-stochastic multi-player multi-armed bandits: Optimal rate withcollision information, sublinear without.arXiv preprint arXiv:1904.12233.
Komiyama, J., Honda, J., and Nakagawa, H. (2015).Optimal regret analysis of thompson sampling in stochasticmulti-armed bandit problem with multiple plays.In International Conference on Machine Learning, pages 1152–1161.
Liu, K. and Zhao, Q. (2010).Distributed learning in multi-armed bandit with multiple players.IEEE Transactions on Signal Processing, 58(11):5667–5681.
Liu, L., Mania, H., and Jordan, M. (2019).Competing bandits in matching markets.arXiv preprint arXiv:1906.05363.
![Page 50: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/50.jpg)
References III
Lugosi, G. and Mehrabian, A. (2018).Multiplayer bandits without observing collision information.arXiv preprint arXiv:1808.08416.
Magesh, A. and Veeravalli, V. (2019).Multi-player multi-armed bandits with non-zero rewards on collisionsfor uncoordinated spectrum access.arXiv preprint arXiv:1910.09089.
Proutiere, A. and Wang, P. (2019).An optimal algorithm in multiplayer multi-armed bandits.
Rosenski, J., Shamir, O., and Szlak, L. (2016).Multi-player bandits–a musical chairs approach.In International Conference on Machine Learning, pages 155–163.
![Page 51: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f7126fc47d3e458a5009c12/html5/thumbnails/51.jpg)
References IV
Tibrewal, H., Patchala, S., Hanawal, M., and Darak, S. (2019).Distributed learning and optimal assignment in multiplayerheterogeneous networks.In IEEE INFOCOM 2019-IEEE Conference on ComputerCommunications, pages 1693–1701. IEEE.