“multi-players bandit algorithms for internet of things networks” · 2019. 11. 28. ·...

120
“Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD defense at CentraleSupélec (Rennes) Wednesday 20th of November, 2019 Supervisors : Prof. Christophe Moy at SCEE team, IETR & CentraleSupélec Dr. Émilie Kaufmann at SequeL team, CNRS & Inria, in Lille

Upload: others

Post on 14-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

“Multi-players Bandit Algorithmsfor Internet of Things Networks”

By Lilian Besson

PhD defense at CentraleSupélec (Rennes)

Wednesday 20th of November, 2019

Supervisors: Prof. Christophe Moy at SCEE team, IETR & CentraleSupélec Dr. Émilie Kaufmann at SequeL team, CNRS & Inria, in Lille

Page 2: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Introduction:

Spectrum issuesin wireless networks

Ref: Chapter 1 of my thesis.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 2/ 52

Page 3: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

All spectrum is allocated to different applications

But all zones are not always used everywhere

What if we could dynamically use the (most) empty channels?

Free

United States of North America, Department of Commerce, © 16

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 3/ 52

.Wireless networks

Page 4: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Wireless networks. . .

We focus on Internet of Things networks (IoT) in unlicensed bands.→ networks with decentralized access. . .→ many wireless devices access a wireless network

served from one access pointthe base station is not affecting devices to radio resources. . .

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 4/ 52

.Target of this study

Page 5: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Wireless networks. . .

We focus on Internet of Things networks (IoT) in unlicensed bands.→ networks with decentralized access. . .→ many wireless devices access a wireless network

served from one access pointthe base station is not affecting devices to radio resources. . .

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 4/ 52

.Target of this study

Page 6: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Main constraints

decentralized: devices initiate transmission

can be in unlicensed radio bands

massive number of devices

long range

ultra-low power devices

low duty cycle

low data rate

Images from http://IBM.com/blogs/internet-of-things/what-is-the-iot and http://www.globalsign.com/en/blog/

connected-cows-and-crop-control-to-drones-the-internet-of-things-is-rapidly-improving-agriculture/

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 5/ 52

.The “Internet of Things”

Page 7: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Can the IoT devices optimize their access to the radio resourcesin a simple, efficient, automatic and decentralized way?In a given location, and a given time, for a given radio standard. . .

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 6/ 52

.Main questions

Page 8: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Can the IoT devices optimize their access to the radio resourcesin a simple, efficient, automatic and decentralized way?In a given location, and a given time, for a given radio standard. . .

Goal: increase the battery life of IoT devices

Fight the spectrum scarcity issue by using the spectrum moreefficiently than a static or uniformly random allocation

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 6/ 52

.Main questions

Page 9: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Can the IoT devices optimize their access to the radio resourcesin a simple, efficient, automatic and decentralized way?In a given location, and a given time, for a given radio standard. . .

Goal: increase the battery life of IoT devices

Fight the spectrum scarcity issue by using the spectrum moreefficiently than a static or uniformly random allocation

Main solutions !

Yes we can!

By letting the radio devicesbecome “intelligent”

With MAB algorithms !

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 6/ 52

.Main questions

Page 10: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Outline of thispresentation

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 7/ 52

Page 11: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Chapter 1Introduction

Chapter 2The Stochastic

Multi-Armed Bandit models

Chapter 3SMPyBandits: simulation

library for MAB

Chapter 4Online selection

of the best algorithm

Chapter 5Two MAB modelsfor IoT networks

Chapter 6Multi-players

Multi-Armed Bandits

Chapter 7Piece-Wise StationaryMulti-Armed Bandits

Chapter 8General Conclusion

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 8/ 52

.Contributions of my thesis highlighted today

Page 12: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Introduction. Spectrum issues in wireless networks

Part I. Selfish MAB learning in a new model of IoT network

Part II. Two tractable problems extending the classical bandit multi-player bandits in stationary settings single-player bandits in piece-wise stationary settings

Conclusion and perspectives

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 9/ 52

.Outline of this presentation

Page 13: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Part I.

Selfish MAB Learning

in IoT Networks

Ref: Chapter 5 of my thesis, and [Bonnefoi, Besson et al, 17].

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 10/ 52

Page 14: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

We control a lot of IoT devices

We want to insert them in an already crowded wireless network

Within a protocol slotted in time and frequency

Each device / has a low duty cycle ex: few messages per day

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 11/ 52

.We want

Page 15: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

We control a lot of IoT devices

We want to insert them in an already crowded wireless network

Within a protocol slotted in time and frequency

Each device / has a low duty cycle ex: few messages per day

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 11/ 52

.We want

Page 16: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Discrete time t ∈ N∗ and K radio channels (e.g., 10) (known)

Chosen protocol: uplink messages ր followed by acknowledgements ց[Bonnefoi, Besson et al, 17], Sec.5.2

D dynamic devices trying to access the network independently

S = S1 + · · · + SK static devices occupying the network:S1, . . . , SK in each channel 1, . . . , K (unknown)

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 12/ 52

.A new model for IoT networks

Page 17: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

1st case: Successful transmission if no collision on uplink messages ր !

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 13/ 52

.Protocol: decentralized access with Ack. mode

Page 18: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

2nd case: Failed transmission if collision on uplink messages ր. . .

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 13/ 52

.Protocol: decentralized access with Ack. mode

Page 19: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Emission model for IoT devices with low duty cycle

Each device / has the same low emission probability:each step, each device sends a packet with probability p

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 52

.Our model of IoT devices [Bonnefoi, Besson et al, 17]

Page 20: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Emission model for IoT devices with low duty cycle

Each device / has the same low emission probability:each step, each device sends a packet with probability p

Background stationary ambiant traffic

Each static device uses only one channel (Sk devices in channel k)

Their repartition is fixed in time

=⇒ This surrounding traffic is disturbing the dynamic devices

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 52

.Our model of IoT devices [Bonnefoi, Besson et al, 17]

Page 21: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Emission model for IoT devices with low duty cycle

Each device / has the same low emission probability:each step, each device sends a packet with probability p

Background stationary ambiant traffic

Each static device uses only one channel (Sk devices in channel k)

Their repartition is fixed in time

=⇒ This surrounding traffic is disturbing the dynamic devices

Dynamic radio reconfiguration

Dynamic device decide the channel to use to send their packets

They all have memory and computational capacity to implementsmall decision algorithms

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 52

.Our model of IoT devices [Bonnefoi, Besson et al, 17]

Page 22: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Goal minimize packet loss ratio (max = number of received Ack)

in a finite-space discrete-time Decision Making Problem

Baseline (naive solution)

Purely random (uniform) spectrum access for the D dynamic devices .

A possible solution

Embed a decentralized Multi-Armed Bandit algorithm, runningindependently on each dynamic device .

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 15/ 52

.Problem

Page 23: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

If an oracle can affect Dk dynamic devices to channel k , thesuccessful transmission probability of the entire network is

P(success|sent) =K∑

k=1

(1 − p)Dk −1

︸ ︷︷ ︸Dk −1 others

× (1 − p)Sk

︸ ︷︷ ︸No static device

× Dk/D︸ ︷︷ ︸Sent in channel k

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 16/ 52

.1) Oracle centralized strategy [Bonnefoi, Besson et al, 17]

Page 24: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

If an oracle can affect Dk dynamic devices to channel k , thesuccessful transmission probability of the entire network is

P(success|sent) =K∑

k=1

(1 − p)Dk −1

︸ ︷︷ ︸Dk −1 others

× (1 − p)Sk

︸ ︷︷ ︸No static device

× Dk/D︸ ︷︷ ︸Sent in channel k

The oracle has to solve this optimization problem:

arg maxD1,...,DK

K∑k=1

Dk(1 − p)Sk +Dk −1

such thatK∑

k=1

Dk = D and Dk ≥ 0, ∀1 ≤ k ≤ K .

Contribution: a (numerical) solver for this quasi-convex optimizationproblem, with Lagrange multipliers.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 16/ 52

.1) Oracle centralized strategy [Bonnefoi, Besson et al, 17]

Page 25: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

=⇒ This oracle strategy has very good performance, as it maximizes thetransmission rate of all the D dynamic devices

But unrealistic

But not achievable in practice!

because there is no centralized supervision!

and (S1, . . . , SK ) are unknown!

We propose a realistic decentralized approach, with bandits!

MachineLearning

ReinforcementLearning

Multi-ArmedBandits

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 17/ 52

.1) Oracle centralized strategy

Page 26: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

It’s an old name for a casino machine !

© Dargaud 1981, Lucky Luke tome 18,.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 18/ 52

.Hum, what is a (one-armed) bandit?

Page 27: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

A player tries to collect rewards when playing a K -armed bandit game.

At each round t ∈ 1, . . . , T player chooses an arm A(t) ∈ 1, . . . , K the arm generates an i.i.d. reward rA(t)(t) ∼ νA(t)

Ex: from a Bernoulli distribution νk = B(µk)

player observes the reward rA(t)(t)

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 19/ 52

.Stochastic Multi-Armed Bandit formulation

Page 28: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

A player tries to collect rewards when playing a K -armed bandit game.

At each round t ∈ 1, . . . , T player chooses an arm A(t) ∈ 1, . . . , K the arm generates an i.i.d. reward rA(t)(t) ∼ νA(t)

Ex: from a Bernoulli distribution νk = B(µk)

player observes the reward rA(t)(t)

Goal (Reinforcement Learning)

Maximize the sum reward or its expectation

maxA

T∑

t=1

rA(t) or maxA E

[T∑

t=1

rA(t)

].

[Bubeck, 12], [Lattimore & Szepesvári, 19], [Slivkins, 19]

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 19/ 52

.Stochastic Multi-Armed Bandit formulation

Page 29: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

A dynamic device tries to collect rewards when transmitting:

it transmits following a random Bernoulli process(ie. probability p of transmitting at each round t)

it chooses a channel A(τ) ∈ 1, . . . , K (= arm ) if Ack (no collision) =⇒ reward rA(τ) = 1 (successful transm.!) if collision (no Ack) =⇒ reward rA(τ) = 0 (failed transmission!)

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 20/ 52

.2) Pseudo MAB formulation of our IoT problem

Page 30: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

A dynamic device tries to collect rewards when transmitting:

it transmits following a random Bernoulli process(ie. probability p of transmitting at each round t)

it chooses a channel A(τ) ∈ 1, . . . , K (= arm ) if Ack (no collision) =⇒ reward rA(τ) = 1 (successful transm.!) if collision (no Ack) =⇒ reward rA(τ) = 0 (failed transmission!)

Goal: Maximize transmission rate ≡ maximize cumulated rewards

It is not a stochastic Multi-Armed Bandit problem

It looks like a MAB but the environment is not stochastic or stationary

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 20/ 52

.2) Pseudo MAB formulation of our IoT problem

Page 31: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

A dynamic device keeps τ number of sent packets1 For the first K activations (τ = 1, . . . , K ), try each channel once.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 21/ 52

.2) Upper Confidence Bound algorithm [Auer et al, 02]

Page 32: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

A dynamic device keeps τ number of sent packets1 For the first K activations (τ = 1, . . . , K ), try each channel once.2 Then for the next steps t:

With probability p, the device is active (τ := τ + 1)

Compute the index UCBk(τ) :=

Mean µk (τ)︷ ︸︸ ︷Xk(τ)Nk(τ)

+

Confidence Bonus︷ ︸︸ ︷√log(τ)2Nk(τ)

,

Choose channel A(τ) = arg maxk

UCBk(τ),

Observe reward rA(τ)(τ) from arm A(τ) Update Nk(τ + 1) nb selections of channel k Update Xk(τ) nb of successful transmissions

Wait for next message. . . (mean waiting time ≃ 1/p)

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 21/ 52

.2) Upper Confidence Bound algorithm [Auer et al, 02]

Page 33: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

1 For any dynamic device , for any round t: With probability p, the device is active (τ := τ + 1) Play UCB algorithm. . . [Auer et al, 02] Wait for next message. . . (mean waiting time ≃ 1/p)

Problem 1: multiple dynamic devices

The collisions between dynamic devices are not stochastic!

Problem 2: random activation times τ? Devices transmits only with probability p at each time t

(following its Bernoulli activation pattern)

The times τ are not the global time indexes t (synchronized clock) !

=⇒ These two problems make the model hard to analyze !

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 21/ 52

.2) Upper Confidence Bound algorithm [Auer et al, 02]

Page 34: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

K = 10 channels ,

S + D = 10000 devices in total,

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 22/ 52

.Experimental setting: simulation parameters

Page 35: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

K = 10 channels ,

S + D = 10000 devices in total,

p = 10−3 probability of emission,

Horizon T = 105 total time slots (avg. ≃ 100 messages / device),

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 22/ 52

.Experimental setting: simulation parameters

Page 36: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

K = 10 channels ,

S + D = 10000 devices in total,

p = 10−3 probability of emission,

Horizon T = 105 total time slots (avg. ≃ 100 messages / device),

We change the proportion of dynamic devices D / (S + D ),

For one example of repartition of (S1, . . . , SK ) static devices .

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 22/ 52

.Experimental setting: simulation parameters

Page 37: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Number of slots ×1052 4 6 8 10

Successfu

l tr

ansm

issio

n r

ate

0.82

0.83

0.84

0.85

0.86

0.87

0.88

0.89

0.9

0.91

UCBThompson-samplingOptimalGood sub-optimalRandom

10% of dynamic devices . Gives 7% of gain.[Bonnefoi, Besson et al, 17], Sec.5.2

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 23/ 52

.One result for 10% of dynamic devices

Page 38: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Proportion of dynamic devices (%)0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Gain

com

pare

d to r

andom

channel sele

ction

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Optimal strategyUCB

1, α=0.5

Thomson-sampling

The MAB selfish learning is almost optimal, for any proportion ofdynamic devices , after a short learning time.

In this example, it gives up-to 16% gain over the naive approach!

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 24/ 52

.Growing proportion of dynamic devices D/(S + D)

Page 39: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

We developed a realistic demonstration using USRP boards and GNURadio, as a proof-of-concept in a “toy” IoT network.

[Bonnefoi et al, ICT 18], [Besson et al, WCNC 19], Ch.5.3 and video published on YouTu.be/HospLNQhcMk

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 25/ 52

.We implemented this with real hardware (1/3)

Page 40: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 26/ 52

.Using USRP board to simulate IoT devices (2/3)

Page 41: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 27/ 52

.GNU Radio for the UI of the demo (3/3)

Page 42: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

It works very well empirically! But random activation times andcollisions due to multiple devices make the model hard to analyze. . .

Hyp 1: in avg. p × D dynamic devices are using K channels=⇒ so p ≤ K

Dor D ≤ K

pgives best performance

Hyp 2: we assumed a stationary background traffic . . .

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 28/ 52

.From practice to theory

Page 43: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

It works very well empirically! But random activation times andcollisions due to multiple devices make the model hard to analyze. . .

Hyp 1: in avg. p × D dynamic devices are using K channels=⇒ so p ≤ K

Dor D ≤ K

pgives best performance

Hyp 2: we assumed a stationary background traffic . . .

Goal: obtain theoretical result for our proposed model of IoT networks,and guarantees about the observed behavior of Selfish MAB learning.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 28/ 52

.From practice to theory

Page 44: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

It works very well empirically! But random activation times andcollisions due to multiple devices make the model hard to analyze. . .

Hyp 1: in avg. p × D dynamic devices are using K channels=⇒ so p ≤ K

Dor D ≤ K

pgives best performance

Hyp 2: we assumed a stationary background traffic . . .

Goal: obtain theoretical result for our proposed model of IoT networks,and guarantees about the observed behavior of Selfish MAB learning.

We can study theoretically two more specific models

Model 1: multi-player bandits: devices are always activatedie. p = 1 in their random activation process =⇒ D = M ≤ K

p= K

Model 2: non-stationary bandits (for one device )

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 28/ 52

.From practice to theory

Page 45: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Part II.

Theoretical analysis oftwo relaxed models

Ref: Chapters 6 and 7 of my thesisand [Besson & Kaufmann, 18] and [Besson et al, 19].

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 29/ 52

Page 46: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Theoretical analysis oftwo relaxed models

Multi-player banditsRef: Chapter 6 of my thesis, and [Besson & Kaufmann, 18].

Piece-wise stationary bandits

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 30/ 52

Page 47: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

M ≥ 2 players playing the same K -armed bandit (2 ≤ M ≤ K )they are all activated at each time step, ie. p = 1

At round t ∈ 1, . . . , T:

player m selects arm Amt ; then this arm generates sAm

t ,t ∈ 0, 1 and the reward is computed as

rm,t =

sAm

t ,t if no other player chose the same arm0 else (= COLLISION)

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 31/ 52

.Multi-players bandits: setup

Page 48: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

M ≥ 2 players playing the same K -armed bandit (2 ≤ M ≤ K )they are all activated at each time step, ie. p = 1

At round t ∈ 1, . . . , T:

player m selects arm Amt ; then this arm generates sAm

t ,t ∈ 0, 1 and the reward is computed as

rm,t =

sAm

t ,t if no other player chose the same arm0 else (= COLLISION)

Goal

maximize centralized (sum) rewardsM∑

m=1

T∑t=1

rm,t

. . . without (explicit) communication between players

trade-off: exploration / exploitation / and collisions !

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 31/ 52

.Multi-players bandits: setup

Page 49: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Different observation models: players observe sAmt ,t and/or rm,t

# 1: “Listen before talk” [Liu & Zhao, 10], [Jouini et al. 10], [Anandkumar et al. 11]

Good model for Opportunistic Spectrum Access (OSA)

First do sensing, attempt of transmission if no Primary User (PU),possible collisions with other Secondary Users (SU).

Feedback model: observe first sAm

t ,t , if sAm

t ,t = 1, transmit and then observe the joint rm,t , else don’t transmit and don’t observe a reward.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 32/ 52

.Multi-Players bandits for Cognitive Radios

Page 50: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

# 2: “Talk and maybe collide” [Besson & Kaufmann, 18]

Good model for Internet of Things (IoT)

Do not do any sensing, just transmit, and wait for anacknowledgment before any next message.

Feedback model: observe only the joint information rm,t , no collision if rm,t 6= 0, but cannot distinguish between collision or zero reward if rm,t = 0.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 33/ 52

.M-P bandits for Cognitive Radios: proposed models

Page 51: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

# 2: “Talk and maybe collide” [Besson & Kaufmann, 18]

Good model for Internet of Things (IoT)

Do not do any sensing, just transmit, and wait for anacknowledgment before any next message.

Feedback model: observe only the joint information rm,t , no collision if rm,t 6= 0, but cannot distinguish between collision or zero reward if rm,t = 0.

# 3: “Observe collision then talk?” [Besson & Kaufmann, 18], [Boursier et al, 19]

A third “hybrid” model studied by several recent papers, following ourwork

Feedback model: first check if collision, then if not collision, receive joint reward rm,t .

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 33/ 52

.M-P bandits for Cognitive Radios: proposed models

Page 52: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Hypothesis: arms sorted by decreasing mean: µ1 ≥ µ2 ≥ · · · ≥ µK

Rµ(A, T ) :=

(M∑

k=1

µk

)T

︸ ︷︷ ︸oracle total reward

−EAµ

[T∑

t=1

M∑

m=1

rm,t

]

Regret decomposition [Besson & Kaufmann, 18]

Rµ(A, T ) =K∑

k=M+1

(µM − µk)E[Nk(T )]

+M∑

k=1

(µk − µM) (T − E[Nk(T )]) +K∑

k=1

µkE[Ck(T )].

Nk(T ) total number of selections of arm k

Ck(T ) total number of collisions experienced on arm k

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 34/ 52

.Regret for multi-player bandits (M players on K arms)

Page 53: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Regret decomposition [Besson & Kaufmann, 18]

Rµ(A, T ) ≤ cstK∑

k=M+1

E [Nk(T )] + cst’M∑

k=1

E [Ck(T )] .

A good algorithm has to control both

the number of selections of sub-optimal arms→ with a good classical bandit policy: like kl-UCB

the number of collisions on optimal arms→ with a good orthogonalization procedure

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 34/ 52

.Regret for multi-player bandits (M players on K arms)

Page 54: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

At round t, player m uses his past sensing information to:

compute an Upper Confidence Bound for each mean µk , UCBmk (t)

use the UCBs to estimate the M best arms

Mm(t) := arms with M largest UCBmk (t)

Two simple ideas: inspired by Musical Chair [Rosenski et al. 16]

always pick an arm estimated as “good” Am(t) ∈ Mm(t − 1)

try not to switch arm too often

σm(t) := player m is “fixed” at the end of round t

Other UCB-based algorithms: TDFS [Lui and Zhao, 10],Rho-Rand [Anandkumar et al., 11], Selfish [Bonnefoi, Besson et al., 17]

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 35/ 52

.The MC-Top-M algorithm (for the OSA case)

Page 55: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

(0) Start t = 0

Not fixed, σm(t) (2) Cm(t), Am(t) ∈ Mm(t)

(3) Am(t) /∈ Mm(t)

Fixed, σm(t)

(1) Cm(t), Am(t) ∈ Mm(t)

(4) Am(t) ∈ Mm(t)

(5) Am(t) /∈ Mm(t)

Sketch of the proof to bound number of collisions

any sequence of transitions (2) has constant length

O(log T ) number of transitions (3) and (5), by kl-UCB

=⇒ player m is fixed, for almost all rounds (O(T − log T ) times)

nb of collisions ≤ M× nb of collisions of non fixed players

=⇒ nb of collisions = O(log T ) & O(log(T )) sub-optimal selections (4)

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 36/ 52

.The MC-Top-M algorithm (for the OSA case)

Page 56: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

MC-Top-M with kl-based confidence intervals [Cappé et al. 13]UCBm

k (t) = max q : Nmk (t)kl (µm

k (t), q) ≤ ln(t) ,

where kl(x , y) = KL (B(x), B(y)) = x ln(

xy

)+ (1 − x) ln

(1−x1−y

).

Control of the sub-optimal selections (state-of-the-art)

For all sub-optimal arms k ∈ M + 1, . . . , K,

E[Nmk (T )] ≤ ln(T )

kl(µk , µM)+ Cµ

√ln(T ).

Control of the collisions (new result)

E

[K∑

k=1

Ck(T )

]≤ M2

a,b:µa<µb

2M + 1kl(µa, µb)

ln(T ) + O(ln T ).

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 37/ 52

.Theoretical results for MC-Top-M

Page 57: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

MC-Top-M with kl-based confidence intervals [Cappé et al. 13]UCBm

k (t) = max q : Nmk (t)kl (µm

k (t), q) ≤ ln(t) ,

where kl(x , y) = KL (B(x), B(y)) = x ln(

xy

)+ (1 − x) ln

(1−x1−y

).

Control of the sub-optimal selections (state-of-the-art)

For all sub-optimal arms k ∈ M + 1, . . . , K,

E[Nmk (T )] ≤ ln(T )

kl(µk , µM)+ Cµ

√ln(T ).

logarithmic regret =⇒ Rµ(A, T ) = O((MCM,µ + M2C6) log(T ))

Control of the collisions (new result)

E

[K∑

k=1

Ck(T )

]≤ M2

a,b:µa<µb

2M + 1kl(µa, µb)

ln(T ) + O(ln T ).

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 37/ 52

.Theoretical results for MC-Top-M

Page 58: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

0 2000 4000 6000 8000 10000Time steps t=1. . . T, horizon T=10000

0

200

400

600

800

1000

1200

1400

1600

Cum

ulat

ed n

umbe

r of c

ollis

ions

on

all a

rms

Multi-players M=9 : Cumulated number of collisions, averaged 200 times9 arms: [B(0.1) ∗ , B(0.2) ∗ , B(0.3) ∗ , B(0.4) ∗ , B(0.5) ∗ , B(0.6) ∗ , B(0.7) ∗ , B(0.8) ∗ , B(0.9) ∗ ]

CentralizedMultiplePlay(kl-UCB)Selfish-kl-UCBRhoRand-kl-UCBMCTopM-kl-UCB

For M = K , our strategy MC-Top-M (D) achieves constant nb of collisions!=⇒ Our new orthogonalization procedure is very efficient!

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 38/ 52

.Results on a multi-player MAB problem (1/2)

Page 59: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

0 2000 4000 6000 8000 10000Time steps t=1. . . T, horizon T=10000,

0

50

100

150

200

250

300Cu

mul

ativ

e ce

ntra

lized

regr

et t

6 ∑ k=

1µ∗ k−

t ∑ s=

1

9 ∑ k=

1µk(s)

200[A

j(t)=k,C

j(t)]

Multi-players M=6 : Cumulated centralized regret, averaged 200 times9 arms: [B(0.1), B(0.2), B(0.3), B(0.4) ∗ , B(0.5) ∗ , B(0.6) ∗ , B(0.7) ∗ , B(0.8) ∗ , B(0.9) ∗ ]

CentralizedMultiplePlay(kl-UCB)Selfish-kl-UCBRhoRand-kl-UCBMCTopM-kl-UCB

For M = 6 devices, our strategy MC-Top-M (D) largely outperforms ρrand andother previous state-of-the-art policies (not included).

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 39/ 52

.Results on a multi-player MAB problem (2/2)

Page 60: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Algorithm Ref. Regret boundPerformance

⋆ is worst

Speed

is worstParameters

Centralized multi-play kl-UCB

[1] CM,µ log(T ) ⋆⋆⋆⋆⋆ just M butin another model!

ρrand UCB [2] M3C2 log(T ) ⋆⋆ just M

MEGA [3] C3T 3/4 ⋆ 4 params,

impossible to tune

Musical Chair [4](2M

M

)C4 log(T ) ⋆⋆ 1 parameter T0

hard to tune

Selfish UCB [5] T in some case ⋆ / ⋆⋆⋆⋆ none!

MCTopM klUCB [6] (MCM,µ + M2C6) log(T ) ⋆⋆⋆⋆ just M

Sic-MMAB [7] (CM,µ+MK ) log(T ) ⋆⋆⋆⋆ none! butin another model!

DPE [8] CM,µ log(T ) ??none! butin another model!

Optimal regret bound is multiple-play bound R(A, T ) ≤ CM,µ log(T ) + o(log(T )), with

CM,µ =∑

k:µk<µ∗

M

M∑j=1

µ∗

M

kl(µk ,µ∗

j) , and Ci ≫ CM,µ are much larger constants.

Papers: [1] Anantharam et al, 87 [2] Anandkumar et al, 11 [3] Avner et al, 15 [4] Rosenski et al, 15[5] Bonnefoi et al 17 [6] Besson & Kaufmann, 18 [7] Boursier et al, 19 [8] Proutière et al, 19

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 40/ 52

.State-of-the-art multi-player algorithms

Page 61: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Theoretical analysis oftwo relaxed models

Multi-player bandits

Piece-wise stationary banditsRef: Chapter 7 of my thesis, and [Besson et al, 19].

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 41/ 52

Page 62: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Stationary MAB problems

Arm k samples rewards from the same distribution for any round

∀t, rk(t) iid∼ νk = B(µk).

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 42/ 52

.Piece-wise stationary bandits

Page 63: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Stationary MAB problems

Arm k samples rewards from the same distribution for any round

∀t, rk(t) iid∼ νk = B(µk).

Non stationary MAB problems?

(possibly) different distributions for any round !

∀t, rk(t) iid∼ νk(t) = B(µk(t)).

=⇒ harder problem! And impossible with no extra hypothesis

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 42/ 52

.Piece-wise stationary bandits

Page 64: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Stationary MAB problems

Arm k samples rewards from the same distribution for any round

∀t, rk(t) iid∼ νk = B(µk).

Non stationary MAB problems?

(possibly) different distributions for any round !

∀t, rk(t) iid∼ νk(t) = B(µk(t)).

=⇒ harder problem! And impossible with no extra hypothesis

Piece-wise stationary problems!

The literature usually focuses on the easier case, when there are at mostΥT = o(

√T ) intervals, on which the means are all stationary.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 42/ 52

.Piece-wise stationary bandits

Page 65: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

We plots the means µ1(t), µ2(t), µ3(t) of K = 3 arms .There are ΥT = 4 break-points and 5 sequences in 1, . . . , T = 5000

0 1000 2000 3000 4000 5000Time steps t=1. . . T, horizon T=5000

0.2

0.4

0.6

0.8

Succ

essiv

e m

eans

of t

he K

=3 a

rms

History of means for Non-Stationary MAB, Bernoulli with 4 break-pointsArm #0Arm #1Arm #2

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 43/ 52

.Example of a piece-wise stationary MAB problem

Page 66: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

The “oracle” plays the (unknown) best arm k∗(t) = argmax µk(t)(which changes between the ΥT ≥ 1 stationary sequences)

R(A, T ) = E

[T∑

t=1

rk∗(t)(t)

]−

T∑

t=1

E [r(t)]

=

(T∑

t=1

maxk

µk(t)

)

︸ ︷︷ ︸oracle total reward

−T∑

t=1

E [r(t)] .

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 44/ 52

.Regret for piece-wise stationary bandits

Page 67: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

The “oracle” plays the (unknown) best arm k∗(t) = argmax µk(t)(which changes between the ΥT ≥ 1 stationary sequences)

R(A, T ) = E

[T∑

t=1

rk∗(t)(t)

]−

T∑

t=1

E [r(t)]

=

(T∑

t=1

maxk

µk(t)

)

︸ ︷︷ ︸oracle total reward

−T∑

t=1

E [r(t)] .

Typical regimes for piece-wise stationary bandits

The (minimax) worst-case lower-bound is R(A, T ) ≥ Ω(√

KTΥT )

State-of-the-art algorithms A obtain R(A, T ) ≤ O(K√

TΥT log(T ))

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 44/ 52

.Regret for piece-wise stationary bandits

Page 68: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Three components of our algorithm [Besson et al, 19]

Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB

[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]

A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52

.Our new algorithm: kl-UCB index + BGLR detector

Page 69: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Three components of our algorithm [Besson et al, 19]

Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB

[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]

A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected

A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52

.Our new algorithm: kl-UCB index + BGLR detector

Page 70: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Three components of our algorithm [Besson et al, 19]

Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB

[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]

A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected

A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts)

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52

.Our new algorithm: kl-UCB index + BGLR detector

Page 71: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Three components of our algorithm [Besson et al, 19]

Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB

[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]

A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected

A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems)

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52

.Our new algorithm: kl-UCB index + BGLR detector

Page 72: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Three components of our algorithm [Besson et al, 19]

Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB

[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]

A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected

A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems)

Forced exploration of parameter α ∈ (0, 1) (tuned with ΥT )

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52

.Our new algorithm: kl-UCB index + BGLR detector

Page 73: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Three components of our algorithm [Besson et al, 19]

Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB

[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]

A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected

A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems)

Forced exploration of parameter α ∈ (0, 1) (tuned with ΥT )

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52

.Our new algorithm: kl-UCB index + BGLR detector

Page 74: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Three components of our algorithm [Besson et al, 19]

Our algorithm is inspired by CUSUM-UCB [Liu et al, 18] and M-UCB

[Cao et al, 19], and new analysis of the GLR test [Maillard, 19]

A classical bandit index policy: kl-UCBwhich gets restarted after a change-point is detected

A change-point detection algorithm: the Generalized LikelihoodRatio Test for sub-Bernoulli observations (BGLR), we can bound its false alarm probability (if enough samples between two restarts) its detection delay (for “easy enough” problems)

Forced exploration of parameter α ∈ (0, 1) (tuned with ΥT )

Regret bound (if T and ΥT are both known)

Our algorithm obtains R(A, T ) ≤ O(

K∆2

change

√TΥT log(T )

)

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 45/ 52

.Our new algorithm: kl-UCB index + BGLR detector

Page 75: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

→ kl-UCB + BGLR (⋆) achieves the best performance (among non-oracle)!

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 46/ 52

.Results on a piece-wise stationary MAB problem

Page 76: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Algorithm Ref. Regret boundPerformance⋆ is worst

Speedis worst

Parameters

Naive UCB [1] T in worst case ⋆ none!

Oracle-Restart UCB [1] CΥT log(T ) ⋆⋆⋆⋆⋆ the break-points

(unrealistic oracle!)

Discounted UCB [2] C2

√TΥT log(T ) ⋆ T and ΥT

Sliding-Window UCB [2] C′2

√TΥT log(T ) ⋆ T and ΥT

Exp3.S [3] C√

TΥT log(T ) ⋆ ΥT

Discounted TS [4] not yet proven ⋆⋆ how to tune γ ?

CUSUM-UCB [5] C5

√TΥT log( T

ΥT) ⋆⋆⋆ T , ΥT and δmin

M-UCB [6] C6

√TΥT log(T ) ⋆⋆ T , ΥT and δmin

BGLR + kl-UCB [7] C√

TΥT log(T ) ⋆⋆⋆⋆ T and ΥT

AdSwitch [8] C8

√TΥT log(T ) ⋆⋆ just T

Ada-ILTCB+ [9] C9

√TΥT log(T ) ?? just T

Optimal minimax regret bound is R(A, T ) = O(√

KTΥT ), and C = CΥT ,µ = O( K∆2

change

).

Ci ≫ CΥT ,µ are much larger constants, and δmin < ∆change lower-bounds the problem difficulty.

Papers: [1] Auer et al. 02 [2] Garivier et al. 09 [3] Auer et al. 02 [5] Raj et al. 17[5] Liu et al. 18 [6] Cao et al. 19 [7] Besson et al. 19 [8] Auer et al. 19 [9] Chen et al. 19

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 47/ 52

.State-of-the-art piece-wise stationary algorithms

Page 77: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Summary

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 48/ 52

Page 78: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Part I:

Part II:

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 49/ 52

.Contributions (1/3)

Page 79: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Part I:

A simple model of IoT network, where autonomous IoT devicescan embed decentralized learning (“selfish MAB learning”),

numerical simulations proving the quality of our solution,

a realistic implementation on radio hardware.

Part II:

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 49/ 52

.Contributions (1/3)

Page 80: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Part I:

A simple model of IoT network, where autonomous IoT devicescan embed decentralized learning (“selfish MAB learning”),

numerical simulations proving the quality of our solution,

a realistic implementation on radio hardware.

Part II: New algorithms and regret bounds, in two simplified models:

for multi-player bandits, with M ≤ K players, for piece-wise stationary bandits, with ΥT = o(T ) break-points,

our proposed algorithms achieve state-of-the-art performance on both numerical, and theoretical results.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 49/ 52

.Contributions (1/3)

Page 81: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52

.Perspectives (2/3)

Page 82: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!

More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52

.Perspectives (2/3)

Page 83: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!

More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy

Study the “Graal” goal:

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52

.Perspectives (2/3)

Page 84: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!

More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy

Study the “Graal” goal: propose a more realistic model for IoT networks

(exogenous activation, non stationary traffic, etc)

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52

.Perspectives (2/3)

Page 85: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!

More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy

Study the “Graal” goal: propose a more realistic model for IoT networks

(exogenous activation, non stationary traffic, etc) propose an efficient decentralized low-cost algorithm

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52

.Perspectives (2/3)

Page 86: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!

More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy

Study the “Graal” goal: propose a more realistic model for IoT networks

(exogenous activation, non stationary traffic, etc) propose an efficient decentralized low-cost algorithm that works empirically and has strong theoretical guarantees!

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52

.Perspectives (2/3)

Page 87: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Unify the multi-player and non-stationary bandit models→ in progress: already one paper from last year (arXiv:1812.05165), wecan probably do a better job with our tools!

More validation of our contributions in real-world IoT environments→ started in summer 2019 with an intern working with Christophe Moy

Study the “Graal” goal: propose a more realistic model for IoT networks

(exogenous activation, non stationary traffic, etc) propose an efficient decentralized low-cost algorithm that works empirically and has strong theoretical guarantees!

Extend my Python library SMPyBandits to cover many other banditmodels (cascading, delay feedback, combinatorial, contextual etc)→ it is already online, free and open-source on GitHub.com/SMPyBandits

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 50/ 52

.Perspectives (2/3)

Page 88: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

8 International conferences with proceedings:

“MAB Learning in IoT Networks”, Bonnefoi, Besson et al, CROWNCOM, 2017 “Aggregation of MAB for OSA”, Besson, Kaufmman, Moy, IEEE WCNC, 2018 “Multi-Player Bandits Revisited”, Besson & Kaufmann, ALT, 2018 “MALIN with GRC . . . ”, Bonnefoi, Besson, Moy, demo at ICT, 2018 “GNU Radio Implementation of MALIN . . . ”, Besson et al, IEEE WCNC, 2019 “UCB . . . LPWAN w/ Retransmissions”, Bonnefoi, Besson et al, IEEE WCNC, 2019 “Decentralized Spectrum Learning . . . ”, Moy & Besson, ISIoT, 2019 “Analyse non asymptotique . . . ”, Besson & Kaufmann, GRETSI, 2019

1 Preprints:

“Doubling-Trick . . . ”, Besson & Kaufmann, arXiv:1803.06971, 2018

3 Submitted works:

“Decentralized Spectrum Learning . . . ”, Moy, Besson et al, for Annals ofTelecommunications, July 2019

“GLRT meets klUCB . . . ”, Besson & Kaufmann & Maillard, for AISTATS, Oct.2019 “SMPyBandits . . . ”, Besson, for JMLR MLOSS, October 2019

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 51/ 52

.List of publications (3/3)

Page 89: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Thanks for your attention!

Questions & Discussion

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 52/ 52

.Conclusion

Page 90: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

an extension of our model of IoT network to account forretransmissions (Section 5.4),

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27

.Didn’t have time to talk about. . . (2/4)

Page 91: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

an extension of our model of IoT network to account forretransmissions (Section 5.4),

my Python library SMPyBandits (Chapter 3),

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27

.Didn’t have time to talk about. . . (2/4)

Page 92: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

an extension of our model of IoT network to account forretransmissions (Section 5.4),

my Python library SMPyBandits (Chapter 3),

our proposed algorithm for aggregating bandit algorithms(Chapter 4),

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27

.Didn’t have time to talk about. . . (2/4)

Page 93: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

an extension of our model of IoT network to account forretransmissions (Section 5.4),

my Python library SMPyBandits (Chapter 3),

our proposed algorithm for aggregating bandit algorithms(Chapter 4),

details about our algorithms, their precise theoretical results andproofs (Chapters 6 & 7),

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27

.Didn’t have time to talk about. . . (2/4)

Page 94: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

an extension of our model of IoT network to account forretransmissions (Section 5.4),

my Python library SMPyBandits (Chapter 3),

our proposed algorithm for aggregating bandit algorithms(Chapter 4),

details about our algorithms, their precise theoretical results andproofs (Chapters 6 & 7),

our work on the “doubling trick” (to make an algorithm Aanytime and keep its regret bounds).

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 1/ 27

.Didn’t have time to talk about. . . (2/4)

Page 95: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

References and

publications

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 2/ 27

Page 96: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Check out the

“The Bandit Book”by Tor Lattimore and Csaba Szepesvári

Cambridge University Press, 2019.

→ tor-lattimore.com/downloads/book/book.pdf

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 3/ 27

.Where to know more: about bandits (1/3)

Page 97: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Reach me (or Christophe or Émilie) out by email, if you have questions

Lilian.Besson @ CentraleSupelec.fr

→ perso.crans.org/besson/

Christophe.Moy @ Univ-Rennes1.fr

→ moychristophe.wordpress.com

Emilie.Kaufmann @ Univ-Lille.fr

→ chercheurs.lille.inria.fr/ekaufman

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 4/ 27

.Where to know more: about our work? (2/3)

Page 98: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Experiment with bandits by yourself!

Interactive demo on this web-page→ perso.crans.org/besson/phd/MAB_interactive_demo/

Use my Python library for simulations of MAB problems SMPyBandits→ SMPyBandits.GitHub.io & GitHub.com/SMPyBandits

Install with $ pip install SMPyBandits

Free and open-source (MIT license)

Easy to set up your own bandit experiments, add new algorithms etc.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 5/ 27

.Where to know more: in practice (3/3)

Page 99: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 6/ 27

.→ SMPyBandits.GitHub.io

Page 100: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

My PhD thesis (Lilian Besson)“Multi-players Bandit Algorithms for Internet of Things Networks”→ Online at perso.crans.org/besson/phd/

→ Open-source at GitHub.com/Naereen/phd-thesis/

My Python library for simulations of MAB problems, SMPyBandits→ SMPyBandits.GitHub.io

“The Bandit Book”, by Tor Lattimore and Csaba Szepesvári→ tor-lattimore.com/downloads/book/book.pdf

“Introduction to Multi-Armed Bandits”, by Alex Slivkins→ arXiv.org/abs/1904.07272

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 7/ 27

.Main references

Page 101: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

List of publications

Cf.: CV.archives-ouvertes.fr/lilian-besson

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 8/ 27

Page 102: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Decentralized Spectrum Learning for IoT Wireless Networks Collision Mitigation,by Christophe Moy & Lilian Besson.1st International ISIoT workshop, at Conference on Distributed Computing inSensor Systems, Santorini, Greece, May 2019.See Chapter 5.

Upper-Confidence Bound for Channel Selection in LPWA Networks withRetransmissions,by Rémi Bonnefoi, Lilian Besson, Julio Manco-Vasquez & Christophe Moy.1st International MOTIoN workshop, at WCNC, Marrakech, Morocco, April 2019.See Section 5.4.

GNU Radio Implementation of MALIN: “Multi-Armed bandits Learning forInternet-of-things Networks”,by Lilian Besson, Rémi Bonnefoi & Christophe Moy.Wireless Communication and Networks Conference, Marrakech, April 2019.See Section 5.3.

For more details, see: CV.Archives-Ouvertes.fr/lilian-besson.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 9/ 27

.International conferences with proceedings (1/2)

Page 103: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Multi-Player Bandits Revisited,by Lilian Besson & Émilie Kaufmann.Algorithmic Learning Theory, Lanzarote, Spain, April 2018.See Chapter 6.

Aggregation of Multi-Armed Bandits learning algorithms for OpportunisticSpectrum Access,by Lilian Besson, Émilie Kaufmann & Christophe Moy.Wireless Communication and Networks Conference, Barcelona, Spain, April 2018.See Chapter 4.

Multi-Armed Bandit Learning in IoT Networks and non-stationary settings,by Rémi Bonnefoi, L.Besson, C.Moy, É.Kaufmann & Jacques Palicot.Conference on Cognitive Radio Oriented Wireless Networks, Lisboa, Portugal,September 2017. Best Paper Award.See Section 5.2.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 10/ 27

.International conferences with proceedings (2/2)

Page 104: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

MALIN: “Multi-Arm bandit Learning for Iot Networks” with GRC: A TestBedImplementation and Demonstration that Learning Helps,by Lilian Besson, Rémi Bonnefoi, Christophe Moy.Demonstration presented in International Conference on Communication,Saint-Malo, France, June 2018.See YouTu.be/HospLNQhcMk for a 6-minutes presentation video.See Section 5.3.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 11/ 27

.Demonstrations in international conferences

Page 105: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Analyse non asymptotique d’un test séquentiel de détection de ruptures etapplication aux bandits non stationnaires (in French),by Lilian Besson & Émilie Kaufmann,GRETSI, August 2019.See Chapter 7.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 12/ 27

.French language conferences with proceedings

Page 106: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

Decentralized Spectrum Learning for Radio Collision Mitigation in Ultra-Dense IoTNetworks: LoRaWAN Case Study and Measurements,by Christophe Moy, Lilian Besson, G. Delbarre & L. Toutain, July 2019.Submitted for a special volume of the Annals of Telecommunications journal, on“Machine Learning for Intelligent Wireless Communications and Networking”.See Chapter 5.

The Generalized Likelihood Ratio Test meets klUCB: an Improved Algorithm forPiece-Wise Non-Stationary Bandits,by Lilian Besson & Émilie Kaufmann & Odalric-Ambrym Maillard, October 2019.Submitted for AISTATS 2020. Preprint at HAL.Inria.fr/hal-02006471.See Chapter 7.

SMPyBandits: an Open-Source Research Framework for Single and Multi-PlayersMulti-Arms Bandits (MAB) Algorithms in Python,by Lilian BessonActive development since October 2016, HAL.Inria.fr/hal-01840022.It currently consists in about 45000 lines of code, hosted onGitHub.com/SMPyBandits, and a complete documentation accessible onSMPyBandits.rtfd.io or SMPyBandits.GitHub.io.Submitted for JMLR MLOSS, in October 2019.See Chapter 3.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 13/ 27

.Submitted works. . .

Page 107: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

What Doubling-Trick Can and Can’t Do for Multi-Armed Bandits,by Lilian Besson & Émilie Kaufmann, September 2018.Preprint at HAL.Inria.fr/hal-01736357.

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 14/ 27

.In progress works waiting for a new submission. . .

Page 108: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

I included here some extra slides. . .

pseudo code of Rand-Top-M + kl-UCB

pseudo code of MC-Top-M + kl-UCB

exact regret bound of MC-Top-M + kl-UCB

pseudo code of GLRT + kl-UCB

exact regret bound of GLRT + kl-UCB

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 15/ 27

.Backup slides

Page 109: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 16/ 27

.Our algorithm Rand-Top-M

Page 110: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 17/ 27

.Our algorithm MC-Top-M

Page 111: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 18/ 27

.Lemma: bad selections for MC-Top-M with kl-UCB

Page 112: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 19/ 27

.Lemma: collisions for MC-Top-M with kl-UCB

Page 113: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 20/ 27

.Theoreom: regret for MC-Top-M with kl-UCB

Page 114: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 21/ 27

.Our algorigthm GLRT and kl-UCB

Page 115: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 22/ 27

.Theorem: regret bound for GLRT + kl-UCB (global)

Page 116: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 23/ 27

.Corollary: regret bounds for GLRT + kl-UCB (global)

Page 117: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 24/ 27

.Theorem: regret bound for GLRT + kl-UCB (local)

Page 118: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 25/ 27

.Corollary: regret bounds for GLRT + kl-UCB (local)

Page 119: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

End of backup slides

Thanks for your attention!

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 26/ 27

.End of backup slides

Page 120: “Multi-players Bandit Algorithms for Internet of Things Networks” · 2019. 11. 28. · “Multi-players Bandit Algorithms for Internet of Things Networks” By Lilian Besson PhD

© Jeph Jacques, 2015, QuestionableContent.net/view.php?comic=3074

PhD defense – Lilian Besson – “MAB Algorithms for IoT Networks” 20 November, 2019 – 27/ 27

.What about the climatic crisis?