cellular network traffic scheduling using deep reinforcement … · 2019-03-11 · cellular network...

CellularNetworkTrafficSchedulingusingDeepReinforcementLearning

SandeepChinchali,et.al.MarcoPavone,SachinKattiStanfordUniversity

AAAI2018

Canwelearn tooptimallymanagecellularnetworks?

2

Internet

DelaySensitive

Real-timeMobileTraffic

DelayTolerant(DT)Traffic

IoT:Map/SWupdatesPre-fetchedcontent

WhyisIoT/DTtrafficschedulinghard?

[email protected] 3

IoT

Utilization

AcceptableLimit IoT Contendinggoals• MaxIoT/DTdata• Losstomobiletraffic

• Networklimits

OptimalControl

WhyisIoT/DTtrafficschedulinghard?

[email protected] 4

09:00 11:00 13:00 15:00 17:00 19:00 21:00

Local time

0

10

20

30

40

50

Con

gest

ion

C

Melbourne Central Business District, Rolling Average = 1 min

Shopping center

O�ce building

Southern cross station

Melbourne central station

Diversecity-widecellpatterns

Ourcontributions

1. Identifyinefficienciesinrealcellularnetworks4weeks,10diversecellsinDowntownMelbourne,Australia

2. DataDriven,DeepLearningNetworkModelOurlivenetworkexperimentsmatchMDPdynamics

3. AdaptiveRLschedulerFlexiblyrespondstooperatorrewardfunctions

[email protected] 5

IoTScheduler

NetworkState

IoTrate

WhyDeepLearning?

1. Learntime-variantnetworkdynamics

2. Adapttohigh-levelnetworkoperationgoals

3. Generalizetodiversecells

4. Abundanceofnetworkdata

[email protected] 6

09:00 11:00 13:00 15:00 17:00 19:00 21:00

Local time

0

10

20

30

40

50

Con

gest

ion

C

Melbourne Central Business District, Rolling Average = 1 min

Shopping center

O�ce building

Southern cross station

Melbourne central station

RelatedWork

1. DynamicResourceAllocation• Electricitygrid(Reddy2011),calladmission(Marbach 1998),trafficcontrol(Chu2016)

2. Data-drivenOptimalControl+Forecasting• DeepRL(Mnih 2013,Silver2014,Lillicrap 2015)• LSTMnetworks(Hochreiter 1997,Laptev2017,Shi2015)

3. MachineLearningforComputerNetworks• ClusterResourceManagement(Mao2016)• MobileVideoStreaming(Mao2017,Yin2015)

[email protected] 7

Data-drivenproblemformulation1. NetworkStateSpace2. IoTSchedulerActions3. Time-variantdynamics4. Networkoperatorpolicies

8

NumUsers

IoTScheduler

Networkstate+forecasts

Congestion

Cellefficiency IoTrate

PrimeronCellNetworks

[email protected] 9

Goal: Maxsafe IoT𝐭𝐫𝐚𝐟𝐟𝐢𝐜𝑽𝒕 overday

(LinkQuality)

CurrentNetworkState

FullStatewithTemporalFeatures

RLsetup(1):StateSpace

[email protected] 10

Agent EnvironmentAction

Networkstate

Reward

StochasticForecast(LSTM)

Horizon:DayofT mins

IoTTrafficRate:

IoTVolumeperminute:

Utilizationgain:

RLsetup(2):ActionSpace



Networkstate

Reward

RLsetup(3):TransitionDynamics


20:10 20:15 20:20

Local time

1.0

1.1

1.2

1.3

1.4

1.5

1.6

Con

gest

ion

C

Controlled tra�c

Backgrounddynamics


Networkstate

Reward

RLsetup(4):OperatorRewards

Overallweightedreward

1. IoTtrafficvolume

2. Losstoregularusers

3. Trafficbelownetworklimit

13


Networkstate

Reward

Goal: FindOptimalOperatorPolicy

What-ifmodel

Evaluation

[email protected]

EvaluationCriteria

1. Robustperformanceondiversecell-daypairs2. Abilitytoexploitbetterforecasts3. Interpretability

15

NumUsers

IoTScheduler

Networkstate+forecasts

Congestion

Cellefficiency IoTrate

1.RLgeneralizestoseveralcell-daypairs

TUain Test0

20

40

60

80

100

8ti

lizati

on

gain

VIoT/V0 (

%)

α

1

2

Respondtooperatorpriorities

Significantgains:• FCCSpectrumAuction(Reardon2016):$4.5Bfor10MHzofspectrum• 14.7%mediangainforα = 2• Significant costsavings[simulated]


2.RLeffectivelyleveragesforecasts

17RicherLSTMforecasts

RL

Benchmark

3a.RLexploitstransientdipsinutilization

ControlledCongestion Utilizationgain

18

9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

0

2

4

6

8

10

12

14

16

Con

gest

ion

C

Original

Heuristic control

DDPG control TransientDip

9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

0

20

40

60

80

100

Utiliz

atio

nga

inV

IoT/V

0(%

)

Heuristic control

DDPG control

3b.RLsmoothsnetworkthroughput

ControlledCongestion ResultingThroughput


9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

0

2

4

6

8

10

12

14

16

Con

gest

ion

C

Original

Heuristic control

DDPG control

9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

Local time

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Thr

ough

put

B(M

Bps

)

Original

Heuristic control

DDPG control

Throughput limit

Conclusion

Modernnetworksareevolving• Delaytoleranttraffic(IoTupdates,pre-fetchedcontent)

Data-drivenoptimalcontrol• LSTMforecasts+RLcontroller• 14.7%simulatedgain->significantsavings

Futurework:• Operationalnetworktests• Decouplepredictionandcontrol

Questions:[email protected]

[email protected]

Extraslides

[email protected]

2.RLeffectivelyleveragesforecasts

Betterforecastsenhanceperformance DiscretizedMDPforofflineoptimal

0 50 100 150 200 250|S|

0.8

1.0

1.2

1.4

1.6

1.8

2.0

2.2

2.4

Rew

ard

R

|A|=5|A|=20|A|=40|A|=60

[email protected] 22RicherLSTMforecasts ApproachCts MDP

cellular network traffic scheduling using deep reinforcement … · 2019-03-11 · cellular network...

Documents