smart traffic lights that learn ! multi-agent reinforcement learning integrated network of...

Smart Traffic Lights that Learn !

Multi-Agent Reinforcement Learning Integrated Network of Adaptive Traffic Signal Controllers

M A R L I N

Samah El-Tantawy, Ph.D. Post Doctoral Fellow, Dept of Civil Engineering

Baher Abdulhai, Ph.D., P.Eng. Director, ITS Centre and Testbed, Dept of Civil Engineering

Hossam Abdelgawad, Ph.D., P.Eng. Manager of ITS Centre and Testbed

ACGM 2013- Intelligent Transport for Smart Cities

Outline 2

1. In a Nutshell 2. Theory in Brief Reinforcement Learning and Game Theory

3. Applications City of Toronto Testbed

4. Hardware in the Loop Testing Approach Integration with PEEK ATC-1000

Next Steps Q&A

In a Nutshell 3

Grand objective

Intersections "talk to each other",

Each is affected by what is happening upstream

Each affects what is happening downstream –

Whole network control in one shot from a grand brain is the dream

Issue

Intractable theoretically,

Too complex practically,

Requires massive and very expensive communication

Solution

Decentralized,

Self learning: agents learn to control their local intersection, and

Game theory based: agents learn to collaborate

What is MARLIN? 4

Artificial-intelligence-based control software

Enables traffic lights to self-learn and self-collaborate with neighbouring traffic lights

Cuts down motorists’ delay, fuel consumption and the negative environmental effects of congestion

Easier to operate (self learning)

Less expensive communication if even necessary (less costly)

MARLIN-ATSC: Level 4

Evolution of “Adaptive” Signal Control

Level 0 • Fixed-Time

and Actuated Control

• TRANSYT • 1969, UK

Level 1 • Centralized

Control, Off-line Optimization

• SCATS • 1979,

Australia • >50

installations worldwide

Level 2 • Centralized

Control, On-line Optimization

• SCOOT • 1981, UK • >170

installations worldwide

Level 3 • Distributed

Control, Model-Based

• OPAC, RHODES • 1992, USA • 5 installations in

USA

Level 4 • Distributed

Self-Learning Control

• MARLIN-ATSC • 2011, Canada

5

Issues with Leading ATSC Technologies?

• Expensive • Not scalable • Not robust

Centralized

• Relying on an accurate traffic modelling framework

• the accuracy of which is questionable Model-Based

• Increasing the complexity of the system exponentially with the increase in the number of intersections/controllers

Curse of Dimensionality

• Requiring highly skilled labour to operate due to their complexity.

Human Intervention

Requirements

6

Why is MARLIN Different? 7

MARLIN

Self-Learning

Decentralized

Model-Free

Coordinated Scalable

Pattern Sensitive

Generic

Human Intervention Requirements

Centralized

Inefficient Coordination

Model-Based

Curse of Dimensionality

Prediction Requirement

Specific Design

Learning the Control Law: Reinforcement Learning Architecture

8

Environment

RL Architecture

Agent

State Reward Action

Goal: Optimal Control law = mapping between states and actions

)],(),(max[),(),( 111 kkkkk

a

kkkkkkk asQasQrasQasQ

),(maxarg 11 asQa kk

a

k Balancing exploration and exploitation

Q a1 a2

s1 -10 -5

s2 -3 -15

Q Table

RL-based ATSC Architecture

RL Software Agent

State (Queue Lengths)

Reward

(Delay Savings)

Action (Extend /Switch)

Traffic Simulation Environment

9

10

Each agent plays a game with each adjacent intersection in its neighborhood

I5 I6 I4

I2 I3 I1

I8 I9 I7

I5 I6 I4

I2 I3 I1

I8 I9 I7

I5 I6 I4

I2 I3 I1

I8 I9 I7

Example for Intermediate Intersection (4 Games )

Example for Edge Intersection ( 3 Games)

Example for Corner Intersection ( 2 Games)

MARLIN- ATSC: Coordination Principle

MARLIN-ATSC: (a) Independent Mode, (b) Integrated Mode

MARLIN-ATSC Available Modes

Queue Length 1

Delay 1 Extend 1 Queue Length 2

Delay 2 Extend 2

MARLIN-ATSC

Queue Length 1

Delay 1 Extend 1 Queue Length 2

Delay 2 Extend 2

(a)

(b)

11

Large-Scale Application Network-Wide MOE in the Normal Scenario

12

System

MOE

BC

%

Improvments

MARL-TI Vs.

BC

%

Improvments

MARLIN-IC Vs.

BC

% Improvments

MARLIN-IC Vs.

MARL-TI

Average Intersection

Delay (sec/veh)35.27 27% 38% 14%

Throughput (veh) 23084 3% 6% 3%

Avg Queue Length (veh) 8.66 24% 32% 11%

Std. Avg. Queue Length

(veh)2.12 23% 31% 10%

Avg. Link Delay (sec) 9.45 10% 47% 41%

Avg. Link Stop Time (sec) 2.74 6% 26% 21%

Avg. Link Travel Time

(sec)16.81 6% 27% 22%

CO2 Emission Factor

(gm/km)587.28 28% 30% 2%

Large-Scale Application % Improvement in Average Delay

13

MARLIN-IC vs BC

% Improvement Area 1

Area 2

Area 3

Large-Scale Application Average Route Travel Time for Selected Routes

14

0

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8 9 10 11 12

Average T

ravel

Tim

e (

min

)

Time Interval (5 min)

Gardiner EB

BC MARL-TI MARLIN-IC

Fre

eway

Large-Scale Application Average Route Travel Time for Selected Routes

15

0

2

4

6

8

10

12

14

16

18

20

1 2 3 4 5 6 7 8 9 10 11 12

Av

era

ge T

ra

vel

Tim

e (

min

)

Time Interval (5 min)

LakeShore EB to Spadina NB

BC MARL-TI MARLIN-IC

Maj

or

Art

eria

l

Controller Interface Device(CID) RS485 to USB

Traffic Signal Controller

RS485 - SDLC protocol

USB - SDLC protocol

Industrial Computer

Ethernet - NTCIP protocol

Paramics Modeller

MARLIN-HILS Architecture 16

HILS Setup: Demo 17

Conclusion 18

MARLIN state of the art gen4+

Thoroughly developed and tested

Patent Pending Status

On going:

HILS & PEEK ATC-1000 Integration

Potential Field Operation Test

Productization

From TSP to People Priority (PSP)

Samah El-Tantawy [email protected]

Baher Abdulhai [email protected]

Hossam Abdelgawad [email protected]


mailto:[email protected]



Smart Traffic Lights that Learn !

Multi-Agent Reinforcement Learning Integrated Network of Adaptive Traffic Signal Controllers

M A R L I N

Samah ElTantawy, Ph.D. Post Doctoral Fellow, Dept of Civil Engineering

Baher Abdulhai, Ph.D., P.Eng. Director, ITS Centre and Testbed, Dept of Civil Engineering

Hossam Abdelgawad, Ph.D., P.Eng. Manager of ITS Centre and Testbed


smart traffic lights that learn ! multi-agent reinforcement learning integrated network of...

Technology