deep parking

Deep parking:an implementation of automatic parking with deep reinforcement

learning

Shintaro Shiba, Feb.2016-Dec.2016Engineer Internship at Preferred Networks

Mentor: Abe-san, Fujita-san

About meShintaro Shiba• Graduate student at the University of

Tokyo– Major in neuroscience and animal behavior

• Part-time engineer (internship) at Preferred Networks, Inc.– Blog post URL: https://

research.preferred.jp/2017/03/deep-parking/

Contents• Original Idea• Background: DQN and Double-DQN• Task definition

– Environment: car simulator– Agents

1. Coordinate2. Bird‘s-eye view3. Subjective view

• Discussion• Summary

Achievement

Trajectory of the car agent Subjective view (Input for DQN)

-120 deg

+120 deg

Original Idea: DQN for parking

https://research.preferred.jp/2016/01/ces2016/https://research.preferred.jp/2015/06/distributed-deep-reinforcement-learning/

Succeeded in driving smoothly with DQNInput: 32 virtual sensors, 3 previous actions + Current speed and steeringOutput: 9 actions

Is it possible to learn for car agent to park itself,with inputs of images from camera?

Reinforcement learning

Environment

action statereward

Learning algorithm

DQN: Deep-Q Network

Volodymyr Mnih et al. 2015

each episode >>each action >>

update Q function >>

Double DQN

Preventing overestimation of Q values

Hado van Hasselt et al. 2015

Reinforcement learning in this project

EnvironmentCar simulator

AgentDifferent sensor +

different neural network

action state = sensor inputreward

Environment:Car simulator

Forces of …• Traction• Air resistance• Rolling resistance• Centrifugal force• Brake• Cornering force

F = Ftraction + Faero + Frr + Fc + Fbrake + Fcf

Common specifications:state, action, reward

Input (States)– Features specific to each agent + car speed, car steering

Output (Actions)– 9: accelerate, decelerate, steer right, steer left, throw (do nothing),

accelerate + steer right, accelerate + steer left, decelerate + steer right, decelerate + steer left

Reward– +1 when the car is in the goal– -1 when the car is out of the field– 0.01 - 0.01 * distance_to_goal otherwise (changed afterward)

Goal– Car inside the goal region, no other conditions like car direction

Terminate– Time up: 500 times of actions (changed to 450 afterward)– Field out: Out of the field

Common specifications:hyperparameters

Maximum episode: 50,000Gamma: 0.97Optimizer: RMSpropGraves

– lr=0.00015, alpha=0.95, momentum=0.95, eps=0.01

– changed afterward: lr=0.00015, alpha=0.95, momentum=0, eps=0.01

Batchsize: 50 or 64Epsilon: 0.1 at last

– linearly decreased from 1.0 at first

Agents1. Coordinate2. Bird’s-eye view3. Subjective view

– Three cameras– Four cameras

Coordinate agentInput features

– Relative coordinate value from the car to the goal

(80, 300)

carinput shape: (2, )normalized

Coordinate agentNeural Network

– only full-connected layers (3)

n of actions (9)

n of car parameters (2)

coordinates (2)

Coordinate agentResult

Bird’s-eye view agentInput features

– Bird’s-eye image of the whole field

input size: 80 x 80normalized

Bird’s-eye view agentNeural Network

192n of actions

n of car parameters (2)64

Bird’s-eye view agentNeural Network

192n of actions

n of car parameters (2)64

Bird’s-eye view agentResult: 18k episodes

Bird’s-eye view agentResult: after 18k episodes ?

But we had already spent about 6 month for this agent so moved to the next…

Subjective view agentInput features

– N_of_camera images of subjective view from the car

– Number of cameras…Three or Four– FoV = 120 deg

cameraex. Input images for four camera agent

front+0

back+180

right+90

left+270

Subjective view agentNeural Network

200 x 3

256n of actions

Subjective view agentNeural Network

200 x 3

256n of actions

Subjective view agentProblem

– Calculation time (GeForce GTX TITAN X) • At first… 3 [min/ep] x 50k [ep] = 100 days• Reviewed by Abe-san… 1.6 [min/ep] x 50k [ep] = 55

days– Because of copy and synchronization between GPU and CPU– Learning interrupted as soon as divergence of DNN output– (Fortunately) agent “learned” goal by ~10k episodes in

some trials– Memory usage

• In DQN, we need to store 1M previous input data– 1M x (80 x 80 x 3 ch x 4 cameras)

• Save images to disk and access every time

Subjective view agentResult: three cameras, 6k episodes

-120 deg

+120 deg

Trajectory of the car agent Subjective view (Input for DQN)

Subjective view agentResult: three cameras, 50k episodes

The policy “move anyways” ?>> Reward setting

Seems not able to goal every timeOnly “easy” goal to achieve>> Variable task difficulty (curriculum)

Frequent goals here

Subjective view agentFour camera at 30k ep.

Modify rewardPrevious

– +1 when the car is in the goal– -1 when the car is out of the field– 0.01 - 0.01 * distance_to_goal otherwise

New– +1 - speed when the car is in the goal

• in order to stop the car– -1 when the car is out of the field– -0.005

Modify difficultyDifficulty: Initial car direction & position

– Constraint• Car always starts near the middle of the field• Car always starts with face toward center:

– Curriculum• Car direction:

– where n = currriculum• Criteria:

– 0.6 of mean reward over 100 episodes

Goaln = 1

Subjective view agent: modifications

N cameras Reward Difficulty Learning result

3 Default Default about 6k: o50k: x

3 modified Default about 16k: o

3 modified Constraint ? (still learning)

3 modified Curriculum o(though curriculum 1

yet)4 Default Default x

4 modified Curriculum △ (not bad, but not successful yet at 6k)

Subjective view agent: modifications

Curriculum + Three cameras@curriculum 1. Criteria needs to be modified

reward mean reward sum1.0

n episode0 10k 20k

Discussion1. Initial settings included the situation

where car cannot reach the goal– e.g. Start towards the edge of the field– This made learning unstable

2. Why successful for coordinate agent?– In spite there could be such situations?

Discussion3. Comparison with three and four cameras

– Considering success rate and execution time, three camera is better

– Why not successful in four cameras?– Need several trials?

4. DQN often diverged– every three times in personal feeling

• four cameras is slightly more oftern– Importance of dataset for learning

• memory size, batch size

Discussion5. Curriculum

– Ideally better to quantify “difficulty of the task”

• In this case, maybe it is roughly represented as “bias of distribution” of the selected actions?accelerate

deceleratethrow (do nothing)

steer rightsteer left

accelerate + steer rightaccelerate + steer left

decelerate + steer rightdecelerate + steer left

same times for each actions >> go straightbiased distribution of selected actions >> go right/left

Summary• Car agent can park itself with subjective

view of cameras, though not always stable learning

• Trade-off between reward design and learning difficulty– Simple reward: difficult to learn

• Try other algorithms like A3C– Complex reward: difficult to set

• Other setting for distance_to_goal

deep parking

Technology

erp flyer premiumtruck-parking sprachen neu€¦ · premium...

parking map key - denver, · pdf filecheyenne wells visitor...

hybrid parking.... parking delivery perfected

p-63 3 4 o4 3 4 - sekretarijat za · pdf filetrafostanica...

paid parking lots - granville island...paid parking lots...

timetec parking cloud base parking management system ·...

théoule-sur-mer pointe de l’aiguille - climbing...

ЈКП ПАРКИНГ» jkp “parking”„parking” kkv...

parking permit corp. parking permit corp. the urban parking...

2-shared parking - patrick...

residence free-parking disk & charged parking system … ·...

with tensorflow google brain team large-scale deep ... ·...

playhouse district “parklets, plus” pilot...

special event parking: rv parking, vip parking, and more...

iconic corner. iconic opportunity.€¦ · wine bar...

deep reinforcement learning for automated parking

k&a - nema.go.ke · basement-1 27no. parking ways...

parking study - oneida...

campus parking - kean university · campus parking...

ts10/0,4kvts ts10/0,4kv ts ts10/0,4kv ãte...