how machines learn: from robot soccer to autonomous … · how machines learn: from robot soccer to...

124
How Machines Learn: From Robot Soccer to Autonomous Traffic Peter Stone Department or Computer Sciences The University of Texas at Austin

Upload: vuduong

Post on 05-Aug-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

How Machines Learn: From Robot Soccerto Autonomous Traffic

Peter Stone

Department or Computer SciencesThe University of Texas at Austin

Page 2: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Research Question

To what degree can autonomousintelligent agents learn in the presence of

teammates and/or adversaries inreal-time, dynamic domains?

Peter Stone

Page 3: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Research Question

To what degree can autonomousintelligent agents learn in the presence of

teammates and/or adversaries inreal-time, dynamic domains?

• Autonomous agents• Multiagent systems• Machine learning• Robotics

Peter Stone

Page 4: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Autonomous Intelligent Agents

• They must sense their environment.• They must decide what action to take (“think”).• They must act in their environment.

Peter Stone

Page 5: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Autonomous Intelligent Agents

• They must sense their environment.• They must decide what action to take (“think”).• They must act in their environment.

Complete Intelligent Agents

Peter Stone

Page 6: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Autonomous Intelligent Agents

• They must sense their environment.• They must decide what action to take (“think”).• They must act in their environment.

Complete Intelligent Agents

• Interact with other agents (Multiagent systems)

Peter Stone

Page 7: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Autonomous Intelligent Agents

• They must sense their environment.• They must decide what action to take (“think”).• They must act in their environment.

Complete Intelligent Agents

• Interact with other agents (Multiagent systems)• Improve performance from experience (Learning agents)

Peter Stone

Page 8: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Autonomous Intelligent Agents

• They must sense their environment.• They must decide what action to take (“think”).• They must act in their environment.

Complete Intelligent Agents

• Interact with other agents (Multiagent systems)• Improve performance from experience (Learning agents)

Autonomous Bidding, Cognitive Systems,Robot Soccer, Traffic management

Peter Stone

Page 9: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

BE a learning agent

Peter Stone

Page 10: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

BE a learning agent

• You, as a group, act as a learning agent

Peter Stone

Page 11: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

BE a learning agent

• You, as a group, act as a learning agent

• Actions: Wave, Stand, Clap

Peter Stone

Page 12: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

BE a learning agent

• You, as a group, act as a learning agent

• Actions: Wave, Stand, Clap

• Observations: colors, reward

Peter Stone

Page 13: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

BE a learning agent

• You, as a group, act as a learning agent

• Actions: Wave, Stand, Clap

• Observations: colors, reward

• Goal: Find an optimal policy

Peter Stone

Page 14: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

BE a learning agent

• You, as a group, act as a learning agent

• Actions: Wave, Stand, Clap

• Observations: colors, reward

• Goal: Find an optimal policy

− Way of selecting actions that gets you the most reward

Peter Stone

Page 15: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

How did you do it?

Peter Stone

Page 16: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

How did you do it?

• What is your policy?

• What does the world look like?

Peter Stone

Page 17: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Formalizing What Just HappenedKnowns:

Peter Stone

Page 18: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}

o0, a0, r0, o1, a1, r1, o2, . . .

Peter Stone

Page 19: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}

o0, a0, r0, o1, a1, r1, o2, . . .

Unknowns:

Peter Stone

Page 20: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}

o0, a0, r0, o1, a1, r1, o2, . . .

Unknowns:• S = 4x3 grid• R : S ×A 7→ IR• P = S 7→ O• T : S ×A 7→ S

Peter Stone

Page 21: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}

o0, a0, r0, o1, a1, r1, o2, . . .

Unknowns:• S = 4x3 grid• R : S ×A 7→ IR• P = S 7→ O• T : S ×A 7→ S

oi = P(si)

Peter Stone

Page 22: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}

o0, a0, r0, o1, a1, r1, o2, . . .

Unknowns:• S = 4x3 grid• R : S ×A 7→ IR• P = S 7→ O• T : S ×A 7→ S

oi = P(si) si = T (si−1, ai−1)

Peter Stone

Page 23: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Formalizing What Just HappenedKnowns:• O = {Blue, Red, Green, Black, . . .}• Rewards in IR• A = {Wave, Clap, Stand}

o0, a0, r0, o1, a1, r1, o2, . . .

Unknowns:• S = 4x3 grid• R : S ×A 7→ IR• P = S 7→ O• T : S ×A 7→ S

oi = P(si) si = T (si−1, ai−1) ri = R(si, ai)

Peter Stone

Page 24: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Reinforcement Learning

• Algorithms to select actions in such problems

Peter Stone

Page 25: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Reinforcement Learning

• Algorithms to select actions in such problems

• Q-learning: provably converges to the optimal policy

Peter Stone

Page 26: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Reinforcement Learning

• Algorithms to select actions in such problems

• Q-learning: provably converges to the optimal policy

− Proof: contraction mappings and fixed point theorem

Peter Stone

Page 27: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

A harder problem

• You had 3 actions and saw one of 10 colors

Peter Stone

Page 28: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

A harder problem

• You had 3 actions and saw one of 10 colors

• What if you had to control 12 joints . . .

Peter Stone

Page 29: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

A harder problem

• You had 3 actions and saw one of 10 colors

• What if you had to control 12 joints . . .

• . . . and saw something like this 30 times per second?

Peter Stone

Page 30: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup

Peter Stone

Page 31: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup

Goal: By the year 2050, a team of humanoid robotsthat can beat the human World Cup champion team.

Peter Stone

Page 32: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup

Goal: By the year 2050, a team of humanoid robotsthat can beat the human World Cup champion team.

• An international research initiative

Peter Stone

Page 33: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup

Goal: By the year 2050, a team of humanoid robotsthat can beat the human World Cup champion team.

• An international research initiative

• Drives research in many areas:

− Control algorithms; machine vision, sensing; localization;− Distributed computing; real-time systems;− Ad hoc networking; mechanical design;

Peter Stone

Page 34: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup

Goal: By the year 2050, a team of humanoid robotsthat can beat the human World Cup champion team.

• An international research initiative

• Drives research in many areas:

− Control algorithms; machine vision, sensing; localization;− Distributed computing; real-time systems;− Ad hoc networking; mechanical design;− Multiagent systems; machine learning; robotics

Peter Stone

Page 35: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup

Goal: By the year 2050, a team of humanoid robotsthat can beat the human World Cup champion team.

• An international research initiative

• Drives research in many areas:

− Control algorithms; machine vision, sensing; localization;− Distributed computing; real-time systems;− Ad hoc networking; mechanical design;− Multiagent systems; machine learning; robotics

Several Different Leagues

Peter Stone

Page 36: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup Soccer

Peter Stone

Page 37: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

The Early Years

Peter Stone

Page 38: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

A Decade Later

Peter Stone

Page 39: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Sony Aibo (ERS-210A, ERS-7)

Peter Stone

Page 40: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Sony Aibo (ERS-210A, ERS-7)

Peter Stone

Page 41: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Sony Aibo (ERS-210A, ERS-7)

Peter Stone

Page 42: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Creating a team — Subtasks

Peter Stone

Page 43: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Creating a team — Subtasks

• Vision• Localization• Walking• Ball manipulation (kicking)• Individual decision making• Communication/coordination

Peter Stone

Page 44: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Creating a team — Subtasks

• Vision• Localization• Walking• Ball manipulation (kicking)• Individual decision making• Communication/coordination

Peter Stone

Page 45: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Competitions

• Barely “closed the loop” by American Open (May, ’03)

Peter Stone

Page 46: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Competitions

• Barely “closed the loop” by American Open (May, ’03)

• Improved significantly by Int’l RoboCup (July, ’03)

Peter Stone

Page 47: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Competitions

• Barely “closed the loop” by American Open (May, ’03)

• Improved significantly by Int’l RoboCup (July, ’03)

• Won 3rd place at US Open (2004, 2005)

• Quarterfinalist at RoboCup (2004, 2005)

Peter Stone

Page 48: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Competitions

• Barely “closed the loop” by American Open (May, ’03)

• Improved significantly by Int’l RoboCup (July, ’03)

• Won 3rd place at US Open (2004, 2005)

• Quarterfinalist at RoboCup (2004, 2005)

• Highlights:− Many saves: 1; 2; 3; 4;− Lots of goals: CMU; Penn; Penn; Germany;

− A nice clear− A counterattack goal

Peter Stone

Page 49: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Post-competition: the CS research

Peter Stone

Page 50: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Post-competition: the CS research

• Model-based joint control [Stronger, S, ’04]

• Learning sensor and action models [Stronger, S, ’06]

• Machine learning for fast walking [Kohl, S, ’04]

• Learning to acquire the ball [Fidelman, S, ’06]

• Color constancy on mobile robots [Sridharan, S, ’04]

• Robust particle filter localization [Sridharan, Kuhlmann, S, ’05]

• Autonomous Color Learning [Sridharan, S, ’05]

Peter Stone

Page 51: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Policy Gradient RL to learn fast walk

Goal: Enable an Aibo to walk as fast as possible

Peter Stone

Page 52: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Policy Gradient RL to learn fast walk

Goal: Enable an Aibo to walk as fast as possible

• Start with a parameterized walk

• Learn fastest possible parameters

Peter Stone

Page 53: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Policy Gradient RL to learn fast walk

Goal: Enable an Aibo to walk as fast as possible

• Start with a parameterized walk

• Learn fastest possible parameters

• No simulator available:

− Learn entirely on robots− Minimal human intervention

Peter Stone

Page 54: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Walking Aibos

• Walks that “come with” Aibo are slow

• RoboCup soccer: 25+ Aibo teams internationally

− Motivates faster walks

Peter Stone

Page 55: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Walking Aibos

• Walks that “come with” Aibo are slow

• RoboCup soccer: 25+ Aibo teams internationally

− Motivates faster walks

Hand-tuned gaits [2003] Learned gaitsGerman UT Austin Hornby et al. Kim & UtherTeam Villa UNSW [1999] [2003]

230 mm/s 245 254 170 270 (±5)

Peter Stone

Page 56: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

A Parameterized Walk• Developed from scratch as part of UT Austin Villa 2003

• Trot gait with elliptical locus on each leg

Peter Stone

Page 57: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Locus Parametersz

x

y

• Ellipse length• Ellipse height• Position on x axis• Position on y axis• Body height• Timing values

12 continuous parameters

Peter Stone

Page 58: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Locus Parametersz

x

y

• Ellipse length• Ellipse height• Position on x axis• Position on y axis• Body height• Timing values

12 continuous parameters

• Hand tuning by April, ’03: 140 mm/s• Hand tuning by July, ’03: 245 mm/s

Peter Stone

Page 59: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Experimental Setup• Policy π = {θ1, . . . , θ12}, V (π) = walk speed when using π

Peter Stone

Page 60: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Experimental Setup• Policy π = {θ1, . . . , θ12}, V (π) = walk speed when using π

• Training Scenario

− Robots time themselves traversing fixed distance− Multiple traversals (3) per policy to account for noise

Peter Stone

Page 61: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Experimental Setup• Policy π = {θ1, . . . , θ12}, V (π) = walk speed when using π

• Training Scenario

− Robots time themselves traversing fixed distance− Multiple traversals (3) per policy to account for noise− Multiple robots evaluate policies simultaneously− Off-board computer collects results, assigns policies

Peter Stone

Page 62: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Experimental Setup• Policy π = {θ1, . . . , θ12}, V (π) = walk speed when using π

• Training Scenario

− Robots time themselves traversing fixed distance− Multiple traversals (3) per policy to account for noise− Multiple robots evaluate policies simultaneously− Off-board computer collects results, assigns policies

No human intervention except battery changes

Peter Stone

Page 63: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Policy Gradient RL• From π want to move in direction of gradient of V (π)

Peter Stone

Page 64: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Policy Gradient RL• From π want to move in direction of gradient of V (π)

− Can’t compute ∂V (π)∂θi

directly: estimate empirically

Peter Stone

Page 65: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Policy Gradient RL• From π want to move in direction of gradient of V (π)

− Can’t compute ∂V (π)∂θi

directly: estimate empirically

• ∂V (π)∂θi

≈ V ({θ1 + ε, . . . , θ12})− V ({θ1 − ε, . . . , θ12})

Peter Stone

Page 66: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Policy Gradient RL• From π want to move in direction of gradient of V (π)

− Can’t compute ∂V (π)∂θi

directly: estimate empirically

• ∂V (π)∂θi

≈ V ({θ1 + ε, . . . , θ12})− V ({θ1 − ε, . . . , θ12})

− Requires evaluation of 24 policies

Peter Stone

Page 67: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Policy Gradient RL• From π want to move in direction of gradient of V (π)

− Can’t compute ∂V (π)∂θi

directly: estimate empirically

• ∂V (π)∂θi

≈ V ({θ1 + ε, . . . , θ12})− V ({θ1 − ε, . . . , θ12})

− Requires evaluation of 24 policies

• Instead, evaluate t (15) policies in the neighborhood of π

s.t. ith parameter is randomly θi ± ε or 0.

Peter Stone

Page 68: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Policy Gradient RL• From π want to move in direction of gradient of V (π)

− Can’t compute ∂V (π)∂θi

directly: estimate empirically

• ∂V (π)∂θi

≈ V ({θ1 + ε, . . . , θ12})− V ({θ1 − ε, . . . , θ12})

− Requires evaluation of 24 policies

• Instead, evaluate t (15) policies in the neighborhood of π

s.t. ith parameter is randomly θi ± ε or 0.

• V ({θ1 + ε, . . . , θ12}) ≈ Avg+ε,1 ≡ policies with θ1 + ε

Peter Stone

Page 69: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Policy Gradient RL• From π want to move in direction of gradient of V (π)

− Can’t compute ∂V (π)∂θi

directly: estimate empirically

• ∂V (π)∂θi

≈ V ({θ1 + ε, . . . , θ12})− V ({θ1 − ε, . . . , θ12})

− Requires evaluation of 24 policies

• Instead, evaluate t (15) policies in the neighborhood of π

s.t. ith parameter is randomly θi ± ε or 0.

• V ({θ1 + ε, . . . , θ12}) ≈ Avg+ε,1 ≡ policies with θ1 + ε

− Expect t/3 estimates for each of θi ± ε, 0− Each evaluation contributes to all 12 estimates

Peter Stone

Page 70: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Gradient Estimation

Peter Stone

Page 71: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Taking a step

Peter Stone

Page 72: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Taking a step

Ai =

0 if Avg+0,i > Avg+ε,i and

Avg+0,i > Avg−ε,i

Avg+ε,i −Avg−ε,i otherwise(1)

Peter Stone

Page 73: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Taking a step

Ai =

0 if Avg+0,i > Avg+ε,i and

Avg+0,i > Avg−ε,i

Avg+ε,i −Avg−ε,i otherwise(1)

• Normalize A, multiply by scalar step-size η

• π = π + ηA

Peter Stone

Page 74: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Experiments• Started from stable, but fairly slow gait

• Used 3 robots simultaneously

• Each iteration takes 45 traversals, 712 minutes

Peter Stone

Page 75: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Experiments• Started from stable, but fairly slow gait

• Used 3 robots simultaneously

• Each iteration takes 45 traversals, 712 minutes

Before learning After learning

• 24 iterations = 1080 field traversals, ≈ 3 hours

Peter Stone

Page 76: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Results

180

200

220

240

260

280

300

0 5 10 15 20 25

Vel

ocity

(m

m/s

)

Number of Iterations

Velocity of Learned Gait during Training

(UT Austin Villa)

Learned Gait

Hand−tuned Gait

Hand−tuned Gait

Hand−tuned Gait

(UNSW)

(UNSW)

(German Team)

(UT Austin Villa)Learned Gait

Peter Stone

Page 77: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Results

180

200

220

240

260

280

300

0 5 10 15 20 25

Vel

ocity

(m

m/s

)

Number of Iterations

Velocity of Learned Gait during Training

(UT Austin Villa)

Learned Gait

Hand−tuned Gait

Hand−tuned Gait

Hand−tuned Gait

(UNSW)

(UNSW)

(German Team)

(UT Austin Villa)Learned Gait

• Additional iterations didn’t help• Spikes: evaluation noise? large step size?

Peter Stone

Page 78: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Learned ParametersParameter Initial ε Best

Value ValueFront ellipse:

(height) 4.2 0.35 4.081(x offset) 2.8 0.35 0.574(y offset) 4.9 0.35 5.152

Rear ellipse:(height) 5.6 0.35 6.02

(x offset) 0.0 0.35 0.217(y offset) -2.8 0.35 -2.982

Ellipse length 4.893 0.35 5.285Ellipse skew multiplier 0.035 0.175 0.049Front height 7.7 0.35 7.483Rear height 11.2 0.35 10.843Time to move

through locus 0.704 0.016 0.679Time on ground 0.5 0.05 0.430

Peter Stone

Page 79: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Algorithmic Comparison, Robot Port

Before learning After learning

Peter Stone

Page 80: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Summary

• Used policy gradient RL to learn fastest Aibo walk

• All learning done on real robots

• No human itervention (except battery changes)

Peter Stone

Page 81: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Outline

• Machine learning for fast walking [Kohl, S, ’04]

• Learning to acquire the ball [Fidelman, S, ’06]

• Color constancy on mobile robots [Sridharan, S, ’05]

• Autonomous Color Learning [Sridharan, S, ’06]

Peter Stone

Page 82: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Grasping the Ball

• Three stages: walk to ball; slow down; lower chin

• Head proprioception, IR chest sensor 7→ ball distance

• Movement specified by 4 parameters

Peter Stone

Page 83: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Grasping the Ball

• Three stages: walk to ball; slow down; lower chin

• Head proprioception, IR chest sensor 7→ ball distance

• Movement specified by 4 parameters

Brittle!

Peter Stone

Page 84: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Parameterization• slowdown dist: when to slow down

• slowdown factor: how much to slow down

• capture angle: when to stop turning

• capture dist: when to put down head

Peter Stone

Page 85: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Learning the Chin Pinch

• Binary, noisy reinforcement signal: multiple trials

• Robot evaluates self: no human intervention

Peter Stone

Page 86: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Results

• Evaluation of policy gradient, hill climbing, amoeba

0 2 4 6 8 10 120

10

20

30

40

50

60

70

80

90

100

succ

essf

ul c

aptu

res

out o

f 100

tria

ls

iterations

policy gradientamoebahill climbing

Peter Stone

Page 87: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

What it learned

Policy slowdown slowdown capture capture Successdist factor angle dist rate

Initial 200mm 0.7 15.0o 110mm 36%Policy gradient 125mm 1 17.4o 152mm 64%

Amoeba 208mm 1 33.4o 162mm 69%Hill climbing 240mm 1 35.0o 170mm 66%

Peter Stone

Page 88: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Outline

• Machine learning for fast walking [Kohl, S, ’04]

• Learning to acquire the ball [Fidelman, S, ’06]

• Color constancy on mobile robots [Sridharan, S, ’05]

• Autonomous Color Learning [Sridharan, S, ’06]

Peter Stone

Page 89: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Color Constancy

• Visual system’s ability to recognize true color acrossvariations in environment

Peter Stone

Page 90: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Color Constancy

• Visual system’s ability to recognize true color acrossvariations in environment

• Challenge: Nonlinear variations in sensor response withchange in illumination

Peter Stone

Page 91: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Color Constancy

• Visual system’s ability to recognize true color acrossvariations in environment

• Challenge: Nonlinear variations in sensor response withchange in illumination

• Mobile robots:

− Computational limitations− Changing camera positions

Peter Stone

Page 92: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Sample Images

Peter Stone

Page 93: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Sample Images

Peter Stone

Page 94: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Sample Images

Peter Stone

Page 95: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Sample Images

Peter Stone

Page 96: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Our Goal

• Match current performance in changing lighting

• Experiments on ERS-210A robots

Peter Stone

Page 97: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Autonomous Color Learning• Color Constancy: more tediously created maps

− Hand-labeling many images −→ hours of manual effort

Peter Stone

Page 98: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Autonomous Color Learning• Color Constancy: more tediously created maps

− Hand-labeling many images −→ hours of manual effort

• Use the structured environment

− Robot learns color distributions

Peter Stone

Page 99: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Autonomous Color Learning• Color Constancy: more tediously created maps

− Hand-labeling many images −→ hours of manual effort

• Use the structured environment

− Robot learns color distributions

• Comparable accuracy, 5 minutes of robot effort

Peter Stone

Page 100: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Outline

• Learning on physical robots

− No simulation, minimal human intervention

Peter Stone

Page 101: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Outline

• Learning on physical robots

− No simulation, minimal human intervention

• Motion: learning for fast walking

• Behavior: acquiring the ball

• Vision: color constancy, autonomous color learning

Peter Stone

Page 102: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Outline

• Learning on physical robots

− No simulation, minimal human intervention

• Motion: learning for fast walking

• Behavior: acquiring the ball

• Vision: color constancy, autonomous color learning

• Multiagent Strategy: RL in simulation

Peter Stone

Page 103: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup Simulator• Distributed: each player a separate client• Server models dynamics and kinematics

Peter Stone

Page 104: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup Simulator• Distributed: each player a separate client• Server models dynamics and kinematics• Clients receive sensations, send actions

Client 1

Server

Client 2

Cycle t-1 t t+1 t+2

Peter Stone

Page 105: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup Simulator• Distributed: each player a separate client• Server models dynamics and kinematics• Clients receive sensations, send actions

Client 1

Server

Client 2

Cycle t-1 t t+1 t+2

• Parametric actions: dash, turn, kick, say

Peter Stone

Page 106: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup Simulator• Distributed: each player a separate client• Server models dynamics and kinematics• Clients receive sensations, send actions

Client 1

Server

Client 2

Cycle t-1 t t+1 t+2

• Parametric actions: dash, turn, kick, say• Abstract, noisy sensors, hidden state− Hear sounds from limited distance− See relative distance, angle to objects ahead

Peter Stone

Page 107: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

RoboCup Simulator• Distributed: each player a separate client• Server models dynamics and kinematics• Clients receive sensations, send actions

Client 1

Server

Client 2

Cycle t-1 t t+1 t+2

• Parametric actions: dash, turn, kick, say• Abstract, noisy sensors, hidden state− Hear sounds from limited distance− See relative distance, angle to objects ahead

• > 10923states

• Limited resources : stamina• Play occurs in real time (≈ human parameters)

Peter Stone

Page 108: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

3 vs. 2 Keepaway

Peter Stone

Page 109: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

3 vs. 2 Keepaway• Play in a small area (20m × 20m)

• Keepers try to keep the ball

• Takers try to get the ball

Peter Stone

Page 110: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

3 vs. 2 Keepaway• Play in a small area (20m × 20m)

• Keepers try to keep the ball

• Takers try to get the ball

• Episode:− Players and ball reset randomly− Ball starts near a keeper− Ends when taker gets the ball or ball goes out

Peter Stone

Page 111: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

3 vs. 2 Keepaway• Play in a small area (20m × 20m)

• Keepers try to keep the ball

• Takers try to get the ball

• Episode:− Players and ball reset randomly− Ball starts near a keeper− Ends when taker gets the ball or ball goes out

• Performance measure: average possession duration

• Use CMUnited-99 skills:− HoldBall, PassBall(k), GoToBall, GetOpen

Peter Stone

Page 112: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

The Keepers’ Policy Space

notBall

�����

������

���

JJ

JJ

JJJ���

������

��

JJ

JJ

JJ

GetOpen

GoToBall {HoldBall,PassBall(k)}(k is another keeper)

Teammate with ballor can get therefaster

kickable Ballkickable

Peter Stone

Page 113: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

The Keepers’ Policy Space

notBall

�����

������

���

JJ

JJ

JJJ���

������

��

JJ

JJ

JJ

GetOpen

GoToBall {HoldBall,PassBall(k)}(k is another keeper)

Teammate with ballor can get therefaster

kickable Ballkickable

Example PoliciesRandom: HoldBall or PassBall(k) randomly

Peter Stone

Page 114: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

The Keepers’ Policy Space

notBall

�����

������

���

JJ

JJ

JJJ���

������

��

JJ

JJ

JJ

GetOpen

GoToBall {HoldBall,PassBall(k)}(k is another keeper)

Teammate with ballor can get therefaster

kickable Ballkickable

Example PoliciesRandom: HoldBall or PassBall(k) randomlyHold: Always HoldBall

Peter Stone

Page 115: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

The Keepers’ Policy Space

notBall

�����

������

���

JJ

JJ

JJJ���

������

��

JJ

JJ

JJ

GetOpen

GoToBall {HoldBall,PassBall(k)}(k is another keeper)

Teammate with ballor can get therefaster

kickable Ballkickable

Example PoliciesRandom: HoldBall or PassBall(k) randomlyHold: Always HoldBallHand-coded:

If no taker within 10m: HoldBallElse If there’s a good pass: PassBall(k)Else HoldBall

Peter Stone

Page 116: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Keeper’s State Variables

• 11 distances among players, ball, and center

• 2 angles to takers along passing lanes

Peter Stone

Page 117: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Function Approximation: Tile Coding

• Form of sparse, coarse coding based on CMACS [Albus,

1981]

Peter Stone

Page 118: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Function Approximation: Tile Coding

• Form of sparse, coarse coding based on CMACS [Albus,

1981]

Actionvalues

Fullsoccerstate

Fewstate

variables(continuous)

Sparse, coarse,tile coding

Linearmap

Huge binary feature vector(about 400 1’s and 40,000 0’s)

Peter Stone

Page 119: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Main Result

0 1 0 2 0 2 54

6

8

1 0

1 2

1 4

EpisodeDuration(seconds)

Hours of Training Time(bins of 1000 episodes)

handcoded randomalwayshold

1 hour = 720 5-second episodes

Peter Stone

Page 120: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Difficulty of Multiagent Learning

4

6

8

10

12

14

16

18

0 5 10 15 20

Epi

sode

Dur

atio

n (s

econ

ds)

Training Time (hours)

1 Learning2 Learning3 Learning

4

6

8

10

12

14

16

18

0 5 10 15 20

Epi

sode

Dur

atio

n (s

econ

ds)

Training Time (hours)

1 Learning2 Learning3 Learning

4

6

8

10

12

14

16

18

0 5 10 15 20

Epi

sode

Dur

atio

n (s

econ

ds)

Training Time (hours)

1 Learning2 Learning3 Learning

• Multiagent learning is harder!

Peter Stone

Page 121: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Outline

• Robot soccer on real robots

• Robot soccer in simulation

Peter Stone

Page 122: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Outline

• Robot soccer on real robots

• Robot soccer in simulation

• Autonomous driving

Peter Stone

Page 123: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Acknowledgements

Thanks to all the Students Involved!

• Kurt Dresner, Nate Kohl, Peggy Fidelman, MohanSridharan, Richard Sutton

• Other members of the UT Austin Villa Legged Robot Team

• http://www.cs.utexas.edu/~AustinVilla

Peter Stone

Page 124: How Machines Learn: From Robot Soccer to Autonomous … · How Machines Learn: From Robot Soccer to Autonomous Traffic ... Robot Soccer, Traffic management ... • Robust particle

Acknowledgements

Thanks to all the Students Involved!

• Kurt Dresner, Nate Kohl, Peggy Fidelman, MohanSridharan, Richard Sutton

• Other members of the UT Austin Villa Legged Robot Team

• http://www.cs.utexas.edu/~AustinVilla

• Fox Sports World for inspiration!

Peter Stone