advanced applications: robotics

49
CS 343: Artificial Intelligence Advanced Applications: Robotics Prof. Scott Niekum — The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

Upload: others

Post on 26-Mar-2022

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Applications: Robotics

CS343:ArtificialIntelligenceAdvancedApplications:Robotics

Prof.ScottNiekum—TheUniversityofTexasatAustin [TheseslidesbasedonthoseofDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathttp://ai.berkeley.edu.]

Page 2: Advanced Applications: Robotics

RoboticHelicopters

Page 3: Advanced Applications: Robotics

MotivatingExample

■ How do we execute a task like this?

Page 4: Advanced Applications: Robotics

AutonomousHelicopterFlight

▪ Keychallenges:

▪ Trackhelicopterpositionandorientationduringflight

▪ Decideoncontrolinputstosendtohelicopter

Page 5: Advanced Applications: Robotics

AutonomousHelicopterSetup

On-boardinertialmeasurementunit(IMU)

Sendoutcontrolstohelicopter

Position

Page 6: Advanced Applications: Robotics

HMMforTrackingtheHelicopter

▪ State:

▪ Measurements:[observationupdate]▪ 3-Dcoordinatesfromvision,3-axismagnetometer,3-axisgyro,3-axisaccelerometer

▪ Transitions(dynamics):[timeelapseupdate]▪ st+1=f(st,at)+wt f: encodes helicopter dynamics, w: noise

Page 7: Advanced Applications: Robotics

HelicopterMDP

▪ State:

▪ Actions(controlinputs):▪ alon:Mainrotorlongitudinalcyclicpitchcontrol(affectspitchrate) ▪ alat:Mainrotorlatitudinalcyclicpitchcontrol(affectsrollrate) ▪ acoll:Mainrotorcollectivepitch(affectsmainrotorthrust) ▪ arud:Tailrotorcollectivepitch(affectstailrotorthrust)

▪ Transitions(dynamics):▪ st+1=f(st,at)+wt [f encodes helicopter dynamics] [w is a probabilistic noise model]

▪ CanwesolvetheMDPyet?

Page 8: Advanced Applications: Robotics

Problem:What’stheReward?

▪ Rewardforhovering:

Page 9: Advanced Applications: Robotics

Hover

[Ngetal,2004]

Page 10: Advanced Applications: Robotics

Problem:What’stheReward?

▪ Rewardsfor“Flip”?

▪ Problem:what’sthetargettrajectory?

▪ Justwriteitdownbyhand?

▪ Penalizefordeviationfromtrajectory

Page 11: Advanced Applications: Robotics

Flips(?)

Page 12: Advanced Applications: Robotics

HelicopterApprenticeship?

!12

Page 13: Advanced Applications: Robotics

UnalignedDemonstrations

Page 14: Advanced Applications: Robotics

ProbabilisticAlignmentusingaBayes’Net

▪ Intendedtrajectorysatisfiesdynamics.

▪ Experttrajectoryisanoisyobservationofoneofthehiddenstates.▪ Butwedon’tknowexactlywhichone.

Intended trajectory

Expert demonstrations

Time indices

[Coates,Abbeel&Ng,2008]

Page 15: Advanced Applications: Robotics

AlignedDemonstrations

Page 16: Advanced Applications: Robotics

FinalBehavior

[Abbeel,Coates,Quigley,Ng,2010]

Page 17: Advanced Applications: Robotics

LeggedLocomotion

Page 18: Advanced Applications: Robotics

Quadruped

▪ Low-levelcontrolproblem:movingafootintoanewlocation! searchwithsuccessorfunction~movingthemotors

▪ High-levelcontrolproblem:whereshouldweplacethefeet?

▪ RewardfunctionR(x)=w.f(s)[25features]

[Kolter,Abbeel&Ng,2008]

Page 19: Advanced Applications: Robotics

▪ Demonstratepathacrossthe“trainingterrain”

▪ Runapprenticeshiptolearntherewardfunction

▪ Receive“testingterrain”---heightmap.

▪ Findtheoptimalpolicywithrespecttothelearnedrewardfunctionforcrossingthetestingterrain.

Experimentalsetup

[Kolter,Abbeel&Ng,2008]

Page 20: Advanced Applications: Robotics

Learning task objectives: Inverse reinforcement learning

Reinforcement learning basics:

MDP: (S,A, T, �, D,R)

⇡(s, a) ! [0, 1]Policy:

Value function: V ⇡(s0) =1X

t=0

�tR(st)

states actions transition dynamics

discount rate start statedistribution

reward function

What if we have an MDP/R?

Page 21: Advanced Applications: Robotics

Learning task objectives: Inverse reinforcement learning

2. Explain expert demos by finding such that:R⇤

1. Collect user demonstration (s0, a0), (s1, a1), . . . , (sn, an)

⇡Eand assume it is sampled from the expert’s policy,

E[P1

t=0 �tR⇤(st)|⇡E ] E[

P1t=0 �

tR⇤(st)|⇡]

8⇡Es0⇠D[V ⇡E

(s0)] Es0⇠D[V ⇡(s0)]

8⇡�

How can search be made tractable?

[Abbeel and Ng 2004]

Page 22: Advanced Applications: Robotics

Learning task objectives: Inverse reinforcement learning

Define R⇤ as a linear combination of features:

R⇤(s) = wT�(s) , where � : S ! Rn

Then,

E[P1

t=0 �tR⇤(st)|⇡] = E[

P1t=0 �

twT�(st)|⇡]

= wTE[P1

t=0 �t�(st)|⇡]

= wTµ(⇡)

Thus, the expected value of a policy can be expressed as a weighted sum of the expected features µ(⇡)

[Abbeel and Ng 2004]

Page 23: Advanced Applications: Robotics

Learning task objectives: Inverse reinforcement learning

Originally -

E[P1

t=0 �tR⇤(st)|⇡E ] E[

P1t=0 �

tR⇤(st)|⇡] 8⇡

Restated - find such that:

Explain expert demos by finding such that:R⇤

w⇤

w⇤µ(⇡E) w⇤µ(⇡)

Use expected features:

8⇡

E[P1

t=0 �tR⇤(st)|⇡] = wTµ(⇡)

[Abbeel and Ng 2004]

Page 24: Advanced Applications: Robotics

Learning task objectives: Inverse reinforcement learning

Find such that:w⇤ 8⇡

2. Find s.t. expert maximally outperforms all previously

1. Initialize to any policy⇡0

Iterate for i =1, 2, … :

⇡i

examined policies :⇡0...i�1

w⇤µ(⇡E) � w⇤µ(⇡j) + ✏s.t.

w⇤µ(⇡E) w⇤µ(⇡)�

3. Use RL to calc. optimal policy associated with

max

✏,w⇤:kw⇤k21✏

w⇤

4. Stop if ✏ threshold

Goal:

w⇤

[Abbeel and Ng 2004]

Page 25: Advanced Applications: Robotics

Learning task objectives: Inverse reinforcement learning

Find such that:w⇤ 8⇡

2. Find s.t. expert maximally outperforms all previously

1. Initialize to any policy⇡0

Iterate for i =1, 2, … :

⇡i

examined policies :⇡0...i�1

w⇤µ(⇡E) � w⇤µ(⇡j) + ✏s.t.

w⇤µ(⇡E) w⇤µ(⇡)�

3. Use RL to calc. optimal policy associated with

max

✏,w⇤:kw⇤k21✏

w⇤

4. Stop if ✏ threshold

Goal:

w⇤

[Abbeel and Ng 2004]

SVM solver

Page 26: Advanced Applications: Robotics

Without learning

Page 27: Advanced Applications: Robotics

With learned reward function

Page 28: Advanced Applications: Robotics

Robotic manipulation

Page 29: Advanced Applications: Robotics

Demonstration

[RSS 2013, IJRR 2015]

Page 30: Advanced Applications: Robotics

High-level task modeling

?

Unsegmented demonstrationsof multi-step tasks

Finite-state taskrepresentation

• How many skills?• Parameters of skills / controllers?• How to sequence intelligently?

• Superior generalization of skills• Handle contingencies• Adaptively sequence skills

Why?

Questions

Page 31: Advanced Applications: Robotics

System overview

[IROS 2012]

Page 32: Advanced Applications: Robotics

Joint angles

Gripper pose

Stereo data

System overview

[IROS 2012]

Page 33: Advanced Applications: Robotics

Taskdemos

Joint angles Forward kinematics

Object recognition

Gripper pose

Stereo data

System overview

[IROS 2012]

Page 34: Advanced Applications: Robotics

Taskdemos

Joint angles Forward kinematics

Object recognition

Preprocessing /BP-AR-HMMsegmentation Segmented

motionsGripper pose

Stereo data

System overview

[IROS 2012]

Page 35: Advanced Applications: Robotics

y1 y2 y3 y4 y5 y6 y7 y8

Segmenting demonstrations

x1 x2 x3 x4 x5 x6 x7 x8Motioncategories

Observations

Standard Hidden Markov Model

[IROS 2012]

Page 36: Advanced Applications: Robotics

y1 y2 y3 y4 y5 y6 y7 y8

x1 x2 x3 x4 x5 x6 x7 x8

Observations

Autoregressive Hidden Markov Model

[IROS 2012]

Segmenting demonstrations

y(i)t =

rX

j=1

Aj,z(i)

ty(i)t�j + e(i)t (z(i)t )

Motioncategories

Page 37: Advanced Applications: Robotics

6 6 3 1 1 3 11 10

y1 y2 y3 y4 y5 y6 y7 y8Observations

[IROS 2012]

Segmenting demonstrations

Autoregressive Hidden Markov Model

y(i)t =

rX

j=1

Aj,z(i)

ty(i)t�j + e(i)t (z(i)t )

Motioncategories

Page 38: Advanced Applications: Robotics

6 6 3 1 1 3 11 10

y1 y2 y3 y4 y5 y6 y7 y8Observations

[IROS 2012]

Segmenting demonstrations

Autoregressive Hidden Markov Model

y(i)t =

rX

j=1

Aj,z(i)

ty(i)t�j + e(i)t (z(i)t )

Motioncategories

Page 39: Advanced Applications: Robotics

Autoregressive Hidden Markov Model

6 6 3 1 1 3 11 10

y1 y2 y3 y4 y5 y6 y7 y8Observations

unknown number!

Beta Process

[IROS 2012]

Segmenting demonstrations

y(i)t =

rX

j=1

Aj,z(i)

ty(i)t�j + e(i)t (z(i)t )

(Fox et al. 2011)

Motioncategories

Page 40: Advanced Applications: Robotics

Taskdemos

Joint angles Forward kinematics

Object recognition

Preprocessing /BP-AR-HMMsegmentation Segmented

motions

Coordinate frame detection Frame-

labeledsegments

Gripper pose

Stereo data

Red object coord. frame

System overview

[IROS 2012]

Page 41: Advanced Applications: Robotics

Taskdemos

Joint angles Forward kinematics

Object recognition

Preprocessing /BP-AR-HMMsegmentation Segmented

motions

Coordinate frame detection Frame-

labeledsegments

Gripper pose

Stereo data

Learning from demonstration DMPs

with frame-relative goals

Red object coord. frame

System overview

[IROS 2012]

Page 42: Advanced Applications: Robotics

Taskdemos

Joint angles Forward kinematics

Object recognition

Preprocessing /BP-AR-HMMsegmentation Segmented

motions

Coordinate frame detection Frame-

labeledsegments

Gripper pose

Stereo data

Learning from demonstration DMPs

with frame-relative goals

Joint angles

Gripper pose

Stereo data

Realtime data froma novel task

Forward kinematics

Object recognition

Red object coord. frame

System overview

[IROS 2012]

Page 43: Advanced Applications: Robotics

Taskdemos

Joint angles Forward kinematics

Object recognition

Preprocessing /BP-AR-HMMsegmentation Segmented

motions

Coordinate frame detection Frame-

labeledsegments

Gripper pose

Stereo data

Learning from demonstration DMPs

with frame-relative goals

DMP planning /inverse kinematics Joint trajectory

spline controller

Joint angles

Gripper pose

Stereo data

Realtime data froma novel task

Forward kinematics

Object recognition

Red object coord. frame

System overview

[IROS 2012]

Page 44: Advanced Applications: Robotics

Learning a task plan: Finite state automata

[RSS 2013, IJRR 2015]

Page 45: Advanced Applications: Robotics

Learning a task plan: Finite state automata

Controller built from motion category examples

Classifier built from robot percepts

[RSS 2013, IJRR 2015]

Page 46: Advanced Applications: Robotics

Interactive corrections

[RSS 2013, IJRR 2015]

Page 47: Advanced Applications: Robotics

Replay with corrections: missed grasp

[RSS 2013, IJRR 2015]

Page 48: Advanced Applications: Robotics

Replay with corrections: too far away

[RSS 2013, IJRR 2015]

Page 49: Advanced Applications: Robotics

Replay with corrections: full run

[RSS 2013, IJRR 2015]