laboratory for perceptual robotics – department of computer science hierarchical mechanisms for...

Laboratory for Perceptual Robotics – Department of Computer Science

Hierarchical Mechanisms for Robot Programming

Shiraj Sen Stephen Hart Rod Grupen Laboratory for Perceptual Robotics

University of Massachusetts AmherstMay 30, 2008

NEMS ‘08

2Laboratory for Perceptual Robotics – Department of Computer Science

OutlineHierarchical mechanisms

for robot programming

representationprogrammin

g

ActionPotential functions

Value functions

State representation

user defined

reinforcementlearning

intrinsicextrinsic


Hierarchical Actions

Σ G

H

Σ G

H

Σ G

H

forcevelocity

references

feedbacksignals

ϕpotential fields

Φvalue functions greedy traversal

avoids local minimum

programs

closed loopprimitive actions


Primitive Action Programming Interface

Sensory Error () Visual (uref)

Tactile (fref) Configuration

variables (θref) Operational

Space(xref)

Potential Functions () Spring potential fields

(ϕh)

Collision-free motion fields (ϕc)

Kinematic conditioning fields (ϕcond)

Motor Variables ()Subsets of : Configuration

Variables Operational

Space Variables

primitive actions:

a =Nullspace Projection

a1 a2


State Representation

Discrete abstraction of action dynamics. 4-level logic in control predicate pi

no reference ()

convergenceunknown X

-

1

0 descending gradient


Hierarchical Programming

A program is defined as a MDP over a vector of controller predicates:

S = p1 … pN

Absorbing states in the value function capture “convergence” of programs.

X

-

1

0

Learn value functions using reinforcement learning


StackInsertGraspTouch

Catalog

Intrinsic Reward

Goal: build deep control knowledge

Reward controllable interaction with the world• controllers with direct feedback from the external world.

Track

X

-

1

0

convergence event

X

-

1

0


Experimental Demonstration

Motor units• Two 7-DOF Barrett WAMs• Two 4-DOF Barrett Hands• 2-DOF pan/tilt stereo head

Sensory feedback• Visual

• Hue• Saturation• Intensity• Texture

• Tactile • 6-axis finger-tip F/T sensors

• ProprioceptiveDexter


STAGE 1: SaccadeTrack - 25 Learning Episodes

atrack

atrack

atrack

asaccade asaccade

X 1X 0

1 X

0 X

X -

X X

Sst = psaccade ptrack

rewarding action

Track-saturation


Srg = pst preach pgrab

STAGE 2: ReachGrab - 25 Learning Episodes

rewarding action

TouchTrack-saturation


STAGE 2: ReachGrab - 25 Learning Episodes TouchTrack-saturation


STAGE 3: VisualInspect - 25 Learning Episodes

Svi = prg pcond ptrack(blue)


Track-blue

rewarding action


STAGE 3: VisualInspect - 25 Learning Episodes


Track-blue


STAGE 4: Grasp – User Defined Reward

X - -

1 X XX X X

ReachGrab

X

-

1

0

X 0 0 X 1 1

X 1 0

X 0 1

amoment aforce


Grasp

Track-blue

Sgrasp = prg pmoment pforce

rewarding action


STAGE 5: PickAndPlace – User Defined Reward

atransport amoment

X

-

1

0

X X X

Grasp

X 0 - X 0 0

X - -

1 X X X 1 1X 1 0

Spnp = pg ptransport pmoment

rewarding action


Conclusions

Mechanisms for creating hierarchical programs.• recursive formulation of potential functions and value functions.

control theoretic representation for action, state, and intrinsic reward.

Experimental demonstration of programming manipulation skills using staged learning episodes.

Intrinsic reward pushes out new behavior and models the affordances of objects.


Thank You

laboratory for perceptual robotics – department of computer science hierarchical mechanisms for...

Documents

track x

x x s

reinforcement learning

saccade x

p reach p grab stage

intrinsic extrinsic

descending gradient

convergence event x