subramanian ramamoorthy school of informatics the university of edinburgh 3 december 2008

24
Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Upload: zoe-goodman

Post on 05-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Subramanian Ramamoorthy

School of Informatics

The University of Edinburgh

3 December 2008

Page 2: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Physically Virtually

Page 3: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Autonomy is not useful unless it is also “robust” – a many-hued concept

My focus is on strategy: systems issues vs. task

Consider this origami robot [Balkcom & Mason at CMU]:

Can we do such things autonomously using Nao/PR2 (robotic equivalents of the MITS Altair or Apple I from 1975) – in a semi-structured home environment?

Autonomous robots must act in an adversarial world

Page 4: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Environment

Perception

Action

Adversarial actions & otheragents

Adversarial actions & otheragents

High-level goals

Problem: How to generate actions, to achieve high-level goals, using limited perception and incomplete knowledge of environment & adversarial actions?

Page 5: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Robust control Play a differential game against nature or other self-interested/cooperative agents

w are adversarial actions (e.g., large deviations)Constrained high-dim partially-observed problem is

hard!

Page 6: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

(X,U,W)Robust control

Game, adversary, strategy

(X,U)Feedback control

& optimality

(X,W)Verification

(X)Motion synthesis

& planning

Approach: Use this structure to devise abstractions & shape learning

Model incompleteness: Many constraints (e.g., c-space limits) play out at a slower time scale

Adversary: Constraints impact immediate moves, e.g., state space subset rendered infeasible, and longer term (sequential decision making)

Can we combine such problem factorization and machine learning methods to learn solutions?

Page 7: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008
Page 8: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Phase space of the pendulum

• System consists of two subsystems – pendulum and cart on finite track

• Only one actuator – cart• We want global

asymptotic stability of 4-dim system

• The Game: Experimenter hits the pole with arbitrary velocity at any time, system picks controls

• What are the weak sufficient conditions defining this task?

Page 9: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

We want to reach andstay here

The uncontrolled systemconverges to this point

Adversary could pushsystem anywhere, e.g., here

Larger disturbancescould truly changequantitative details,e.g., any number of rotations around origin

Can describeglobal strategyas a qualitativetransition graph

Page 10: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Lemma (Spring – Mass - Positive Damping):Let a system be described by

where, and Then it is asymptotically stable at (0,0).

Lemma (Spring – Mass - Negative Damping):Let a system be described by where, and Then it has an unstable fixed-point at (0,0), and no limit cycle.

Page 11: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

The control law:if Balance

else if Pump

else Spin

Constraints:

Page 12: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

The switching strategy: If then Balance else if then Pump else Spin

[Ramamoorthy & Kuipers, HSCC 02 & 03]

Page 13: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Result: Best Response computation for this game

* A few more technical steps to ‘lift’ pendulum strategy to 4-dim

Page 14: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008
Page 15: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Many constraints – dynamic stability, intermittent footholds

Incomplete models: No high-dim models, only data from randomized exploration

The Game: Nature picks foothold (on-line), robot picks trajectory

Page 16: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

(X,U,W)

(X,U) (X,W)

(X)

Define qualitative strategyin low-dimensions (finitehorizon optimal control)

Lift resulting strategy tothe more complex c-space(presently unknown!)

Page 17: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008
Page 18: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

[Ramamoorthy & KuipersRSS 06, ICRA 08]

Page 19: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Task encoding: Knot energy shape

descriptor For an n-edge polygonal

knot

Manipulation planning(Offline) Learn multi-

scale structure in energy functional

(Online) – Navigate a hierarchical graph

The Game: Nature/adversary picks ways to deform/disturb object, robot picks manipulation actions

Page 20: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Action synthesis: Shaped reinforcement

learning (SARSA) Optimality of MDP

solution is not compromised – knot energy is a valid potential energy

10x faster than uninformed RL

For large problems, RL simply doesn’t converge within acceptable time – ours does

[Also see poster bySandhya Prabhakaran]

Page 21: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008
Page 22: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Not hard to acquire low-dimensional models from dataSimple tools like PCA/SVD have been around for a

long timeRecent explosion of non/semi-parametric methods

Hard to summarize this information for use in the larger planning and control framework In order to reason about adversaries and actions

My approach:Define notions of system equivalence –many

geometric ideas Sampling-based algorithms to induce abstractions ,

with dim(A) << dim(Q)

Page 23: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Shaping (PO)MDP and related modelsHow to combine this with abstraction concepts and

algorithms in previous slide?Multi-scale formulations of learning algorithms

Risk-sensitive control – beyond simple best responseControl learning is often driven by metrics related to

predictive accuracyFor robust control, we may be interested in quite

different issues, e.g., large reachable sets from all c-space points

Particularly relevant in electronic markets and competitive scenarios, i.e., agents with conflicting interests

Page 24: Subramanian Ramamoorthy School of Informatics The University of Edinburgh 3 December 2008

Pendulum and bipedal walking problems are from my PhD thesis – work with Benjamin Kuipers (U. Texas – Austin)

Knots work was done by Sandhya Prabhakaran - MSc thesis

Collaborators in my current & future work: Ioannis Havoutis, Thomas Larkworthy (PhD students) Sethu Vijayakumar, Taku Komura, Michael Herrmann (IPAB) Rahul Savani (Warwick) – algorithms for automated trading Ram Rajagopal (Berkeley) – sampling & non-parametric

learning

* The title of this talk is taken from a wonderful book by Peter Bernstein