subramanian ramamoorthy school of informatics the university of edinburgh 3 december 2008
TRANSCRIPT
Subramanian Ramamoorthy
School of Informatics
The University of Edinburgh
3 December 2008
Physically Virtually
Autonomy is not useful unless it is also “robust” – a many-hued concept
My focus is on strategy: systems issues vs. task
Consider this origami robot [Balkcom & Mason at CMU]:
Can we do such things autonomously using Nao/PR2 (robotic equivalents of the MITS Altair or Apple I from 1975) – in a semi-structured home environment?
Autonomous robots must act in an adversarial world
Environment
Perception
Action
Adversarial actions & otheragents
Adversarial actions & otheragents
High-level goals
Problem: How to generate actions, to achieve high-level goals, using limited perception and incomplete knowledge of environment & adversarial actions?
Robust control Play a differential game against nature or other self-interested/cooperative agents
w are adversarial actions (e.g., large deviations)Constrained high-dim partially-observed problem is
hard!
(X,U,W)Robust control
Game, adversary, strategy
(X,U)Feedback control
& optimality
(X,W)Verification
(X)Motion synthesis
& planning
Approach: Use this structure to devise abstractions & shape learning
Model incompleteness: Many constraints (e.g., c-space limits) play out at a slower time scale
Adversary: Constraints impact immediate moves, e.g., state space subset rendered infeasible, and longer term (sequential decision making)
Can we combine such problem factorization and machine learning methods to learn solutions?
Phase space of the pendulum
• System consists of two subsystems – pendulum and cart on finite track
• Only one actuator – cart• We want global
asymptotic stability of 4-dim system
• The Game: Experimenter hits the pole with arbitrary velocity at any time, system picks controls
• What are the weak sufficient conditions defining this task?
We want to reach andstay here
The uncontrolled systemconverges to this point
Adversary could pushsystem anywhere, e.g., here
Larger disturbancescould truly changequantitative details,e.g., any number of rotations around origin
Can describeglobal strategyas a qualitativetransition graph
Lemma (Spring – Mass - Positive Damping):Let a system be described by
where, and Then it is asymptotically stable at (0,0).
Lemma (Spring – Mass - Negative Damping):Let a system be described by where, and Then it has an unstable fixed-point at (0,0), and no limit cycle.
The control law:if Balance
else if Pump
else Spin
Constraints:
The switching strategy: If then Balance else if then Pump else Spin
[Ramamoorthy & Kuipers, HSCC 02 & 03]
Result: Best Response computation for this game
* A few more technical steps to ‘lift’ pendulum strategy to 4-dim
Many constraints – dynamic stability, intermittent footholds
Incomplete models: No high-dim models, only data from randomized exploration
The Game: Nature picks foothold (on-line), robot picks trajectory
(X,U,W)
(X,U) (X,W)
(X)
Define qualitative strategyin low-dimensions (finitehorizon optimal control)
Lift resulting strategy tothe more complex c-space(presently unknown!)
[Ramamoorthy & KuipersRSS 06, ICRA 08]
Task encoding: Knot energy shape
descriptor For an n-edge polygonal
knot
Manipulation planning(Offline) Learn multi-
scale structure in energy functional
(Online) – Navigate a hierarchical graph
The Game: Nature/adversary picks ways to deform/disturb object, robot picks manipulation actions
Action synthesis: Shaped reinforcement
learning (SARSA) Optimality of MDP
solution is not compromised – knot energy is a valid potential energy
10x faster than uninformed RL
For large problems, RL simply doesn’t converge within acceptable time – ours does
[Also see poster bySandhya Prabhakaran]
Not hard to acquire low-dimensional models from dataSimple tools like PCA/SVD have been around for a
long timeRecent explosion of non/semi-parametric methods
Hard to summarize this information for use in the larger planning and control framework In order to reason about adversaries and actions
My approach:Define notions of system equivalence –many
geometric ideas Sampling-based algorithms to induce abstractions ,
with dim(A) << dim(Q)
Shaping (PO)MDP and related modelsHow to combine this with abstraction concepts and
algorithms in previous slide?Multi-scale formulations of learning algorithms
Risk-sensitive control – beyond simple best responseControl learning is often driven by metrics related to
predictive accuracyFor robust control, we may be interested in quite
different issues, e.g., large reachable sets from all c-space points
Particularly relevant in electronic markets and competitive scenarios, i.e., agents with conflicting interests
Pendulum and bipedal walking problems are from my PhD thesis – work with Benjamin Kuipers (U. Texas – Austin)
Knots work was done by Sandhya Prabhakaran - MSc thesis
Collaborators in my current & future work: Ioannis Havoutis, Thomas Larkworthy (PhD students) Sethu Vijayakumar, Taku Komura, Michael Herrmann (IPAB) Rahul Savani (Warwick) – algorithms for automated trading Ram Rajagopal (Berkeley) – sampling & non-parametric
learning
* The title of this talk is taken from a wonderful book by Peter Bernstein