learning in worlds with objects

21
NTT-MIT Collaboration Meeting, 2001 Leslie Pack Kaelbling 1 Learning in Worlds with Objects Leslie Pack Kaelbling MIT Artificial Intelligence Laboratory With Tim Oates, Natalia Hernandez, Sarah Finney

Upload: durin

Post on 27-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Learning in Worlds with Objects. Leslie Pack Kaelbling MIT Artificial Intelligence Laboratory With Tim Oates, Natalia Hernandez, Sarah Finney. What is an Agent?. A system that has an ongoing interaction with an external environment household robot factory controller web agent - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 1

Learning in Worlds with Objects

Leslie Pack Kaelbling

MIT Artificial Intelligence Laboratory

With Tim Oates, Natalia Hernandez, Sarah Finney

Page 2: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 2

What is an Agent?

A system that has an ongoing interaction with an external environment

• household robot• factory controller• web agent• Mars explorer• pizza delivery robot

Environment

ActionObservation

Page 3: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 3

Agents Must Learn

Learning is a crucial aspect of intelligent behavior• human programmers lack required knowledge• agents should work in a variety of environments• agents should work in changing environments

What to learn?• World dynamics: What happens when I take a

particular action?• Reward: What world states are good?

Page 4: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 4

Current state-of-the-art learning methods will not work in domains with multiple objects:

These are crucial domains for robots of the future.

Crisis

?

Page 5: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 5

Representation

Learning requires some sort of representation of states of the world.

The choice of representation affects• what information can be represented• what kinds of generalizations the agent can make

Page 6: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 6

Attribute VectorState-of-the-art representation for learning

temperature = 48.2pressure = 57.9 mBvalve1 = openvalve2 = closedtime = 10:48AMbacklog = 78volume = 32.2production = 45.5…

Page 7: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 7

Generalization over Attribute Vectors

0

1

2

3 0

1

2

3

-1-0.5

00.51

0

1

2

3

temp > 22

time < 10AMpressure < 3

closevalve

increasetemp

addreagent

openvalve

temp

time

x

Page 8: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 8

Complex Everyday Domains

book1-on-book2: truebook2-on-book1: falsepen-is-yellow: truepen-is-blue: falselamp-on: truelamp-off: falseink-bottle-level: 50%lamp-in-bottle: falsebottle-on-lamp: falsepaper1-color: graypaper2-color: whitefabric-behind-lamp: truebook2-is-clear: falsebook4-is-clear: falsebook1-is-clear: trueblock1-on-block2: falseblock3-unstable: trueblock2-on-table: falseblock1-in-front-of-lamp: true

Attribute vector is impossibly big

Page 9: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 9

Generalization over Objects

• If book1 is on book2 and I move book2, then book1 will move

• If the cup is on the table and I move the table, then the cup will move

• If the pen is on the paper and I move the paper, then the pen will move

• If the coat is on the chair and I move the chair, then the coat will move

For all objects A and B:

If A is on B and I move B, then A will move

Page 10: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 10

Referring to Objects

Traditional symbolic AI has the problem of “symbol grounding”:

How do I know what object is named by book1?

on(book1,book2)

Page 11: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 11

Deictic Expressions“Deixis” is Greek for “pointing”

koko ima

watashi-ga motteiru hako watashi-ga miteiru hako

Page 12: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 12

Automatic Generalization

If I have an object in my hand and I open my hand, then the object that was in my hand is now on the table

This is true, no matter what object is in your hand.

Page 13: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 13

Communicating with Humans

Natural language communication• speaks of the world in terms of objects and their

relationships• uses deictic expressions

Our robots of the future will have to be able to understand and generate human descriptions of the world

Page 14: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 14

Long-Term Research Goal

A robotic system with hand and cameras that can• learn to achieve tasks efficiently through trial and

error• acquire natural language descriptions of the

objects and their properties through “conversation” with humans

Page 15: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 15

Short-Term Research Plan

Explore deictic, object-based representation for learning algorithms

• build simulated hand-eye robot system that manipulates blocks (with real physics)

• have simulated robot learn to carry out tasks from trial and error

Demonstrate empirically and theoretically that deictic representation is crucial for efficient learning

Page 16: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 16

First Example Domain

Unreliable block stacking:• robot is rewarded for making tall piles of blocks• the taller a pile is, the more likely it is to fall over

when another block is added• a pile can be made more stable by building piles to

its sides

Once the robot learns to do this task, keep the physics of the domain the same, but reward a more complex behavior.

Page 17: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 17

Learning by Doing

Having an initial task to perform focuses the robot’s attention on aspects of the environment

• Use extension of Utree learning algorithm to select important aspects of the environment

• Generate new deictic expressions dynamically: the-block-on-top-of(the-block-I-am-looking-at)

• Extend reinforcement learning methods to apply to object-based representations

Page 18: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 18

Extracting General Rules

There are too many facts that are true in any interesting environment.

Solving tasks focuses attention on • particular objects (named with deictic

expressions)• particular properties of those objects

These objects and properties are likely of general importance: use them as input to association-rule learning algorithm to learn facts like:

The thing that is on the thing that I am holding will probably fall off if I move

Page 19: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 19

Enabling Planning

Given general rules, the agent can “think” about the consequences of its actions and decide what to do, rather than learn through trial and error.

Page 20: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 20

In Future

An ambitious research project• vision algorithms for learning segmentation and

object recognition• learning good properties and relations for

characterizing the domain (“concept learning”)• connect with natural language learning for word

meanings

Page 21: Learning in Worlds with Objects

NTT-MIT Collaboration Meeting, 2001Leslie Pack Kaelbling 21

Don’t missany dirt!