learning from stories: making ai programming accessible - mark riedl

Learning from Stories: Making AI Programming AccessibleMark Riedl [email protected] @mark_riedl

The golden age of AI?

2

Who controls the AI?• Google?

• Facebook?

• OpenAI?

• People who can code – Fundamental understanding

of algorithms, data structures, theory & systems

3

Democratization of Artificial Intelligence

Making AI accessible• Can non-coders program artificial intelligences?

4

Training simulations

• Subject matter expert rapidly create a new sociocultural simulation w/o programming

* Tactical Language Training System

Computer games

• Create large numbers of virtual non-player characters (farmers, shopkeepers, etc.)

* Skyrim

Interactive machine learning• Reinforcement learning: AI automatically devises a

“program” for operating in a stochastic environment through trial and error (+ reward signal)

5

Interactive machine learning• Reinforcement learning: AI automatically devises a

“program” for operating in a stochastic environment through trial and error (+ reward signal)

6

• Humans train AI by interacting – Demonstration – Critique

• Must have the ability, presence, & patience to train the AI

What programming language?

Storytelling• Narrative is the fundamental

means by which we organize, understand, and explain the world

• Narrative intelligence: the ability to craft, tell, understand, and affectively respond to stories

• Storytelling is an effective means through which we convey complex, tacit knowledge to each other

7

Learning from stories• Hypothesis: if computers could comprehend stories then

humans can transfer complex procedural knowledge to computers by telling stories

• Don’t need to teach humans how to tell stories

8

Stories are hard for computers• Natural language

• Stories tend to skip “obvious” steps

• Humans are noisy

• Humans don’t know the agent capabilities or execution environment

9

Quixote• Reinforcement learning: AI automatically devises a

“program” for operating in a stochastic environment through trial and error

• Intuition: reward the agent for performing actions that mimic those of the protagonist in stories

➡Learn a reward function

10

Harrison & Riedl. AIIDE Conference, 2016.

• Markov Decision Process (MDP) – S: set of states – A: set of actions – Pa(s, s’): transition function – Ra(s, s’): reward function

• Policy (π) gives rules of behavior

Reinforcement learning

11

Missing

Solve this

Quixote

12

10

1015

Plot graph learning

Trajectory tree creation

Reward assignment


Exemplar stories A plot graph A trajectory tree

A trajectory tree with events assigned reward valuesA policy mapping

states to actions

Environment

Crowdsourcing stories• Many people(via Amazon’s Mechanical Turk) to write a

story about what happens when someone does X

• Hard for individuals to think of many alternatives

• Need multiple examples to learn a pattern

13

Semantic lifting• If event transition is in a training story, agent can do it

• Gappy stories

• Malicious stories

• Learn a model that abstracts away from language to events – Fill gaps – Filter outliers

14

Malicious stories:John went to the pharmacy. John took the drugs and ran.

Gappy stories:John went to the pharmacy. John left with the prescription.

Plot graphs• Primitive events learned from natural language

• Temporal relations between events

15

Li, Lee-Urban, Johnston & Riedl. AAAI 2013 Conference.

walk/go into restaurant

read menu

choose menu item

wait in line

take out wallet place order

pay for food

......

16

choose restaurant

drive to restaurant

walk/go into restaurant

read menu

choose menu item

wait in line

drive to drive-thru

take out wallet place order

pay for food

wait for food

drive to window

get food

find table

sit down

eat food

clear trash

leave restaurant

drive home

Fast food restaurant

17

arrive at theatre

wait for ticket

go to ticket booth

buy tickets

choose movie

go to concession stand

order popcorn / soda show tickets

buy popcorn

enter theatre

find seats

turn off cellphone sit down

eat popcorn watch movie

hold handsuse bathroom discard trash

talk about movie

leave movie

drive home

kiss

Going on a date to the

movies

18

John covers face

John enters bank

John sees Sally

John waits in line

John approaches Sally

John gives Sally bag

Sally is scaredSally greets

JohnJohn hands Sally a noteJohn pulls

out gun

Sally screams

John points gun at Sally

John shows gun

Sally reads note

John demands money

Sally calls police

John drives away

John gets in car

John leaves bank

John opens bank door

John takes bag

Sally gives John bag

Sally presses alarm

Sally puts money in bag

The note demands money

Sally collects money

Sally opens cash drawer

Sally give John money

John collects money

Police arrives

Police arrests John

Bank robbery

Quixote

19

10

1015

Plot graph learning


Reward assignment




states to actions

Environment

Trajectory tree generation• Generate all possible plot sequences from a plot graph

• Including stories hypothesized to be possible based on the plot graph but not part of the exemplars

20

Semantic lowering• Map plot events to agent operators

• Summed word2vec embeddings and cosine similarity

21

Agent actions:give_money

enter_pharmacyexit_pharmacy

pick_up_prescription

drop_money

0.83

0.67

0.14

op_21

op_23op_24

op_22

op_25

Quixote

22

10

1015

Plot graph learning


Reward assignment




states to actions

Environment

• Fill gaps between events in trajectory tree


23

World state space

a

c

d

Pharmacy world

24

Pharmacy world

25

Leave House

Go to bank Go to hospital Go to doctor

Don't get prescription hospital Don't get prescription doctor

Get prescription hospital Get prescription doctorWithdraw money

Go to pharmacy

Buy strong drugs Buy weak drugs

Go home

Harrison & Riedl. IJCAI Workshop on Interactive Machine Learning, 2016.

Simulation study• Events in trajectory tree manually mapped to agent

actions

• Policy is consistent with the plot graph

• Never steals drugs or money

• (Unless the agent has no other course of action)

26

Harrison & Riedl. IJCAI Workshop on Interactive Machine Learning, 2016.

Event correspondence• Automatic mapping of plot graph events to agent actions

• Missing mappings

• Plot graph events map to multiple agent actions

27

Leave House

Go to bank Go to hospital Go to doctor

Don't get prescription hospital Don't get prescription doctor

Get prescription hospital Get prescription doctorWithdraw money

Go to pharmacy

Buy strong drugs Buy weak drugs

Go home

give_money

enter_pharmacyexit_pharmacy

pick_up_prescription

Agent actions:

(word2vec)

28

AI morals• When humans tell stories they implicitly demonstrate their

values, social norms, and social conventions

• Crowdsourced stories —> children’s stories —> adult

• Different agents for different cultures

Harrison & Riedl. AAAI Workshop on AI, Ethics & Society, 2016.

Concluding thoughts• The future is not distributed evenly (William Gibson)

• Accessibility and democratization —> creativity

• Narrative intelligence is central to many of the things humans do on a day to day basis

• AI able to understand and reason more like humans will unlock its potential

29

Thanks!• Brent Harrison

• Boyang (Albert) Li

• Stephen Lee-Urban

• Siddhartha Banerjee

• D. Scott Appling

• George Johnston

30

http://www.cc.gatech.edu/~riedl/@[email protected]

learning from stories: making ai programming accessible - mark riedl

Technology