learning from stories: making ai programming accessible - mark riedl
TRANSCRIPT
Learning from Stories: Making AI Programming AccessibleMark Riedl [email protected] @mark_riedl
The golden age of AI?
2
Who controls the AI?• Google?
• Facebook?
• OpenAI?
• People who can code – Fundamental understanding
of algorithms, data structures, theory & systems
3
Democratization of Artificial Intelligence
Making AI accessible• Can non-coders program artificial intelligences?
4
Training simulations
• Subject matter expert rapidly create a new sociocultural simulation w/o programming
* Tactical Language Training System
Computer games
• Create large numbers of virtual non-player characters (farmers, shopkeepers, etc.)
* Skyrim
Interactive machine learning• Reinforcement learning: AI automatically devises a
“program” for operating in a stochastic environment through trial and error (+ reward signal)
5
Interactive machine learning• Reinforcement learning: AI automatically devises a
“program” for operating in a stochastic environment through trial and error (+ reward signal)
6
• Humans train AI by interacting – Demonstration – Critique
• Must have the ability, presence, & patience to train the AI
What programming language?
Storytelling• Narrative is the fundamental
means by which we organize, understand, and explain the world
• Narrative intelligence: the ability to craft, tell, understand, and affectively respond to stories
• Storytelling is an effective means through which we convey complex, tacit knowledge to each other
7
Learning from stories• Hypothesis: if computers could comprehend stories then
humans can transfer complex procedural knowledge to computers by telling stories
• Don’t need to teach humans how to tell stories
8
Stories are hard for computers• Natural language
• Stories tend to skip “obvious” steps
• Humans are noisy
• Humans don’t know the agent capabilities or execution environment
9
Quixote• Reinforcement learning: AI automatically devises a
“program” for operating in a stochastic environment through trial and error
• Intuition: reward the agent for performing actions that mimic those of the protagonist in stories
➡Learn a reward function
10
Harrison & Riedl. AIIDE Conference, 2016.
• Markov Decision Process (MDP) – S: set of states – A: set of actions – Pa(s, s’): transition function – Ra(s, s’): reward function
• Policy (π) gives rules of behavior
Reinforcement learning
11
Missing
Solve this
Quixote
12
10
1015
Plot graph learning
Trajectory tree creation
Reward assignment
Reinforcement learning
Exemplar stories A plot graph A trajectory tree
A trajectory tree with events assigned reward valuesA policy mapping
states to actions
Environment
Crowdsourcing stories• Many people(via Amazon’s Mechanical Turk) to write a
story about what happens when someone does X
• Hard for individuals to think of many alternatives
• Need multiple examples to learn a pattern
13
Semantic lifting• If event transition is in a training story, agent can do it
• Gappy stories
• Malicious stories
• Learn a model that abstracts away from language to events – Fill gaps – Filter outliers
14
Malicious stories:John went to the pharmacy. John took the drugs and ran.
Gappy stories:John went to the pharmacy. John left with the prescription.
Plot graphs• Primitive events learned from natural language
• Temporal relations between events
15
Li, Lee-Urban, Johnston & Riedl. AAAI 2013 Conference.
walk/go into restaurant
read menu
choose menu item
wait in line
take out wallet place order
pay for food
......
16
choose restaurant
drive to restaurant
walk/go into restaurant
read menu
choose menu item
wait in line
drive to drive-thru
take out wallet place order
pay for food
wait for food
drive to window
get food
find table
sit down
eat food
clear trash
leave restaurant
drive home
Fast food restaurant
17
arrive at theatre
wait for ticket
go to ticket booth
buy tickets
choose movie
go to concession stand
order popcorn / soda show tickets
buy popcorn
enter theatre
find seats
turn off cellphone sit down
eat popcorn watch movie
hold handsuse bathroom discard trash
talk about movie
leave movie
drive home
kiss
Going on a date to the
movies
18
John covers face
John enters bank
John sees Sally
John waits in line
John approaches Sally
John gives Sally bag
Sally is scaredSally greets
JohnJohn hands Sally a noteJohn pulls
out gun
Sally screams
John points gun at Sally
John shows gun
Sally reads note
John demands money
Sally calls police
John drives away
John gets in car
John leaves bank
John opens bank door
John takes bag
Sally gives John bag
Sally presses alarm
Sally puts money in bag
The note demands money
Sally collects money
Sally opens cash drawer
Sally give John money
John collects money
Police arrives
Police arrests John
Bank robbery
Quixote
19
10
1015
Plot graph learning
Trajectory tree creation
Reward assignment
Reinforcement learning
Exemplar stories A plot graph A trajectory tree
A trajectory tree with events assigned reward valuesA policy mapping
states to actions
Environment
Trajectory tree generation• Generate all possible plot sequences from a plot graph
• Including stories hypothesized to be possible based on the plot graph but not part of the exemplars
20
Semantic lowering• Map plot events to agent operators
• Summed word2vec embeddings and cosine similarity
21
Agent actions:give_money
enter_pharmacyexit_pharmacy
pick_up_prescription
drop_money
0.83
0.67
0.14
op_21
op_23op_24
op_22
op_25
Quixote
22
10
1015
Plot graph learning
Trajectory tree creation
Reward assignment
Reinforcement learning
Exemplar stories A plot graph A trajectory tree
A trajectory tree with events assigned reward valuesA policy mapping
states to actions
Environment
• Fill gaps between events in trajectory tree
Reinforcement learning
23
World state space
a
c
d
Pharmacy world
24
Pharmacy world
25
Leave House
Go to bank Go to hospital Go to doctor
Don't get prescription hospital Don't get prescription doctor
Get prescription hospital Get prescription doctorWithdraw money
Go to pharmacy
Buy strong drugs Buy weak drugs
Go home
Harrison & Riedl. IJCAI Workshop on Interactive Machine Learning, 2016.
Simulation study• Events in trajectory tree manually mapped to agent
actions
• Policy is consistent with the plot graph
• Never steals drugs or money
• (Unless the agent has no other course of action)
26
Harrison & Riedl. IJCAI Workshop on Interactive Machine Learning, 2016.
Event correspondence• Automatic mapping of plot graph events to agent actions
• Missing mappings
• Plot graph events map to multiple agent actions
27
Leave House
Go to bank Go to hospital Go to doctor
Don't get prescription hospital Don't get prescription doctor
Get prescription hospital Get prescription doctorWithdraw money
Go to pharmacy
Buy strong drugs Buy weak drugs
Go home
give_money
enter_pharmacyexit_pharmacy
pick_up_prescription
Agent actions:
(word2vec)
28
AI morals• When humans tell stories they implicitly demonstrate their
values, social norms, and social conventions
• Crowdsourced stories —> children’s stories —> adult
• Different agents for different cultures
Harrison & Riedl. AAAI Workshop on AI, Ethics & Society, 2016.
Concluding thoughts• The future is not distributed evenly (William Gibson)
• Accessibility and democratization —> creativity
• Narrative intelligence is central to many of the things humans do on a day to day basis
• AI able to understand and reason more like humans will unlock its potential
29
Thanks!• Brent Harrison
• Boyang (Albert) Li
• Stephen Lee-Urban
• Siddhartha Banerjee
• D. Scott Appling
• George Johnston
30
http://www.cc.gatech.edu/~riedl/@[email protected]