conversational informatics - 京都大学 · siri (apple), now (google), ortana (microsoft), m...
TRANSCRIPT
Conversational Informatics, April 13, 2016
1. Introduction—Conversational Informatics—
Toyoaki NishidaKyoto University
Copyright © 2016, Toyoaki Nishida, Atsushi Nakazawa, Yoshimasa Ohmoto, Yasser Mohammad, At ,Inc. All Rights Reserved.
Year AI ICT
1940~ 1936: Turing Machine, 1947: von Neumann Computer, 1948: Information Theory, by C. Shannon and W. Weaver, 1948: Cybernetics by Wiener
1950~ 1952-62: Checker program by A.Samuel1956: Dartmouth Conference 1957: FORTRAN by J.Backus
1960~ 1961: Symbolic Integration program SAINT by J.Slagle1962: Perceptron by F.Rosenblatt1966: The ALPAC report against Machine Translation by R. Pierce1967: Formula Manipulation System Macsyma by J.Moses1967: Dendral for Mass Spectrum Analysis by E.Feigenbaum
1961: Mathematical theory of Packet Networks by L. Kleinrock1963: Interactive Computer Graphics by I.Sutherland
1968: Mouse and Bitmap display for oN Line System (NLS) by D.C.Engelbart1969: ARPA-net
1970~ 1971: Natural Language Dialogue System SHRDLU, by T.Winograd1973: Combinatorial Explosion problem pointed out in The Lighthill report1974: MYCIN by T.ShortliffeMid 1970’s: Prial Sketch and Visual Perceptron by D.Marr1976: Automated Mathematician (AM) by D.Lenat1979: Autonomous Vehicle Stanford Cart by H.Moravec
1970: ALOHAnet1970: Relational Database Theory by E.F.Codd1972: Theory of NP-completeness by S.Cook and R.KarpMid 1970’s: Alto Machine by A.Kay and A.Goldberg1976: Ethernet1979: Spreadsheet Program Visicalc by D.Bricklin
1980~ 1982: Fifth Generation Computer Project1984: The CYC Project by D.LenatMid 1980’s: Back-propagation algorithm was widely used1985: the Cybernetic Artist Aaron by H.Cohen1986: Subsumption Architecture by R.Brooks1989: An Autonomous Vehicle ALVINN by D.Pomerleau
1982:TCP/IP Protocol by B.Kahn and V.CerfMid 1980’s: First Wireless Tag Products1987: UUNET started the Commercial UUCP Network Connection Service1988: Internet worm (Morris Worm)1989: World Wide Web by T.Berners-Lee1989: The number of hosts on the Internet has exceeded 100,000.
1990~ 1990: Genetic Programming by J.R.KozaEarly1990’s: TD-Gammon by G.TesauroMid 1990’s: Data Mining Technology1997: DeepBlue defeated the World Chess Champion G.Kasparov1997: The First Robocup by H.Kitano1999: Robot pets became commercially available
1992: The number of hosts on the Internet has exceeded 1,000,000.1994: Shopping malls on the Internet1994: W3C was founded by T. Berners-Lee1997: Google Search1998: XML1.0(eXtensible Markup Language) by W3C1998: PayPal
2000~ 2000: Honda Asimo
2004: The Mars Exploration Rovers (Spirit & Opportunity)
2001: Wikipedia.2003: Skype / iTunes store2004: Facebook2005: YouTube / Google Earth2006: Twitter2007: Google Street View
2010~ 2010: Google Driverless Car / Kinect2011: IBM Watson Jeopardy defeated two of the greatest champions2012: Siri
History of AI research in contrast with ICT
Year AI ICT
1940~ 1936: Turing Machine, 1947: von Neumann Computer, 1948: Information Theory, by C. Shannon and W. Weaver, 1948: Cybernetics by Wiener
1950~ 1952-62: Checker program by A.Samuel1956: Dartmouth Conference 1957: FORTRAN by J.Backus
1960~ 1961: Symbolic Integration program SAINT by J.Slagle1962: Perceptron by F.Rosenblatt1966: The ALPAC report against Machine Translation by R. Pierce1967: Formula Manipulation System Macsyma by J.Moses1967: Dendral for Mass Spectrum Analysis by E.Feigenbaum
1961: Mathematical theory of Packet Networks by L. Kleinrock1963: Interactive Computer Graphics by I.Sutherland
1968: Mouse and Bitmap display for oN Line System (NLS) by D.C.Engelbart1969: ARPA-net
1970~ 1971: Natural Language Dialogue System SHRDLU, by T.Winograd1973: Combinatorial Explosion problem pointed out in The Lighthill report1974: MYCIN by T.ShortliffeMid 1970’s: Prial Sketch and Visual Perceptron by D.Marr1976: Automated Mathematician (AM) by D.Lenat1979: Autonomous Vehicle Stanford Cart by H.Moravec
1970: ALOHAnet1970: Relational Database Theory by E.F.Codd1972: Theory of NP-completeness by S.Cook and R.KarpMid 1970’s: Alto Machine by A.Kay and A.Goldberg1976: Ethernet1979: Spreadsheet Program Visicalc by D.Bricklin
1980~ 1982: Fifth Generation Computer Project1984: The CYC Project by D.LenatMid 1980’s: Back-propagation algorithm was widely used1985: the Cybernetic Artist Aaron by H.Cohen1986: Subsumption Architecture by R.Brooks1989: An Autonomous Vehicle ALVINN by D.Pomerleau
1982:TCP/IP Protocol by B.Kahn and V.CerfMid 1980’s: First Wireless Tag Products1987: UUNET started the Commercial UUCP Network Connection Service1988: Internet worm (Morris Worm)1989: World Wide Web by T.Berners-Lee1989: The number of hosts on the Internet has exceeded 100,000.
1990~ 1990: Genetic Programming by J.R.KozaEarly1990’s: TD-Gammon by G.TesauroMid 1990’s: Data Mining Technology1997: DeepBlue defeated the World Chess Champion G.Kasparov1997: The First Robocup by H.Kitano1999: Robot pets became commercially available
1992: The number of hosts on the Internet has exceeded 1,000,000.1994: Shopping malls on the Internet1994: W3C was founded by T. Berners-Lee1997: Google Search1998: XML1.0(eXtensible Markup Language) by W3C1998: PayPal
2000~ 2000: Honda Asimo
2004: The Mars Exploration Rovers (Spirit & Opportunity)
2001: Wikipedia.2003: Skype / iTunes store2004: Facebook2005: YouTube / Google Earth2006: Twitter2007: Google Street View
2010~ 2010: Google Driverless Car / Kinect2011: IBM Watson Jeopardy defeated two of the greatest champions2012: Siri
History of AI research in contrast with ICT
Exponential Growth- Moore’s Law
- Expanding Internet World
Universal Turing Machine- from Hardware to Software
Generic Computing Engines- from Software to Data
Service Copying- Machine Learning / Data Mining
Symbolic Computing- Knowledge-based Systems
Heuristics- Intelligence as Clever Computing
Embodied & Network Intelligence- Like human and society
Very near future—rise of cognitive computing
Siri (Apple), Now (Google), Cortana (Microsoft), M (Facebook), …, Conversational commerce
World is waiting for conversational informaitcs
The key issue is building a common ground
[Nishida-Nakazawa-Ohmoto-Mohammad 2014]
Common Ground
Artificial Intelligence
People
Difficulty: common ground is like iceberg!Often tacit and not observable.
Types of common ground [Clark 1996]
1. Communala. Human natureb. Communal lexiconsc. Cultural facts, norms,
procedures2. Personal
a. Perceptual basesgestural indications, partner’s activities, salient perceptual events
b. Actional basesc. Personal diaries
Presuppositions for conversation that each participant is supposed to share about surroundings, activities, perceptions, emotions, plans, interests, etc.
Behavior in public places
[Clark 1996, p. 14]
Structure of participation
The people around an action divide first into those (participants) who are truly participating in it and those (nonparticipants) who are not.- The speaker- The addressee- Side participant—taking part in the conversation but not currently being addressed.- Overhearer—has no rights or responsibilities in it- Bystander—openly present but not part of the conversation- Eavesdropper—those who listen in without the speaker’s awareness.
Speaker
Side participant
Addressee
All participantBystander
Eavesdropper
All listeners
History of conversational systems development
t1990 2000 201019801970
Natural language dialogue systems
Speech dialogue systems
Multi-modal dialogue systems
Embodied Conversational Agents / Intelligent Virtual Human
Story Understanding systems
Conversational Systems
Transactional systems
Interactional systems
Affective Computing
Cognitive systems
Natural language question answering systems
The Knowledge Navigator
Early Natural Language Dialogue Systems
[Green 1961]
Spec[ification] list:
“Where did the Red Sox play on July 7?”-> Place = ?
Team = Red SoxMonth = JulyDay = 7
“What teams won 10 games in July?”-> Team(winning) = ?
Game(number of) = 10Month = July
Dictionary:“team” -> meaning: Team=(blank)“Red Sox” -> meaning: Team=Red Sox“who” -> meaning: Team=?“winning” -> meaning: subroutine
Data:Month = July
Place= BostonDay = 7Game Serial No. = 96(Team = Red Sox, Score= 5)(Team = Yankees, Score = 3)
Baseball
• One of the earliest natural language question answering system.
• Answers such questions about baseball games.
Early Natural Language Dialogue Systems
[Weizenbaum 1966]
The syntax routine determines whether the verb is active or passive and locates its subject and object.
The syntax analysis checks to see if any of the words is marked as a question word. If not, a signal is set to indicate that the question requires a yes/no answer
4. Content AnalysisThe content analysis uses the dictionary meanings and the results of the syntactic analysis to set up a specification list for the processing program. E.g.,
“each team”: Team=(blank) -> Team=each“what team” :Team=(blank)-> Team=?“Who beat the Yankees on July 4”: Team=(blank)-> Team=?, Team(winning)=? Team(losing)=Yankees“six games” : Game=(blank)->Game(number of)=6“how many games”: Game=(blank)->Game(number of)=?“Who was the winning team…”: Team=? and Team(winning)=(blank) -> Team(winning)=?
5. ProcessingThe specification list indicates to the processor what part of the stored data is relevant for answering the input question. The processor extracts the matching information from the data and procedures, for the responder, the answer to the question in the form of a list structure.
Modules of Baseball
1. Question Read-in2. Dictionary Look-up3. Syntax(1) scan for ambiguities in part of speech: in some
cases, resolved by looking at adjoining words; in other cases, resolved by inspecting the entire question.
(2) locates and brackets the noun phrases, [], and the prepositional and adverbial phrases, (). The verb is left unbracketed. E.g.,
“How many games did the Yankees play in July?”-> [How many games] did [the Yankees] play (in [July])?
Any unbracketed preposition is attached to the first noun phrase in the sentence, and prepositional brackets added. E.g.,
“Who did the Red Sox lose to on July 5?”-> (To [who]) did [the Red Sox] lose (on [July 5])?
Early Natural Language Dialogue Systems
[Woods 1973]
LUNAR
Prototype natural language question answering system that helps lunar geologists access chemical analysis data on lunar rock and soil composition.
(i) Syntactic analysis using heuristic/semantic information to choose the most likely parsing (Augmented Transition Network Grammar was used)
(ii) Semantic interpretation: produce a formal representation for queries
(iii) Execution of this formal expression in the retrieval
Questions LUNAR “understands”
1. (List samples with Silicon)(Give me all lunar samples with Magnetite)(In which samples has Apatite been identified)(How many samples contain Titanium)(Which rocks contain Chromite and Ulvospinel)(Which rocks do not contain Chromite and Ulvospinel)
2. (What analyses of Olivine are there)(Analyses of Strontium in Plagioclase)(What are the Plag analyses for breccias)(Rare earth analyses for S10005)(I need all chemical analyses of lunar soil)(What is the composition of Ilmenite in rock 10017)(List the analyses of Aluminum in vugs)(Nickel content of opaques)
3. (Which samples are breccias)(What are the igneous rocks)(What types of sample are there)(What is the number of phases in each sample)
4. (Give me the K / Rb ratios for all lunar samples)(What is the specific activity of A126 in soil)(Give me all references on fayalitic Olivine)(Which rock is the oldest)(Which is the oldest rock)...
Early Natural Language Dialogue Systems
[Woods 1973]
Augmented Transition Network Grammar
S:NP V NP
AUX VPP
NP:
DET NOUN
ADJ
PRON
PREP NPPP:
Extension of recursive transition network model which is weakly equivalent to the context-free grammar model.
“The augmented transition network builds up a partial structural description of the sentence as it proceeds from state to state through the network. The pieces of this partial description are held in registers which can contain any rooted tree or list of rooted trees and which are automatically pushed down when a recursive application of the transition network is called for and restored when the lower level (recursive) computation is completed. The structure-building actions on the arcs specify changes in the contents of these registers in terms of their previous contents, the contents of other registers, the current input symbol, and/or the result of lower level computations. In addition to holding pieces of substructure that will eventually be incorporated into a larger structure, the registers may also be used to hold flags or other indicators to be interrogated by conditions on the arcs.”
Early Natural Language Dialogue Systems
Winograd, Terry, Understanding Natural Language, New York: Academic Press, 1972. http://hci.stanford.edu/~winograd/shrdlu/
SHRDLU
• Natural language understanding system working on the “Blocks” world.
• Based on the belief that “a computer cannot deal with language unless it can understand the subject it is discussing”
• Answers questions, executes commands, and accepts information in normal English dialog.
• Uses semantic information and context to understand discourse and disambiguate sentences.
• Procedural knowledge representation <-> declarative KR
Early Natural Language Dialogue Systems
Winograd, Terry, Understanding Natural Language, New York: Academic Press, 1972. http://hci.stanford.edu/~winograd/shrdlu/
USER: “Pick up a big red block”
SHRDLU: “OK”
simulate robot arm
Syntactic analysis
Semantic analysis
Planning
Graphics
Questioning answering
SHRDLU
Early Natural Language Dialogue Systems
[Weizenbaum 1966]
“ELIZA is a program operating within the MAC time-sharing system at MIT which makes certain kinds of natural language conversation between man and computer possible. Input sentences are analyzed on the basis of decomposition rules which are triggered by key words appearing in the input text. Responses are generated by reassembly rules associated with selected decomposition rules. The fundamental technical problems with which ELIZA is concerned are: (1) the identification of key words, (2) the discovery of minimal context, (3) the choice of appropriate transformations, (4) generation of responses in the absence of key words, and (5) the provision of an editing capability for ELIZA "scripts". A discussion of some psychological issues relevant to the ELIZA approach as well as of future developments concludes the paper.”
Sample dialogue between human and ELIZA“Men are all alike.IN WHAT WAYThey're always bugging us about something or other.CAN YOU THINK OF A SPECIFIC EXAMPLEWell, my boyfriend made me come here.YOUR BOYFRIEND MADE YOU COME HEREHe says I'm depressed much of the time.I AM SORRY TO HEAR YOU ARE DEPRESSEDIt’s true. I am unhappy.DO YOU THINK COMING HERE WILL HELP YOU NOT TO
BE UNHAPPYI need some help, that much seems certain.WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELPPerhaps I could learn to get along with my mother.TELL ME MORE ABOUT YOUE FAMILYMy mother takes care of me,WHO ELSE IN Y0UR FAMILY TAKES CARE 0F YOUMy father.YOUR FATHERYou are like my father in some ways.WHAT RESEMBLANCE DO YOU SEEYou are not very aggressive but I think you don't want me to notice
that.WHAT MAKES YOU THINK I AM NOT VERY AGGRESSIVEYou don’t argue with me.WHY DO YOU THINK I DON'T ARGUE WITH YOUYou are afraid of me,DOES IT PLEASE YOU TO BELIEVE I AM AFRAIDOF YOUMy far her is afraid of everybody.WHAT ELSE COMES TO MIND WHEN YOU THINK OF YOUR FATHER”
Early Natural Language Dialogue Systems
[Weizenbaum 1966]
Example:
“You are very helpful”-> “I are very helpful”
by a simple substitution rule-> “What makes you think I am very helpful”
by a decomposition template: (0 I are 0)by a reassembly rule:
(What makes you think I am 4)
After Ri,j was applied, an index in the transformation list is inserted to prevent the same reassembly rule from being applied in a row.
(How = (What)) allows the transformation rule for “how” to be equally applied to “what”
(MOTHER DLIST (/NOUN FAMILY))allows the keyword “MOTHER” to be identified as a noun and as a member of the class “family”. As a result, “MOTHER” will match “(/FAMILY)” in a decomposition rule.
Transformation rules: collection of key lists.
where, K: keyword, Di: decomposition template, Ri,j: Reassembly rule.
Example of a decomposition template and a reassembly rule:D: (0 YOU 0 ME) R: (WHAT MAKES YOU THINK I 3 YOU)
“It seems that you hate me”-> “What makes you think I hate you”
Substitution rule. E.g.,
(YOURSELF = MYSELF)(MY YOUR 5 (transformation rules))
)))()()()((
))()()()((
))()()()(((
,2,1,
,22,21,22
,12,11,11
2
1
nmnnnn
m
m
RRRD
RRRD
RRRDK
ELIZA: Natural language dialogue system without explicit knowledge about the discourse domain
"Mary socked John."
"Mary punched John."
"Mary hit John with her fist."
"John was socked by Mary."
"Marie a donne un coup de poing a Jean."
"Maria pego a Juan."
Story understanding and generation
Primitive Meaning
ATRANS Transfer of an abstract relationship (i.e. give)
PTRANS Transfer of the physical location of an object (e.g., go)
PROPEL Application of physical force to an object (e.g. push)
MOVE Movement of a body part by its owner (e.g. kick)
GRASP Grasping of an object by an action (e.g. throw)
INGEST Ingesting of an object by an animal (e.g. eat)
EXPEL Expulsion of something from the body of an animal (e.g. cry)
MTRANS Transfer of mental information (e.g. tell)
MBUILD Building new information out of old (e.g decide)
SPEAK Producing of sounds (e.g. say)
ATTEND Focusing of a sense organ toward a stimulus (e.g. listen)
[Schank 1975]
Action: PROPELActor: MaryObject: FistFrom: MaryTo: John
Margie
Story understanding and generation
SAM (Script Applier Mechanism)[Cullingford 1981] Cullingford, Richard E: SAM and Micro SAM. In Roger C. Schank, & Christopher K. Riesbeck (Eds.), Inside computer understanding. Hillsdale, NJ: Erlbaum, 1981
FRUMP [Dejong 1979] DeJong, Gerald F.: Skimming stories in real time: An experiment in integrated understanding (Technical Report YALE/DCS/tr158). New Haven, CT: Computer Science Department, Yale University, 1979
Script-based understanding
Story understanding and generation
Plan-based understandingPAM[Wilensky 1978] Wilensky, Robert: Understanding goal-based stories (Technical Report YALE/DCS/tr140). New Haven, CT: Computer Science Department, Yale University, 1978.
POLITICS[Carbonell 1978] Carbonell, Jaime: Subjective understanding: Computer models of belief systems (Technical Report YALE/DCS/tr150). New Haven, CT: Computer Science Department, Yale University, 1978.
Dynamic MemoryIPP [Lebowitz 1980] Lebowitz, Michael : ¥Generalization and memory in an integrated understanding system (Technical Report YALE/DCS/tr186). New Haven, CT: Computer Science Department, Yale University, 1980.
BORIS[Lehnert 1983] Lehnert, Wendy G., Dyer, Michael G., Johnson, Peter N., Yang, C. J., Harley, Steve: BORIS -- An experiment in in-depth understanding of narratives. Artificial Intelligence, 20(1), 15-62., 1983.
CYRUS[Kolodner 1984] Kolodner, Janet L.: Retrieval and organizational strategies in conceptual memory: A computer model. Hillsdale, NJ: Erlbaum, 1984.
Story tellingTALE-SPIN[Meehan 1981] Meehan, James: TALE-SPIN and Micro TALE-SPIN. In Roger C. Schank, & Christopher K. Riesbeck (Eds.), Inside computer understanding. Hillsdale, NJ: Erlbaum, 1981.
Early Speech Dialogue Systems
[Erman 1980]
The Hearsay-II Speech-Understanding System
Integrates multiple levels of information processing by knowledge sources coordinated by the blackboard model: • Parameter• Segment• Syllable• Word• Word-sequence• Phrase• Data base interface
Combining top-down (hypothesis driven) and bottom-up (data-driven) processing.
Selective attention to allocate limited computing resources.
[Erman 1980]
The blackboard architecture
| ARE | ANY | BY | FEIGENBAUM | AND | FELDMAN |
(g) Phrases
(f) Word sequences
(e) Words - 1
(d) Words – 2
(c) Syllable classes
(b) Segments
(a) ParametersSEG
POM
MOW
WORD-CTL
VERIFY
PARSE
SEMANT
WORD-SEQ
VERIFY
WORD-SEQ-CTL
PREDICT STOP
CONTACT
RPOL
Early Speech Dialogue Systems
PUT-THAT-THERE
The conjoint use of voice-input and gesture recognition to command events on a large format raster-scan graphic display in a “Media Room”
Commands
“Create a blue square there.”“Move the blue triangle to the right of the green square”“Move that to the right of the green square.” (with pointing)“Put that there” (indicated by gesture)“Make that smaller” (with pointing gesture)“Make that (indicating some item) like that (indicating some other item)”“Delete that” (pointing to some item)“Call that … the calendar” (with pointing)
[Bolt 1980]
“Phil”: concrete image of a personal assistant.
The Knowledge Navigator
[Apple Computer 1987]
Phil, the agent
CASA: Computers Are Social Actors
Social responses to computers
- not: conscious beliefs that computers are human or human-like.
- not: from user’s ignorance or psychological or social dysfunctions
- not from: a belief that subjects are interacting with programmers
- rather: human-computer relationship is fundamentally social.
Nass 1994]
- Support interactive give and takeAssistants will need not only respond to questions but also ask questions
- Recognize the costs of interaction and delayIt is inappropriate to require the user’s confirmation of every decision made while carrying out a task.
- Manage interruptions effectivelyWhen it is necessary to initiate an interaction with the user, the assistant needs to do so carefully, recognizing the likelihood that the user is already occupied to some degree.
- Acknowledge the social and emotional aspects of interactionTo become a comfortable working partner, a computer assistant will need to vary its behavior depending on such the task, the time of day, and the boss’s mood.
Requirements
=> The PERSONA project @ Microsoft Research[Ball 1997]
From Tool-based Computing to Assistive Interface
[Ball 1997]
The architecture of Peedy
Spoken Language
Interpretation
Character Animation
Sound Generation
Background and
Object Animation
Dialogue
Management
Microphone
Whisper
Speech
Recognition
Names
Proper Name
Substitution
NLP
Language Analysis
Semantic
Template Matching &
Object Descriptions
Dialogue
Context & Conversation State
Application
CD Changes
Speech Controller
Player/ReActor
Animation EngineCharacter
Sound
CD Player
[Ball 1997]
Names DatabaseAction Templates Database
Object Database (CDs)
Dialogue Rules Database
Speech & Animation Database
The architecture of Peedy
[Blumberg 1994]
Believable animals
How are we to build behaviorally-based animated animals whose level of behavioral complexity is on the order of dogs and cats?
For the user to say “Hey, it acts just like my dog.”
i. The artificial animal must have a believable set of internal needs and motivations and an appropriate set of activities through which those internal needs and motivations may be satisfied and expressed.
ii. It must respond with an appropriate and believable activity on every time step given its internal state, past history and a perceived environment with its attendant opportunities, challenges and changed. Moreover, the pattern and rhythm of the chosen activities must be believable and consistent with those of the real animal.
iii. The interaction must be believable. The creature should respond to the actions of the user in a believable way and the user should be able to interact with the creature using natural gestures and a minimum amount of hardware in-between.
For the user not to say “Wait a minute, this is a dumb robot, my dog would never do that!”
i. It should not mechanically respond to stimulus. An animal’s response to a given stimulus is highly dependent on its internal state.
ii. It should not get stuck in “loops”. Animals are very good at avoiding two types of loops. The first type of loop is one in which the animal is pathologically engaged in a single activity. The second type of loop is one in which the animal dithers incessantly among two or more competing activities.
iii. It should avoid patently stupid behavior. Animals often display a certain common sense or “horse sense”.
Believability
Key ideas from Ethology
Inhibition and fatigue play an important role in ethological models of action-selection and temporal patterns of behavior.
i. Animals engage in one behavior at a time. Yet, animals typically do not mindlessly pursue an activity indefinitely to the detriment of other needs.
ii. Animals sometimes appear to engage in a form of time-sharing in which low priority activities are given a chance to execute, despite the presence of a higher priority activity.
iii. While animals typically do not dither between multiple activities they will nonetheless interrupt a behavior when another behavior becomes significantly more appropriate.
[Blumberg 1994]
Believability
Ludlow’s Model
i. An activity such as feeling, or drinking has a value which is based on the sum of its relevant internal and external factors less inhibition it receives from competing activities.
ii. Competing activities are mutually inhibiting, where a given activity i inhibits activity j by an amount equal to activity i’s value times an inhibitory gain kji
iii. If (a) activities are mutually inhibiting, (b) inhibitory gains are restricted to be greater than 1, and (c) values of activities is restricted to being zero or greater, then this model would result in a winner-take-all system, in which only one activity would have a none-zero value once the system stabilized.
iv. A level of fatigue is proposed to be associated with every activity. v. The level of fatigue is influenced by a number of factors. When an activity is
active, the level of fatigue increases in proportion to the activity’s value, which reduces the value of an active activity over time.
vi. When an activity is no longer active, the fatigue decays toward zero, and the value of the activity rises.
[Blumberg 1994]
Believability
[Mateas 1997]
Drama = Character + Story + Presentation
Interactive Drama:Interactive drama concerns itself with building dramatically interesting virtual worlds inhabited by computer-controlled characters, within which the user (the player) experiences a story from a first person perspective. … (Bates 1992)
Oz project
[Mateas 1997]
Oz project
PersonalityRich personality should infuse everything that a character does. What makes characters interesting are their unique ways doing things.
EmotionCharacters exhibit their own emotions and respond to the emotions of others in personality-specific ways.
Self-motivationCharacters have their own internal drives and desires which they pursue whether or not others are interacting with them.
Change Characters grow and change with time, in a manner consistent with their personality.
Social relationshipsCharacters engage in interactions with others in a manner consistent with their relationship. In turn, these relationships change as a result of the interaction.
Illusion of lifePursuing multiple, simultaneous goals and actions, having broad capabilities, and reacting quickly to stimuli in the environment.
Antonio Damasio’s “Descartes’ Error – Emotion, Reason and the Human Brain”
If you come to know that animal or object or situation Xcauses fear, you will have two ways of behaving toward X. The first way is innate; you do not control it. Moreover, it is not specific to X; a large number of creatures, objects, and circumstances can cause the response. The second way is based on your own experience and is specific to X. Knowing about X allows you to think ahead and predict the probability of its being present in a given environment, so that you can avoid X, preemptively, rather than just have to react to its presence in an emergency. …
Primary emotions depend on limbic system circuitry, the amygdala and anterior cingulate being the primary players. After an appropriate stimulus activates the amygdala, a number of responses ensue: internal responses, muscular responses, visceral responses, and responses to neurotransmitter nuclei and hypothalamus.
Secondary emotions utilize the machinery of primary emotions. The stimulus may still be processed directly via the amygdala but is now also analyzed in the thought process, and may activate frontal cortices (VM). VM acts via the amygdala. (p. 133-137)H: Hypothalamus,
VMF: ventromedial prefrontal cortex [Damasio 1994]
Real world
Cognitive Process
Amygdala
VMF
H
Body
Brain
Sensory input
Reactions
Affective Computing
[Picard 1997]
High level
Low level
Representations /
signals
Representations /
signals
Inference and decision making
Pattern recognition and synthesis
Emotional states
Cognitive processing
[Hayes-Roth 1998]
Jennifer James
Text
Multi-modal
dialogue engine
Graphics
Sound
Cloud
Rea
[Cassell 1999]
Implements the social, linguistic, and psychological conventions of conversation.
- Has a human-like body. Uses eye gaze, body posture, hand gestures, and facial displays to organize and regulate the conversation.
- The conversational model relies on the function of non-verbal behaviors as well as speech.
- A full symmetry between input and output modalities: not only respond to visual, audio and speech cues (such as speech, shifts in gaze, gesture, and non-speech audio but also generate these cues.
Rea
[Cassell 1999]
Discourse Model
Knowledge Base
Word timing
Language Tagging
Behavior Scheduling
Text Input
Animation
Behavior Suggestion
Behavior Selection
Generator Set Filter Set
Behavior Generation
Translator
BEAT (Behavior Expression Animation Toolkit)
[Cassell 2004]
Challenge: A robot that can participate in conversation
Long-term goal of conversational informatics
Application
Platform Evaluation
Content production
Model building
Analysis
Theory
Measurement
Conversational interactions
Conversational Informatics
[Nishida-Nakazawa-Ohmoto-Mohammad 2014]
Building conversational
systems
Understanding conversation
Approach 1: Making environment playful
Smart conversation space that encompasses participants and referents of conversation.-> Engaged conversational interactions-> More insights about the common ground
Augmentationby MR (VR—AR)
Daily living space
Immersive interaction with ICIE
[Lala 2013]https://www.youtube.com/watch?v=V-9SKpcMrzk
Immersive interaction with ICIE
https://www.youtube.com/watch?v=V-9SKpcMrzk
[Nishida-Nakazawa-Ohmoto-Mohammad 2014]
Projecting the real world into the virtual world
[Nishida-Nakazawa-Ohmoto-Mohammad 2014]
http://www.youtube.com/watch?v=68UrJv65HvY
Telepresence by connecting ICIE with a networked robot
Feedback
generation
Motion mapping
User motion sensing
Head recognition
Gesture recognition
Face model
Human body model
WOZ operating environment
WOZ operatorTele-operated robot
The conversation place
[Nishida-Nakazawa-Ohmoto-Mohammad 2014]
Avatars and NPCs in virtual basketball
[Lala 2014] https://www.youtube.com/watch?v=ZtjSRjHBgUs
Inducing intentional stance toward agent players
Q: How can we induce an intentional stance toward NPCs?H: Demonstrating strategy change.
[Suyama-Ohmoto-Nishida 2015]
Player #1 in ICIE #1
Red hat
Player #2 in ICIE #2
Green hat
Player #3 (agent)
Blue hat
The Interactive Dome
[Nishida-Nakazawa-Ohmoto-Mohammad 2014]https://www.youtube.com/watch?v=wxkZ9armrI8
Appearance Architecture Projection
Inside view
Analyze the behaviors of
participants by integrating audio-
visual and
physiological .measurement
Approach 2: Understanding by measurement
IMADE: Interaction Measurement, Analysis, and Design Environment
[Nishida-Nakazawa-Ohmoto-Mohammad 2014]
iCorpusStudio
Collaborative annotation system
[Nishida-Nakazawa-Ohmoto-Mohammad 2014]
3D conversation capture—over the shoulder view
https://www.youtube.com/watch?v=J08vG8wnrnw
[Yano 2012]
Corneal Imaging CameraScene cameraEye camera
• Lightweight and versatile system • Appl.: Google Glass like HMD, unconstrained setups
Corneal Image Feature MatchingProblem:Local feature correspondence + RANSAC does not work due to large noise in eye images
Approach:1. Formulate problem as registration of 3D
spherical light maps of eye and scene image2. Single point algorithm for robust alignment
Non-intrusive Eye Gaze Tracking (EGT) by corneal imaging
Eye images with GRP
Scene images with PoG
↑ Aligned results (from eye images)
Peripheral vision map overlaid to scene imagePeripheral vision map in eye image
Gaze Reflection Point Mapping
Application 1: Non-intrusive and uncalibrated PoG estimation
Application 2: Peripheral vision estimation
Gaze trajectory in static scene image
Inducing intentional stance toward agent players
[Takeda-Matsuda-Ohmoto-Nishida 2015]
Deliberating but not reacting
Deliberating and reacting
Doing nothing Reacting but not deliberating
SCR
LF/HF
+
+
-
-
Hide from the chaser
Not concentrated
Hide in the place the chaser checked previously.
Simply moving around
Multi-dimensional model of
estimating internal state of human
Concentration
Level of proficiency
Learning by imitation—Generic framework
Measurement Corpus Generalization Dialogue
patterns
[Nishida et al 2014]
Endow robots with an ability
of autonomously imitating
human behaviors.
Interactions from observation—General framework
Causes
Causes
Causes
1a
2a
3a
t
t
t
[Nishida et al 2014]
[Mohammad 2009]
Learning by demonstration
The problem formulation
[Mohammad 2010]
Gesture stream
Action stream
Problem formulation:Find approximately repeated subsequences in a longer time series.
(1) Motif Discovery—Finding Patterns of Interaction
[Nishida et al 2014]
Future
Change angle
GH
Past Futuret
;...; 1H t seq t n seq t
1 ;...;G t seq t seq t n
1
1
1
ˆ
f
f
T
i i i
l
i i
i
l
i
i
s t t t
cs
x
c
t
TtVtStUtH )()()()(
Find optimal lPggT uutGtG )()(
Find optimal lF
11and,)( jjjF
g
ii liut
fT
ll
T
lli li
tUU
tUUt ,
)(
)()(
)()()()()(ˆ)(~ tttttxtx PFPF
Learning by imitation
Robust Singular Spectrum Transform
[Mohammad 2009]
Fluid Learning—work in progress
Anytime learner (automatic segmentation, perspective taking, significance estimation, … )
[Mohammad 2015]
[Mohammad 2016]
Evolving stories underlying conversation
Physical world
Imaginary world
Goffman’s Frame Analysis: keying, fabricating, …
Evolving stories underlying conversation
Our approach: Shared Virtual Space
Synthetic evidential study
Synthetic evidential study (SES) combines dramatic role play and group discussion to help people spin stories by bringing together partial thoughts and evidence.
Componentize
Reuse
SES session Interpretation archive
Structured collection of {story, background, critique}Agent Play
Dramatic role play
Group discussions
[Nishida et al 2015]
At the beginning of the 18th century, a feudal lord named Asano Takumi-no-kami
Naganori was in charge of a reception for envoys from the Imperial Court in Kyoto.
Another feudal lord, Kira Kozuke-no-suke Yoshinaka, was appointed to instruct
Asano in the ceremonies. On the day of the reception, while Kira was talking with
Yoriteru Kajikawa, a lesser official, at “Matsu no Roka” (“Hallway of Pine Trees”) in
Edo Castle, Asano came up to them screaming “This is for revenge!!” and slashed
Kira twice with a short sword. Soon after the incident, Kajikawa restrained Asano,
who was then imprisoned. The reason for the attack was not known, though it was
widely believed that Kira had somehow humiliated Asano. Ultimately Asano was
sentenced to commit seppuku, a ritual suicide, but Kira went without punishment.
Hallway of Pine Trees (from Chushingura)
Kira Kozuke-no-suke Yoshinaka
Asano Takumi-no-kami Naganori
Yoriteru KajikawaWhy was it possible?
How did it happen?
What did each think?
Dramatic Role Play
Group play capture
Agent play
Discussion phase
T. Ookaki, M. Abe, M. Yoshino, Y. Ohmoto and T. Nishida. Synthetic Evidential Study for Deepening
Inside Their Heart. IEA/AIE 2015.
Asano
Kira
Kajikawa
Third person view First person view
Discussions
Observed communicative behaviors
The observed behavior of participants
• Acting behavior -- what the participants do when they are actually acting.
• Commenting behavior -- a critique of the incidents and the acting, including reasoning, discussion and thinking aloud.
• Oral editing behavior -- suggested revisions to the acting
• Idling behavior -- all actions that are not classified above.
The role play phase
• Twelve detailed behaviors were observed in the role play phase: (1) acting, (2) commenting, (3) oral editing, (4) idling, (5) speaking his/her role, (6) acting + thinking aloud, (7) acting + commenting, (8) acting + oral editing, (9) acting + speaking his/her role, (10) idling + commenting, (11) idling + oral editing, and (12) idling + speaking his/her role.
• Roughly classified into the rehearsal acting scene and production acting scene. Transitions between the rehearsal acting scene and production acting scene can be clearly identified by eye. For example, just before the production acting scene, explicit signaling behavior such as giving-a-cue was observed. [Ookaki et al 2015]
Contrasting objective and subjective views
After contrasting the objective and subjective views on the action of Kira falling prone, one participant remarked that “Falling prone seems strange in the objective viewpoint. However, when I experience the subjective viewpoint of Kira, it looks like a natural movement”, and everyone agreed.
Subjective view transfer
When experienced with Kajikawa’s viewpoint, the Kira player said
“(Kajikawa was too slow to) restrain Asano after having been slashed,” which is considered to reflect the Kira player’s view that Kajikawa should have helped Kira earlier.
After a while, however, the Kira player said, “When Asano swung his sword for the first time, Kajikawa might have been farther away from Asano,” suggesting that he considered that it prevented Kajikawa from restraining Asano earlier.
Multiple lines of story
Tajomaru met Takehiro and Masago
Tajomaru tied Takehiro to a cedar tree
Tajomaru and Takehiro went into the woods
Tajomaru took Masago into the woods
Tajomaru took his way with Masago
Tajomaru killed Takehiro
Masago killed Takehiro
Takehiro killed himself
Masago asked Tajomaru to kill Takehiro
Masago said that she would go with either one of them and the other one must die
Takehiro despised Masago
Tajomaru asked Masago to come along
Tajomaru kicked Masago to the ground
Duel between Tajomaru and Takehiro
Masagoescaped
Masagoleft
Tajomaruwas caught
Masago cared for Takehiro
Tajomaru found Masago with Takehiro
“In the wood” by Ryunosuke Akutagawa
Multi-layered Multi-view Interpretation
Representing story and its interpretations
Participant’s interpretation
A’s interpretation
B’ interpretation
Actors’ interpretation
Tajomaru’s
Wife’s
Husband’s
Dramatic Scene
Third Person Tajomaru Wife Husband
{Spatial, Temporal} X {Locality, Influence}
“In the Woods” by Ryunosuke Akutagawa
Tajomaru’s Story (First person view)
The Bounty Hunter's Story (Third person view)
Major Characters:
- Tajomaru, the robber
- Takehiro Kanazawa, a samurai working in Wakasa
- Masago, Takehiro’s wife
Plot
- “The same incident in the woods” is told from four witnesses and three actors
- Stories told by three actors contradict with each other.
Scene Abstract
[WC] The Woodcutter's Story The woodcutter found the body of Takehiro.
[TM] The Traveling Monk's Story
The traveling monk saw Takehiro and Masago yesterday on the Yamashina road.
[BT] The Bounty Hunter's Story
The bounty hunter caught Tajomaru who was thrown out of his horse and moaning in pain at the bridge of Awataguchi.
[TJ] Tajomaru’s Story Takehiro was killed as a result of duel between Tajomaru and Takehiro.
[Ma1] Masago’s Story - 1 Masago was kicked to the ground by Tajomaru. Masago was dispised by Takehiro. Masago was fainted.
[Ma2] Masago’s Story - 2 Masago stabbed him hard in the chest.
[TK] Takehiro’s Story through a Medium
Masago asked Tajomaru to kill Takehiro. Tajomaru kicked Masago to the ground. Masago ran toward the deep part of the woods, while Tajomaru was asking Takehiro if he wanted Tajomaru to kill Masago or let her go.
Experiment
Purpose- Verify that SES will help participants deepen the interpretation
- Study the multiple application of SES cycles
Task in three stages- We asked participants to annotate each scene in three stages.
Stages
Tajomaru’s Story (First person view)The Bounty Hunter's Story (Third person view)
Stage 1 (Condition 1)5 participants
Stage 2 (Condition 2)4 participants
Stage 3 (Condition 3)5 participants
Phase 1: comments presented to each participant
None 5 comments obtained atStage 1
9 comments obtained at Stage 1 and 2
Phase 2: networked comments presented to each participant
None 5 network comments obtained at Stage 1
9 networked comments obtained at Stage 1 and 2
Annotation subsystem
• Displays previous comments
• Allows the participant to add new comments
• Each comment has one of the following types:[Confirmation], [Empathy], [Confirmation], [Conjecture], [Doubt], [Question], [Surprise]
Some results
We asked participants to write their own interpretation for “In the Wood”.
Before experiment: “Not clear” or no answer (6 out of 14 participants)
After experiment: Almost all participants were able to write their own short interpretations.
・The number of comments decreased at Stage 2, while it significantly increased at Stage 3.
・Conjecture-type comments are increasing ⇒ More new interpretation at later stages
・Confirmation-type comments are decreasing ⇒ Less confirmation is necessary at later stages.
Stage 1 Stage 2 Stage 3 Subtotal
Clarification 55 19 93 167
Empathy 17 6 41 64
Confirmation 23 9 12 44
Doubt 15 5 20 40
Conjecture 17 15 58 90
Question 12 11 18 41
Surprise 2 2 5 9
Subtotal 141 67 247 4550
0.1
0.2
0.3
0.4
0.5
STA GE 1 STA GE 2 STA GE 3
R AT IO O F C O M M ENTS O F EAC H T Y P E
Clarification Empathy ConfirmationDoubt Conjecture QuestionSurprise
Potential applications of SES
• Academic research
– Social sciences, History and archaeology, Literature study
• Evidence-based methodologies
– Criminal investigation, profiling
– Onsite investigation
• Planning
– Strategy formation
– Disaster planning
• Training
– Social skills training, language training
– Dramatic problem solving
1. Full interpretation of SES
2. Using SES for frame analysis
3. Evaluation of SES
4. SES for common ground building
5. Conversational agents based on SES
Future work
1. Conversation as a powerful medium for bridging natural and artificial agents.
2. An integrated approach is necessary to induce synergetic effect of individual insights in the right context.
3. Community will benefit from reciprocal relationship between common ground and witty conversations.
4. Synthetic Evidential Study as a sustained participatory activity to review and cultivate common ground.
Conclusion
Agenda
CreditsWill be awarded based on a report on subjects given at the class. Due date (July 31st, 2016)
Agenda (planned)
1. Introduction (April 13) Nishida2. Methodologies for Conversational System Development (April 20) Nishida3. Smart Conversation Space (April 27) Ohmoto4. Measurement, Analysis and Modeling (May 11) Ohmoto5. Learning by Imitation (May 17) Nishida6. Time Series Mining (May 25) Nishida7. Stories and Conversation (June 1) Nishida8. Affective Computing (June 8) Nishida9. Cognitive Design (June 15) Ohmoto10. Aspects of Conversation—1 (June 22) Nishida11. Aspects of Conversation—2 (June 29) Nishida12. Aspects of Conversation—3 (July 6) Nishida13. Speaking Turn Taking System (July 13) Nishida14. Synergy and Wrap up (July 20) Nishida
Course materials available from: http://www.ii.ist.i.kyoto-u.ac.jp/?page_id=5646&lang=ja
References[Apple Computer 1987] http://homepage.mac.com/ericestrada/Movies/iMovieTheater53.html[Ball 1997] Gene Ball, Dan Ling, David Kurlander, John Miller, David Pugh, Tim Skelly, Andy Stankosky, David Thiel, Maarten Van Dantzich, and Trace
Wax. Lifelike Computer Characters: The Persona Project at Microsoft Research. Software Agents. Jeffrey M. Bradshaw (ed.). AAAI/MIT Press, 1997.[Bolt 1980] Richard A. Bolt: Put-that-there": Voice and gesture at the graphics interface. In Proceedings of the 7th annual conference on Computer
graphics and interactive techniques, Vol. 14, No. 3. (July 1980), pp. 262-270.[Cassell 1999] Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H. and Yan, H. (1999). "Embodiment in Conversational
Interfaces: Rea." Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 520-527. Pittsburgh, PA.[Clark 1996] Herbert H. Clark. Using Language. Cambridge, Cambridge University Press (1996)[Goffman 1974] Erving Goffman. Frame Analysis: An Essay on the Organization of Experience. New York: Harper and Row, 1974.[Higuchi 2016] Higuchi, O. Interactive 3D Virtual Space Design Tool by Using Point Cloud Data Capture and Immersive Environment, Unpublished
Undergraduate Thesis, Faculty of Engineering, Kyoto University, 2016. [Lala 2014] Lala D, Mohammad Y, Nishida T. A joint activity theory analysis of body interactions in multiplayer virtual basketball, 28th British Human
Computer Interaction Conference 2014.[Mohammad 2009] Mohammad Y, Nishida T, Okada S. Unsupervised simultaneous learning of gestures, actions and their associations for human-robot
interaction, In: Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems (IROS); p. 2537–2544.[Mohammad 2010] Mohammad Y, Nishida T. Learning interaction protocols using Augmented Bayesian Networks applied to guided navigation. In:
Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 4119–4126.[Mohammad 2015] Mohammad, Y. and Nishida, T. Data Mining for Social Robotics, Springer 2015.
http://www.springer.com/us/book/9783319252308[Nishida-Nakazawa-Ohmoto-Mohammad 2014] Nishida, T., Nakazawa, A., Ohmoto, Y., Mohammad, Y. Conversational Informatics–A Data-Intensive
Approach with Emphasis on Nonverbal Communication, Springer 2014.http://www.springer.com/us/book/9784431550396
[Nishida et al 2015] Nishida T et al. Synthetic evidential study as primordial soup of conversation. In: Chu W et al. editors. DNIS 2015, LNCS 8999 2015; p. 74–83.
[Ookaki et al 2015] Ookaki T et al. Synthetic evidential study for deepening inside their heart. In: IEA/AIE 2015; p. 161–170.[Ookaki 2016] Ookaki, T. Building a Support System for Story Interpretation from Multiple Perspectives, Unpublished Master Thesis, Graduate School of
Informatics, Kyoto University, 2016. [Suyama-Ohmoto-Nishida] Suyama T, Ohmoto Y, Nishida T. Improving engagement of users by changing agent's strategy action dynamically based on the
observed user's state. JSAI Annual Convention 2015 (in Japanese).[Takeda-Matsuda-Ohmoto-Nishida 2015] Takeda S, Nishida T, Ohmoto Y. Method of Estimating Concentration in Exercise Game by Combining Multiple
Physiological Indices. JSAI Annual Convention 2015 (in Japanese).[Winograd 1972] Terry Winograd: Understanding Natural Language, Academic Press, 1972.[Yano2012] Yano M. Construction of 3-Dimensional Recording Environments for Multi-party Conversation with RGB-Depth Sensors. Unpublished master
thesis, Graduate School of Informatics, Kyoto University, 2012 (in Japanese).