conversational informatics - 京都大学 · siri (apple), now (google), ortana (microsoft), m...

Conversational Informatics, April 13, 2016

1. Introduction—Conversational Informatics—

Toyoaki NishidaKyoto University

Copyright © 2016, Toyoaki Nishida, Atsushi Nakazawa, Yoshimasa Ohmoto, Yasser Mohammad, At ,Inc. All Rights Reserved.

Year AI ICT

1940～ 1936: Turing Machine, 1947: von Neumann Computer, 1948: Information Theory, by C. Shannon and W. Weaver, 1948: Cybernetics by Wiener

1950～ 1952-62: Checker program by A.Samuel1956: Dartmouth Conference 1957: FORTRAN by J.Backus

1960～ 1961: Symbolic Integration program SAINT by J.Slagle1962: Perceptron by F.Rosenblatt1966: The ALPAC report against Machine Translation by R. Pierce1967: Formula Manipulation System Macsyma by J.Moses1967: Dendral for Mass Spectrum Analysis by E.Feigenbaum

1961: Mathematical theory of Packet Networks by L. Kleinrock1963: Interactive Computer Graphics by I.Sutherland

1968: Mouse and Bitmap display for oN Line System (NLS) by D.C.Engelbart1969: ARPA-net

1970～ 1971: Natural Language Dialogue System SHRDLU, by T.Winograd1973: Combinatorial Explosion problem pointed out in The Lighthill report1974: MYCIN by T.ShortliffeMid 1970’s: Prial Sketch and Visual Perceptron by D.Marr1976: Automated Mathematician (AM) by D.Lenat1979: Autonomous Vehicle Stanford Cart by H.Moravec

1970: ALOHAnet1970: Relational Database Theory by E.F.Codd1972: Theory of NP-completeness by S.Cook and R.KarpMid 1970’s: Alto Machine by A.Kay and A.Goldberg1976: Ethernet1979: Spreadsheet Program Visicalc by D.Bricklin

1980～ 1982: Fifth Generation Computer Project1984: The CYC Project by D.LenatMid 1980’s: Back-propagation algorithm was widely used1985: the Cybernetic Artist Aaron by H.Cohen1986: Subsumption Architecture by R.Brooks1989: An Autonomous Vehicle ALVINN by D.Pomerleau

1982:TCP/IP Protocol by B.Kahn and V.CerfMid 1980’s: First Wireless Tag Products1987: UUNET started the Commercial UUCP Network Connection Service1988: Internet worm (Morris Worm)1989: World Wide Web by T.Berners-Lee1989: The number of hosts on the Internet has exceeded 100,000.

1990～ 1990: Genetic Programming by J.R.KozaEarly1990’s: TD-Gammon by G.TesauroMid 1990’s: Data Mining Technology1997: DeepBlue defeated the World Chess Champion G.Kasparov1997: The First Robocup by H.Kitano1999: Robot pets became commercially available

1992: The number of hosts on the Internet has exceeded 1,000,000.1994: Shopping malls on the Internet1994: W3C was founded by T. Berners-Lee1997: Google Search1998: XML1.0（eXtensible Markup Language) by W3C1998: PayPal

2000～ 2000: Honda Asimo

2004: The Mars Exploration Rovers (Spirit & Opportunity)

2001: Wikipedia.2003: Skype / iTunes store2004: Facebook2005: YouTube / Google Earth2006: Twitter2007: Google Street View

2010～ 2010: Google Driverless Car / Kinect2011: IBM Watson Jeopardy defeated two of the greatest champions2012: Siri

History of AI research in contrast with ICT

Year AI ICT

1940～ 1936: Turing Machine, 1947: von Neumann Computer, 1948: Information Theory, by C. Shannon and W. Weaver, 1948: Cybernetics by Wiener

1950～ 1952-62: Checker program by A.Samuel1956: Dartmouth Conference 1957: FORTRAN by J.Backus

1960～ 1961: Symbolic Integration program SAINT by J.Slagle1962: Perceptron by F.Rosenblatt1966: The ALPAC report against Machine Translation by R. Pierce1967: Formula Manipulation System Macsyma by J.Moses1967: Dendral for Mass Spectrum Analysis by E.Feigenbaum

1961: Mathematical theory of Packet Networks by L. Kleinrock1963: Interactive Computer Graphics by I.Sutherland

1968: Mouse and Bitmap display for oN Line System (NLS) by D.C.Engelbart1969: ARPA-net

1970～ 1971: Natural Language Dialogue System SHRDLU, by T.Winograd1973: Combinatorial Explosion problem pointed out in The Lighthill report1974: MYCIN by T.ShortliffeMid 1970’s: Prial Sketch and Visual Perceptron by D.Marr1976: Automated Mathematician (AM) by D.Lenat1979: Autonomous Vehicle Stanford Cart by H.Moravec

1970: ALOHAnet1970: Relational Database Theory by E.F.Codd1972: Theory of NP-completeness by S.Cook and R.KarpMid 1970’s: Alto Machine by A.Kay and A.Goldberg1976: Ethernet1979: Spreadsheet Program Visicalc by D.Bricklin

1980～ 1982: Fifth Generation Computer Project1984: The CYC Project by D.LenatMid 1980’s: Back-propagation algorithm was widely used1985: the Cybernetic Artist Aaron by H.Cohen1986: Subsumption Architecture by R.Brooks1989: An Autonomous Vehicle ALVINN by D.Pomerleau

1982:TCP/IP Protocol by B.Kahn and V.CerfMid 1980’s: First Wireless Tag Products1987: UUNET started the Commercial UUCP Network Connection Service1988: Internet worm (Morris Worm)1989: World Wide Web by T.Berners-Lee1989: The number of hosts on the Internet has exceeded 100,000.

1990～ 1990: Genetic Programming by J.R.KozaEarly1990’s: TD-Gammon by G.TesauroMid 1990’s: Data Mining Technology1997: DeepBlue defeated the World Chess Champion G.Kasparov1997: The First Robocup by H.Kitano1999: Robot pets became commercially available

1992: The number of hosts on the Internet has exceeded 1,000,000.1994: Shopping malls on the Internet1994: W3C was founded by T. Berners-Lee1997: Google Search1998: XML1.0（eXtensible Markup Language) by W3C1998: PayPal

2000～ 2000: Honda Asimo

2004: The Mars Exploration Rovers (Spirit & Opportunity)

2001: Wikipedia.2003: Skype / iTunes store2004: Facebook2005: YouTube / Google Earth2006: Twitter2007: Google Street View

2010～ 2010: Google Driverless Car / Kinect2011: IBM Watson Jeopardy defeated two of the greatest champions2012: Siri

History of AI research in contrast with ICT

Exponential Growth- Moore’s Law

- Expanding Internet World

Universal Turing Machine- from Hardware to Software

Generic Computing Engines- from Software to Data

Service Copying- Machine Learning / Data Mining

Symbolic Computing- Knowledge-based Systems

Heuristics- Intelligence as Clever Computing

Embodied & Network Intelligence- Like human and society

Very near future—rise of cognitive computing

Siri (Apple), Now (Google), Cortana (Microsoft), M (Facebook), …, Conversational commerce

World is waiting for conversational informaitcs

The key issue is building a common ground

[Nishida-Nakazawa-Ohmoto-Mohammad 2014]

Common Ground

Artificial Intelligence

People

Difficulty: common ground is like iceberg!Often tacit and not observable.

Types of common ground [Clark 1996]

1. Communala. Human natureb. Communal lexiconsc. Cultural facts, norms,

procedures2. Personal

a. Perceptual basesgestural indications, partner’s activities, salient perceptual events

b. Actional basesc. Personal diaries

Presuppositions for conversation that each participant is supposed to share about surroundings, activities, perceptions, emotions, plans, interests, etc.

Behavior in public places

[Clark 1996, p. 14]

Structure of participation

The people around an action divide first into those (participants) who are truly participating in it and those (nonparticipants) who are not.- The speaker- The addressee- Side participant—taking part in the conversation but not currently being addressed.- Overhearer—has no rights or responsibilities in it- Bystander—openly present but not part of the conversation- Eavesdropper—those who listen in without the speaker’s awareness.

Speaker

Side participant

Addressee

All participantBystander

Eavesdropper

All listeners

History of conversational systems development

t1990 2000 201019801970

Natural language dialogue systems

Speech dialogue systems

Multi-modal dialogue systems

Embodied Conversational Agents / Intelligent Virtual Human

Story Understanding systems

Conversational Systems

Transactional systems

Interactional systems

Affective Computing

Cognitive systems

Natural language question answering systems

The Knowledge Navigator

Early Natural Language Dialogue Systems

[Green 1961]

Spec[ification] list:

“Where did the Red Sox play on July 7?”-> Place = ?

Team = Red SoxMonth = JulyDay = 7

“What teams won 10 games in July?”-> Team(winning) = ?

Game(number of) = 10Month = July

Dictionary:“team” -> meaning: Team=(blank)“Red Sox” -> meaning: Team=Red Sox“who” -> meaning: Team=?“winning” -> meaning: subroutine

Data:Month = July

Place= BostonDay = 7Game Serial No. = 96(Team = Red Sox, Score= 5)(Team = Yankees, Score = 3)

Baseball

• One of the earliest natural language question answering system.

• Answers such questions about baseball games.


[Weizenbaum 1966]

The syntax routine determines whether the verb is active or passive and locates its subject and object.

The syntax analysis checks to see if any of the words is marked as a question word. If not, a signal is set to indicate that the question requires a yes/no answer

4. Content AnalysisThe content analysis uses the dictionary meanings and the results of the syntactic analysis to set up a specification list for the processing program. E.g.,

“each team”: Team=(blank) -> Team=each“what team” :Team=(blank)-> Team=?“Who beat the Yankees on July 4”: Team=(blank)-> Team=?, Team(winning)=? Team(losing)=Yankees“six games” : Game=(blank)->Game(number of)=6“how many games”: Game=(blank)->Game(number of)=?“Who was the winning team…”: Team=? and Team(winning)=(blank) -> Team(winning)=?

5. ProcessingThe specification list indicates to the processor what part of the stored data is relevant for answering the input question. The processor extracts the matching information from the data and procedures, for the responder, the answer to the question in the form of a list structure.

Modules of Baseball

1. Question Read-in2. Dictionary Look-up3. Syntax(1) scan for ambiguities in part of speech: in some

cases, resolved by looking at adjoining words; in other cases, resolved by inspecting the entire question.

(2) locates and brackets the noun phrases, [], and the prepositional and adverbial phrases, (). The verb is left unbracketed. E.g.,

“How many games did the Yankees play in July?”-> [How many games] did [the Yankees] play (in [July])?

Any unbracketed preposition is attached to the first noun phrase in the sentence, and prepositional brackets added. E.g.,

“Who did the Red Sox lose to on July 5?”-> (To [who]) did [the Red Sox] lose (on [July 5])?


[Woods 1973]

LUNAR

Prototype natural language question answering system that helps lunar geologists access chemical analysis data on lunar rock and soil composition.

(i) Syntactic analysis using heuristic/semantic information to choose the most likely parsing (Augmented Transition Network Grammar was used)

(ii) Semantic interpretation: produce a formal representation for queries

(iii) Execution of this formal expression in the retrieval

Questions LUNAR “understands”

1. (List samples with Silicon)(Give me all lunar samples with Magnetite)(In which samples has Apatite been identified)(How many samples contain Titanium)(Which rocks contain Chromite and Ulvospinel)(Which rocks do not contain Chromite and Ulvospinel)

2. (What analyses of Olivine are there)(Analyses of Strontium in Plagioclase)(What are the Plag analyses for breccias)(Rare earth analyses for S10005)(I need all chemical analyses of lunar soil)(What is the composition of Ilmenite in rock 10017)(List the analyses of Aluminum in vugs)(Nickel content of opaques)

3. (Which samples are breccias)(What are the igneous rocks)(What types of sample are there)(What is the number of phases in each sample)

4. (Give me the K / Rb ratios for all lunar samples)(What is the specific activity of A126 in soil)(Give me all references on fayalitic Olivine)(Which rock is the oldest)(Which is the oldest rock)...


[Woods 1973]

Augmented Transition Network Grammar

S:NP V NP

AUX VPP

NP:

DET NOUN

ADJ

PRON

PREP NPPP:

Extension of recursive transition network model which is weakly equivalent to the context-free grammar model.

“The augmented transition network builds up a partial structural description of the sentence as it proceeds from state to state through the network. The pieces of this partial description are held in registers which can contain any rooted tree or list of rooted trees and which are automatically pushed down when a recursive application of the transition network is called for and restored when the lower level (recursive) computation is completed. The structure-building actions on the arcs specify changes in the contents of these registers in terms of their previous contents, the contents of other registers, the current input symbol, and/or the result of lower level computations. In addition to holding pieces of substructure that will eventually be incorporated into a larger structure, the registers may also be used to hold flags or other indicators to be interrogated by conditions on the arcs.”


Winograd, Terry, Understanding Natural Language, New York: Academic Press, 1972. http://hci.stanford.edu/~winograd/shrdlu/

SHRDLU

• Natural language understanding system working on the “Blocks” world.

• Based on the belief that “a computer cannot deal with language unless it can understand the subject it is discussing”

• Answers questions, executes commands, and accepts information in normal English dialog.

• Uses semantic information and context to understand discourse and disambiguate sentences.

• Procedural knowledge representation <-> declarative KR


Winograd, Terry, Understanding Natural Language, New York: Academic Press, 1972. http://hci.stanford.edu/~winograd/shrdlu/

USER: “Pick up a big red block”

SHRDLU: “OK”

simulate robot arm

Syntactic analysis

Semantic analysis

Planning

Graphics

Questioning answering

SHRDLU


[Weizenbaum 1966]

“ELIZA is a program operating within the MAC time-sharing system at MIT which makes certain kinds of natural language conversation between man and computer possible. Input sentences are analyzed on the basis of decomposition rules which are triggered by key words appearing in the input text. Responses are generated by reassembly rules associated with selected decomposition rules. The fundamental technical problems with which ELIZA is concerned are: (1) the identification of key words, (2) the discovery of minimal context, (3) the choice of appropriate transformations, (4) generation of responses in the absence of key words, and (5) the provision of an editing capability for ELIZA "scripts". A discussion of some psychological issues relevant to the ELIZA approach as well as of future developments concludes the paper.”

Sample dialogue between human and ELIZA“Men are all alike.IN WHAT WAYThey're always bugging us about something or other.CAN YOU THINK OF A SPECIFIC EXAMPLEWell, my boyfriend made me come here.YOUR BOYFRIEND MADE YOU COME HEREHe says I'm depressed much of the time.I AM SORRY TO HEAR YOU ARE DEPRESSEDIt’s true. I am unhappy.DO YOU THINK COMING HERE WILL HELP YOU NOT TO

BE UNHAPPYI need some help, that much seems certain.WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELPPerhaps I could learn to get along with my mother.TELL ME MORE ABOUT YOUE FAMILYMy mother takes care of me,WHO ELSE IN Y0UR FAMILY TAKES CARE 0F YOUMy father.YOUR FATHERYou are like my father in some ways.WHAT RESEMBLANCE DO YOU SEEYou are not very aggressive but I think you don't want me to notice

that.WHAT MAKES YOU THINK I AM NOT VERY AGGRESSIVEYou don’t argue with me.WHY DO YOU THINK I DON'T ARGUE WITH YOUYou are afraid of me,DOES IT PLEASE YOU TO BELIEVE I AM AFRAIDOF YOUMy far her is afraid of everybody.WHAT ELSE COMES TO MIND WHEN YOU THINK OF YOUR FATHER”


[Weizenbaum 1966]

Example:

“You are very helpful”-> “I are very helpful”

by a simple substitution rule-> “What makes you think I am very helpful”

by a decomposition template: (0 I are 0)by a reassembly rule:

(What makes you think I am 4)

After Ri,j was applied, an index in the transformation list is inserted to prevent the same reassembly rule from being applied in a row.

(How = (What)) allows the transformation rule for “how” to be equally applied to “what”

(MOTHER DLIST (/NOUN FAMILY))allows the keyword “MOTHER” to be identified as a noun and as a member of the class “family”. As a result, “MOTHER” will match “(/FAMILY)” in a decomposition rule.

Transformation rules: collection of key lists.

where, K: keyword, Di: decomposition template, Ri,j: Reassembly rule.

Example of a decomposition template and a reassembly rule:D: (0 YOU 0 ME) R: (WHAT MAKES YOU THINK I 3 YOU)

“It seems that you hate me”-> “What makes you think I hate you”

Substitution rule. E.g.,

(YOURSELF = MYSELF)(MY YOUR 5 (transformation rules))

)))()()()((

))()()()((

))()()()(((

,2,1,

,22,21,22

,12,11,11

2

1

nmnnnn

m

m

RRRD

RRRD

RRRDK

ELIZA: Natural language dialogue system without explicit knowledge about the discourse domain

"Mary socked John."

"Mary punched John."

"Mary hit John with her fist."

"John was socked by Mary."

"Marie a donne un coup de poing a Jean."

"Maria pego a Juan."

Story understanding and generation

Primitive Meaning

ATRANS Transfer of an abstract relationship (i.e. give)

PTRANS Transfer of the physical location of an object (e.g., go)

PROPEL Application of physical force to an object (e.g. push)

MOVE Movement of a body part by its owner (e.g. kick)

GRASP Grasping of an object by an action (e.g. throw)

INGEST Ingesting of an object by an animal (e.g. eat)

EXPEL Expulsion of something from the body of an animal (e.g. cry)

MTRANS Transfer of mental information (e.g. tell)

MBUILD Building new information out of old (e.g decide)

SPEAK Producing of sounds (e.g. say)

ATTEND Focusing of a sense organ toward a stimulus (e.g. listen)

[Schank 1975]

Action: PROPELActor: MaryObject: FistFrom: MaryTo: John

Margie


SAM (Script Applier Mechanism)[Cullingford 1981] Cullingford, Richard E: SAM and Micro SAM. In Roger C. Schank, & Christopher K. Riesbeck (Eds.), Inside computer understanding. Hillsdale, NJ: Erlbaum, 1981

FRUMP [Dejong 1979] DeJong, Gerald F.: Skimming stories in real time: An experiment in integrated understanding (Technical Report YALE/DCS/tr158). New Haven, CT: Computer Science Department, Yale University, 1979

Script-based understanding


Plan-based understandingPAM[Wilensky 1978] Wilensky, Robert: Understanding goal-based stories (Technical Report YALE/DCS/tr140). New Haven, CT: Computer Science Department, Yale University, 1978.

POLITICS[Carbonell 1978] Carbonell, Jaime: Subjective understanding: Computer models of belief systems (Technical Report YALE/DCS/tr150). New Haven, CT: Computer Science Department, Yale University, 1978.

Dynamic MemoryIPP [Lebowitz 1980] Lebowitz, Michael : ¥Generalization and memory in an integrated understanding system (Technical Report YALE/DCS/tr186). New Haven, CT: Computer Science Department, Yale University, 1980.

BORIS[Lehnert 1983] Lehnert, Wendy G., Dyer, Michael G., Johnson, Peter N., Yang, C. J., Harley, Steve: BORIS -- An experiment in in-depth understanding of narratives. Artificial Intelligence, 20(1), 15-62., 1983.

CYRUS[Kolodner 1984] Kolodner, Janet L.: Retrieval and organizational strategies in conceptual memory: A computer model. Hillsdale, NJ: Erlbaum, 1984.

Story tellingTALE-SPIN[Meehan 1981] Meehan, James: TALE-SPIN and Micro TALE-SPIN. In Roger C. Schank, & Christopher K. Riesbeck (Eds.), Inside computer understanding. Hillsdale, NJ: Erlbaum, 1981.

Early Speech Dialogue Systems

[Erman 1980]

The Hearsay-II Speech-Understanding System

Integrates multiple levels of information processing by knowledge sources coordinated by the blackboard model: • Parameter• Segment• Syllable• Word• Word-sequence• Phrase• Data base interface

Combining top-down (hypothesis driven) and bottom-up (data-driven) processing.

Selective attention to allocate limited computing resources.

[Erman 1980]

The blackboard architecture

| ARE | ANY | BY | FEIGENBAUM | AND | FELDMAN |

(g) Phrases

(f) Word sequences

(e) Words - 1

(d) Words – 2

(c) Syllable classes

(b) Segments

(a) ParametersSEG

POM

MOW

WORD-CTL

VERIFY

PARSE

SEMANT

WORD-SEQ

VERIFY

WORD-SEQ-CTL

PREDICT STOP

CONTACT

RPOL

Early Speech Dialogue Systems

PUT-THAT-THERE

The conjoint use of voice-input and gesture recognition to command events on a large format raster-scan graphic display in a “Media Room”

Commands

“Create a blue square there.”“Move the blue triangle to the right of the green square”“Move that to the right of the green square.” (with pointing)“Put that there” (indicated by gesture)“Make that smaller” (with pointing gesture)“Make that (indicating some item) like that (indicating some other item)”“Delete that” (pointing to some item)“Call that … the calendar” (with pointing)

[Bolt 1980]

“Phil”: concrete image of a personal assistant.

The Knowledge Navigator

[Apple Computer 1987]

Phil, the agent

CASA: Computers Are Social Actors

Social responses to computers

- not: conscious beliefs that computers are human or human-like.

- not: from user’s ignorance or psychological or social dysfunctions

- not from: a belief that subjects are interacting with programmers

- rather: human-computer relationship is fundamentally social.

Nass 1994]

- Support interactive give and takeAssistants will need not only respond to questions but also ask questions

- Recognize the costs of interaction and delayIt is inappropriate to require the user’s confirmation of every decision made while carrying out a task.

- Manage interruptions effectivelyWhen it is necessary to initiate an interaction with the user, the assistant needs to do so carefully, recognizing the likelihood that the user is already occupied to some degree.

- Acknowledge the social and emotional aspects of interactionTo become a comfortable working partner, a computer assistant will need to vary its behavior depending on such the task, the time of day, and the boss’s mood.

Requirements

=> The PERSONA project @ Microsoft Research[Ball 1997]

From Tool-based Computing to Assistive Interface

[Ball 1997]

The architecture of Peedy

Spoken Language

Interpretation

Character Animation

Sound Generation

Background and

Object Animation

Dialogue

Management

Microphone

Whisper

Speech

Recognition

Names

Proper Name

Substitution

NLP

Language Analysis

Semantic

Template Matching &

Object Descriptions

Dialogue

Context & Conversation State

Application

CD Changes

Speech Controller

Player/ReActor

Animation EngineCharacter

Sound

CD Player

[Ball 1997]

Names DatabaseAction Templates Database

Object Database (CDs)

Dialogue Rules Database

Speech & Animation Database

The architecture of Peedy

[Blumberg 1994]

Believable animals

How are we to build behaviorally-based animated animals whose level of behavioral complexity is on the order of dogs and cats?

For the user to say “Hey, it acts just like my dog.”

i. The artificial animal must have a believable set of internal needs and motivations and an appropriate set of activities through which those internal needs and motivations may be satisfied and expressed.

ii. It must respond with an appropriate and believable activity on every time step given its internal state, past history and a perceived environment with its attendant opportunities, challenges and changed. Moreover, the pattern and rhythm of the chosen activities must be believable and consistent with those of the real animal.

iii. The interaction must be believable. The creature should respond to the actions of the user in a believable way and the user should be able to interact with the creature using natural gestures and a minimum amount of hardware in-between.

For the user not to say “Wait a minute, this is a dumb robot, my dog would never do that!”

i. It should not mechanically respond to stimulus. An animal’s response to a given stimulus is highly dependent on its internal state.

ii. It should not get stuck in “loops”. Animals are very good at avoiding two types of loops. The first type of loop is one in which the animal is pathologically engaged in a single activity. The second type of loop is one in which the animal dithers incessantly among two or more competing activities.

iii. It should avoid patently stupid behavior. Animals often display a certain common sense or “horse sense”.

Believability

Key ideas from Ethology

Inhibition and fatigue play an important role in ethological models of action-selection and temporal patterns of behavior.

i. Animals engage in one behavior at a time. Yet, animals typically do not mindlessly pursue an activity indefinitely to the detriment of other needs.

ii. Animals sometimes appear to engage in a form of time-sharing in which low priority activities are given a chance to execute, despite the presence of a higher priority activity.

iii. While animals typically do not dither between multiple activities they will nonetheless interrupt a behavior when another behavior becomes significantly more appropriate.

[Blumberg 1994]

Believability

Ludlow’s Model

i. An activity such as feeling, or drinking has a value which is based on the sum of its relevant internal and external factors less inhibition it receives from competing activities.

ii. Competing activities are mutually inhibiting, where a given activity i inhibits activity j by an amount equal to activity i’s value times an inhibitory gain kji

iii. If (a) activities are mutually inhibiting, (b) inhibitory gains are restricted to be greater than 1, and (c) values of activities is restricted to being zero or greater, then this model would result in a winner-take-all system, in which only one activity would have a none-zero value once the system stabilized.

iv. A level of fatigue is proposed to be associated with every activity. v. The level of fatigue is influenced by a number of factors. When an activity is

active, the level of fatigue increases in proportion to the activity’s value, which reduces the value of an active activity over time.

vi. When an activity is no longer active, the fatigue decays toward zero, and the value of the activity rises.

[Blumberg 1994]

Believability

[Mateas 1997]

Drama = Character + Story + Presentation

Interactive Drama：Interactive drama concerns itself with building dramatically interesting virtual worlds inhabited by computer-controlled characters, within which the user (the player) experiences a story from a first person perspective. … (Bates 1992)

Oz project

[Mateas 1997]

Oz project

PersonalityRich personality should infuse everything that a character does. What makes characters interesting are their unique ways doing things.

EmotionCharacters exhibit their own emotions and respond to the emotions of others in personality-specific ways.

Self-motivationCharacters have their own internal drives and desires which they pursue whether or not others are interacting with them.

Change Characters grow and change with time, in a manner consistent with their personality.

Social relationshipsCharacters engage in interactions with others in a manner consistent with their relationship. In turn, these relationships change as a result of the interaction.

Illusion of lifePursuing multiple, simultaneous goals and actions, having broad capabilities, and reacting quickly to stimuli in the environment.

Antonio Damasio’s “Descartes’ Error – Emotion, Reason and the Human Brain”

If you come to know that animal or object or situation Xcauses fear, you will have two ways of behaving toward X. The first way is innate; you do not control it. Moreover, it is not specific to X; a large number of creatures, objects, and circumstances can cause the response. The second way is based on your own experience and is specific to X. Knowing about X allows you to think ahead and predict the probability of its being present in a given environment, so that you can avoid X, preemptively, rather than just have to react to its presence in an emergency. …

Primary emotions depend on limbic system circuitry, the amygdala and anterior cingulate being the primary players. After an appropriate stimulus activates the amygdala, a number of responses ensue: internal responses, muscular responses, visceral responses, and responses to neurotransmitter nuclei and hypothalamus.

Secondary emotions utilize the machinery of primary emotions. The stimulus may still be processed directly via the amygdala but is now also analyzed in the thought process, and may activate frontal cortices (VM). VM acts via the amygdala. (p. 133-137)H: Hypothalamus,

VMF: ventromedial prefrontal cortex [Damasio 1994]

Real world

Cognitive Process

Amygdala

VMF

H

Body

Brain

Sensory input

Reactions

Affective Computing

[Picard 1997]

High level

Low level

Representations /

signals

Representations /

signals

Inference and decision making

Pattern recognition and synthesis

Emotional states

Cognitive processing

[Hayes-Roth 1998]

Jennifer James

Text

Multi-modal

dialogue engine

Graphics

Sound

Cloud

Rea

[Cassell 1999]

Implements the social, linguistic, and psychological conventions of conversation.

- Has a human-like body. Uses eye gaze, body posture, hand gestures, and facial displays to organize and regulate the conversation.

- The conversational model relies on the function of non-verbal behaviors as well as speech.

- A full symmetry between input and output modalities: not only respond to visual, audio and speech cues (such as speech, shifts in gaze, gesture, and non-speech audio but also generate these cues.

Rea

[Cassell 1999]

Discourse Model

Knowledge Base

Word timing

Language Tagging

Behavior Scheduling

Text Input

Animation

Behavior Suggestion

Behavior Selection

Generator Set Filter Set

Behavior Generation

Translator

BEAT (Behavior Expression Animation Toolkit)

[Cassell 2004]

Challenge: A robot that can participate in conversation

Long-term goal of conversational informatics

Application

Platform Evaluation

Content production

Model building

Analysis

Theory

Measurement

Conversational interactions

Conversational Informatics


Building conversational

systems

Understanding conversation

Approach 1: Making environment playful

Smart conversation space that encompasses participants and referents of conversation.-> Engaged conversational interactions-> More insights about the common ground

Augmentationby MR (VR—AR)

Daily living space

Immersive interaction with ICIE

[Lala 2013]https://www.youtube.com/watch?v=V-9SKpcMrzk

Immersive interaction with ICIE

https://www.youtube.com/watch?v=V-9SKpcMrzk


https://www.youtube.com/watch?v=V-9SKpcMrzk

Projecting the real world into the virtual world


http://www.youtube.com/watch?v=68UrJv65HvY

http://www.youtube.com/watch?v=68UrJv65HvY

Telepresence by connecting ICIE with a networked robot

Feedback

generation

Motion mapping

User motion sensing

Head recognition

Gesture recognition

Face model

Human body model

WOZ operating environment

WOZ operatorTele-operated robot

The conversation place


Avatars and NPCs in virtual basketball

[Lala 2014] https://www.youtube.com/watch?v=ZtjSRjHBgUs

https://www.youtube.com/watch?v=ZtjSRjHBgUs

Inducing intentional stance toward agent players

Q: How can we induce an intentional stance toward NPCs?H: Demonstrating strategy change.

[Suyama-Ohmoto-Nishida 2015]

Player #1 in ICIE #1

Red hat

Player #2 in ICIE #2

Green hat

Player #3 (agent)

Blue hat

The Interactive Dome

[Nishida-Nakazawa-Ohmoto-Mohammad 2014]https://www.youtube.com/watch?v=wxkZ9armrI8

Appearance Architecture Projection

Inside view

https://www.youtube.com/watch?v=wxkZ9armrI8

Analyze the behaviors of

participants by integrating audio-

visual and

physiological .measurement

Approach 2: Understanding by measurement

IMADE: Interaction Measurement, Analysis, and Design Environment


iCorpusStudio

Collaborative annotation system


3D conversation capture—over the shoulder view

https://www.youtube.com/watch?v=J08vG8wnrnw

[Yano 2012]

https://www.youtube.com/watch?v=J08vG8wnrnw

Corneal Imaging CameraScene cameraEye camera

• Lightweight and versatile system • Appl.: Google Glass like HMD, unconstrained setups

Corneal Image Feature MatchingProblem:Local feature correspondence + RANSAC does not work due to large noise in eye images

Approach:1. Formulate problem as registration of 3D

spherical light maps of eye and scene image2. Single point algorithm for robust alignment

Non-intrusive Eye Gaze Tracking (EGT) by corneal imaging

Eye images with GRP

Scene images with PoG

↑ Aligned results (from eye images)

Peripheral vision map overlaid to scene imagePeripheral vision map in eye image

Gaze Reflection Point Mapping

Application 1: Non-intrusive and uncalibrated PoG estimation

Application 2: Peripheral vision estimation

Gaze trajectory in static scene image

Inducing intentional stance toward agent players

[Takeda-Matsuda-Ohmoto-Nishida 2015]

Deliberating but not reacting

Deliberating and reacting

Doing nothing Reacting but not deliberating

SCR

LF/HF

+

+

-

-

Hide from the chaser

Not concentrated

Hide in the place the chaser checked previously.

Simply moving around

Multi-dimensional model of

estimating internal state of human

Concentration

Level of proficiency

Learning by imitation—Generic framework

Measurement Corpus Generalization Dialogue

patterns

[Nishida et al 2014]

Endow robots with an ability

of autonomously imitating

human behaviors.

Interactions from observation—General framework

Causes

Causes

Causes

1a

2a

3a

t

t

t


[Mohammad 2009]

Learning by demonstration

The problem formulation

[Mohammad 2010]

Gesture stream

Action stream

Problem formulation:Find approximately repeated subsequences in a longer time series.

(1) Motif Discovery—Finding Patterns of Interaction


Future

Change angle

GH

Past Futuret

;...; 1H t seq t n seq t

1 ;...;G t seq t seq t n

1

1

1

ˆ

f

f

T

i i i

l

i i

i

l

i

i

s t t t

cs

x

c

t

TtVtStUtH )()()()(

Find optimal lPggT uutGtG )()(

Find optimal lF

11and,)( jjjF

g

ii liut

fT

ll

T

lli li

tUU

tUUt ,

)(

)()(

)()()()()(ˆ)(~ tttttxtx PFPF

Learning by imitation

Robust Singular Spectrum Transform

[Mohammad 2009]

Fluid Learning—work in progress

Anytime learner (automatic segmentation, perspective taking, significance estimation, … )

[Mohammad 2015]

[Mohammad 2016]

Evolving stories underlying conversation

Physical world

Imaginary world

Goffman’s Frame Analysis: keying, fabricating, …

Evolving stories underlying conversation

Our approach: Shared Virtual Space

Synthetic evidential study

Synthetic evidential study (SES) combines dramatic role play and group discussion to help people spin stories by bringing together partial thoughts and evidence.

Componentize

Reuse

SES session Interpretation archive

Structured collection of {story, background, critique}Agent Play

Dramatic role play

Group discussions


At the beginning of the 18th century, a feudal lord named Asano Takumi-no-kami

Naganori was in charge of a reception for envoys from the Imperial Court in Kyoto.

Another feudal lord, Kira Kozuke-no-suke Yoshinaka, was appointed to instruct

Asano in the ceremonies. On the day of the reception, while Kira was talking with

Yoriteru Kajikawa, a lesser official, at “Matsu no Roka” (“Hallway of Pine Trees”) in

Edo Castle, Asano came up to them screaming “This is for revenge!!” and slashed

Kira twice with a short sword. Soon after the incident, Kajikawa restrained Asano,

who was then imprisoned. The reason for the attack was not known, though it was

widely believed that Kira had somehow humiliated Asano. Ultimately Asano was

sentenced to commit seppuku, a ritual suicide, but Kira went without punishment.

Hallway of Pine Trees (from Chushingura)

Kira Kozuke-no-suke Yoshinaka

Asano Takumi-no-kami Naganori

Yoriteru KajikawaWhy was it possible?

How did it happen?

What did each think?

Dramatic Role Play

Group play capture

Agent play

Discussion phase

T. Ookaki, M. Abe, M. Yoshino, Y. Ohmoto and T. Nishida. Synthetic Evidential Study for Deepening

Inside Their Heart. IEA/AIE 2015.

Asano

Kira

Kajikawa

Third person view First person view

Discussions

Observed communicative behaviors

The observed behavior of participants

• Acting behavior -- what the participants do when they are actually acting.

• Commenting behavior -- a critique of the incidents and the acting, including reasoning, discussion and thinking aloud.

• Oral editing behavior -- suggested revisions to the acting

• Idling behavior -- all actions that are not classified above.

The role play phase

• Twelve detailed behaviors were observed in the role play phase: (1) acting, (2) commenting, (3) oral editing, (4) idling, (5) speaking his/her role, (6) acting + thinking aloud, (7) acting + commenting, (8) acting + oral editing, (9) acting + speaking his/her role, (10) idling + commenting, (11) idling + oral editing, and (12) idling + speaking his/her role.

• Roughly classified into the rehearsal acting scene and production acting scene. Transitions between the rehearsal acting scene and production acting scene can be clearly identified by eye. For example, just before the production acting scene, explicit signaling behavior such as giving-a-cue was observed. [Ookaki et al 2015]

Contrasting objective and subjective views

After contrasting the objective and subjective views on the action of Kira falling prone, one participant remarked that “Falling prone seems strange in the objective viewpoint. However, when I experience the subjective viewpoint of Kira, it looks like a natural movement”, and everyone agreed.

Subjective view transfer

When experienced with Kajikawa’s viewpoint, the Kira player said

“(Kajikawa was too slow to) restrain Asano after having been slashed,” which is considered to reflect the Kira player’s view that Kajikawa should have helped Kira earlier.

After a while, however, the Kira player said, “When Asano swung his sword for the first time, Kajikawa might have been farther away from Asano,” suggesting that he considered that it prevented Kajikawa from restraining Asano earlier.

Multiple lines of story

Tajomaru met Takehiro and Masago

Tajomaru tied Takehiro to a cedar tree

Tajomaru and Takehiro went into the woods

Tajomaru took Masago into the woods

Tajomaru took his way with Masago

Tajomaru killed Takehiro

Masago killed Takehiro

Takehiro killed himself

Masago asked Tajomaru to kill Takehiro

Masago said that she would go with either one of them and the other one must die

Takehiro despised Masago

Tajomaru asked Masago to come along

Tajomaru kicked Masago to the ground

Duel between Tajomaru and Takehiro

Masagoescaped

Masagoleft

Tajomaruwas caught

Masago cared for Takehiro

Tajomaru found Masago with Takehiro

“In the wood” by Ryunosuke Akutagawa

Multi-layered Multi-view Interpretation

Representing story and its interpretations

Participant’s interpretation

A’s interpretation

B’ interpretation

Actors’ interpretation

Tajomaru’s

Wife’s

Husband’s

Dramatic Scene

Third Person Tajomaru Wife Husband

{Spatial, Temporal} X {Locality, Influence}

“In the Woods” by Ryunosuke Akutagawa

Tajomaru’s Story (First person view)

The Bounty Hunter's Story (Third person view)

Major Characters:

- Tajomaru, the robber

- Takehiro Kanazawa, a samurai working in Wakasa

- Masago, Takehiro’s wife

Plot

- “The same incident in the woods” is told from four witnesses and three actors

- Stories told by three actors contradict with each other.

Scene Abstract

[WC] The Woodcutter's Story The woodcutter found the body of Takehiro.

[TM] The Traveling Monk's Story

The traveling monk saw Takehiro and Masago yesterday on the Yamashina road.

[BT] The Bounty Hunter's Story

The bounty hunter caught Tajomaru who was thrown out of his horse and moaning in pain at the bridge of Awataguchi.

[TJ] Tajomaru’s Story Takehiro was killed as a result of duel between Tajomaru and Takehiro.

[Ma1] Masago’s Story - 1 Masago was kicked to the ground by Tajomaru. Masago was dispised by Takehiro. Masago was fainted.

[Ma2] Masago’s Story - 2 Masago stabbed him hard in the chest.

[TK] Takehiro’s Story through a Medium

Masago asked Tajomaru to kill Takehiro. Tajomaru kicked Masago to the ground. Masago ran toward the deep part of the woods, while Tajomaru was asking Takehiro if he wanted Tajomaru to kill Masago or let her go.

Experiment

Purpose- Verify that SES will help participants deepen the interpretation

- Study the multiple application of SES cycles

Task in three stages- We asked participants to annotate each scene in three stages.

Stages

Tajomaru’s Story (First person view)The Bounty Hunter's Story (Third person view)

Stage 1 (Condition 1)5 participants



Phase 1: comments presented to each participant

None 5 comments obtained atStage 1

9 comments obtained at Stage 1 and 2

Phase 2: networked comments presented to each participant

None 5 network comments obtained at Stage 1

9 networked comments obtained at Stage 1 and 2

Annotation subsystem

• Displays previous comments

• Allows the participant to add new comments

• Each comment has one of the following types:[Confirmation], [Empathy], [Confirmation], [Conjecture], [Doubt], [Question], [Surprise]

Some results

We asked participants to write their own interpretation for “In the Wood”.

Before experiment: “Not clear” or no answer (6 out of 14 participants)

After experiment: Almost all participants were able to write their own short interpretations.

・The number of comments decreased at Stage 2, while it significantly increased at Stage 3.

・Conjecture-type comments are increasing ⇒ More new interpretation at later stages

・Confirmation-type comments are decreasing ⇒ Less confirmation is necessary at later stages.

Stage 1 Stage 2 Stage 3 Subtotal

Clarification 55 19 93 167

Empathy 17 6 41 64

Confirmation 23 9 12 44

Doubt 15 5 20 40

Conjecture 17 15 58 90

Question 12 11 18 41

Surprise 2 2 5 9

Subtotal 141 67 247 4550

0.1

0.2

0.3

0.4

0.5

STA GE 1 STA GE 2 STA GE 3

R AT IO O F C O M M ENTS O F EAC H T Y P E

Clarification Empathy ConfirmationDoubt Conjecture QuestionSurprise

Potential applications of SES

• Academic research

– Social sciences, History and archaeology, Literature study

• Evidence-based methodologies

– Criminal investigation, profiling

– Onsite investigation

• Planning

– Strategy formation

– Disaster planning

• Training

– Social skills training, language training

– Dramatic problem solving

1. Full interpretation of SES

2. Using SES for frame analysis

3. Evaluation of SES

4. SES for common ground building

5. Conversational agents based on SES

Future work

1. Conversation as a powerful medium for bridging natural and artificial agents.

2. An integrated approach is necessary to induce synergetic effect of individual insights in the right context.

3. Community will benefit from reciprocal relationship between common ground and witty conversations.

4. Synthetic Evidential Study as a sustained participatory activity to review and cultivate common ground.

Conclusion

Agenda

CreditsWill be awarded based on a report on subjects given at the class. Due date (July 31st, 2016)

Agenda (planned)

1. Introduction (April 13) Nishida2. Methodologies for Conversational System Development (April 20) Nishida3. Smart Conversation Space (April 27) Ohmoto4. Measurement, Analysis and Modeling (May 11) Ohmoto5. Learning by Imitation (May 17) Nishida6. Time Series Mining (May 25) Nishida7. Stories and Conversation (June 1) Nishida8. Affective Computing (June 8) Nishida9. Cognitive Design (June 15) Ohmoto10. Aspects of Conversation—1 (June 22) Nishida11. Aspects of Conversation—2 (June 29) Nishida12. Aspects of Conversation—3 (July 6) Nishida13. Speaking Turn Taking System (July 13) Nishida14. Synergy and Wrap up (July 20) Nishida

Course materials available from: http://www.ii.ist.i.kyoto-u.ac.jp/?page_id=5646&lang=ja

http://www.ii.ist.i.kyoto-u.ac.jp/?page_id=5646&lang=ja

References[Apple Computer 1987] http://homepage.mac.com/ericestrada/Movies/iMovieTheater53.html[Ball 1997] Gene Ball, Dan Ling, David Kurlander, John Miller, David Pugh, Tim Skelly, Andy Stankosky, David Thiel, Maarten Van Dantzich, and Trace

Wax. Lifelike Computer Characters: The Persona Project at Microsoft Research. Software Agents. Jeffrey M. Bradshaw (ed.). AAAI/MIT Press, 1997.[Bolt 1980] Richard A. Bolt: Put-that-there": Voice and gesture at the graphics interface. In Proceedings of the 7th annual conference on Computer

graphics and interactive techniques, Vol. 14, No. 3. (July 1980), pp. 262-270.[Cassell 1999] Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjálmsson, H. and Yan, H. (1999). "Embodiment in Conversational

Interfaces: Rea." Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 520-527. Pittsburgh, PA.[Clark 1996] Herbert H. Clark. Using Language. Cambridge, Cambridge University Press (1996)[Goffman 1974] Erving Goffman. Frame Analysis: An Essay on the Organization of Experience. New York: Harper and Row, 1974.[Higuchi 2016] Higuchi, O. Interactive 3D Virtual Space Design Tool by Using Point Cloud Data Capture and Immersive Environment, Unpublished

Undergraduate Thesis, Faculty of Engineering, Kyoto University, 2016. [Lala 2014] Lala D, Mohammad Y, Nishida T. A joint activity theory analysis of body interactions in multiplayer virtual basketball, 28th British Human

Computer Interaction Conference 2014.[Mohammad 2009] Mohammad Y, Nishida T, Okada S. Unsupervised simultaneous learning of gestures, actions and their associations for human-robot

interaction, In: Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems (IROS); p. 2537–2544.[Mohammad 2010] Mohammad Y, Nishida T. Learning interaction protocols using Augmented Bayesian Networks applied to guided navigation. In:

Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 4119–4126.[Mohammad 2015] Mohammad, Y. and Nishida, T. Data Mining for Social Robotics, Springer 2015.

http://www.springer.com/us/book/9783319252308[Nishida-Nakazawa-Ohmoto-Mohammad 2014] Nishida, T., Nakazawa, A., Ohmoto, Y., Mohammad, Y. Conversational Informatics–A Data-Intensive

Approach with Emphasis on Nonverbal Communication, Springer 2014.http://www.springer.com/us/book/9784431550396

[Nishida et al 2015] Nishida T et al. Synthetic evidential study as primordial soup of conversation. In: Chu W et al. editors. DNIS 2015, LNCS 8999 2015; p. 74–83.

[Ookaki et al 2015] Ookaki T et al. Synthetic evidential study for deepening inside their heart. In: IEA/AIE 2015; p. 161–170.[Ookaki 2016] Ookaki, T. Building a Support System for Story Interpretation from Multiple Perspectives, Unpublished Master Thesis, Graduate School of

Informatics, Kyoto University, 2016. [Suyama-Ohmoto-Nishida] Suyama T, Ohmoto Y, Nishida T. Improving engagement of users by changing agent's strategy action dynamically based on the

observed user's state. JSAI Annual Convention 2015 (in Japanese).[Takeda-Matsuda-Ohmoto-Nishida 2015] Takeda S, Nishida T, Ohmoto Y. Method of Estimating Concentration in Exercise Game by Combining Multiple

Physiological Indices. JSAI Annual Convention 2015 (in Japanese).[Winograd 1972] Terry Winograd: Understanding Natural Language, Academic Press, 1972.[Yano2012] Yano M. Construction of 3-Dimensional Recording Environments for Multi-party Conversation with RGB-Depth Sensors. Unpublished master

thesis, Graduate School of Informatics, Kyoto University, 2012 (in Japanese).

http://homepage.mac.com/ericestrada/Movies/iMovieTheater53.html

http://www.springer.com/us/book/9783319252308

http://www.springer.com/us/book/9784431550396

conversational informatics - 京都大学 · siri (apple), now (google), ortana (microsoft), m...

Documents