enterface ’08 project 2 “multimodal high level data integration” final report august 29th,...
TRANSCRIPT
eNTERFACE ’08Project 2
“Multimodal High Level Data Integration”
Final Report
August 29th, 2008
Application challenges
• 2 users in their home/office environment
• unrestricted natural language
• free human behavior
Components integrated
SpeechRecognizer
Video Analyzer
Sound Waves
SyntacticAnalyzer
Recognized String
Sequence ofImages
SemanticAnalyzer
Syntactic Triple
KnowledgeBase
Fusion Mechanism
Human BehaviorAnalyzer
Movements Coordinates
Movements Meanings
Advise PeopleLinguistic meanings
Audio Stream Video Stream
Audio Stream Video Stream
Sphinx-4Open CV
Sound Waves
C & C Parser
Recognized String
Sequence ofImages
C & C Boxer
Syntax Analysis
ProtegèJena
Fusion Mechanism
Human BehaviorAnalyzer
Movements Coordinates
Movements Meanings
Advise PeopleLinguistic meanings
Semantic Validation
Example Scenario
[Ronald] I want to call Nick. Nick mentioned that he attended a wine tasting course.
[Beto] It sounds interesting, I like wine.
[Ronald] Actually I plan to join the next class. He also mentioned a book about French wines, but I cannot recall the name of the author.
[Beto] Why don't you send a mail to Nick?
[Ronald] Maybe I can find a book about it in the library.
[Beto] Yes, you are right.
[Beto] Did you find it?
[Ronald] Yes, I did.
Hints for plan recognition by speech
Alerts:
want, need, wish, require, going to, plan, look for, wonder, can, may, must, do you know, do we have, etc.
Stop-alerts:
- negation (I am not going to…)
- past tense (Yesterday I was going to…)
Maybe I can find a book
about it in the library
Ronald is moving towards the book
shelves
Decision making
If (Ronald) [wants to send] {email to Nick} &
(Ronald [is moving to] {the computer} | He [is close to] {the computer}) then
open the mail client with the “to” field filled with [email protected]
If (Ronald) [can] find {book} [about] {it} [in] {the library} &
(Ronald [is moving to] {the library} then
There is a book about French wines on the first shelf.
If (Ronald) [can] find {book} [about] {it} [in] {the library} &
(Ronald [is moving to] {the computer}) then
Open a web search website and put the keyword in the search field.
Achievements• spatial relationships (based on the fixed “anchor” objects in the room)
• semantic fusion of events not coinciding in time
• good results in speaker identification: synchronisation between image and speech identification
• an open framework to manage fusion between two (our case) or more modalities was created during the project and will be enhanced further
• each component can run in a separated machine thanks to the distribution mechanism interchanging data through a TCP/IP network.
Future work• implement effective learning
• efficient decision making even from information fragments
• spatial relationships relatively to moving people
• 3D video analysis
• detection of orientation of the people in the scene
• eye gaze tracking
• recognition of various types of gestures
• dealing with natural language redundancy (repeating the same idea in different words)
Further development of results
• integration on the OpenInterface platform (openinterface.org)
• create an open-source community around the project to
- gain ideas and contributions from outside
- have new modalities to fuse
• create a website, a forum, a mailing list