computer-supported interaction hamed ketabdar shiva sundaram
Post on 22-Dec-2015
217 views
TRANSCRIPT
Computer-supported interaction
• Technologies which support interaction between human, machine and environment
• Capturing, processing, and retrieving multimedia
Computer-supported interaction
Schedule: Thursday 14-16h, FR 0512C, starting 04.11.2010
Hamed Ketabdar: PhD in Electrical Engineering from Swiss Federal Institute of Technology at Lausanne (EPFL)[email protected]
Shiva Sundaram: PhD in Electrical Engineering from University of Southern California (USC)[email protected]
Outline• Multi-modal Interfaces
– Input methods: keyboard, pen, voice, gesture, tactile (touch) …
– Output modalities: audio, video, tactile
– Fusion of modalities
• Speech Processing – Speech Recognition: statistical
methods, acoustic modelling, and decoding
– Meta data extraction (age, gender, language, emotion)
– Audio-visual speech recognition– Multi-lingual speech recognition
• Information Retrieval– Representation– Clustering/segmentation/
classification
– Integration with other processes
• System and Architecture– Natural Language Processing
• Translation
Practical sessions
Possibility for small class projects:
Quickly develop multi-modal interfaces based on our context aware SDK for iPhone …
User Activity and Context Detection with Mobile Phones
Detect whether you are walking, sitting, in a meeting, concert or party, or in an emergency situation …
Mobile phones are equipped with microphone and tilt sensors Audio context is detected using microphone output Physical activity signature is captured using tilt sensor output Tilt and audio information are combined to detect context and/or user activity Time, duration and other prior knowledge can be also integrated
Applications: Smart mobile phones
• Control ring and other functionalities according to context Surveillance and organization (employees, elderly, children)
• Information about user activity can be used for better organization of employees and taking care of elderly and children
Smart home environment
General Purpose Audio Switches
A Switch which can be triggered based on speech commands or non-speech events
The commands or speech events can be learned automatically
Switch can be easily reconfigured for a new command or application
Involves: Automatic language/event acquisition Robustness to different sources of variabilities
Applications: Smart environments, security and surveillance
Dreams: You buy it, as you may buy a normal mechanical switch
in stores It can be installed everywhere the same way as a
normal switch
y
ae
yeah
Call ClassificationAnger, Gender, Age, Language, …
Hierarchical design and discriminative training:
Discriminative representation of emotional states Efficient fusion of different acoustic features with higher level
information (e.g. duration, message content) Efficient feature selection mechanism, less computational
load for feature extraction
Discriminative
transformation
Pitch
, In
tensity
Combination
Textual data, duration, …
Call classification
Digital Logging of Physical Activities and Context
Enhancing Emergency and Security/Privacy Functionalities in Mobile Phones•Unexpected physical events experienced by a mobile phone can be signs of critical security or emergency scenarios:
•Having phone under the risk of being lost or stolen: confidential information on the phone can be exposed
•Phone user experiencing an accident
MobileHCI 2009, Ubicomp 2009
Digital Logging of Physical Activities and Context: Entertainment: What Type of Music You May Like to Hear?Automatic selection of music based on context:
Actual activity of user Audio activity in the environment
Habits and music taste can be also integrated
11th International ACM Conference on Computers and Accessibility (ASSETS 2009)
Interaction with Mobile User Interface
Sending commands Turning pagesZooming Click and Double ClickCalling an application or service
MagiSign: “3D Magnetic Signatures” for User Identification/Authentication
•The user creates his own arbitrary 3D signature using a properly shaped magnet in the 3D space around the device.
• Wider choice for authentication as it can be flexibly drawn in 3D space around the device.
• No hardcopy of 3D magnetic signature can be easily generated.
• Unlike Regular signatures can not be affected by the quality of paper, pen, ink, etc.
•3D Magnetic Signature:• A simple 3D motion • Regular signature of the user drawn on the air! • Any other combination of even higher
complexity actively using all 3D space around the device.
•A magnet as a physical key? A personalized magnet in terms of shape and polarity can enhance the authentication process …
•Can be used for accessing a service or data, entrance doors, or simply instead of regular signature during a purchase …
Call classification
1) 2) 3)
Even simple gestures may be used for authentication
MagiWrite: Write It in the Air!
•Text entry based on magnetic field interaction
•Character shaped gestures are written in the space around the device
•Suitable for dialling a number, entering a pin code, selecting a text entry, etc.
•Especially useful for very small mobile devices in which it is hard to operate or design small keypads or touch screens
MagiEntertain: Using Magnetic Interaction in Mobile Entertainment Applications (Gaming and Audio Synthesis)•Conventionally touch pads and touch screens are used for gaming
• Screen occlusion
•MagiGame: Actions of a game avatar such as shooting, jumping, and changing the aim can be controlled
•No screen occlusion, natural gesture based interaction, more actions per minute, possibility of multi-player gaming on a device
•Adjusting different audio and DJ effects based on position, orientation, and movements of the magnet
•Changing sound volume and audio tracks in a portable music player
•New music instruments …, two players can play on the same instrument
Literature
Basics:•Laurence Rabiner and Biing-Hwang Juang:
„Fundamentals of speech recognition“ (Prentice
Hall, 1993)
•Bernd Pompino-Marschall: „Einführung in die
Phonetik“ (de Gruyter, 1995)
•Richard O. Duda, Peter E. Hart, David G. Stork:
„Pattern Classification“ (Wiley, 2000)
•Keinosuke Fukunaga: „Statistical Pattern
Recognition“ (Academic Press, 1990)
•Thomas H. Cormen: „Introduction to Algorithms“
(MIT, 1990)
Automatic Speech Recognition:•Ernst Günter Schukat-Talamazzini: „Automatische Spracherkennung -- Grundlagen, statistische Modelle und effiziente Algorithmen.“ (Vieweg, 1995)
•Andreas Wendemuth: „Grundlagen der stochastischen Sprachverarbeitung“ (Oldenburg, 2004)
•Tanja Schultz und Katrin Kirchhoff: "Multilingual Speech Processing" (Academic Press, 2006)
• Fred Jelinek: „Statistical methods for speech processing“ (MIT, 1997)