computer-supported interaction hamed ketabdar shiva sundaram

Computer-supported Interaction

Hamed Ketabdar

Shiva Sundaram

Computer-supported interaction

• Technologies which support interaction between human, machine and environment

• Capturing, processing, and retrieving multimedia

Computer-supported interaction

Schedule: Thursday 14-16h, FR 0512C, starting 04.11.2010

Hamed Ketabdar: PhD in Electrical Engineering from Swiss Federal Institute of Technology at Lausanne (EPFL)[email protected]

Shiva Sundaram: PhD in Electrical Engineering from University of Southern California (USC)[email protected]

Outline• Multi-modal Interfaces

– Input methods: keyboard, pen, voice, gesture, tactile (touch) …

– Output modalities: audio, video, tactile

– Fusion of modalities

• Speech Processing – Speech Recognition: statistical

methods, acoustic modelling, and decoding

– Meta data extraction (age, gender, language, emotion)

– Audio-visual speech recognition– Multi-lingual speech recognition

• Information Retrieval– Representation– Clustering/segmentation/

classification

– Integration with other processes

• System and Architecture– Natural Language Processing

• Translation

Practical sessions

Possibility for small class projects:

Quickly develop multi-modal interfaces based on our context aware SDK for iPhone …

User Activity and Context Detection with Mobile Phones

Detect whether you are walking, sitting, in a meeting, concert or party, or in an emergency situation …

Mobile phones are equipped with microphone and tilt sensors Audio context is detected using microphone output Physical activity signature is captured using tilt sensor output Tilt and audio information are combined to detect context and/or user activity Time, duration and other prior knowledge can be also integrated

Applications: Smart mobile phones

• Control ring and other functionalities according to context Surveillance and organization (employees, elderly, children)

• Information about user activity can be used for better organization of employees and taking care of elderly and children

Smart home environment

http://images.google.ch/imgres?imgurl=http://www.dreamstime.com/person-walking-icon-thumb33655.jpg&imgrefurl=http://www.dreamstime.com/person-walking-icon-image33655&usg=__bEhyVDwBKURIx1z1ZsiDUpdHcCQ=&h=350&w=246&sz=18&hl=en&start=1&um=1&tbnid=QxXtoOgSIT9mmM:&tbnh=120&tbnw=84&prev=/images%3Fq%3Dwalking%2Bicon%26um%3D1%26hl%3Den

General Purpose Audio Switches

A Switch which can be triggered based on speech commands or non-speech events

The commands or speech events can be learned automatically

Switch can be easily reconfigured for a new command or application

Involves: Automatic language/event acquisition Robustness to different sources of variabilities

Applications: Smart environments, security and surveillance

Dreams: You buy it, as you may buy a normal mechanical switch

in stores It can be installed everywhere the same way as a

normal switch

y

ae

yeah

Call ClassificationAnger, Gender, Age, Language, …

Hierarchical design and discriminative training:

Discriminative representation of emotional states Efficient fusion of different acoustic features with higher level

information (e.g. duration, message content) Efficient feature selection mechanism, less computational

load for feature extraction

Discriminative

transformation

Pitch

, In

tensity

Combination

Textual data, duration, …

Call classification

Digital Logging of Physical Activities and Context

Enhancing Emergency and Security/Privacy Functionalities in Mobile Phones•Unexpected physical events experienced by a mobile phone can be signs of critical security or emergency scenarios:

•Having phone under the risk of being lost or stolen: confidential information on the phone can be exposed

•Phone user experiencing an accident

MobileHCI 2009, Ubicomp 2009

Digital Logging of Physical Activities and Context: Entertainment: What Type of Music You May Like to Hear?Automatic selection of music based on context:

Actual activity of user Audio activity in the environment

Habits and music taste can be also integrated

11th International ACM Conference on Computers and Accessibility (ASSETS 2009)

Interaction with Mobile User Interface

Sending commands Turning pagesZooming Click and Double ClickCalling an application or service

Motivating Design of Very Small Mobile Devices, Headsets, Wrist Watches, and Portable Music Players

MagiSign: “3D Magnetic Signatures” for User Identification/Authentication

•The user creates his own arbitrary 3D signature using a properly shaped magnet in the 3D space around the device.

• Wider choice for authentication as it can be flexibly drawn in 3D space around the device.

• No hardcopy of 3D magnetic signature can be easily generated.

• Unlike Regular signatures can not be affected by the quality of paper, pen, ink, etc.

•3D Magnetic Signature:• A simple 3D motion • Regular signature of the user drawn on the air! • Any other combination of even higher

complexity actively using all 3D space around the device.

•A magnet as a physical key? A personalized magnet in terms of shape and polarity can enhance the authentication process …

•Can be used for accessing a service or data, entrance doors, or simply instead of regular signature during a purchase …

Call classification

1) 2) 3)

Even simple gestures may be used for authentication

MagiWrite: Write It in the Air!

•Text entry based on magnetic field interaction

•Character shaped gestures are written in the space around the device

•Suitable for dialling a number, entering a pin code, selecting a text entry, etc.

•Especially useful for very small mobile devices in which it is hard to operate or design small keypads or touch screens

MagiEntertain: Using Magnetic Interaction in Mobile Entertainment Applications (Gaming and Audio Synthesis)•Conventionally touch pads and touch screens are used for gaming

• Screen occlusion

•MagiGame: Actions of a game avatar such as shooting, jumping, and changing the aim can be controlled

•No screen occlusion, natural gesture based interaction, more actions per minute, possibility of multi-player gaming on a device

•Adjusting different audio and DJ effects based on position, orientation, and movements of the magnet

•Changing sound volume and audio tracks in a portable music player

•New music instruments …, two players can play on the same instrument

Literature

Basics:•Laurence Rabiner and Biing-Hwang Juang:

„Fundamentals of speech recognition“ (Prentice

Hall, 1993)

•Bernd Pompino-Marschall: „Einführung in die

Phonetik“ (de Gruyter, 1995)

•Richard O. Duda, Peter E. Hart, David G. Stork:

„Pattern Classification“ (Wiley, 2000)

•Keinosuke Fukunaga: „Statistical Pattern

Recognition“ (Academic Press, 1990)

•Thomas H. Cormen: „Introduction to Algorithms“

(MIT, 1990)

Automatic Speech Recognition:•Ernst Günter Schukat-Talamazzini: „Automatische Spracherkennung -- Grundlagen, statistische Modelle und effiziente Algorithmen.“ (Vieweg, 1995)

•Andreas Wendemuth: „Grundlagen der stochastischen Sprachverarbeitung“ (Oldenburg, 2004)

•Tanja Schultz und Katrin Kirchhoff: "Multilingual Speech Processing" (Academic Press, 2006)

• Fred Jelinek: „Statistical methods for speech processing“ (MIT, 1997)

Webpage

• Detailed information about our projects can be found at

http://www.deutsche-telekom-laboratories.de/~ketabdar.hamed

/

• All the updated information, slides, etc. can be soon found at:

http://www.deutsche-telekom-laboratories.de/~ketabdar.hamed/teachingsection/index.htm

computer-supported interaction hamed ketabdar shiva sundaram

Documents

classification slide

iphone slide

multimedia slide

speech commands

audio information

context surveillance

context detection

tilt sensors audio context