the design of multidimensional sound interfaces michael cohen & elizabeth m. wenzel presented...

53
The Design of Multidimensional Sound Interfaces Michael Cohen & Elizabeth M. Wenzel Presented by: Andrew Snyder & Thor Castillo February 3, 2000 HFE760 - Dr. Gallimore 8

Upload: delphia-shelton

Post on 26-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

The Design of Multidimensional Sound Interfaces

Michael Cohen & Elizabeth M. Wenzel

Presented by:

Andrew Snyder & Thor Castillo

February 3, 2000 HFE760 - Dr. Gallimore

8

Table of Contents• Introduction – How we localize sound

• Chapter 8

• Research

• Conclusion

Introduction

• Ear Structure

• Binaural Beats - Demo

• Why are they important?

• Localization Cues

Introduction• Ear Structure

Introduction• Binaural Beats – Demo

• Why are they Important?

Introduction• Localization Cues

– Humans use auditory localization cues to help locate the position in space of a sound source. There are eight sources of localization cues:

• interaural time difference

• head shadow

• pinna response

• shoulder echo

• head motion

• early echo response

• reverberation

• vision

Introduction• Localization Cues

– Interaural time difference describes the time delay between sounds arriving at the left and right ears.

– This is a primary localization cue for interpreting the lateral position of a sound source.

Introduction• Localization Cues

– Head shadow is a term describing a sound having to go through or around the head in order to reach an ear.

– The filtering effects of head shadowing cause one to have perception problems with linear distance and direction of a sound source.

Introduction• Localization Cues

– Pinna response desribes the effect that the external ear, or pinna, has on sound.

– Higher frequencies are filtered by the pinna in such a way as to affect the perceived lateral position, or azimuth, and elevation of a sound source.

Introduction• Localization Cues

– Shoulder echo - Frequencies in the range of 1-3kHz are reflected from the upper torso of the human body.

Introduction• Localization Cues

– Head motion - The movement of the head in determining a location of a sound source is a key factor in human hearing and quite natural.

Introduction• Localization Cues

– Early echo response and reverberation -Sounds in the real world are the combination of the original sound source plus their reflections from surfaces in the world (floors, walls, tables, etc.).

– Early echo response occurs in the first 50-100ms of a sounds life.

Introduction• Localization Cues

– Vision helps us quickly locate the physical location of a sound and confirm the direction that we perceive

Chapter 8 Contents• Introduction

• Characterization and Control of Acoustic

Objects

• Research Applications

• Interface Control via Audio Windows

• Interface Issues: Case Studies

Introduction

• I/O generations and dimensions

• Exploring the audio design space

Introduction• I/O generations and dimensions

– First Generation - Early computer terminals allowed only textual i/o – Character-based user interface (CUI)

– Second Generation - As terminal technology improved, user could manipulate graphical objects – Graphical User Interface (GUI)

– Third Generation –3D graphical devices.

– 3D audio: The sound has a spatial attribute, originating, virtually or exactly, from an arbitrary point with respect to the listener – This chapter focused on the third-generation of aural sector.

Introduction• Exploring the audio design space

– Most people think that it would be easier to be hearing- than sight- impaired, even though the incidence of disability-related cultural isolation is higher among the deaf than the blind.

– The development of user interfaces has historically been focused more on visual modes than aural.

– Sound is frequently included and utilized to the limits of its availability and affordability in PCs. However, computer aided exploitation of audio bandwidth is only now beginning to rival that of graphics.

– Because of the cognitive overload that results from overburdening other systems (perhaps especially the visual) there are strong motivations for exploiting sound to its full potential

Introduction• Exploring the audio design space

– This chapter reviews the evolving state of the art of non-speech audio interfaces, driving both spatial and non-spatial attributes.

– This chapter will focus primarily on the integration of these new technologies – crafting effective matches between projected user desires and emerging technological capabilities.

Characterization and Control of Acoustic Objects

Part of listening to a mixture of conversations or music is being able to hear the individual voices or musical instruments. This synthesis/decomposition duality is the opposite effect of masking: instead of sounds hiding each other, they are complementary and individually perceivable.

Audio imaging – the creation of sonic illusions by manipulation of stereo channels.

Stereo system – sound comes from only left and right transducers, whether headphones or loudspeakers.

Spatial sound involves technology that allows sound to emanate from any direction. (left-right, up-down, back-forth, and everything in between)

Characterization and Control of Acoustic Objects

The cocktail party effect…we can filter sound according to

• position• speaker voice• subject matter• tone/timbre• melodic line and rhythm

Characterization and Control of Acoustic Objects

• Spatial dimensions of sound

• Implementing spatial sound

• Non-spatial dimensions and auditory symbology

Characterization and Control of Acoustic Objects

• Spatial dimensions of sound

– The goal of spatial sound synthesis is to project audio media into space by manipulating sound sources so that they assume virtual positions, mapping the source channel into three-dimensional space. These virtual positions enable auditory localization.

– Duplex Theory (Lord Rayleigh, 1907) – human sound localization is based on two primary cues to location, interaural differences in time of arrival and interaural differences in intensity.  

Characterization and Control of Acoustic Objects

• Spatial dimensions of sound

– There are several problems with the duplex theory:

• Cannot account for the ability of subjects to localized many types of sounds coming from many different regions (ex. Sound along the median plane)

• When using duplex to generate sound cues in headphones, the sound is perceived inside the head

• Most of the deficiencies with the duplex theory are linked to the interaction of sound waves in the pinnae (outer ears)

Characterization and Control of Acoustic Objects

• Spatial dimensions of sound

– Peaks and valleys in the auditory spectrum can be used as localization cues for elevation of the sound source. Other cues are also necessary to locate the vertical position of a sound source. This is very important to researchers since it has never been truly understood.

Characterization and Control of Acoustic Objects

• Spatial dimensions of sound– Localization errors in current sound generating technologies is very

common, some of the problems that persist are:• Locating sound on the vertical plane

• Some systems can cause a front ↔ back reversal

• Some systems can cause an up ↔ down reversal

• Judging distance from the sound source! – We’re generally terrible at doing this anyways!!!

– Sound localization can be dramatically improved with a dynamic stimulus (can reduce amount of reversals)

• Allowing head motion

• Moving the location of the sound

• Researchers suggest that this can help externalize sound!!!

Characterization and Control of Acoustic Objects

• Implementing spatial sound– Physically locating loudspeakers in the place were each source is

located, relative to the listener. (Most direct forward)• Not portable – Cumbersome

– Other approaches use analytic mathematical models of the pinnae and other body structures in order to directly calculate acoustic responses.

– A third approach to accurate real-time spatialization concentrates on digital sound processors (DSP) techniques for synthesizing cues from direct measurements of head related transfer functions. (The author focuses on this type of approach)

Characterization and Control of Acoustic Objects

• Implementing spatial sound

– DSP – The goals is to make sound spatializers that give the impression that the sound is coming from different sources and different locations.

– Why? - A display that focuses on this technology can exploit the human ability to quickly and subconsciously locate sound sources.

– Convolution – Hardware and/or Software based engines performs the convolution that filters the sound in some DSPs

Characterization and Control of Acoustic Objects

• Implementing spatial sound– Crystal River Engineering Convolvotron 

– Gehring Research Focal Point

– AKG CAP (Creative Audio Processor)

– Head Acoustics

– Roland Sound Space (RSS) Processor

– Mixels

Characterization and Control of Acoustic Objects

• Implementing spatial sound– Crystal River Engineering Convolvotron 

– What is it? – It is a convolution engine that spatializes sound by filtering audio channels with transfer functions that simulate positional effects.

• Alphatron & Acoustetron II

– The technology is good except for time delays do to computation of 30-40 ms (which can be picked up by the ear if used with visual inputs)

Characterization and Control of Acoustic Objects

• Implementing spatial sound– Gehring Research Focal Point

– What is it? – Focal Point™ comprises two binaural localization technologies, Focal Point Type 1 and 2.

• Focal Point 1 – the original Focal Point technology, utilizing time-domain convolution with head related transfer function based impulse responses for anechoic simulation.

• Focal Point 2 – a Focal Point implementation in which sounds are preprocessed offline, creating interleaved sound files which can then be positioned in 3D in real-time upon playback.

Characterization and Control of Acoustic Objects

• Implementing spatial sound– AKG CAP (Creative Audio Processor)

– What is it? – A kind of binaural mixing console. The system is used to create audio recordings with integrated Head Related Transfer Functions and other 3D audio filters.

Characterization and Control of Acoustic Objects

• Implementing spatial sound– Head Acoustics

– What is it? – A research company in Germany that has developed a spatial audio system with an eight-channel binaural mixing console using anechoic simulations as well as a new version of an artificial head

Characterization and Control of Acoustic Objects

• Implementing spatial sound– Roland Sound Space (RSS) Processor

– What is it? – Roland has developed a system which attempts to provide real-time spatialization capabilities for both headphones and stereo loudspeaker presentation. The basic RSS system allows independent placement of up to four sources using domain convolution.

– What makes this system special is that it incorporates a technique know as transaural processing, or crosstalk cancellation between the stereo speakers. This technique seems to allow an adequate spatial impression to be achieved.

Characterization and Control of Acoustic Objects

• Implementing spatial sound– Mixels

– The number of channels in a system corresponds to the degree of spatial polyphony, simultaneous spatialized sound sources, the system can generate. In the assumption that systems will increase their capabilities enormously, via number of channels, we label their number of channels as Mixels.

– By way of analogy to pixels and voxels, the atomic level of sound is sometimes called mixels, acronymic for sound mixing elements

Characterization and Control of Acoustic Objects

• Implementing spatial sound– Mixels

– But, rather than diving in deeply into more spatial audio systems, the rest of the chapter will concentrate on the nature of control interfaces that will need to be developed to take full advantage of these new capabilities.

Characterization and Control of Acoustic Objects

• Non-spatial dimensions and auditory symbology

– Auditory icons - acoustic representations of naturally occurring events that caricature the action being represented

– Earcons – elaborated auditory symbols which compose motifs into artificial non-speech language, phrases distinguished by rhythmic and tonal patterns

Characterization and Control of Acoustic Objects

• Non-spatial dimensions and auditory symbology

– Filtears – a class of cues that independent of distance and direction. They are used to attempt to expand the spectrum of how we used sound. Used to create sounds with attributes attached to them. Think of it as sonic typography: placing sound in space can be likened to putting written information on a page. Filtears are dependant on source and sink.

– Example: Imagine your telenegotiating with many people. You can select attributes of a person’s voice. (distance from you, direction, indoors-outdoors, whispers behind your ear, etc…)

Research Applications• Virtual acoustic displays featuring spatial sound can be

thought of as enabling two performance advantages:

– Situation Awareness – Omnidirectional monitoring via direct representation of spatial information reinforces or replaces information in other modalities, enhancing one’s sense of presence or realism.

– Multiple Channel Segregation – can improve intelligibility, discrimination, selective attention among audio sources.

Research Applications

• Sonification

• Teleconferencing

• Music

• Virtual Reality and Architectural Acoustics

• Telerobotics and Augmented Audio Reality

Research Applications• Sonification

• Sonification can be thought of as auditory visualization and can be used as a tool for analysis, for example, presenting multivariate data as auditory patterns. Because visual and auditory channels can be independent from each other, data can be mapped differently to each mode of perception, and auditory mappings can be used to discover relationships that are hidden in the visual display.

Interface Control via Audio Windows

• Audio Windows is an auditory-object manager.

• The general idea is to permit multiple simultaneous audio sources, such as teleconference, to coexist in a modifiable display without clutter or user stress.

Interface Design Issues: Case Studies

• Veos and Mercury (written with Brian Karr)

• Handy Sound

• Maw

Interface Design Issues: Case Studies

• Veos and Mercury (written with Brian Karr)– Veos - Virtual Environment Operating System

– Sound Render Implementation - A software package that interfaces with a VR system (like Veos).

– The Audio Browser - A hierarchical sound file navigation and audition tool.

Interface Design Issues: Case Studies

• Handy Sound– Handy Sound explores gestural control of an audio window system.

– Manipulating source position in Handy Sound

– Manipulating source quality in Handy Sound

– Manipulating sound volume in Handy Sound

– Summary - Handy sound demonstrates the general possibilities of gesture recognition and spatial sound in a multichannel conferencing environment.

Interface Design Issues: Case Studies

• Maw– Developed as an interactive frontend for teleconferencing, Maw

allows the user to arrange sources and sinks in a horizontal plane.

– Manipulating source and sink positions in Maw

– Organizing acoustic objects in Maw

– Manipulating sound volume in Maw

– Summary

Conclusion

Real world examples

Sound authoring tools for future multimedia systems

Bezzi, Marco; De Poli, Giovanni; Rocchesso, Davide

Univ di Padova, Padova, Italy

Summary• A framework for authoring non-speech sound objects in the context of multimedia

systems is proposed. The goal is to design specific sound and their dynamic behavior in such a way that they convey dynamic and multidimensional information. Sound are designed using a three-layer abstraction model: physically-based description of sound identity, signal-based description of sound quality, perception- and geometry-based description of sound projection in space. The model is validated with the aid of an experimental tool where manipulation of sound objects can be performed in three ways: handling a set of parameter control sliders, editing the evolution in time of compound parameter settings, via client applications sending their requests to the sounding engine. [Author abstract; 26 Refs; In English]

Conference Information: Proceedings of the 1999 6th International Conference on Multimedia Computing and Systems - IEEE ICMCS'99; Jun 7-Jun 11 1999; Florence, Italy; Sponsored by IEEE CS; IEEE Circuit and Systems Society

Interactive 3D sound hyperstories for blind children

Lumbreras, Mauricio; Sanchez, Jaime Univ of Chile, Santiago, Chile

Summary• Interactive software is currently used for learning and entertainment purposes.

This type of software is not very common among blind children because most computer games and electronic toys do not have appropriate interfaces to be accessible without visual cues. This study introduces the idea of interactive hyperstories carried out in a 3D acoustic virtual world for blind children. We have conceptualized a model to design hyperstories. Through AudioDoom we have an application that enables testing cognitive tasks with blind children. The main research question underlying this work explores how audio-based entertainment and spatial sound navigable experiences can create cognitive spatial structures in the minds of blind children. AudioDoom presents first person experiences through exploration of interactive virtual worlds by using only 3D aural representations of the space. [Author abstract; 21 Refs; In English]

Conference Information: Proceedings of the CHI 99 Conference: CHI is the Limit - Human Factors in Computing Systems; May 15-May 20 1999; Pittsburgh, PA, USA; Sponsored by ACM SIGCHI

Any questions???

References• Modeling Realistic 3-D Sound Turbulence

– http://www-engr.sjsu.edu/~duda/Duda.Reports.html#R1

• 3D Sound Aids for Fighter Pilots– http://www.dsto.defence.gov.au/corporate/history/jubilee/sixtyyears18.html

• 3D Sound Synthesis – http://www.ee.ualberta.ca/~khalili/3Dnew.html

• Binaural Beat Demo– http://www.monroeinstitute.org/programs/bbapplet.html

• Begault, Durand R. "Challenges to the Successful Implementation of 3-D Sound", NASA-Ames Research Center, Moffett Field, CA, 1990.

• Begault, Durand R. "An Introduction to 3-D Sound for Virtual Reality", NASA-Ames Research Center, Moffett Field, CA, 1992.

References• Burgess, David A. "Techniques for Low Cost Spatial Audio", UIST 1992.

• Foster, Wenzel, and Taylor. "Real-Time Synthesis of Complex Acoustic Environments" Crystal River Engineering, Groveland, CA.

• Stuart Smith. "Auditory Representation of Scientific Data", Focus on Scientific Visualization, H. Hagen, H. Muller, G.M. Nielson, eds. Springer-Verlag. 1993.

• Stuart, Rory. "Virtual Auditory Worlds: An Overview", VR Becomes a Business, Proceedings of Virtual Reality 92, San Jose, CA, 1992.

• Takala, Tapio and James Hahn. "Sound Rendering". Computer Graphics, #26, 2, July 1992.

One last thing

• For those who want to have a little fun, try this:

http://www.cs.indiana.edu/picons/javoice/index.html

  

http://ourworld.compuserve.com/homepages/Peter_Meijer/javoice.htm

The Design of Multidimensional Sound Interfaces

Michael Cohen & Elizabeth M. Wenzel

Presented by:

Andrew Snyder & Thor Castillo

February 3, 2000 HFE760 - Dr. Gallimore

8