technology trends in audio engineering · modern challenges for the forensic audio examiner have...

13
TECHNOLOGY TRENDS Technology Trends in Audio Engineering A report by the AES Technical Council INTRODUCTION Technical Committees are centers of tech- nical expertise within the AES. Coordi- nated by the AES Technical Council, these committees track trends in audio in order to recommend to the Society papers, workshops, tutorials, master classes, stan- dards, projects, publications, conferences, and awards in their fields. The Technical Council serves the role of the CTO for the society. Currently there are 22 such groups of specialists within the council. Each consists of members from diverse backgrounds, countries, companies, and interests. The committees strive to foster wide-ranging points of view and approaches to technology. Please go to: http://www.aes.org/technical/ to learn more about the activities of each commit- tee and to inquire about membership. Membership is open to all AES members as well as those with a professional inter- est in each field. Technical Committee meetings and informal discussions held during regular conventions serve to identify the most current and upcoming issues in the spe- cific technical domains concerning our Society. The TC meetings are open to all convention registrants. With the addition of an internet-based Virtual Office, com- mittee members can conduct business at any time and from any place in the world. One of the functions of the Technical Council and its committees is to track new, important research and technology trends in audio and report them to the Board of Governors and the Society’s membership. This information helps the governing bodies of the AES to focus on items of high priority. Supplying this information puts our technical expertise to a greater use for the Society. In the fol- lowing pages you will find an edited com- pilation of the reports recently provided by many of the Technical Committees. Francis Rumsey Chair, AES Technical Council Bob Schulein, Jürgen Herre, Michael Kelly Vice Chairs The trend in mobile telecommunications across the globe has been toward devices that are the mobile window into a person’s digital universe. In 2013 smartphones out- sold feature phones globally for the first time on a unit basis. Several companies have focused efforts on creating inexpensive smartphones for the emerging markets, where, in many cases, these devices may be the only link those consumers have to the internet and the rest of the world. This expansion has resulted in devices that have many more communications bands and codecs than we have had in the past. It is not uncommon to find single devices that will communicate via UMTS, GSM, CDMA, LTE (VoLTE), and WiFi (VOIP). Some of these modes of communication have to support multiple voice codecs for each mode depending on the network or service provider. In all cases the switching between modes and performance of the system must be seamless, as past performance is no longer acceptable to the consumer. From a voice quality standpoint, the AMR Wide Band codec is in widespread use glob- ally, and the 3GPP EVS (Enhanced Voice Services) codec has been finalized and is being developed for commercialization. The audio recording capabilities of the devices have improved to offer high bit rate and 24 bit capture in some cases. There are also many products out on the market that allow multichannel audio recording via internal microphone arrays to compliment the video recording on the devices. Voice controlled user interface in the mobile devices has become commonplace, with new advances being introduced each product generation such as natural lan- guage control, always-on voice recognition that does not require the user to touch the device, and user defined triggers to wake up the recognition engine. Advancements in usability in all environ- ments have come in the form of noise adap- tive technology in voice call, content play- back and voice control. Voice communication and voice recognition have benefited from algorithms that adapt uplink noise reduction or algorithm parameters based on noise type and level. Likewise downlink audio and content playback have benefited from noise adaptive algorithms. Another new feature that assists the usabil- ity in various environments has been the use of smart amplifier technology. The AUDIO FOR TELECOMMUNICATIONS Bob Zurek, Chair Antti Kelloniemi, Vice Chair J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February 119

Upload: others

Post on 07-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

TECHNOLOGY TRENDS

Technology Trends in Audio EngineeringA report by the AES Technical Council

INTRODUCTIONTechnical Committees are centers of tech-nical expertise within the AES. Coordi-nated by the AES Technical Council, thesecommittees track trends in audio in orderto recommend to the Society papers,workshops, tutorials, master classes, stan-dards, projects, publications, conferences,and awards in their fields. The TechnicalCouncil serves the role of the CTO for thesociety. Currently there are 22 suchgroups of specialists within the council.Each consists of members from diversebackgrounds, countries, companies, andinterests. The committees strive to fosterwide-ranging points of view andapproaches to technology. Please go to:

http://www.aes.org/technical/ to learnmore about the activities of each commit-tee and to inquire about membership.Membership is open to all AES membersas well as those with a professional inter-est in each field.

Technical Committee meetings andinformal discussions held during regularconventions serve to identify the mostcurrent and upcoming issues in the spe-cific technical domains concerning ourSociety. The TC meetings are open to allconvention registrants. With the additionof an internet-based Virtual Office, com-mittee members can conduct business atany time and from any place in the world.

One of the functions of the Technical

Council and its committees is to tracknew, important research and technologytrends in audio and report them to theBoard of Governors and the Society’smembership. This information helps thegoverning bodies of the AES to focus onitems of high priority. Supplying thisinformation puts our technical expertiseto a greater use for the Society. In the fol-lowing pages you will find an edited com-pilation of the reports recently providedby many of the Technical Committees.

Francis RumseyChair, AES Technical Council

Bob Schulein, Jürgen Herre, Michael KellyVice Chairs

The trend in mobile telecommunicationsacross the globe has been toward devicesthat are the mobile window into a person’sdigital universe. In 2013 smartphones out-sold feature phones globally for the firsttime on a unit basis. Several companieshave focused efforts on creating inexpensivesmartphones for the emerging markets,where, in many cases, these devices may bethe only link those consumers have to theinternet and the rest of the world. Thisexpansion has resulted in devices that havemany more communications bands andcodecs than we have had in the past. It isnot uncommon to find single devices thatwill communicate via UMTS, GSM, CDMA,LTE (VoLTE), and WiFi (VOIP). Some ofthese modes of communication have tosupport multiple voice codecs for each

mode depending on the network or serviceprovider. In all cases the switching betweenmodes and performance of the system mustbe seamless, as past performance is nolonger acceptable to the consumer.

From a voice quality standpoint, the AMRWide Band codec is in widespread use glob-ally, and the 3GPP EVS (Enhanced VoiceServices) codec has been finalized and isbeing developed for commercialization.

The audio recording capabilities of thedevices have improved to offer high bit rateand 24 bit capture in some cases. There arealso many products out on the market thatallow multichannel audio recording viainternal microphone arrays to complimentthe video recording on the devices.

Voice controlled user interface in themobile devices has become commonplace,

with new advances being introduced eachproduct generation such as natural lan-guage control, always-on voice recognitionthat does not require the user to touch thedevice, and user defined triggers to wake upthe recognition engine.

Advancements in usability in all environ-ments have come in the form of noise adap-tive technology in voice call, content play-back and voice control. Voicecommunication and voice recognition havebenefited from algorithms that adapt uplinknoise reduction or algorithm parametersbased on noise type and level. Likewisedownlink audio and content playback havebenefited from noise adaptive algorithms.Another new feature that assists the usabil-ity in various environments has been theuse of smart amplifier technology. The

AUDIO FORTELECOMMUNICATIONS

Bob Zurek, ChairAntti Kelloniemi, Vice Chair

J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February 119

Page 2: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

120 J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February

TECHNOLOGY TRENDS

technology constantly monitors the contentand the speaker to produce optimal loud-ness or bandwidth, while maintaining safedrive levels for the speaker. This technologypaired with new high voltage amplifierswork together to produce impressive audioquality in many environments.

On the content side of the business, thetrend has been from localized content tocloud-based content. This has allowed theconsumer to consume the same contentacross all devices, which has posed chal-lenges in the small power consciousmobile devices.

The move from portable phones with afew extra features to the mobile hub of theconsumer’s digital world are reflected inthe standards world where the scope ofstandards as well as their names have hadto change to keep up.

Forensic audio has important applicationsin law and investigations. These continue togrow as the ability to record and share digi-tal media proliferates. Therefore, it isimportant for the practitioner working withforensic audio to be trained in and applyprocesses related to proper evidence han-dling and laboratory procedures. Analogaudio evidence, once commonly presentedon compact cassette and micro-cassettetapes, make up an increasingly small per-centage of cases. As such, digital audio evi-dence, which in practice is closely related toand often generated by computer forensicsinvestigations, requires handling practicessuch as imaging physical digital media,hashing file duplicates, authenticatingrecordings, and recovering and/or repairingcorrupt or carved files.

Modern challenges for the forensic audioexaminer have expanded to include dealingwith evidence from social networks andsharing websites such as YouTube and Face-book. These pose unique challenges inattributing provenance due to processingapplied automatically by the hosting site.

Additionally, growing concern for privacyrights has led citizens and providers ofInternet services to question the authoritythat law enforcement entities may (or maynot) have in requesting and accessing digi-tal media hosted on these sites. Similarly,mobile phone manufacturers have adoptedpolicies and operating system designsresistant to the collection of probativematerial and the use of “backdoor” accessto password-protected devices. As socialmedia awareness and use continue toevolve, so will public and commercial poli-cies related to privacy rights.

The use of probabilistic methods in theanalysis and presentation of scientific evi-dence is of increasing interest to the foren-sic sciences as a whole. In audio forensicsit is of particular interest in voice compari-son where an unknown speaker can becompared to a known suspect or set of sus-pects within a reference population inorder to derive a Bayesian likelihood ofsimilarity. We anticipate further develop-ments in this field and the possible integra-tion of methods based upon such

techniques in other areas of forensic audio.Difficulties in obtaining successful

audio enhancement can be exacerbated bythe lossy data compression algorithmscommonly used within small digitalrecorders, data compression and band-width limited signals in telecommunica-tions, and the non-ideal recording envi-ronments common to surveillance andsecurity. In the presence of heavy datacompression, classical algorithm designsmay be challenged, and this may lead to the development of new processingtechnologies.

Numerous papers on audio forensicsappear in the Journal of the AES and arepresented at AES Conventions each year.Additionally, there have been five AESConferences on audio forensics (AES 26thin 2005, 33rd in 2008, 39th in 2010, 46thin 2012, and 54th in 2014) with the nextslated for 2016. Additionally, regular work-shops and tutorials organized by the TC-AF exploring these and many other emerg-ing trends appear at AES Conventions inthe US and Europe each year.

AUDIO FORENSICSJeff M. Smith, Chair

Daniel Rappaport, Vice ChairEddy Bøgh Brixen, Vice Chair

IntroductionThe AES Technical Committee on Coding ofAudio Signals is composed of experts in per-ceptual and lossless audio coding. The topicsconsidered by the committee include signalprocessing for audio data reduction and asso-ciated rendering. An understanding of audi-tory perception, models of the human audi-tory system, and subjective sound quality arekey to achieving effective signal compression.

Audio coding has undergone a tremen-dous evolution since its commercial begin-nings in the early nineties. Today, an audiocodec provides many more functionalitiesbeyond just bitrate reduction while preserv-

ing sound quality. A number of examplesare given below.

Evolution of the coder structureIn the nineties, a classic audio coder con-sisted of four building blocks: analysis fil-terbank, perceptual model, quantizationand entropy coding of spectral coefficients,and bitstream multiplexer. Later, more andmore tools were added to enhance thecodec behavior for critical types of inputsignals, such as transient and tonal soloinstruments, and to improve coding ofstereo material. More recently, this struc-ture was augmented by pre/post processors

that provided significantly enhanced per-formance at low bit rates.

First, spatial preprocessors perform adownmix of the codec input channels (two-channel stereo, 5.1 or more) into a reducednumber of waveforms plus parametric sideinformation describing the properties of theoriginal spatial sound image. Thus, the loadon the encoder core is reduced significantly.Conversely, on the decoder side, the transmit-ted waveforms are upmixed again using thetransmitted spatial side information to repro-duce a perceptual replica of the originalsound image. This allows transmitting spatialsound even at low bit rates. Examples for

CODING OF AUDIO SIGNALSJürgen Herre and Schuyler Quackenbush, Chairs

Page 3: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

TECHNOLOGY TRENDS

such spatial pre/postprocessors are MPEGParametric Stereo (PS), MPEG Surround,MPEG Spatial Audio Object Coding (SAOC),and, as a more complex system, MPEG-H 3DAudio.

Second, to alleviate the burden of the corecodec, various pre/post processors weredevised that exclude spectral regions (pre-dominantly the high frequency range) fromthe regular coding process and send paramet-ric information about them instead. On thedecoder side, the transmitted waveform com-ponents are used to re-create the missingspectral regions using the parametric infor-mation. This allows full audio bandwidth evenat low bit rates. Well-known examples forsuch technologies are Spectral BandwidthReplication (SBR), Harmonic BandwidthExtension (HBE), and Intelligent Gap Filling(IGF).

Convergence between speech and audiocodingThe latest generation of codecs often come ascombined audio and speech codec, i.e., a syn-thesis of advanced audio coding technologyand state-of-the-art speech coding technologyinto a truly universal codec with optimumperformance for both music and speech sig-nals. Examples include MPEG-D UnifiedSpeech and Audio Coding (USAC) or 3GPP’scodec for Enhanced Voice Services (EVS).

Coding of 3D or immersive audioAfter stereo and surround sound, the nextgeneration of formats that further increasedspatial realism and listener immersioninvolve 3D or immersive audio, i.e., repro-duction including height (e.g., higher andpossibly lower) loudspeakers. Examples arethe 22.2 or 5.1+4 (i.e., four height loud-speakers added on top of the regular 5.1)configurations. While the time is ready for

embracing such formats, two key challengesneed to be solved.

First, the loudspeaker setup / “format”compatibility challenge: immersive audiocontent is produced today in many differentformats, i.e., for different loudspeaker setups,thus impeding the introduction of immersiveaudio because of the incompatibility betweenthem. Furthermore, reproduction in con-sumer homes will not happen with a largenumber of (e.g., 22) loudspeakers. Thus, acodec for immersive audio needs to be able toreproduce immersive content on any availableloudspeaker setup, adapting the content andproviding best possible listening experiencefor this setup.

Second, the input format compatibilitychallenge: immersive audio content can beproduced in different paradigms, i.e., eitheras channel signals intended for a specificloudspeaker setup, or as object signalsintended for specific spatial playback coordi-nates and properties, or as Higher OrderAmbisonics (HOA). While the latter two areindependent of particular loudspeakersetups, these are different approaches. Anaudio codec for immersive audio needs toreconcile these paradigms, preferably bybeing equipped to support all of them.

As a recent example, the MPEG-H 3DAudio codec addresses the challenges by inte-grating many core technologies, such asMPEG-D Unified Speech and Audio Coding(USAC), and Spatial Audio Object Coding(SAOC), and additionally supports channels,objects and HOA as input signal formats.

Binaural rendering on headphonesSome codecs provide binaural rendering toheadphones as an additional feature that isespecially important for mobile devices. Head-phone listening is another presentation for-mat beyond that of any of a variety of possible

loudspeaker layouts, and binaural renderingto headphones provides an immersive experi-ence for what might be the most pervasivemode of audio consumption. The renderingcomes at very low computational cost andsupports use of personalized HRTFs/BRIRs.Examples include MPEG-D MPEG Surroundand Spatial Audio Object Coding (SAOC) aswell as MPEG-H 3D Audio.

Interactive sound modification capabilityThe ability to modifying a particular soundwithin a complex sound mixture at the timeof decoding is receiving more and more atten-tion. Most prominently, dialog enhancementof coded sound provides the possibility ofboosting (or attenuating) dialog for hearingimpaired listeners. Interactive sound modifi-cation can also be used for creating personal-izing sound mixtures (e.g., commentator vs.stadium sound for sports events, personalmusic remixes). This can be achieved byusing sound objects and, for low bit rateapplications, MPEG-D Spatial Audio ObjectCoding (SAOC).

Rate adaptive audio representation With an ever-greater consumption of broad-cast or streaming audio on mobile devices,the reality of variable transmission channelcapacity needs to be addressed. Recent sys-tems support point-to-point transmission inwhich a mobile device can signal to the trans-mitting device the instantaneous channelcapacity (e.g., IP-based systems). Compressedaudio formats that support seamless switch-ing to lower (or back to higher) bit rates per-mit the system to deliver uninterrupted audiounder changing channel conditions. Anexample of such a system is the MPEG-3DAudio coding format in combination with theMPEG DASH delivery format, as operatingover IP channels.

Introduction The AESTC on Hearing and Hearing LossPrevention was established in 2005 with fiveinitial goals focused on informing the mem-bership as to important aspects of the hear-ing process and issues related to hearing loss,so as to promote engineering-based solutionsto improve hearing and reduce hearing loss.Its aims include the following: raising AESmember awareness of the normal and abnor-mal functions of the hearing process; raising

AES member awareness of the risk and con-sequences of hearing loss resulting fromexcessive sound exposure; coordinating andproviding technical guidance for the AES-supported hearing testing and consultationprograms at US and European conventions;facilitating the maintenance and refinementof a database of audiometric test results andexposure information on AES members; forging a cooperative union between AESmembers, audio equipment manufacturers,

hearing instrument manufacturers, and thehearing conservation community for pur-poses of developing strategies, technologies,and tools to reduce and prevent hearing loss.

Measurement and diagnosis Current technology in the field of audiologyallows for the primary measurement of hear-ing loss by means of minimum sound pres-sure level audibility vs. frequency producingan audiogram record. Such a record is used

HEARING AND HEARING LOSSPREVENTION

Robert Schulein, ChairMichael Santucci and Jan Voetmann, Vice Chairs

J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February 121

Page 4: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

122 J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February

TECHNOLOGY TRENDS

to define hearing loss in dB vs. frequency.The industry also uses measurement ofspeech intelligibility masked by varying levelsof speech noise. Such measurements allowindividuals to compare their speech intelligi-bility signal-to-noise ratio performance tothe normal population. Other tests are com-monly used as well for diagnosis as to thecause of a given hearing loss and as a basisfor treatment. Within the past ten years, newtests have evolved for diagnosing the behav-ior of the cochlea by means of acousticalstimulation of hair cells and sensing theirresulting motion. Minute sounds producedby such motions are referred to as otoa-coustic emissions. Measurement systemsdeveloped to detect and record such emis-sions work by means of distortion productdetection resulting from two-tone stimula-tions as well as hair cell transients producedfrom pulse-like stimulations. Test equipmentdesigns for such measurements are now incommon use for screening newborn chil-dren. Additional research is being conducteddirected at using such test methods to detectearly stages of hearing loss not yet detectableby hearing-threshold measurements. Thecommittee is currently working to establish acooperative relationship between researchersin this field and AES members, who willserve as evaluation subjects.

Emerging treatments and technology Currently there is no known cure for what isreferred to as sensorineural hearing loss, inthat irreparable damage has been done to thehearing mechanism. Such loss is commonlyassociated with aging and prolonged expo-sure to loud sounds, although it is well estab-lished that all individuals are not affected tothe same degree. Considerable research isongoing with the purpose of devising thera-pies leading to the activation of cochlearstem cells in the inner ear to regenerate newhair cells. There are, however, drug therapiesbeing introduced in oral form to prevent orreduce damage to the cilia portion of haircells in cases where standard protection isnot enough, such as in military situations.We are beginning to see the emergence ofotoprotectant drug therapies, now in clinicaltrials that show signs of reducing temporarythreshold shift and tinnitus from short termhigh sound pressure levels. New stem celltherapies are also being developed with goalsof regenerating damaged hair cells.

Hearing instruments are the only provenmethod by which sensorineural hearing lossis treated. In general the task of a hearinginstrument is to use signal processing andelectro-acoustical means to compress thedynamic range of sounds in the real world to

the now limited audible dynamic range of animpaired person. This requires the imple-mentation of level-dependent compressioncircuits to selectively amplify low-levelsounds and power amplification and high-performance microphone and receiver trans-ducers fitted into miniature packages. Suchcircuitry is commonly implemented usingdigital signal processing techniques poweredby miniature 1-volt zinc-air batteries. Inaddition to dynamic-range improvements,hearing aids serve to improve the signal-to-noise ratio of desired sounds in the real worldprimarily for better speech intelligibility innoise.

Currently miniature directional micro-phone systems with port spacings in the 5mm range are being used to provideimprovements in speech intelligibility innoise of 4 to 6 dB. Such microphones havebecome rather sophisticated, in that manydesigns have directional adaptation circuitsdesigned to modify polar patterns to optimizethe intelligibility of desired sounds. In addi-tion some designs are capable of providingdifferent directional patterns in different fre-quency bands. Furthermore, some hearingaid manufacturers have introduced productsusing second- order directional microphonesoperating above 1 kHz with some success. Inmany situations traditional hearing aid tech-nology is not able to provide adequateimprovements in speech intelligibility. Undersuch circumstances wireless transmissionand reception technology is being employedto essentially place microphones closer totalkers’ mouths and speakers closer to listen-ers’ ears. This trend appears to offer promiseenabled by the evolution of smaller transmit-ter and receiver devices and available operat-ing-frequency allocations. Practical devicesusing such technology are now being offeredfor use with cellular telephones. This isexpected to be an area of considerable tech-nology and product growth.

An additional technology that has beenrefined over the years is the use of inductionloop systems in public spaces. The techniqueis very simple yet very attractive to users,since a hearing aid user needs no additionalequipment to receive the signals (as long asthe hearing aid has a telecoil), which issocially much more acceptable than havingto wear a conspicuous additional device. Inits simplest form, a single conductor isinstalled in the form of a loop at floor or ceil-ing level, surrounding the area to be coveredby the transmission. An audio-frequency cur-rent, generated by an amplifier from one ormore microphones or a recording, flows inthe loop conductor and produces a magneticfield with and outside the loop. It is practica-

ble to generate a sufficiently strong and uni-form field pattern with amplifiers of reason-able cost and complexity. In addition to cou-pling sounds in public spaces, inductiontransmission coils are commonly available incellular and cordless phone systems, so as toenhance the signal to noise ratio associatedwith telephone communications.

Tinnitus Another hearing disorder, tinnitus, is com-monly experienced by individuals, often as aresult of ear infections, foreign objects or waxin the ear, and injury from loud noises. Tinni-tus can be perceived in one or both ears or inthe head. It is usually described as a ringing,buzzing noise or a pure tone perception. Cer-tain treatments for tinnitus have been devel-oped for excessive conditions in the form ofaudio masking, however most research isdirected toward pharmaceutical solutionsand prevention. We are also seeing the emer-gence of electroacoustic techniques for treat-ing what is commonly referred to as idio-pathic tinnitus or tinnitus with no knownmedical cause. About 95% of all tinnitus isconsidered idiopathic. These treatmentsinvolve prescriptive sound stimuli protocolsbased on the spectral content and intensity ofthe tinnitus. In Europe, psychological assis-tance to help individuals live with their tinni-tus is a well-established procedure.

Hearing loss prevention Hearing-loss prevention has become a majorfocus of this committee due to the fact that amajority of AES members come in contactwith high level sounds as a part of the pro-duction, creation, and reproduction of sound.In addition, this subject has become a majorissue of consumer concern due to theincreased availability of fixed and portableaudio equipment capable of producing damaging sound levels as well as live soundperformance attendance. One approach todealing with this issue is education in theform of communicating acceptable exposurelevels and time guidelines. Such measuresare however of limited value, as users havelittle practical means of gauging exposureand exposure times. This situation representsa major need and consequent opportunity forthis committee, audio equipment manufac-turers, and the hearing and hearing-conser-vation communities. In recognition of theimportance of hearing health to audio profes-sionals engaged in the production and repro-duction of music, this committee is planningits second conference devoted to technologi-cal solutions to hearing loss. Pending confir-mation, this conference will most likelyoccur in late 2015 or early 2016.

Page 5: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

TECHNOLOGY TRENDS

Growth in high resolution audio (HRA)over the last several years has been robust.HRA is an established mainstay in the pro-fessional and audiophile markets. Howeverthe arrival of new formats and refinementof the associated processing, along with thegrowth of internet delivery, and now a sig-nificant industry effort to make HRA amainstream format, all herald an interest-ing and promising next few years. The HRATechnical Committee sponsors workshops,tutorials, and discussions highlighting sig-nificant aspects of these developments forthe AES community.

New HRA formatsEspecially notable in the last two years hasbeen the rapid emergence and uptake ofDSD as an independent encoding and dis-tribution format. DSD is the term used bySony and Philips for the single bit streamoutput of a sigma-delta converter that,together with related processing, is used instorage and transmission associated withthe production of SACDs. The original DSDoversampling rate of 64 fs (64 × 44.1 kHz,or 2.8224 MHz) has now been expanded toinclude both 128 fs and 256 fs. The mainadvantage of the higher rates is that the risein shaped noise that occurs as a conse-quence of dynamic range processing insigma delta modulators can be pushed con-siderably further out beyond the audio band(> 60 kHz), and with less quantizationnoise remaining in the audio band, than ispossible with 64 fs. The DSD signal is saidto sound cleaner and more transparent atthe higher data rates.

Also related to DSD is DXD, a designa-tion for PCM at 352.8 kHz / 24 bit champi-oned by Merging Technologies as an inter-mediate stage in DSD processing. Single bitstreams cannot easily be filtered orprocessed and so are typically converted toPCM at high sample rates to facilitate pro-duction. More than just an intermediatestage, DXD’s uses are evolving with somerecording engineers employing it as a directrecording format for release in DSD, as anintermediate between DSD record andrelease, and possibly in the future as a352.8 kHz PCM release format.

This trend to higher sampling rates inboth PCM and DSD is supported by con-sumer and professional hardware. Manycurrent DACs and ADCs support both PCMand DSD. New converters, software, andeven portables increasingly support PCM

from Red Book (44.1 kHz / 16 bit) up to384 kHz / 32 bit and DSD to 256 fs as theindustry continues to explore both the mer-its and the degree of consumer interest inthese formats. An open standard for pack-ing DSD into PCM frames known as DoPhas been adopted by major manufacturersto facilitate transfer of DSD across USBinterfaces as well as AES and SPDIF.

Improved converters, filters, and signal processingWhile high quality audio has always soughtto define the sources of sonic deteriorationassociated with processing and filteringmusic data, high resolution and now higherresolutions are both the outcome of, anddrivers of, this search. There is at presentan effort by manufacturers of high qualityconverters to address shortcomings attrib-uted to the upsampling chips and multi-bitsigma delta modulator chips used nearlyuniversally in PCM DAC processing. Tech-niques include substitution of FPGA orcomputer-based upsampling for that foundon chips, custom filter design includingminimum phase designs, increase of pro-cessing bit depth to double precision float-ing point (64 bit) or above, and customsigma delta modulation and decimation.Several chip makers however have devel-oped improved chips incorporating similarprocessing upgrades, plus improved noiseshaping, jitter control, clocking, and isola-tion. Such chips are increasingly appearingin new HRA-capable hardware.

The theoretical and practical influence offilters on sound has long been debated, anda new test program initiated by MeridianAudio seeks to explore some of the audibil-ity questions. In an important first papergiven at the AES 137th convention, Merid-ian authors H. M. Jackson et al. measuredthe audibility in double blind tests of down-sampling filters typical of those used in CDpreparation when such filters were appliedto a higher resolution stream without deci-mation and played through a high qualityaudio system. Their result disputes thatfrom an earlier paper by E..B. Meyer andD..R. Moran (JAES, 55: 775–779, 2007) andprovides evidence and a likely mechanismfor an audible distinction between CD andhigher resolutions.

Distribution, storage, and replayDistribution of HRA files is now primarilyinternet-based. Download websites ranging

from large aggregations down to individuallabels and orchestras now exist and offerboth new work and remastered back cata-log. PCM resolutions ranging from 192 kHz/ 24 bit to 44.1 kHz / 16 bit are available,and DSD at 64 fs and 128 fs is increasinglyavailable. PCM above 192 kHz and DSD at256 fs are not yet significant factors butDAC manufacturers are including supportfor them anyway due to the rapid upwardtrend in bandwidths. FLAC, WAV, and AIFFare the dominant PCM transmission for-mats. Streaming is likely to supplement orreplace downloading in the future as it isdoing now with compressed music andlower resolution video. Although streamingbandwidths currently limit music resolu-tion to losslessly compressed CD, a newcodec designated MQA was recently intro-duced by Meridian Audio that is said to loss-lessly encode higher resolutions at bit ratesslightly below those of CD. If successful,MQA may greatly influence the streamingof high res audio.

The emphasis on downloads correlateswith the continuing strong trend towardadoption of computers, file servers andportables into all areas of music includingthe traditional two-channel audiophilemusic marketplace. Blu-ray movies incor-porating HD audio also continue to sell welldespite the continued decline of physicalmedia, and there is a small dedicated mar-ket in high quality audio-only Blu-Raydiscs.

New marketsNotable are two new initiatives to bringhigh resolution audio into the mainstreammass market. Behind these efforts are sev-eral factors: first, the considerable businesssuccess of the larger HRA download web-sites in the audiophile market, and then, abroader quest for higher quality as a resultof the many complaints about the ubiquityof compressed low bit rate audio. Pono, anHRA music download service coupled witha well-engineered portable music player, isthe product of multiple years of effort bythe artist Neil Young. Pono is set to beginsales in late 2014. The second is a signifi-cant combined initiative from the DigitalEntertainment Group (DEG), ConsumerElectronics Association (CEA), the Record-ing Academy, and the major labels. Theyhave developed an HRA definition and a setof provenance designators for futurereleases, and are currently sponsoring talks

HIGH RESOLUTION AUDIOVicki Melchior and Josh Reiss, Chairs

J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February 123

Page 6: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

TECHNOLOGY TRENDS

MICROPHONES ANDAPPLICATIONSEddy B. Brixen, Chair

David Josephson, Vice Chair

The microphone is an amazing device.What piece of audio equipment 20–50 yearsof age would be considered a hot pick formodern recording? The microphone it is.

Oldies but goodies(?)In the marketplace of today, we find manyold designs still manufactured. We findmany new brands offering products that inreality are copies of aging technologies: rib-bon microphones, tube microphones, andthe like. The large number of these designsintroduced to the market is partly explainedby the opportunity of making good businesson the general assumption that exotic look-ing microphones provide exotic audio.

Microphones as an industrial productMicrophones are designed and manufacturedfor many purposes other than music produc-tion. Communication is the area that imple-ment the highest number of units. However,in microphones for industrial use like pro-viding feed back signals for control systemsis still getting more important.

Transducer technologyEven though traditional transducer technolo-gies dominate the market, there has beensome interest in development of new ideas.

MEMSThe micro-electronic mechanical systems(MEMS) are improving. Regarding the speci-fications, they still do not provide any advan-tages that on a commercial basis could notbe achieved more efficiently by traditionaltransducer technologies. However, in arraysystems, which consist of a larger number ofunits, MEMS technology is finding its waydue to the low cost and the fact that some ofthe inferior performance is to a reasonabledegree overcome by technical work arounds,primarily provided by DSP.

LaserLaser technology has been in use for years.

So far state of the art was presented byNASA in 2007 in tech briefs: “A New Kind ofLaser Microphone Using High Sensitivity,”(Chen-Chia Wang, Sudhir Trivedi, and FengJin). They demonstrated experimentally anew kind of laser microphone using ahighly sensitive pulsed laser vibrometer. Byusing the photo-electromotive-force(photo-EMF) sensors, they presented dataindicating the real-time detection of surfacedisplacements as small as 4 pm.

UltrasonicsIn 2012 Merkel, Lühmann, and Ritter pre-sented their experiment on creating a vir-tual microphone by the aid of ultrasonics. Ahighly focused ultrasound beam was sentthrough the room. At a distance of severalmeters, the ultrasonic wave was receivedwith an ultrasonic microphone. The wavefield of a common audio source was over-laid with the ultrasonic beam. It was foundthat the phase shift of the received ultra-sonic signal obtains the audio informationof the overlaid field. So the ultrasonic beamitself acts as sound receiver, there is notechnical device like membranes necessaryat direct vicinity of sound reception. (SeeAES convention paper 8587).

Ionic microphonesIn 2012 Akino, Shimokawa, Kikutani, andGreen presented a study of ionic micro-phones: Diaphragm-less ionic loudspeakersusing both low-temperature and high-tem-perature plasma methods have already beenstudied and developed for practical use. Thisstudy examined using similar methods tocreate a diaphragm-less ionic microphone.Although the low-temperature method wasnot practical due to high noise levels in thedischarges, the high-temperature methodexhibited a useful shifting of the oscillationfrequency. By performing FM detection onthis oscillation frequency shift, audio signalswere obtained. Study results showed thatthe stability of the discharge corresponded

to the non-uniform electric field that wasdependent on the formation shape of thehigh-temperature plasma, the shape of thedischarge electrode, and the use of inert gasthat protected the needle electrode. (SeeAES convention paper 8745).

Optical wave microphoneIn 2013 Toshiyuki Nakamiya and YoshitoSonoda presented a solution for an opticalwave microphone. An optical wave micro-phone with no diaphragm, which uses waveoptics and a laser beam to detect sounds,can measure sounds without disturbing thesound field. The theoretical equation forthis measurement can be derived from theoptical diffraction integration equation cou-pled to the optical phase modulation the-ory, but the physical interpretation ormeaning of this phenomenon is not clearfrom the mathematical calculation processalone. In this paper the physical meaning inrelation to wave-optical processes is consid-ered. This property can be used to detectcomplex tones composed of different fre-quencies with a single photo-detector. (SeeAES convention paper 8924).

Membrane materialGraphene: in 2014 Todorovic et al. presentedrecent trends in graphene research andapplications in acoustics and audio-technol-ogy. The possibilities of application of singleor multi-layer graphene as membranes intransducers were the scope of the research ofthe graphene group. FEM and experimentalanalysis of single and multi-layer graphene,as well as realization of the first samples ofacoustic transducers, reported in progress.(See AES convention paper 9063).

Microphones and directivityThere is a still increasing interest on micro-phones with high—and frequency inde-pendent—directivity. Array designs havebeen around for years. Today array tech-nologies are applied in laptops and hand-

and demonstrations of HRA at trade events,including the AES 2014 Los Angeles con-vention. Provenance has been a majorsource of consumer complaint in the pastbecause many DVD-A, SACD, and down-loaded files labeled high resolution weremerely upsampled redbook. Thus the

optional use of designators is an attempt toredress the issue.

Also of note are several initiatives tomake multitrack audio available to theresearch and education communities.These include the Open Multitrack Test-bed, MedleyDB, the Free Multitrack Down-

load Library, and the Structural Segmen-tation Multitrack Dataset. Many of theseinclude better than redbook quality tracks,stems, and mixes and typically containcontent available under Creative Com-mons licensing, allowing some degree ofreuse or redistribution.

124 J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February

Page 7: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

TECHNOLOGY TRENDS

held devices. Newer designs try to obtainconstant directivity by applying sophisti-cated algorithms for beam forming.

Multichannel microphones Single unit multichannel microphone tech-nology is to a large extent building onHigher Order Ambisonics. In connectionwith the applied DSP for the control ofbeam forming, it is obvious to split signalsinto several channels. NH acoustics’ Eigen-mike is an example.

Microphone for 22.2 multichannelaudioIn 2013 Ono, Nishiguchi, Matsui, andHamasaki presented a portable sphericalmicrophone for Super Hi-Vision 22.2 multi-channel audio.

NHK has been developing a portablemicrophone for the simultaneous recordingof 22.2 ch. multichannel audio. The micro-phone is 45 cm in diameter and has acousticbaffles that partition the sphere into angularsegments, in each of which an omnidirec-tional microphone element is mounted.Owing to the effect of the baffles, each seg-

ment works as a narrow angle directivity anda constant beam width in higher frequenciesabove 6 kHz. The directivity becomes wideras frequency decreases and that it becomesalmost omnidirectional below 500 Hz. Theauthors also developed a signal processingmethod that improves the directivity below800 Hz. (See AES convention paper 8922).

Digital adaptationInnovation in the field of modern micro-phone technology is to some degree concentrated around the adaptation to thedigital age. In particular the interfacingproblems are addressed. The continuedupdating of the AES42 standard is essentialin this respect. Now dedicated input/controlstages for the microphones with integratedinterfaces are available. However, differentwidely implemented “device-to-computer”standards like the USB and the FireWirestandards—which are not specificallyreserved for audio—also have been appliedin this field. These standards have becomede facto standards for the interfacing of var-ious types of digital equipment in the areaof home recording, including microphones.

Microphones and IPAudio networking has for a long time beenimplemented in all corners of audio sys-tems design using, for example, Ethernetbased networks like DANTE, Ravenna, Q-LAN, etc. However, not until recently hasthis technology been implemented inmicrophones.

In 2014 Audio-Technica introduced aNetwork Microphone taking advantage ofthe DANTE network protocol control fea-tures and the low latency.

StandardsAES Standards Committee works on stan-dardizing control functions from AES42for the implementation in IP microphonesolutions.

MaterialsIt should be noted, that the difficulties ofgetting some of the rare earth materials formagnets may affect the microphone selec-tion available on the market. In the futurethe effect of this might be fewer dynamicmicrophones or rising prices.

AES67AES67 is an interoperability standard forhigh performance audio over IP. It can beimplemented as an interoperability mode orit can be used as a system's native low levelaudio networking protocol. The standard istightly focused on getting audio across anetwork using standard protocols. Sincepublication in September 2013 it has beendownloaded nearly 1000 times and severalmanufacturers have announced AES67implementations and plans for implementa-tion. An interoperability “plugfest” testorganized by the AES occurred in October2014 in Munich; another is being plannedfor North America in 2015.

MNAThe Media Networking Alliance is a newtrade association whose purpose is to pro-vide industry support for the proliferationof AES67. This is done by providing infor-mation and resources to the industry aswell as promoting the benefits of AES67. Itsfocus is on three areas of activity, technical,marketing, and education. The MNA heldits first meetings and presentations at the

recent AES convention in Los Angeles. Thepresentations were well attended andincluded a lot of dialog and discussion withaudience members. Since then, the MNAhas created a web site (http://www.medi-anetworkingalliance.com/) and has receiveda lot of interest from manufacturers andothers in becoming members.

AVBAVnu Alliance announced the first certifiedswitch at the end of 2013 and first certifiedpro audio endpoint in mid 2014. New certifi-cation programs to be announced in 2015include automotive and pro video. The IEEEhas expanded the scope of AVB—now denotedas Time Sensitive Networking (TSN)—withnew features such as ultra- low latency andseamless redundancy. The AVnu Alliance hasannounced that it will certify TSN features inaddition to AVB product features.

ACIP2ACIP2 is an EBU working group addressingthe contribution of audio material over IP.Participants are members of the EBU andmanufacturers from all over the world. ACIP2

was set up as a follow- up group to ACIP. Bas-ing on the formerly produced Tech. Doc.3326, a standard for Audio Contribution overIP (known as N/ACIP), ACIP2 answers newrising questions and harmonizes the newworld of audio over IP for broadcasters. Dur-ing last IBC a new version of Tech. Doc. 3326has been agreed on, and also a Tech. Doc. onprofiles (a set of parameters describing how totransmit and receive audio streams and forthe decoder to successfully decode the audio,based on the sent parameters) will be pub-lished by end of this year. Ongoing work cov-ers SIP infrastructures, control, and manage-ment. Recently, the North AmericanBroadcasters Association (NABA) has showngreat interest in this work.

JT-NMA Joint Task Force on Professional Net-worked Streamed Media (JT NM) wasformed in 2013 by The European Broad-casting Union (EBU), The Society of MotionPicture and Television Engineers (SMPTE)and The Video Services Forum (VSF). Thetask force collected and categorized usecases for broadcast applications of media

NETWORK AUDIO SYSTEMSKevin Gross, Chair

Richard Foss and Thomas Sporer, Vice Chairs

J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February 125

Page 8: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

126 J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February

TECHNOLOGY TRENDS

networking and then invited manufacturersand technology providers to indicate whichuse cases are addressed by existing andemerging technology. The task force com-pleted phase 1 work with the publication ofa Gap Analysis Report that compared usecases with technology capabilities. Thereport is available for download athttps://tech.ebu.ch/docs/groups/jtnm/GapAnalysisReport_231213.pdf. The task forcehas since embarked on phase 2 projectsincluding definition of a reference modelfor media networking.

Internet performanceAudio networking relies on real time per-formance from networks. Due to a variety

of factors, notably buffer bloat, delaysacross the internet can be measured in sec-onds. Active queue management (AQM) andfair queuing are being developed anddeployed to reduce these delays. The IETFAQM and RMCAT working groups arepreparing proposed standards in these areas(https://datatracker.ietf.org/doc/draft-nichols-tsvwg-codel/, https://tools.ietf.org/html/draft-pan-aqm-pie-02 and https://tools.ietf.org/html/draft-hoeiland-joer-gensen-aqm fq-codel-00).

Lip sync standardSynchronizing separate audio and videostreams in a broadcast plant and otherfacilities has been a detailed and error-

prone engineering problem. With theinclusion of networked AV in these appli-cations, the situation appeared to only begetting more difficult. However, in June2014, the IETF published RFC 7272 thatcontinues work started by ETSI andestablishes an architecture and communi-cation protocols for time synchronizingmultiple media streams with sub -nanosecond accuracy. Any media networking based on RTP may use thetechniques described in this standard.Other use cases addressed include socialTV, video walls, and phase-coherent mul-tichannel audio (e.g., stereophonic sound,line arrays, mic arrays and wave field synthesis).

Signal processing applications in audio engi-neering have grown enormously in recentyears. This trend is particularly evident indigital signal processing (DSP) systems dueto performance improvements in micro-processor devices and programmable gatearrays, solid-state memory, and disk drives.The growth in audio signal processing appli-cations leads to several observations.

ObservationsFirst, DSP has emerged as a technicalmainstay in the audio engineering field.Paper submissions on DSP are now amongthe most popular topics at AES conven-tions. Also the number of DSP-relatedpapers in other specific application fieldsincreases yearly. DSP is also a key field forother professional conferences, includingthose sponsored by IEEE, ASA, and CMA.

Second, the consumer and professionalmarketplaces continue to show growth insignal processing applications, such asincreasing number of discrete audio chan-nels, increasing audio quality per channel(both in word length and sampling fre-quency), sophistication of algorithms, aswell as quality and availability of consumer-ready hardware building blocks, such asspecial signal processors (DSPs), samplingrate converters (SRCs), digital audio net-work chips, ADCs, and DACs.

Third, there are emerging algorithmicmethods designed to deliver an optimal lis-tening experience for the particular audioreproduction system chosen by the listener.

These methods include transcoding and up-converting of audio material to take advan-tage of the available playback channels,numerical precision, frequency range, andspatial distribution of the playback system.Other user benefits may include levelmatching for programs with differing loud-ness, frequency filtering to match loud-speaker capabilities, room correction, anddelay methods to synchronize wavefrontarrival times at a particular listening posi-tion. In the home AV environment there isa trend to use closely placed smaller loud-speaker and soundbars using algorithmsthat allow adequate spatial effects mimick-ing multi-speaker systems. The increasingnumber of people listening to music overheadphones and mobile devices requiresbetter algorithms for translation of musiccontent to binaural format. In connectionwith head trackers similar algorithms areused in virtual reality systems.

In professional sound reinforcement,loudspeakers with steerable radiation pat-terns can provide a practical solution fordifficult rooms. Professional live audioapplications often demand low-latency sys-tems, which remain challenging for DSPbecause many algorithms and processorsare optimized for block processing insteadof sample-by-sample, as well as for ADCs,DACs, and SRCs with incorporated linear-phase FIR filters and thus introduce morelatency.

Fourth, algorithmic developments willcontinue to occur in many other areas of

audio engineering, including music synthe-sis, processing and effect algorithms, intel-ligent noise reduction in cars, as well asenhancement and restoration for archivingand audio forensic purposes. Also, improvedalgorithms for intelligent ambient noisereduction, de-reverberation, echo cancella-tion, acoustical feedback suppression, intel-ligent automixing, and steerable micro-phone arrays are expected in the audioteleconferencing field. Many of them arealso suitable for modern hearing aids. Inmusic production and mastering so calledplug-ins enjoy great popularity and almostcompletely replaced hardware studioprocessors like PEQs, dynamics, reverb, andother effects. However there is still spacefor psychoacoustically related improve-ments. A new trend here is digital modelingof legendary analog gear.

And still there is a need for developmentof basic algorithms and transformation,better suitable for representing features ofaudio signals. There are complete analyti-cally derived filters (not using analog refer-ence), constant-Q frequency-domain andmixed time-frequency-domain transforma-tions. Also in latency-minimized convolu-tion algorithms the last word might be notstill pronounced. Statistical and geneticalgorithms, as well as fuzzy logic and neu-ral network algorithms are more and moreinvolved helping to create intelligent audioalgorithms.

Audio coding algorithms as well asspeech recognition, synthesis, and speaker

SIGNAL PROCESSING FOR AUDIO

Christoph Musialik, ChairJames Johnston, Vice Chair

Page 9: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

TECHNOLOGY TRENDS

identification are also based on audio signalprocessing technology such as describedabove, but are not currently the main focusof this committee, as they are now handledby specialised committees or organizationsof their own.

Sixth, there is growing interest in intelli-gent signal processing for music informa-tion retrieval (MIR), like tune query byhumming, automatically generatingplaylists to mimic user preferences, orsearching large databases with semanticqueries such as style, genre, and aestheticsimilarity.

Seventh, switching amplifiers (like ClassD) continue to replace traditional analogamplifiers in both low-power and high-power applications. Yet even with Class Dsystems, the design trends for load inde-pendence and lowest distortion ofteninclude significant analog signal processingelements and negative feedback features.

Due to advances in AD/DA convertertechnology, future quality improvementswill require the increasingly scarce pool ofskilled analog engineers to design inputstages like microphone preamps, analogclock recovery circuits, and output ampli-fiers that match the specifications of thedigital components.

Implications for technologyAll of the trends show a demand for evergreater computational power, memorycapacity, word length, and more sophisti-cated signal processing algorithms. Nev-ertheless, the demand for greater process-ing capability will be constrained by theneed to minimize power consumption,since a continuously growing part ofaudio signal processing will be done insmall, portable, wireless, battery-powereddevices. On the other hand, due to theincreasing capabilities of standard micro-

processors, contemporary personal com-puters are now fast enough to handle alarge portion of the standard studio pro-cessing algorithms.

Very advanced algorithms still exceedthe capabilities of traditional processors,so we see a trend in the design of futureprocessors to incorporate highly parallelarchitectures and the compiler tools nec-essary to exploit these capabilities usinghigh-level programming schemes. Due tothe relentless price pressure in the con-sumer industry, processors with limitedresolution will still challenge algorithmdevelopers to look for innovative solutionsin order to achieve the best price-perfor-mance ratio.

The Committee is also considering forg-ing contacts with digital signal processormanufacturers to convey to them theneeds, experiences, and recommendationsfrom the audio community.

OverviewThe AES Technical Committee on SpatialAudio addresses fundamental issues in pro-duction, transmission, and reproduction ofspatial audio content employing loud-speaker and headphone-based techniquesfor creating immersive auditory illusions.

An increasing number of conferences,standards activities, and product develop-ments in the last few years indicates agrowing interest from industry and acade-mia in spatial audio.

Object-based and scene-based audioAudio content has been conventionallydelivered as a channel-based mix. The audiomix is intended to be reproduced in a pre-defined standardized loudspeaker layout,such as 2.0 stereo or 5.1 surround. Conse-quently, a consumer with a non-standardloudspeaker layout may be unable to repro-duce the audio content as intended.

Object-based audio is an approach todeliver content without being constrainedto a standardized loudspeaker layout. Theindividual audio objects are transmittedseparately, together with metadata describ-ing their spatial properties. On the con-sumer side the audio objects are pannedaccording to the consumer’s loudspeakerlayout. Further the consumer could adjustthe audio mix in real time. For instance,she could change the loudness or language

of the commentator independently fromother audio elements. In the last few yearsobject-based audio has been successfullydeployed in the cinema context. This trendwill likely continue and trickle down tohome cinema, television, and mobile enter-tainment applications.

Higher-Order Ambisonics (HOA), a scene-based audio technique, is another way to cir-cumvent the aforementioned constraints ofchannel-based content. Independent fromthe reproduction layout, HOA describes thesound field based on spherical harmonics.For the audio reproduction, the HOA dataare rendered according to the desired loud-speaker layout or into binaural headphonefeeds. HOA can either be created from sin-gle-channel audio tracks within a digitalaudio workstation or it can be derived frommicrophone-array recordings, a compellinguse case for live-recording scenarios.Because HOA received significant attentionin recent audio standardization and com-mercialization efforts and is supported by anactive research community, it will presum-ably become a popular alternative to chan-nel-based audio.

In general, hybrid content delivery islikely. For instance, object-based audio ele-ments are transmitted in combination withconventional channel-based content or incombination with scene-based HOA record-ings. Hybrid content delivery will account

for the individual preferences of contentproviders and for technical limitations,such as bandwidth capacities.

Headphone-based reproductionIn many situations spatial audio reproduc-tion via headphones is desired. This so-called binauralization can be achieved byfiltering the audio content with head-related impulse responses (HRIRs) or bin-aural room impulse responses (BRIRs).

HRIRs are to some extent unique to anindividual. The process of capturing indi-vidual HRIRs is time-consuming, requiresspecific equipment and precise measure-ment conditions. Yet, the optimal listeningexperience relies on binauralizing the audiocontent using a listener’s own HRIRs. Cur-rent compromises involve approximationsof HRIRs, through, for example, the data-base selection of well-matching HRIRs, ornovel image analysis techniques.

With image analysis techniques, HRIRscan be estimated with the help of photosfrom the listener’s pinnae. Also, high-reso-lution 3D head scans can lead the simula-tion of the sound field inside the ear canalsand at the pinnae.

In the database selection approach, manydifferent methods exist for rapidly selectingwell-matching HRIRs from large HRIRdatabases, for example, selection based ongeometric head and ear features. An

SPATIAL AUDIOSascha Spors and Nils Peters, Chairs

J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February 127

Page 10: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

TECHNOLOGY TRENDS

increasing number of HRIR databases havebecome available, necessitating a standard-ized way to store and exchange HRIRs ortheir parameterized versions. AES expertsdeveloped the AES-X212 HRIR file format,which provides a flexible and scalable solu-tion to store and exchange HRIR-relateddata. If the AES-X212 file format is largelydeployed, we can expect an increase in theuse of future binaural audio technologies.

Compared to static HRTFs, researchdemonstrates that binauralization based ondynamic head movement enhances theimmersive audio experience. To take advan-tage of these dynamic cues, top tier con-sumer products already feature head-track-ing. Dynamically binauralization maybecome available in more consumer prod-ucts as long as head-tracking technologydecreases in cost.

Spatial audio coding Currently there is a lot of activity in thearea of spatial audio coding and delivery ofimmersive audio content. Compared to pre-vious spatial audio coding standards, thenew generation of audio codecs has anincreased coding efficiency, supports differ-ent content formats (e.g., channel-based,object-based, scene-based Higher OrderAmbisonics), and is equipped with features,such as dynamic range control, interactivityfor audio objects, and adaptive rendering tothe consumer loudspeaker layout as well asto headphones. For the best possible adap-tation of the spatial audio content to differ-ent reproduction scenarios, the embeddingof content-related metadata into the bit-stream becomes increasingly important.

Other ongoing activities strive to formal-ize ways to describe (uncompressed) spatialaudio content for exchange and archival pur-poses. Those formats may be fed into spatialaudio codecs for the content delivery.

Loudspeaker layouts The most common consumer loudspeakerlayouts for spatial audio are 5.1 and 7.1 sur-

round. While these setups are horizontal-only, the next generation of loudspeakersetups incorporate elevated loudspeakers tocreate immersive audio experiences. Sev-eral consumer loudspeaker layouts rangefrom 7.1 plus 2 or 4 elevated loudspeakersup to NHK’s 22.2.

The increase in the number of loudspeak-ers comes with an additional purchasingcost and installation effort for the consumer.To account for physical restrictions in theplacement of loudspeakers, the trend is toaccommodate irregularly placed (non-stan-dardized) layouts. This is often accompaniedby automatic calibration techniques basingon acoustic measurement of the loudspeakerpositions. To further simplify the installationand reduction of cabling costs, wireless loud-speaker setups are promising.

An emerging alternative solution to theproblem of loudspeaker placement isloudspeaker frames around a TV, orsoundbars below.

Spatial audio on mobile devicesThe success of smartphones, phablets, andtablets as media devices creates demand forspatial audio solutions optimized for mobiledevices. Due to the power constraints ofmobile devices, DSP algorithms that eitherconsume low-power or can dynamicallytrade-off processing quality with powerconsumption are desired.

Spatial audio on mobile devices is oftenreproduced binaurally via headphones, orwith transaural (cross-talk cancelling)methods via the built-in near-field stereoloudspeakers. For the latter, the quality,variability, and placement of the built-inloudspeakers affect the reproduction qualityand should be considered. To guarantee adropout-free delivery of audio content overthe network, several techniques exist thatcan seamlessly accommodate changes inbandwidth and latency.

For the creation of spatial audio content,top tier mobile devices are now capable ofrecording 5.1 surround sound.

Research on sound field synthesisThe deployment of Higher OrderAmbisonics (HOA) and Wave Field Syn-thesis (WFS) arrays in research labs con-tinues. For example, current research insound field synthesis investigates the per-ception of synthetic sound fields. Besideslocalization, a special interest lies on col-oration since this has major impact onthe quality. The practical realization ofarrays and synthesis techniques includingheight is a further research focus thesedays.

Besides sound field synthesis, a relatedand emerging field of interest, which wasalso the focus of the 52nd AES conference,is the active control of sound fields. Soundfield control deals with aspects such as theactive compensation of reflections usingmultiple loudspeakers, or the creation ofdifferent acoustical zones in close proximitywhere different audio content can be repro-duced.

Content productionNew content production tools are necessaryfor creating professional audio contentbeyond 7.1, such as content includingheight, object-based content, and contentin Higher Order Ambisonics.

A number of digital audio workstations,the prefered working environments forprofessional content creation, already fea-ture a flexible bus structure to supportnovel content formats, and novel spatialaudio effects have started to take advan-tage of this.

Market trends point to growing interestin audio immersiveness, and immersive-ness relies on those in the field of spatialaudio having access to more and bettercontent that will test, exploit, and show-case emerging technologies. Conse-quently, there is a lot of room for newcontent production and authoring toolsthat simplify the content productionworkflow and enable new immersive audioexperiences.

TRANSMISSION AND BROADCASTINGKimio Hamasaki and Stephen Lyman, Chairs

Lars Jonsson, Vice Chair

The growth of digital broadcasting is thetrend in this field. Digital terrestrial TVand radio broadcasting have beenlaunched in several countries using thetechnology standards listed below. Loud-ness and True Peak measurements are

replacing the conventional VU/PPM meth-ods of controlling program levels, whichhas largely eliminated significant differ-ences in the loudness of different pro-grams at home. “Object-based audio” and“immersive audio” are currently the sub-

ject of great interest in the EBU, ITU,AES, and SMPTE.

Digital terrestrial TV broadcastingIn Europe DVB-T2 has been deployed inseveral countries for HD services. ATSC is

128 J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February

Page 11: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

TECHNOLOGY TRENDS

used in USA, Canada, Mexico, and Korea,while ISDB-T is employed in Japan,Uruguay, and Brazil.

Digital terrestrial radio broadcastingIn Europe and Asia DAB+ is state of the artin the DAB Eureka 147 family. HD-Radio orIBOC fulfils this role in the USA andCanada. Large broadcasting organizationsin Europe and Asia, and major countrieslike India and Russia with large potentialaudiences, are committed to the introduc-tion of DRM (Digital Radio Mondiale) serv-ices and it is to be expected that this willopen the market for low-cost receivers.

Digital terrestrial TV broadcasting for mobile receiversDVB-T2 Lite has been standardized andchip-sets are available, while ISDB-T isused in Japan. DMB is employed in Koreaand there have been a few trials inEurope. In the USA, the Advanced Televi-sion Systems Committee (ATSC) hasbegun development of a non-backwardscompatible system with next-generationvideo compression, transmission andInternet Protocol technologies via localbroadcast TV stations. This effort istermed “ATSC 3.0” and is covered in somedetail within the professional journals andthe industry trade press. It is too early tofully characterize the eventual results.

Benefits of digital broadcasting The introduction of digital broadcasting hasintroduced such benefits as High DefinitionTV (1080i, 720p). Due to the current avail-ability of 5.1 surround sound in digitalbroadcasting, surround sound is an impor-tant trend in TV broadcasting. 5.1 surroundsound is evolving with future extensionsinvolving additional channels. Along withUltra-High Definition image (4k video), sev-eral broadcasters are experimenting withimmersive audio (for instance, 3D-multi-channel-audio, Ambisonics, wave-field syn-thesis, directional audio coding).

Internet streamingThe use of new methods for the distributionof signals to the home via the Internet withstreaming services is an increasing trend.Web radio and IPTV are now getting audi-ence figures that in a number of years fromnow will be closing in on the traditionalsystems. Distribution technologies withrapid growth in many countries are:ADSL/VDSL over copper or fiber, combinedwith WiFi in homes; WIMAX and 3G/UMTS;4G and wi-fi hot spots for distribution tohandheld devices.

LoudnessLoudness and True Peak measurementsare replacing the conventional VU/PPMmethods of controlling program levels.This has largely eliminated significantdifferences in the loudness of differentprograms (and advertisements) and theneed for listeners to keep adjusting theirvolume controls. Supporting interna-tional standards and operating practiceshave been published by several organiza-tions such as ITU-R, EBU and ATSC listedbelow. More and more broadcasters nowapply these standards in their programproduction and transmission chains.

The fundamental document uponwhich all other work was based is ITU-R:BS.1770: “Algorithms to measure audioprogramme loudness and true-peak audiolevel”; BS.1771: “Requirements for loud-ness and true-peak indicating meters.”Importantly, it not only documents loud-ness but a standardized method to meas-ure true peak in digital audio streams.This is a vital , and often overlookedaspect, as the loudness measurement is atime averaged value that should not bechanging rapidly. So the operator needs areliable method of ensuring they are notclipping their signals. The true peak pro-vides this method.

The next document released (in histori-cal order) tat was based upon BS.1770,was ATSC: A/85 “Techniques for Estab-lishing and Maintaining Audio Loudnessfor Digital Television.” A/85 covers therelated topics exhaustively: Making Loud-ness Measurements; Target Loudness andTrue Peak Levels for Content Delivery orExchange; Metadata Management Consid-erations Impacting Audio Loudness;Methods to Effectively Control Program-to-Interstitial Loudness; Dynamic RangeManagement; and last (and perhaps leastobvious) Audio Monitoring Setup.

Starting roughly a year after the ATSCdrafting group, the EBU drafting grouphas produced a similar set of documents:

EBU R128 Loudness RecommendationEBU Tech 3341 Metering SpecificationEBU Tech 3342 Loudness Range

DescriptorEBU Tech 3343 Production GuidelinesEBU Tech 3344 Distribution GuidelinesEBU has, beyond minor revisions of

R128 and its belonging Tech docs, amajor deliverable planned for 2015 con-cerning the revision of the loudnessrange algorithm (Tech 3342).

The seeming “disconnect” betweenA/85 and the R128-family can be bestunderstood by the very different scopes of

each document. A/85 is focused on DTVaudio only, while R128 attempts to coveralmost any audio delivery mechanism(seemingly including analog). Beyond theseeming differences between A/85 andR128, the US Congress adopted a seem-ingly simple, but with unintended conse-quences, law called CALM (“CommercialAdvertisement Loudness Mitigation Act”).Free TV Australia produced OperationalPractice OP–48 (which the AustralianGovernment made mandatory also). ARIBin Japan has issued. TR-B32 1.2 “Opera-tional Guidelines for Loudness of DigitalTelevision Programs.”

Object-based audio“Object-based audio” is currently the sub-ject of great interest in the EBU, ITU,AES, and SMPTE. At its heart is the meta-data describing the audio content thatwill enable its correct handling along thebroadcast chain and optimal renderingfor the individual listener. As part of this,the EBU has produced the “Audio Defini-tion Model,” published in EBU Tech Doc3364 (which is currently the subject ofstandardization activities in the ITU). Inthe ITU-R there are two relatively recentpublications outlining the future: ReportITU-R BS.2266 “Framework of futureaudio broadcasting systems,” and Recom-mendation ITU-R BS.2051 “Advancedsound system for programme produc-tion.”

Immersive audioEBU FAR strategic program group hasfounded a work group for 3D audio,renamed Immersive Audio. Object-basedproduction embraces 3D audio produc-tion, 3D audio codecs, 3D audio broad-casting, 3D audio file formats, and 3Daudio creativity

The MPEG Committee has begun workon “immersive sound” coding.

Audio/video (“lip”) syncThe audio/video-sync issue remainsunsolved but is being discussed in digitalbroadcasting groups. SMPTE is close toissuing new standards for measuring thetime differences between audio and video.

J. Audio Eng. Soc., Vol. 63, No. 1/2, 2015 January/February 129

Page 12: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

TECHNOLOGY TRENDS

Technology Trends in Audio EngineeringA report by the AES Technical Council

INTRODUCTIONOne of the functions of the TechnicalCouncil and its committees is to tracknew, important research and technologytrends in audio and report them to theBoard of Governors and the Society’smembership. This information helps the

governing bodies of the AES to focus onitems of high priority. Supplying thisinformation puts our technical expertiseto a greater use for the Society.

Last month we published a compre-hensive report from many of the Techni-cal Committees. This month we add a

report from the TC on Semantic AudioAnalysis.

Francis RumseyChair, AES Technical Council

Bob Schulein, Jürgen Herre, Michael KellyVice Chairs

INTRODUCTIONSemantic audio analysis is concerned withthe extraction of meaning from audio sig-nals and with the development of applica-tions that use this information to supportthe consumer in identifying, organizing,and exploring audio signals and interactingwith them. This includes music informa-tion retrieval and semantic web technolo-gies, but also audio production, education,and gaming.

Driven by the increasing availability of digi-tal audio signals and compared to other audiotechnologies, semantic audio analysis is a rel-atively new research area. Semantic refers tothe study of meaning of linguistic expressionsand to the interplay between descriptors andtheir meaning. Semantic technology involvessome kind of understanding of the meaningof the information it deals with, for example asearch engine that can match a query for“bird” with a document mentioning “eagle.”In audio, it incorporates methods and tech-niques for machine learning, digital signalprocessing, speech processing, source separa-tion, perceptual models of hearing, and musi-cological knowledge. It also uses several principles from computer science, including

metadata, ontologies, first order logic andbrings these together with more traditionalsignal-based technologies. By doing this,musical features extracted from audio can berepresented symbolically, which opens up thefull potential of artificial intelligence and nat-ural language processing techniques to besmoothly integrated into future music infor-matics systems.

FROM MUSIC COLLECTIONSTO MUSIC STREAMINGSERVICESSemantic analysis supports managing andbrowsing large music collections. It enablesthe identification of artist and title of arecording, the classification of the musicalgenre, finding alternative recordings of thesame piece, and identifying and removingduplicate songs. It also helps the users totake a journey through a musical world, forexample by finding similar songs, but also byfinding similarities between artists such asbirth place or musical influences. New waysof browsing large collections have beendeveloped, e.g., using audio thumbnails, i.e.,a short and representative segment of a song,

and the search of songs by humming. One major trend is to use mobile devices

and services for on-demand streaminginstead of accessing a physical music collec-tion. Music streaming services offer theirusers access to millions of tracks on demandand functionalities for discovering newmusic. The number of users and the shareof revenues have constantly increased inrecent years.

With the advent of music streaming serv-ice, the volume of accessible music hasincreased even more, and with it the demandfor semantic audio analysis and tools forexploring new music. A recent trend is toimprove the performance of these tools andthe user experience by adapting their func-tioning to the musical preferences and thelistening behavior of the user.

DECOMPOSITION AND INTERACTIVITY

Techniques for both guided and unguidedsource separation are applied in variousaudio editors, dialog enhancement applica-tions and for upmixing. These techniquesinclude the extraction of the center channel

SEMANTIC AUDIO ANALYSISMark Sandler, Chair

Gautham Mysore and Christian Uhle, Vice Chairs

204 J. Audio Eng. Soc., Vol. 63, No. 3, 2015 March

Page 13: Technology Trends in Audio Engineering · Modern challenges for the forensic audio examiner have expanded to include dealing with evidence from social networks and sharing websites

TECHNOLOGY TRENDS

J. Audio Eng. Soc., Vol. 63, No. 3, 2015 March 205

signal from two-channel stereo, the decom-position into direct and ambient signal com-ponents, and the manipulation of transientsignal components.

As a recent trend in source separation,informed separation techniques incorporat-ing additional information about the inputsignal are further developed, with score-informed source separation being a promi-nent example. New developments also includethe application of non-negative tensor factor-ization (e.g., for multichannel input signals)and sparse representations in general.

SEMANTIC AUDIOPRODUCTION

Recently developed applications of semanticanalysis use automated transcription ofrhythmic, harmonic, and melodic informa-tion in digital audio workstations.

Newly developed audio effects apply tempoestimation for synchronous temporal effects(e.g., delays and modulation effects liketremolo), tonal and harmonic analysis (e.g.,harmonizers) and perceptual models for esti-mating the perceived intensity of artificialreverberation.

Semantics have been used for organizing,searching, and retrieving presets in digitalsynthesizers in the past, and current researchalso investigates the automated programmingof synthesizers using example recordings.

SEMANTICS IN AUDIOCODING

State-of-the-art audio codecs are capable ofdelivering high sound quality for both, musicand speech signals, by using dedicated codingschemes and tools for these signal types.Semantic analysis of the input signal isapplied for the control of the signal process-ing in the codec.

MUSIC EDUCATION AND ENTERTAINMENT

The use of semantic audio analysis in musiceducation and learning software includesmusic transcription and source separationfor analyzing a musical performance and cre-ating content, e.g., play-alongs and auto-mated accompaniment. A clear trend is theuse of such software on mobile devices.

Semantic audio analysis also includes the

automated synchronization of recordingsand the recommendation of recordingsbased on rhythmic and harmonic similarityin DJ software.

TRENDS IN RESEARCH

Semantic analysis can be improved by com-bining audio analysis with textual and imageanalysis into multimodal analysis, e.g., musicrecommendation using lyrics when search-ing for Christmas carols. Furthermore, theperformance of machine learning methods isimproved by fusion of multiple concurrentmethods. Alternative classification methodsare investigated, e.g., deep neural networksand random forests. An ongoing activity inthe past ten years is the improvement of theevaluation of new methods and technologiesby using formal evaluation frameworks withcommon data sets and evaluation metrics.

Other applications of semantic audioanalysis besides mentioned above are broad-cast monitoring (e.g., determining theamount of speech versus music and the con-tent broadcast and usage), loudness control(by using speech detection for estimating theloudness of the dialog only), and archiving.