xaudio2 high performance considerations us

XAudio2 Performance Tips

4/20/2010 1:32 PM1XAudio2 Performance TipsTom MathewsLead DeveloperAdvanced Technology GroupMicrosoft4/20/2010 1:32 PM2XAudio2 overviewVoice & Graph optimizationxAPO optimizationVoice reuseCompressionStreamingDebugging / Performance analysis

Overview4/20/2010 1:32 PM3Low-level cross-platform game audio APIPlay hundreds of sounds at onceLoop, start, stop, adjust sounds at any timeVolume, pitch, filter, reverb, DSPIdentical code on both platformsBuilding block for higher-level sound design tools such as the XACT3 engineReplaced XAudio1 Replaced DirectSound for gaming purposesWhat Is XAudio24/20/2010 1:32 PM4Flexible channel routingAny channel can be sent to any other channel with attenuation/amplificationMultistage submixingFor example, each car can have a submix (exhaust, transmission, engine, etc.), and each cars mix can then be fed into another submix for environmental effectsFeatures4/20/2010 1:32 PM5Deferred commandsMost operations (Start, SetParameter, SetOutputVoice, SetEffectChain) can be grouped and applied as atomic, sample-accurate operationsxAPOs (DSPs)In-box APOs (Reverb, notch, etc.)Create custom equalizers, compressors, limiters, monitors, phase shifters, attenuators, delays, ..And they can be cross-platform, like the in-box APOs.Advanced Features4/20/2010 1:32 PM6XAudio2 requires at least SSEAvailable since 1999 for PCsMakes extensive use of it in processing codeYour processing code may do the sameXAudio2 also makes use of SSE2/FTZ/DAZAvailable since 2001 for PCsXAudio2 makes use of XMA hardware-accelerated decode and VMX instructions for 360XAudio2: Minimum CPUVectorized signal processing4/20/2010 1:32 PM7Source VoicesMastering Voice32k, Mono24k (Mono)XMA2xWMAXMA2Effect1EffectNSample Rate ConversionAudio Flow32k (5.1)32k (5.1)32k (5.1)Submix Voices48k (5.1)Pitch/SRC + filterEffect1EffectNFilterEffect1EffectNSampleRate. Conv.32k (Mono)32k (5.1)

44k (5.1)

Pitch/SRC + filterEffect1EffectNPitch/SRC + filterEffect1EffectNPitch/SRC + filterEffect1EffectN4/20/2010 1:32 PM8Apply FX to many voices at once for the price of oneMake use of lower-rate sub-graphsLower rate == fewer samples == less CPURun expensive global send FX at a lower rate/channels than the final mixProvides for more detailed control of performance characteristicsAllows for smooth crossfades between disparate FXe.g. Environmental reverb crossfadeGraph OptimizationSUBMIX!FilterEffect1EffectNSampleRate. Conv.4/20/2010 1:32 PM9

Use XAudio2_VOICE_NOPITCH & _NOSRC when possibleMinimize MaxFrequencyRatio when usedSource Voices Setting up for best performanceStopped voices are not touched by the real-time processing threadVoice PoolingMuch faster than repeated allocation/freeSetFrequencyRatio may be applied to reuse voices for data of a different sampling ratePitch + filterEffect1EffectNSampleRate. Conv.32k (Mono)XMA24/20/2010 1:32 PM10Voice PoolingPitch + filterEffect1EffectNSampleRate. Conv.32k (Mono)XMA2Create pools of VoicesEach Pool is unique on Source Content (xWMA, XMA, ADPCM) and Channel CountWhen you need a new VoiceIdentify a lower priority voice in the poolCall Stop(), then FlushSourceBuffers()With February XDK, you no longer have to wait for the next Process() before reusingIf needed: Call SetSourceSampleRate()Remember: Stopped voices are CPU-free

4/20/2010 1:32 PM11XAPO_BUFFER_SILENTIndicates silent data should be assumedActual memory may be uninitializedBuffers are 16-byte aligned & interleaved per-channelUse VMX128 instructionsUse in-place processingIn-place: Input buffer == Output bufferUse EnableEffect/DisableEffectMore convenient than destroying and recreating the voice/FXe.g. Environmental reverb crossfade

FX OptimizationPitch + filterEffect1EffectNSampleRate. Conv.32k (Mono)XMA24/20/2010 1:32 PM12XAudio2 Memory PoolAll internal XAudio2 allocations pooledAllows for efficient parameter passing without imposing cumbersome parameter scope requirementsXaudio2 allocates sooner, rather than laterPool reset when last IXAudio2 instance releasedGives applications control of memory pool lifespanPossible uses include reclaiming memory between levels

Remember this?Memory is pooled for many things, including SRCs and Pitch Shifting4/20/2010 1:32 PM13Pitch + filterEffect1EffectNSampleRate. Conv.32k (Mono)XMA2CompressionAlways use compression to minimize disk/memory/cache footprintReduce XMA/xWMA quality per sound for optimal quality/size tradeoffSeek tables:Allows caller to skip past unwanted packets, without having to load the data itself.

4/20/2010 1:32 PM14Pitch + filterEffect1EffectNSampleRate. Conv.32k (Mono)XMA2Compression - TradeoffsPCMNot compressed, so highest fidelityADPCM (Windows Only)Slight Compression (~4:1, lossy)XMA (360 Only)Hardware-accelerated decode (316 concurrent streams)Good compression (~6+:1)xWMASoftware decode (Mono/Stereo~=.6-1.2% of 360 core)Excellent compression (~20+:1)Good for voices/music, no seamless looping4/20/2010 1:32 PM15StreamingPitch + filterEffect1EffectNSampleRate. Conv.32k (Mono)XMA2Cycle a circular queue of buffers to submit new data to XAudio2Submit new data within voices OnBufferEnd callbackIncreasing read-ahead before starting the voice decreases chance of glitching, but can increase perceptible latency depending on implementationConsider streaming several buffers into the engine before throttlingXMA2 Block Size should be in increments of 32K to mirror DVD I/O patterns4/20/2010 1:32 PM16xWMA StreamingEach xWMA file contains a list of offsets (DPDS chunk)Each submit needs a modified form of this list:DPDS Chunk:0 10001 20002 30003 50004 70005 120006 140007 190008 200001st Submit0 10001 2000 2 3000

2nd Submit0 12

2000 (5000-3000) 4000 (7000-3000) 9000 (12000-3000)DPDS Chunk:0 10001 20002 30003 50004 70005 120006 140007 190008 20000DPDS Chunk:0 10001 20002 30003 50004 70005 120006 140007 190008 200005000700012000DPDS Chunk:0 10001 20002 30003 50004 70005 120006 140007 190008 200001st Submit0 10001 2000 2 3000

Pitch + filterEffect1EffectNSampleRate. Conv.32k (Mono)XMA24/20/2010 1:32 PM17Blocking Calls XAudio2 ThreadThe XAudio2 realtime thread can be blocked by:StopEngine and IXAudio2::Release()DestroyVoice()Thus, the need for voice reuseXAudio2 callbacksCheck time spent in effect chainYour code can be blocked by any XAudio2 API call, waiting on internal realtime thread locks.4/20/2010 1:32 PM18Use the debug versions of XAudio2, X3DAudio, XAPOBase, etc.SetDebugConfiguration may be used to control debug behavior for XAudio2VolumeMeter xAPO useful for detecting clippingPIX counters available to track CPU, memory, and voice statisticsSimilar data available via IXAudio2::GetPerformanceDataWatch for other threads on the core that may be slowing down XAudio2Debugging

Audio performance analysis with PIX

4/20/2010 1:32 PM20A Case StudyFilterReverbSampleRate. Conv.Effect1EffectNSample Rate ConversionPitch/SRC + filterEffect1EffectNPitch/SRC + filterEffect1EffectNPitch/SRC + filterEffect1EffectNPitch/SRC + filterEffect1EffectNMonoStereoQuad5.14/20/2010 1:32 PM21

PIX4/20/2010 1:32 PM22

Timing Capture

4/20/2010 1:32 PM23Use callbacks to notify Hardware Thread 5 that it can resume executionOnProcessingPassEnd Callback

4/20/2010 1:32 PM24xbPerfVieww/ Sampling Capture

4/20/2010 1:32 PM25A Case StudyAdding submixesFilterReverbSampleRate. Conv.Effect1EffectNSample Rate ConversionPitch/SRC + filterEffect1EffectNPitch/SRC + filterEffect1EffectNPitch/SRC + filterEffect1EffectNPitch/SRC + filterEffect1EffectNMonoStereoQuad5.1

4/20/2010 1:32 PM26

xbPerfVieww/ Submixing4/20/2010 1:32 PM27A Case StudySRC & ReverbFilterReverbSampleRate. Conv.Effect1EffectNSample Rate ConversionPitch/SRC + filterEffect1EffectNPitch/SRC + filterEffect1EffectNPitch/SRC + filterEffect1EffectNPitch/SRC + filterEffect1EffectNMonoStereoQuad5.1

Change to Mono->5.1 Reverb32k32k48k48k4/20/2010 1:32 PM28xbPerfViewFinal Numbers

ComponentStart CPU%Final CPU%% FreedMatrixMix17.48%4.25%13.23%Reverb6.37%4.94%1.43%Resampling14.74%11.41%3.33%Total38.59%20.60%17.99%Idle27.95%48.47%20.52%4/20/2010 1:32 PM29

With Processing to Spare

4/20/2010 1:32 PM30SUBMIX!Use OnBufferEnd callbacks to stream dataIntentionally choose your compression methodsCarefully manage your voice interactionsWatch for Blocking CallsPool voices where possibleUse EnableEffect/DisableEffectProfile your title to focus your effortsSummary4/20/2010 1:32 PM31www.microsoftgamefest.com 2009-2010 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.4/20/2010 1:32 PM32

xaudio2 high performance considerations us

Documents

use of sse2ftzdazavailable

use of xma hardware

pm9 use xaudio2

price of onemake use

usedsource voices

reuse voices

channel countwhen

pcsmakes extensive use