![Page 1: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/1.jpg)
Improving Meetings with Microphone Array Algorithms
Ivan TashevMicrosoft Research
![Page 2: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/2.jpg)
Why microphone arrays?They ensure better sound quality: less noises and reverberationProvide speaker position using sound source localization algorithmsThese technologies are used in the upper levels of meeting recording and broadcasting systems:
Speaker position awareness for better UIAssisting speaker clustering and segmentationBetter speech recognition for meeting annotation and transcribingProvide input data for machine learning enabled applications
![Page 3: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/3.jpg)
Better audio quality and user experience with MicArrays
Meeting attendees look awkward wearing microphones, nobody likes to be tethered Capturing sound from single point is difficult
A single microphone captures ambient noises and reverberationDue to interference with reflected sound waves we can have some frequencies enhanced and some completely suppressed
A microphone array is set of microphones positioned closely
The signals are captured synchronously and processed togetherBeamforming is ability to make the microphone array to listen to given location, suppressing the signals coming from other locations. Electronically steerable.Another name for this type of processing is spatial filtering
![Page 4: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/4.jpg)
Delay and sum beamformerThe most straightforward approach
As the sound from the desired direction reaches the microphones with different delay just delay properly the signals from the microphones and sum themSupposedly the mismatched shifts (phases) for signals coming from other directions will reduce their amplitudeFast and easy to implement
Major problemsThe shape of the beam is different for different frequenciesAlmost no directivity in the lower part of the frequency bandSide lobes (one or more) appear in the upper part of the frequency band
Used for comparison as a base line
![Page 5: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/5.jpg)
Delay and sum beamformerDelay and sum beamformer gain vs. frequency and angle
![Page 6: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/6.jpg)
Time vs. Frequency domain
Time domain processingMore “natural”, used in most of the common beamforming algorithms (GSC etc.)No time spent for conversion Requires long filters (200 – 2000 taps), very slow!
Frequency domain processingCPU time for conversionLong filters are vector multiplications, much faster!Many other types of audio signal processing are faster as well
![Page 7: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/7.jpg)
Generalized beamformerAll time domain algorithms for beamforming can be converted to processing in frequency domainCanonical form of the beamformer:
M – number of microphonesXi(f) – spectrum of i-th channelW(f,i) – weight coefficients matrixY(f) – output signal
Fast processing: M multiplications and M-1 additions per frequency binFor each weight matrix we have corresponding shape of the beam - the array gain as function of direction
∑−
==
1
0)(),()(
M
ii fXifWfY
),,( fB θϕ
![Page 8: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/8.jpg)
Calculation of the weights matrix
The goal of the calculation is for given geometry and beam direction to find the optimal weights matrixFor each frequency bin find weights to minimize the total noise in the outputConstrains: equalized gain and zero phase shift for signals coming from the beam direction
![Page 9: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/9.jpg)
Known approachesUsing multidimensional optimization
The multidimensional surface is multimodal, i.e. have multiple extremesNon-predictable number of iterations, i.e. slowMultiple computations lead to losing precision
Using the approach above with different optimization criterion:
Minimax, i.e. minimization of the max differenceMinimal beamwidth, etc.
In all cases the starting point of the multidimensional optimization is critical
![Page 10: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/10.jpg)
Array noise suppression
Noise = ambient + non-correlated + correlated (jammers and reverberation)Ambient noise suppression
Non-correlated noise:
Correlated (from given direction):
∫ ∫ ∫+
−
2
0
2
0
2
2
),,()(log20
Sf
dfddfBfNπ
π
πϕθθϕ
∫
∫
2
0
2
0
),,()(
),,()(log20
S
S
f
JJ
f
SS
dffBfJ
dffBfS
θϕ
θϕ
∫ ∑−
=
2
0
1
0
2),(log20
SfM
idfifW
![Page 11: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/11.jpg)
Microphone Array for meetingsNumber of microphones: 8 Noise suppression, ambient: 12-16 dBSound source suppression (up to 4000 Hz):
At 900: better than 12 dBAt 1800: better than 15 dB
Beam width at -3 dB: 400
Work band: 80 – 7500 Hz.Principle of work: points a capturing beam to the speaker location
![Page 12: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/12.jpg)
Microphone Array for meetings
MicArray gain vs. frequency and angle
![Page 13: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/13.jpg)
Additional goodiesLinear processingBeamforming doesn’t introduce non-linear distortions making the output signal
suitable not only for recording/broadcasting, but for speech recognition as well
Integration with Acoustic Echo cancellationRequirement for real-time communication purposes
Better noise suppressionThe initial noise reduction from the beamformer allows using better noise
suppression algorithms after it without introducing significant non-linear distortions and musical noises
Partial de-reverberationThe narrow beam suppresses reflected from the walls sound waves making the
sound more “dry” and better accepted from live listeners and speech recognition engines, it makes the job of potential de-reverberation processor easier
![Page 14: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/14.jpg)
Beamshapes
525 Hz 1025 Hz
2025 Hz 4025 Hz
The beam shape in 3D proves frequency independent beamforming
![Page 15: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/15.jpg)
Sound source localization
Provides the direction to the sound sourceIn most of the cases works in real-timeGoes trough three phases:
Pre-processing:Actual sound source localization
Provides a single SSL measurement (time, position, weight)
Post-processing of the results:Final result: position, confidence level
![Page 16: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/16.jpg)
SSL pre-processing
Pre-processingPackaging the audio signals in framesConversion to frequency domainNoise suppressionClassification signal/pauseRejection of non-signal frames
![Page 17: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/17.jpg)
SSL pre-processing (example)
SSL measurements vs. time
0 5 10 15 20 25 30 35-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Time
Am
plitu
de
One channel Signal
![Page 18: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/18.jpg)
Actual SSL - known algorithmsTwo step time delay estimates (TDOA) based
Calculate the delay for each microphone pairConvert it to directionCombine the delays from all pairs for the final estimation
One step time delay estimates (Yong Rui and Dinei Florencio, MS Research)
Calculates the correlation function for each pairFor each hypothetical angle of arrival, accumulate correspondingcorrelation strength from all pairs, and search for the best angle
Steered beam based algorithmsCalculate the energy of beams pointing to various directionsFind the maximumInterpolate with neighbors for increased resolution
Others: ICA based, blind source separation, etc.Most of them non real-time
![Page 19: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/19.jpg)
Beamsteering SSL (example)
Energy vs. angle and time, single sound source
![Page 20: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/20.jpg)
Major factors harming the precision
Ambient noise Smoothes the maximumsHides low-level sound sources
ReverberationsCreate additional peaksLift the noise floorSuppress/enhance some frequencies
ReflectionsCreate distinct fake peaks with constant location
All above justify the post-processing phase
![Page 21: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/21.jpg)
SSL with reflections and reverberation – raw data
Speakers in conference room (SSL results histogram)
-200 -150 -100 -50 0 50 100 150 2000
50
100
150
200
250
300
-200 -150 -100 -50 0 50 100 150 2000
20
40
60
80
100
120
Speaker 1 at -8O: louder voice,less reflections
Speaker 2 at 52O: quieter voice, strong reflections from the white boards
![Page 22: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/22.jpg)
SSL post-processingThe goals are:
To remove results from reflections and reverberationTo increase the SSL precision (standard deviation)To track the sound source movement/change dynamicsEventually to provide tracking of multiple sound sources
Approaches for post-processing of the SSL resultsStatistical processingReal-time clusteringKalman filteringParticle filtering
Provides the final result: time, position, confidence level
![Page 23: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/23.jpg)
Real-time clustering of SSL data
Put each new SSL measurement (time, direction, weight) into a queueRemove all measurements older than given life time (~4 sec)Place all measurements into a spatially spread 50% overlapping bucketsFind the bucket with largest sum of weightsWeighted average the measurements in this bucketCalculate the confidence level based on last time, number of measurements, standard deviation
![Page 24: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/24.jpg)
Post-processing results
3830.8766-5.1692-43Conf. Room
2712.4308-4.2070Conf. Room
2260.96993.465735Conf. Room
4050.75114.7209-29Office
3910.96871.61810Office
4071.3155-4.753938Office
2922.47885.6932-21Sound Room
3192.0871.87220Sound Room
3340.3857-1.605436Sound Room
#resultsStDev, degBias, degSpeaker, degConditions
Single speaker in various positionsRecording conditions:
Sound room (no noise and reverberation)Office (high noise, shorter reverberation, reflections)Conference room (less noise, longer reverberation, reflections)
All records done with8 element circularmicrophone arrayfor meetings recording
![Page 25: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/25.jpg)
Post-processing results (2)Two speakers in fixed positionsRecording conditions: conference room, speakers at -8 and 52 deg
Two persons SSL data
-200
-150
-100
-50
0
50
100
150
200
0 10 20 30 40 50 60 70 80 90
Time, s
Ang
le, d
eg
RawSSLPost SSL
![Page 26: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/26.jpg)
Post-processing results (3)Two speakers in fixed positionsRecording conditions: conference room, speakers at -8 and 52 deg
Two persons SSL (detail)
-20
-10
0
10
20
30
40
50
60
70
80
90
57.5 58 58.5 59 59.5 60 60.5 61 61.5 62 62.5
Time, s
Ang
le, d
eg
RawSSLPostSSL
Speaker switchingat second 59
Post-processingdelay: ~400 ms
![Page 27: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/27.jpg)
Applications for MicArrays and Sound Source Localization
Sound capturing during meetingsProvides direction to point the capturing beamAssists the Virtual director for speaker view (real-time)
Meeting post-processingAssists speaker clusteringMeeting annotation using rough ASR (requires good sound quality)Meeting transcription with precise ASR
Recorded meetings viewing/browsingAudio timeline: suppress some audio tracks, navigation by speaker (based on the speaker clustering)Good sound quality - better user experienceGood sound quality – search by phrases or keywords with ASRSSL data assisted virtual director for speaker view (play-time)
![Page 28: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/28.jpg)
Meetings browser (example)
![Page 29: Improving Meetings with Microphone Array Algorithms · Improving Meetings with Microphone Array Algorithms ... Capturing sound from single point is difficult ... Beamforming is ability](https://reader030.vdocuments.mx/reader030/viewer/2022040904/5e78d5e6112e1b3614054e18/html5/thumbnails/29.jpg)
Meetings browser (detail)
Audio timeline