robust sound field reproduction against listener’s movement utilizing image sensor
TRANSCRIPT
Robust Sound Field Reproduction against
Listener’s Movement Utilizing Image Sensor
Toshihide Aketo,Hiroshi Saruwatari,Satoshi Nakamura
(Nara Institute of Science and Technology, Japan)
Outline
� Research background
� Conventional method
�Spectral Division Method
� Local sound field synthesis
� Proposed method
�Equiangular filter
�Sound field reproduction system utilizing image sensor
� Simulation experiment
� Subjective assessment
� on directional perception
� on sound quality
Research background (1/3)
Objective of sound field reproduction (SFR) system
� Ambisonics
� Stereo or surround system
� Boundary surface control
(BoSC)
Circular or spherical
(a little complex)
� To reproduce the primary sound field to another space with wide range
and high accuracy.
� However, it is difficult to realize such a system because the system size
becomes larger and the system configuration becomes complex.
Surrounded
(large and complex)
� Wave field synthesis
(WFS)
Focused
Linear or planer
(simple)
� Therefore, the recent research is focused on reproducing sound field with wide
range and high accuracy using small and simple system.
Complex Simple
� Spectral Division Method (SDM) [J. Ahrens, S. Spors., 2008]
� One of the SFR methods that reproduces the sound field by synthesizing a
number of wavefronts.
� This method can be realized with a simple system like linear loudspeaker
array.
� However, SDM has two problems.
� Problem 1: A sound pressure error is occurred by mismatching the
reference listening line.
� Problem 2: A disturbance of wavefront is occurred by a spatial aliasing.
Research background (2/3)
� Reproduction accuracy: Low
� Reproduction region: WideHigh
We aim to reproduce the sound field with high
accuracy by solving these problems in SDM.
Research background (3/3)
� To cope with these problems, we propose the novel SFR system with
linear loudspeaker array, which combines listener’s position
estimation by Kinect and SDM with local sound field synthesis.
� Reproduction accuracy
� Reproduction region:
Image sensor
� Reproduction accuracy:
� Reproduction region:
Kinect
Local sound
field synthesis
Low
Wide
High
localized around listener
Spectral Division Method (SDM) [J. Ahrens, S. Spors., 2008]
� The driving function in the wavenumber domain
� The driving function in the spatial domain
nth secondary
source
Primary source
Reference
listening line
Fourier transformSpatial domain Wavenumber domainIDFT
Reference
listening line
nth secondary
source
Primary source
: reference listening distance: angular frequency : speed of sound: imaginary unit
: wavenumber in -direction : zero-th order modified Bessel function of the second kind
: zero-th order Hankel function of the second kind
Spectral Division Method (SDM)
� A sound pressure error is occurred by mismatching the reference listening line.
� A disturbance of wavefront is occurred by a spatial aliasing.
:reference listening distance
[J. Ahrens, S. Spors., 2008]
nth secondary
source
Primary source
Reference
listening line
Fourier transformSpatial domain Wavenumber domainIDFT
Reference
listening line
nth secondary
source
Primary source
� The driving function in the wavenumber domain
� The driving function in the spatial domain
Problems in SDM
Problem 1 : sound pressure error
� A sound pressure is correctly reproduced only on the reference
listening line under 2.5-dimensional synthesis condition.
� Therefore, to correctly reproduce the sound field to listener's position,
we must set the reference listening distance equal to listener's distance.
Primary sound field Reproduced sound field
2.0
1.0
0.0
-1.0 1.00.0
2.0
1.0
0.0
-1.0 1.00.0
Sound pressure is correctly
reproduced on the
reference listening line.
Sound pressure error
occurs outside the
reference listening line.
Problem 2: spatial aliasing (1/2)
� In SDM, a spectral overlap of the driving function is occurred by
discretization of secondary source, and filter power at high frequency
becomes larger like in the right figure.
R参
加-30 300
20
10
0
0
-24
-48M
ag
nitu
de
[d
B]
-30 300
20
10
0
0
-24
-48
Ma
gn
itu
de
[dB
]
Spectral overlap occurs
Discretization of the secondary source
� The effect of spectral overlap in the wavenumber domain appears as a
spatial aliasing in the spatial domain.
-1.5 1.50.0
0.10
0.00
-0.10A
mp
litu
de
3.0
1.5
0.0
Problem 2: spatial aliasing (2/2)
Discretization of the secondary source
Disturbance of wavefront occurs
-1.5 1.50.0
0.10
0.00
-0.10
Am
plit
ud
e
3.0
1.5
0.0
Synthesized wavefront
(continuous array)Synthesized wavefront
(discrete array)
Local sound field synthesis: the method enables to suppress a spatial
aliasing by limiting spatial bandwidth in the wavenumber domain.
� By applying a rectangular window to a spectrum in the left figure, we
enable to suppress a spectral overlap like in the right figure.
-30 300
20
10
0
0
-24
-48
Ma
gn
itu
de
[dB
]
Spectral overlap occurs
Local sound field synthesis (1/2) [J. Ahrens, S. Spors., 2011]
-30 300
20
10
0
0
-24
-48
Ma
gn
itu
de
[dB
]
Rectangular window for the spectrum of the driving function
Spectral overlap is suppressed
� By applying a rectangular window, we enable to suppresses a
disturbance of wavefront and enable to increase the maximum
frequency in which the sound field can be correctly reproduced.
� Therefore, It is necessary to design a filter to precisely control the
reproduced direction in order to take advantage of this method.
Local sound field synthesis (2/2)
Disturbance of wavefront is suppressed
Reproduction area is localized
-1.5 1.50.0
0.10
0.00
-0.10A
mp
litu
de
3.0
1.5
0.0
Spatial aliasing occurs
-1.5 1.50.0
0.10
0.00
-0.10
Am
plit
ud
e
3.0
1.5
0.0
Synthesized wavefront (unfiltered) Synthesized wavefront (filtered)
[J. Ahrens, S. Spors., 2011]
� In order to design a filter to accurately control the reproduced direction,
we derive the relation equation between reproduced direction ,
wavenumber in -direction and frequency .
� If reproduced direction is constant, since it is found that is
proportional to , we design a new filter as follows
Equiangular filter
: angular frequency
:reproduced direction
: speed of sound : frequency
: angular width
: wavenumber
: equiangular filter
: wavenumber in -direction
constant
proportional
Result of applying the equiangular filter (1/2)
� An example when we applied a designed filter to a spectrum� This case that the angular is and the angular width is .
-30 300
20
10
0
0
-24
-48
Ma
gn
itu
de
[dB
]
Spectral overlap occurs
-30 300
20
10
0
0
-24
-48
Ma
gn
itu
de
[dB
]
� Equiangular filter used in this presentation is cut by applying a low-pass
filter with respect to the frequency that exceeds the maximum
frequency , and we do not reproduce the sound field.
Equiangular filter for the spectrum of the driving function
Spectral overlap is suppressed
� By applying the equiangular filter, we enable to suppress a disturbance
of wavefront and enable to reproduce the sound field to the specific
direction.
� However, there is a problem that it is impossible to match the sweet spot
to the listener’s position if listener’s direction is unknown in advance.
Result of applying the equiangular filter (2/2)
-1.5 1.50.0
0.10
0.00
-0.10A
mp
litu
de
3.0
1.5
0.0
Spatial aliasing occurs
-1.5 1.50.0
0.10
0.00
-0.10
Am
plit
ud
e
3.0
1.5
0.0
Synthesized wavefront (unfiltered) Synthesized wavefront (filtered)
Disturbance of wavefront is suppressed
Summary of problems
�Problems in SDM
� A sound pressure error occurs in the case that the reference
listening distance does not match listener's distance.
� A spatial aliasing is occurred by discretization of secondary sources.
� Problems in equiangular filter
� It is impossible to match the sweet spot to the listener’s position if
listener’s direction is unknown in advance.
� These problems can be solved if we know the listener’s
position,
� therefore, introduction of the image sensor enables to solve
these problems.
Second problem can be solved by applying an equiangular filter
Condition of simulation experiment
34 ch linear secondary
source array (monopole source)
Primary source (monopole source)
Reference
listening line
Parameter name Parameter value
measurement plane W4.0 D4.0
aliasing frequency approximately 2019 Hz
angular width
reproduced direction
synthesis frequency 3, 5 kHz
Evaluation score
: radiation characteristic of primary sound field
: radiation characteristic of secondary sound field
� It is assumed that listener’s position is obtained by the image sensor, we calculate
the reproduced direction from sound source position and listener's position.
Results of simulation experiment
-1.5 1.50.0
0.10
0.00
-0.10
Am
plit
ud
e2.0
1.0
0.0
-1.0-1.5 1.50.0
0.10
0.00
-0.10
Am
plit
ud
e2.0
1.0
0.0
-1.0
-1.5 1.50.0
0
-24
-48
2.0
1.0
0.0
-1.0-1.5 1.50.0
0
-24
-48
2.0
1.0
0.0
-1.0
:::: Listener:::: Primary source
Synthesized wavefront (3 kHz)
Evaluated value (3 kHz)
Synthesized wavefront (5 kHz)
Evaluated value (5 kHz)
-1.5 1.50.0
0.10
0.00
-0.10
Am
plit
ud
e2.0
1.0
0.0
-1.0-1.5 1.50.0
0.10
0.00
-0.10
Am
plit
ud
e2.0
1.0
0.0
-1.0
-1.5 1.50.0
0
-24
-48
2.0
1.0
0.0
-1.0-1.5 1.50.0
0
-24
-48
2.0
1.0
0.0
-1.0
Synthesized wavefront (3 kHz)
Evaluated value (3 kHz)
Synthesized wavefront (5 kHz)
Evaluated value (5 kHz)
Results of simulation experiment
The sound field is correctly reproduced
at listener’s direction regardless of the frequency.:::: Listener:::: Primary source
Condition of subjective assessment on directional perception
34 ch linear
loudspeaker array
Reference
listening line
Acoustic transparent
curtain
Pos 1Pos 2
Pos 3
Loudspeaker
distance
:::: Primary source:::: Answer number card
parameter name parameter value
sampling frequency 48 kHz
quantization bit rate 16 bit
test sound white Gaussian noise with 3 seconds
aliasing frequency approximately 2019 Hz
angular width
sound source direction
number of evaluator 7
type of sound source ・sound source without bandwidth limitation
(Conventional1)
・sound source with bandwidth limitation in
frequencies under 2 kHz (Conventional2)
・sound source in which we applied the
equiangular filter(Proposed)
: number of evaluator
: true source direction
: answered direction
Evaluation score
� We asked evaluators to answer which card position you perceive the sound
source exists as an evaluation procedure.
Results of subjective assessment on directional perception
Conventional2 (with bandwidth limitation in frequencies under 2 kHz)
Proposed (in which we applied the equiangular filter)
� Proposed is superior to Conventional1 and Conventional2 in Pos1 and Pos2.
However, Proposed is almost the same as Conventional2 in Pos3.� This is because in equiangular filter, as the angle of reproduced direction becomes
larger, the maximum frequency becomes low.
� As the user moves to right (from Pos1 to Pos3), directional perception error of
Conventional1 becomes larger owing to the effect of a spatial aliasing.
The superiority of the proposed method is shown on directional perception.
(a) In Pos1 (b) In Pos2 (c) In Pos3
Good
Bad
Conventional1 (without bandwidth limitation)
Condition of subjective assessment on sound quality
34 ch linear
loudspeaker array
parameter name parameter value
sampling frequency 48 kHz
quantization bit rate 16 bit
test sound White Gaussian noise with 3 seconds
aliasing frequency approximately 2019 Hz
angular width
sound source direction
number of evaluator 7
type of sound source ・sound source without bandwidth
limitation (Conventional1)
・sound source with bandwidth limitation
in frequencies under 2 kHz
(Conventional2)
sound source in which we applied the
equiangular filter(Proposed)
Reference
listening line
Acoustic transparent
curtain
Pos 1Pos 2
Pos 3
Loudspeaker
distance
:::: Primary source
:::: Reference loudspeaker
� We sounded two synthesized sound after reference sound radiated by reference
loudspeaker, and asked evaluators to answer which synthesized sound you
perceive closer to the reference sound as an evaluation procedure.
ꥰꥰ
Results of subjective assessment on sound quality
� In all results, evaluators chose Conventional1 or Proposed, and didn’t
choose Conventional2.
� In all listener’s position, more evaluator chose Conventional1 than
Proposed.
(a) In Pos1 (b) In Pos2 (c) In Pos3
Conventional1 (without bandwidth limitation)
Conventional2 (with bandwidth limitation in frequencies under 2 kHz)
Proposed (in which we applied the equiangular filter)Good
Bad
� It was suggested that the effect in which high frequency region of sound is
cut is larger than the effect of spatial aliasing on sound quality.
ꥰꥰ
Conclusion
� The objective of SFR system is to reproduce the primary sound field to
another space with wide range and high accuracy as much as possible.
� Since it is difficult to reproduce the sound field with a complex system, the
SFR method utilizing simple system has been desired.
� SDM can be realized with a simple system like linear loudspeaker array.
However, to reproduce the sound field with high accuracy utilizing this
method is impossible.
� We proposed the SFR system which reproduce the sound field with high
accuracy to listener's position by estimating the listener's direction.
� As results of subjective assessment, the superiority of proposed
method is shown on directional perception.
� However, since the superiority failed to show on sound quality, it is
necessary to improve the equiangular filter that we do not apply the low-
pass filter.
Thank you for your attention!