robust sound field reproduction against listener’s movement utilizing image sensor

24
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image Sensor Toshihide AketoHiroshi SaruwatariSatoshi Nakamura (Nara Institute of Science and Technology, Japan)

Upload: -

Post on 20-May-2015

10.004 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Robust Sound Field Reproduction against

Listener’s Movement Utilizing Image Sensor

Toshihide Aketo,Hiroshi Saruwatari,Satoshi Nakamura

(Nara Institute of Science and Technology, Japan)

Page 2: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Outline

� Research background

� Conventional method

�Spectral Division Method

� Local sound field synthesis

� Proposed method

�Equiangular filter

�Sound field reproduction system utilizing image sensor

� Simulation experiment

� Subjective assessment

� on directional perception

� on sound quality

Page 3: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Research background (1/3)

Objective of sound field reproduction (SFR) system

� Ambisonics

� Stereo or surround system

� Boundary surface control

(BoSC)

Circular or spherical

(a little complex)

� To reproduce the primary sound field to another space with wide range

and high accuracy.

� However, it is difficult to realize such a system because the system size

becomes larger and the system configuration becomes complex.

Surrounded

(large and complex)

� Wave field synthesis

(WFS)

Focused

Linear or planer

(simple)

� Therefore, the recent research is focused on reproducing sound field with wide

range and high accuracy using small and simple system.

Complex Simple

Page 4: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

� Spectral Division Method (SDM) [J. Ahrens, S. Spors., 2008]

� One of the SFR methods that reproduces the sound field by synthesizing a

number of wavefronts.

� This method can be realized with a simple system like linear loudspeaker

array.

� However, SDM has two problems.

� Problem 1: A sound pressure error is occurred by mismatching the

reference listening line.

� Problem 2: A disturbance of wavefront is occurred by a spatial aliasing.

Research background (2/3)

� Reproduction accuracy: Low

� Reproduction region: WideHigh

We aim to reproduce the sound field with high

accuracy by solving these problems in SDM.

Page 5: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Research background (3/3)

� To cope with these problems, we propose the novel SFR system with

linear loudspeaker array, which combines listener’s position

estimation by Kinect and SDM with local sound field synthesis.

� Reproduction accuracy

� Reproduction region:

Image sensor

� Reproduction accuracy:

� Reproduction region:

Kinect

Local sound

field synthesis

Low

Wide

High

localized around listener

Page 6: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Spectral Division Method (SDM) [J. Ahrens, S. Spors., 2008]

� The driving function in the wavenumber domain

� The driving function in the spatial domain

nth secondary

source

Primary source

Reference

listening line

Fourier transformSpatial domain Wavenumber domainIDFT

Reference

listening line

nth secondary

source

Primary source

: reference listening distance: angular frequency : speed of sound: imaginary unit

: wavenumber in -direction : zero-th order modified Bessel function of the second kind

: zero-th order Hankel function of the second kind

Page 7: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Spectral Division Method (SDM)

� A sound pressure error is occurred by mismatching the reference listening line.

� A disturbance of wavefront is occurred by a spatial aliasing.

:reference listening distance

[J. Ahrens, S. Spors., 2008]

nth secondary

source

Primary source

Reference

listening line

Fourier transformSpatial domain Wavenumber domainIDFT

Reference

listening line

nth secondary

source

Primary source

� The driving function in the wavenumber domain

� The driving function in the spatial domain

Problems in SDM

Page 8: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Problem 1 : sound pressure error

� A sound pressure is correctly reproduced only on the reference

listening line under 2.5-dimensional synthesis condition.

� Therefore, to correctly reproduce the sound field to listener's position,

we must set the reference listening distance equal to listener's distance.

Primary sound field Reproduced sound field

2.0

1.0

0.0

-1.0 1.00.0

2.0

1.0

0.0

-1.0 1.00.0

Sound pressure is correctly

reproduced on the

reference listening line.

Sound pressure error

occurs outside the

reference listening line.

Page 9: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Problem 2: spatial aliasing (1/2)

� In SDM, a spectral overlap of the driving function is occurred by

discretization of secondary source, and filter power at high frequency

becomes larger like in the right figure.

R参

加-30 300

20

10

0

0

-24

-48M

ag

nitu

de

[d

B]

-30 300

20

10

0

0

-24

-48

Ma

gn

itu

de

[dB

]

Spectral overlap occurs

Discretization of the secondary source

Page 10: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

� The effect of spectral overlap in the wavenumber domain appears as a

spatial aliasing in the spatial domain.

-1.5 1.50.0

0.10

0.00

-0.10A

mp

litu

de

3.0

1.5

0.0

Problem 2: spatial aliasing (2/2)

Discretization of the secondary source

Disturbance of wavefront occurs

-1.5 1.50.0

0.10

0.00

-0.10

Am

plit

ud

e

3.0

1.5

0.0

Synthesized wavefront

(continuous array)Synthesized wavefront

(discrete array)

Page 11: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Local sound field synthesis: the method enables to suppress a spatial

aliasing by limiting spatial bandwidth in the wavenumber domain.

� By applying a rectangular window to a spectrum in the left figure, we

enable to suppress a spectral overlap like in the right figure.

-30 300

20

10

0

0

-24

-48

Ma

gn

itu

de

[dB

]

Spectral overlap occurs

Local sound field synthesis (1/2) [J. Ahrens, S. Spors., 2011]

-30 300

20

10

0

0

-24

-48

Ma

gn

itu

de

[dB

]

Rectangular window for the spectrum of the driving function

Spectral overlap is suppressed

Page 12: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

� By applying a rectangular window, we enable to suppresses a

disturbance of wavefront and enable to increase the maximum

frequency in which the sound field can be correctly reproduced.

� Therefore, It is necessary to design a filter to precisely control the

reproduced direction in order to take advantage of this method.

Local sound field synthesis (2/2)

Disturbance of wavefront is suppressed

Reproduction area is localized

-1.5 1.50.0

0.10

0.00

-0.10A

mp

litu

de

3.0

1.5

0.0

Spatial aliasing occurs

-1.5 1.50.0

0.10

0.00

-0.10

Am

plit

ud

e

3.0

1.5

0.0

Synthesized wavefront (unfiltered) Synthesized wavefront (filtered)

[J. Ahrens, S. Spors., 2011]

Page 13: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

� In order to design a filter to accurately control the reproduced direction,

we derive the relation equation between reproduced direction ,

wavenumber in -direction and frequency .

� If reproduced direction is constant, since it is found that is

proportional to , we design a new filter as follows

Equiangular filter

: angular frequency

:reproduced direction

: speed of sound : frequency

: angular width

: wavenumber

: equiangular filter

: wavenumber in -direction

constant

proportional

Page 14: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Result of applying the equiangular filter (1/2)

� An example when we applied a designed filter to a spectrum� This case that the angular is and the angular width is .

-30 300

20

10

0

0

-24

-48

Ma

gn

itu

de

[dB

]

Spectral overlap occurs

-30 300

20

10

0

0

-24

-48

Ma

gn

itu

de

[dB

]

� Equiangular filter used in this presentation is cut by applying a low-pass

filter with respect to the frequency that exceeds the maximum

frequency , and we do not reproduce the sound field.

Equiangular filter for the spectrum of the driving function

Spectral overlap is suppressed

Page 15: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

� By applying the equiangular filter, we enable to suppress a disturbance

of wavefront and enable to reproduce the sound field to the specific

direction.

� However, there is a problem that it is impossible to match the sweet spot

to the listener’s position if listener’s direction is unknown in advance.

Result of applying the equiangular filter (2/2)

-1.5 1.50.0

0.10

0.00

-0.10A

mp

litu

de

3.0

1.5

0.0

Spatial aliasing occurs

-1.5 1.50.0

0.10

0.00

-0.10

Am

plit

ud

e

3.0

1.5

0.0

Synthesized wavefront (unfiltered) Synthesized wavefront (filtered)

Disturbance of wavefront is suppressed

Page 16: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Summary of problems

�Problems in SDM

� A sound pressure error occurs in the case that the reference

listening distance does not match listener's distance.

� A spatial aliasing is occurred by discretization of secondary sources.

� Problems in equiangular filter

� It is impossible to match the sweet spot to the listener’s position if

listener’s direction is unknown in advance.

� These problems can be solved if we know the listener’s

position,

� therefore, introduction of the image sensor enables to solve

these problems.

Second problem can be solved by applying an equiangular filter

Page 17: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Condition of simulation experiment

34 ch linear secondary

source array (monopole source)

Primary source (monopole source)

Reference

listening line

Parameter name Parameter value

measurement plane W4.0 D4.0

aliasing frequency approximately 2019 Hz

angular width

reproduced direction

synthesis frequency 3, 5 kHz

Evaluation score

: radiation characteristic of primary sound field

: radiation characteristic of secondary sound field

� It is assumed that listener’s position is obtained by the image sensor, we calculate

the reproduced direction from sound source position and listener's position.

Page 18: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Results of simulation experiment

-1.5 1.50.0

0.10

0.00

-0.10

Am

plit

ud

e2.0

1.0

0.0

-1.0-1.5 1.50.0

0.10

0.00

-0.10

Am

plit

ud

e2.0

1.0

0.0

-1.0

-1.5 1.50.0

0

-24

-48

2.0

1.0

0.0

-1.0-1.5 1.50.0

0

-24

-48

2.0

1.0

0.0

-1.0

:::: Listener:::: Primary source

Synthesized wavefront (3 kHz)

Evaluated value (3 kHz)

Synthesized wavefront (5 kHz)

Evaluated value (5 kHz)

Page 19: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

-1.5 1.50.0

0.10

0.00

-0.10

Am

plit

ud

e2.0

1.0

0.0

-1.0-1.5 1.50.0

0.10

0.00

-0.10

Am

plit

ud

e2.0

1.0

0.0

-1.0

-1.5 1.50.0

0

-24

-48

2.0

1.0

0.0

-1.0-1.5 1.50.0

0

-24

-48

2.0

1.0

0.0

-1.0

Synthesized wavefront (3 kHz)

Evaluated value (3 kHz)

Synthesized wavefront (5 kHz)

Evaluated value (5 kHz)

Results of simulation experiment

The sound field is correctly reproduced

at listener’s direction regardless of the frequency.:::: Listener:::: Primary source

Page 20: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Condition of subjective assessment on directional perception

34 ch linear

loudspeaker array

Reference

listening line

Acoustic transparent

curtain

Pos 1Pos 2

Pos 3

Loudspeaker

distance

:::: Primary source:::: Answer number card

parameter name parameter value

sampling frequency 48 kHz

quantization bit rate 16 bit

test sound white Gaussian noise with 3 seconds

aliasing frequency approximately 2019 Hz

angular width

sound source direction

number of evaluator 7

type of sound source ・sound source without bandwidth limitation

(Conventional1)

・sound source with bandwidth limitation in

frequencies under 2 kHz (Conventional2)

・sound source in which we applied the

equiangular filter(Proposed)

: number of evaluator

: true source direction

: answered direction

Evaluation score

� We asked evaluators to answer which card position you perceive the sound

source exists as an evaluation procedure.

Page 21: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Results of subjective assessment on directional perception

Conventional2 (with bandwidth limitation in frequencies under 2 kHz)

Proposed (in which we applied the equiangular filter)

� Proposed is superior to Conventional1 and Conventional2 in Pos1 and Pos2.

However, Proposed is almost the same as Conventional2 in Pos3.� This is because in equiangular filter, as the angle of reproduced direction becomes

larger, the maximum frequency becomes low.

� As the user moves to right (from Pos1 to Pos3), directional perception error of

Conventional1 becomes larger owing to the effect of a spatial aliasing.

The superiority of the proposed method is shown on directional perception.

(a) In Pos1 (b) In Pos2 (c) In Pos3

Good

Bad

Conventional1 (without bandwidth limitation)

Page 22: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

Condition of subjective assessment on sound quality

34 ch linear

loudspeaker array

parameter name parameter value

sampling frequency 48 kHz

quantization bit rate 16 bit

test sound White Gaussian noise with 3 seconds

aliasing frequency approximately 2019 Hz

angular width

sound source direction

number of evaluator 7

type of sound source ・sound source without bandwidth

limitation (Conventional1)

・sound source with bandwidth limitation

in frequencies under 2 kHz

(Conventional2)

sound source in which we applied the

equiangular filter(Proposed)

Reference

listening line

Acoustic transparent

curtain

Pos 1Pos 2

Pos 3

Loudspeaker

distance

:::: Primary source

:::: Reference loudspeaker

� We sounded two synthesized sound after reference sound radiated by reference

loudspeaker, and asked evaluators to answer which synthesized sound you

perceive closer to the reference sound as an evaluation procedure.

Page 23: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

ꥰꥰ

Results of subjective assessment on sound quality

� In all results, evaluators chose Conventional1 or Proposed, and didn’t

choose Conventional2.

� In all listener’s position, more evaluator chose Conventional1 than

Proposed.

(a) In Pos1 (b) In Pos2 (c) In Pos3

Conventional1 (without bandwidth limitation)

Conventional2 (with bandwidth limitation in frequencies under 2 kHz)

Proposed (in which we applied the equiangular filter)Good

Bad

� It was suggested that the effect in which high frequency region of sound is

cut is larger than the effect of spatial aliasing on sound quality.

Page 24: Robust Sound Field Reproduction against  Listener’s Movement Utilizing Image Sensor

ꥰꥰ

Conclusion

� The objective of SFR system is to reproduce the primary sound field to

another space with wide range and high accuracy as much as possible.

� Since it is difficult to reproduce the sound field with a complex system, the

SFR method utilizing simple system has been desired.

� SDM can be realized with a simple system like linear loudspeaker array.

However, to reproduce the sound field with high accuracy utilizing this

method is impossible.

� We proposed the SFR system which reproduce the sound field with high

accuracy to listener's position by estimating the listener's direction.

� As results of subjective assessment, the superiority of proposed

method is shown on directional perception.

� However, since the superiority failed to show on sound quality, it is

necessary to improve the equiangular filter that we do not apply the low-

pass filter.

Thank you for your attention!