voice source characterisation
DESCRIPTION
Voice source characterisation. Gerrit Bloothooft UiL-OTS Utrecht University. Voice research. To describe and model the properties of the vocal sound source from view points of: Physiology Acoustics Perception. Importance of the voice. Speech synthesis - PowerPoint PPT PresentationTRANSCRIPT
Voice source characterisation
Gerrit Bloothooft
UiL-OTS Utrecht University
Emasters School Leuven 2002 Voice Source Characterization 2
Voice research To describe and model the
properties of the vocal sound source from view points of:– Physiology– Acoustics– Perception
Emasters School Leuven 2002 Voice Source Characterization 3
Importance of the voice• Speech synthesis
– Towards natural sounding synthesis• Speech recognition
– Using source properties in recognition• Speaker recognition/identification
– Voice source characteristics are essential• Diagnosis
– Pathologies, voice classifications
Emasters School Leuven 2002 Voice Source Characterization 4
Voice possibilitiesLimited use of voice in speech• Range of the fundamental
frequency• Vocal intensity range• Spectral variation
Emasters School Leuven 2002 Voice Source Characterization 5
Focus in this presentation
How do acoustic voice source characteristics vary as a functionof F0 and vocal intensity
Emasters School Leuven 2002 Voice Source Characterization 6
Voice profile measurementThirties: Intensity range as function of
various pitches– manual measurement
Eighties: Automatic computation ofF0 and Intensity– computer measurement– visual feedback– additional parameters
Emasters School Leuven 2002 Voice Source Characterization 7
Measurement unit
• One decibel• One semi-tone
Emasters School Leuven 2002 Voice Source Characterization 8
Measurement procedure
• Subject in front of computer screen• Microphone on head set (30 cm)• Just phonate, sing, and see the result
immediately
• Best results with recording protocol• Feed back stimulates extreme
phonations
Emasters School Leuven 2002 Voice Source Characterization 9
Fundamental frequency (Hz)
Voca
l Int
ens it
y (d
B S P
L )
Sam
ple
dens
ity
Voice profile / density
Emasters School Leuven 2002 Voice Source Characterization 10
Fundamental frequency (Hz)
Voca
l Int
ens it
y (d
B S P
L )
Sam
ple
dens
ity
Voice profile / speech area
Emasters School Leuven 2002 Voice Source Characterization 11
Acoustic voice quality parameters• Jitter
– Stability of periodicity– Asymmetry in vocal folds
• Crest factor– Max amplitude divided by average
energy– Relates to spectral slope
• Many more …
Emasters School Leuven 2002 Voice Source Characterization 12
Crest factorVo
c al I
nten
s ity
(dB
S PL )
Fundamental frequency (Hz)
Cres
t fac
tor
Emasters School Leuven 2002 Voice Source Characterization 1353
Jitter
Fundamental frequency (Hz)
Vo c
al in
t ens
ity (
dB S
PL)
regular
irregular
Emasters School Leuven 2002 Voice Source Characterization 14
Real time presentation
Screen presentation• One data point per F0-I cell
Advanced data storage [new]• Full audio signal • Full distribution of data per F0-I cell • Data for screen presentation
Emasters School Leuven 2002 Voice Source Characterization 15
Advantages
• Reusability of recordings• Statistical analysis per F0-I cell• Study of time-varying behavior
Emasters School Leuven 2002 Voice Source Characterization 16
Crest factorVo
c al I
nten
s ity
(dB
S PL )
Fundamental frequency (Hz)
Cres
t fac
tor
Emasters School Leuven 2002 Voice Source Characterization 17
Median smoothing of crest factorVo
c al I
nten
s ity
(dB
S PL )
Fundamental frequency (Hz)
Cres
t fac
tor
Crest factor median smoothed
Emasters School Leuven 2002 Voice Source Characterization 18
Vocal Registers Different movement patterns of the
vocal folds
• Pulse register (creaky voice)• Modal register• Falsetto register
Emasters School Leuven 2002 Voice Source Characterization 19
Pulse register
• Less than 50 Hz• Irregular • Long closed period
Emasters School Leuven 2002 Voice Source Characterization 20
Fundamental Frequency (Hz)
Voc
al In
t ens
ity (d
B S
PL)
Pulse register
Emasters School Leuven 2002 Voice Source Characterization 21
Modal register• “Normal” use of voice• Active role of M. Vocalis• Vocal folds thick and completely
vibrating• Wide range in F0 and intensity• Flat spectrum
Emasters School Leuven 2002 Voice Source Characterization 22
Fundamental frequency (Hz)
Voc
al In
t ens
ity (d
B S
PL)
Modal register
Emasters School Leuven 2002 Voice Source Characterization 23
Falsetto register• Higher pitches• M. Vocalis passive, tense vocal
ligaments through M.Cricothyroidus
• Edge vibration of vocal volds• Sound poor in higher harmonics (in
untrained subjects)
Emasters School Leuven 2002 Voice Source Characterization 24
Fundamental frequency (Hz)
Voc
al In
t ens
ity (d
B S
PL)
Falsetto register
Emasters School Leuven 2002 Voice Source Characterization 25
Fundamental frequency (Hz)
Voc
al In
e ns i
ty (d
B S
PL)
Register overlap
Emasters School Leuven 2002 Voice Source Characterization 26
Chest- en head voice
Refer to secundary vibratory sensations in the body
• Chest voice: loud modal register• Head voice:
– males: higher, softer modal register in overlap area with falsetto register
– women: falsetto register
Emasters School Leuven 2002 Voice Source Characterization 27
Fundamental frequency (Hz)
Voc
al In
t ens
ity (d
B S
PL)
Chest voice and Head voice
chest
head
Emasters School Leuven 2002 Voice Source Characterization 28
Registers and voice profiles
With a description using
• Iso-crest factor lines• Iso-jitter lines
Emasters School Leuven 2002 Voice Source Characterization 29
Iso-crest factor lines
4 dB
6 dB
Vo c
al In
t ens
ity (d
B S
PL)
Cre
st fa
ctor
Fundamental frequency (Hz)
Emasters School Leuven 2002 Voice Source Characterization 30
Vo c
al In
t ens
ity (d
B S
PL)
Fundamental frequency (Hz)
3 %
Jitte
r (%
)
Iso-jitter lines
Emasters School Leuven 2002 Voice Source Characterization 31
New representation• Areas defined by iso-parameter
lines– crest factor < 4 dB– crest factor > 4 dB, < 6 dB– crest factor > 6 dB– jitter < 3 %– [relative rise time < 6 %]
Emasters School Leuven 2002 Voice Source Characterization 32
Areas in the phonetogramV
o cal
Int e
nsity
(dB
SPL
)
Fundamental frequency (Hz)
Jitter > 3%, unstable
RRT < 6 %pressed-like Crest factor < 4 dB
sine-like
Emasters School Leuven 2002 Voice Source Characterization 33Fundamental frequency (Hz)
Vocal registers in the phonetogram
Falsettoupper boundary
Modallower boundary
Chest voiceboundary
Vo c
al In
t ens
ity (d
B S
PL)
Emasters School Leuven 2002 Voice Source Characterization 34
Comparison of voice profiles
Characterisation of
• Voice pathologies• Voice classifications
Reuse stored voice profiles of subjects with known voice history
Emasters School Leuven 2002 Voice Source Characterization 35
Important features• Contour has limited value
– but most research goes into that direction (norm profiles)
• Distribution of acoustical parameters across the voice profile tells much more
Emasters School Leuven 2002 Voice Source Characterization 36
• Unit for comparisonVoice profile unit defined by small range of F0 and Vocal Intensity
• Distributions of acoustic voice parameters per unitProbability density function per parameter
• ModelHidden Markov Model
We need
Emasters School Leuven 2002 Voice Source Characterization 37
IN OUT
two unconnected states per phonetogram unit
• vocal registers• start and end of phonetion
Unit model
Emasters School Leuven 2002 Voice Source Characterization 38
Speech Voice Profile
• phoneme model F0/I unit model
• not labeled labeled by F0 and I• spectral envelope acoustic voice parameters• language model unrestricted transitions
“forced alignment recognition”
Correspondences
Emasters School Leuven 2002 Voice Source Characterization 39
Crest factor distributionstraining subject 1
0
500
4 5 6 7 8 9 10 11 12 13 14 15
test subject 1
0
500
4 5 6 7 8 9 10 11 12 13 14 15
training subject 2
0
500
4 5 6 7 8 9 10 11 12 13 14 15
test subject 2
0
500
4 5 6 7 8 9 10 11 12 13 14 15
Emasters School Leuven 2002 Voice Source Characterization 40
Fundamental frequency (Hz)
Voc
al In
t ens
ity (d
B S
PL)
Dis
tinct
iven
ess
Most distinctive states
Emasters School Leuven 2002 Voice Source Characterization 41
Conclusions• Voice profiles can enhance our
understanding of vocal behaviour in a visually attractive way
• Current data storage opens a series of important research topics
• Market opportunities for “light” versions