obstruent acoustics

Obstruent Acoustics

Bonus Learning Fun!

Motor Theory, in a nutshell• The big idea:

• We perceive speech as abstract “gestures”, not sounds.

• Evidence:

1. The perceptual interpretation of speech differs radically from the acoustic organization of speech sounds

2. Speech perception is multi-modal

3. Direct (visual, tactile) information about gestures can influence/override indirect (acoustic) speech cues

4. Limited top-down access to the primary, acoustic elements of speech

Moving On…• One important lesson to take from the motor theory perspective is:

• The dynamics of speech are generally more important to perception than static acoustic cues.

• Note: visual chimerism and March Madness.

Auditory Chimeras• Speech waveform + music spectrum:

• Music waveform + speech spectrum:

frequency bands

1 2 4 8 16 32

frequency bands

1 2 4 8 16 32

Source: http://research.meei.harvard.edu/chimera/chimera_demos.html

Originals:

Auditory Chimeras• Speech1 waveform + speech2 spectrum:

• Speech2 waveform + speech1 spectrum:

frequency bands

1 2 4 6 8 16

frequency bands

1 2 4 6 8 16

Originals:

Closure Voicing• The low frequency information that passes through the stop “filter” appears as a “voicing bar” in a spectrogram.

• This acoustic information provides hardly any cues for place of articulation.

Armenian:

[bag]

Stop Transition Cues (again)• With the transition between stop closure and vowel, the perceptual task becomes much easier:

• Try the same with Peter’s productions:

• stop closures:

• with transitions:

• The moral of the story (again):

• Dynamic changes provide stronger perceptual cues to place than static acoustic information.

Release Bursts• Note: along with transitions, stops have another cue for place at their disposal.

• = release bursts

• (nasals do not have these)

• Here’s a waveform of a [p] release burst:

duration 5 msec

• What do you think the [p] burst spectrum will look like?

Burst Spectrum• [p] bursts tend to have very diffuse spectra, with energy spread across a wide range of frequencies.

• Also: [p] bursts are very weak in intensity.

• Extremely short duration of bursts requires lots of damping in the waveform.

• broader frequency range

Release Bursts• In a spectrogram:

• bilabial release bursts have a very diffuse spectrum, weakly spread across all frequencies.

[p] burst

• [p] bursts are relatively close to pure transient sounds.

Transients• A transient is:

• “a sudden pressure fluctuation that is not sustained or repeated over time.”

• An ideal transient waveform:

A Transient Spectrum• An ideal transient spectrum is perfectly flat:

Burst Filtering• The spectra of more posterior release bursts may be filtered by the cavity in front of the burst.

• Ex: [t] bursts tend to lack energy at the lowest end of the frequency scale.

• And higher frequency components are somewhat more intense.

[t] burst

Release Bursts: [k]• Velar release bursts are relatively intense.

• They also often have a strong concentration of energy in the 1500-2000 Hz range (F2/F3).

• There can often be multiple [k] release bursts.

[k] burst

Another Look• [k] bursts tend to be intense right where F2 and F3 meet in the velar pinch:

Armenian:

[bag]

Finally, Fricatives• The last type of sound we need to consider in speech is an aperiodic, continuous noise.

• (Transients are aperiodic but not continuous.)

• Ideally:

• Q: What would the spectrum of this waveform look like?

White Noise Spectrum• Technical term: White noise

• has an unlimited range of frequency components

• Analogy: white light is what you get when you combine all visible frequencies of the electromagnetic spectrum

Turbulence• We can create aperiodic noise in speech by taking advantage of the phenomenon of turbulence.

• Some handy technical terms:

• laminar flow: a fluid flowing in parallel layers, with no disruption between the layers.

• turbulent flow: a fluid flowing with chaotic property changes, including rapid variation in pressure and velocity in both space and time

• Whether or not airflow is turbulent depends on:

• the volume velocity of the fluid

• the area of the channel through which it flows

Turbulence• Turbulence is more likely with:

• a higher volume velocity

• less channel area

• All fricatives therefore require:

• a narrow constriction

• high airflow

Fricative Specs• Fricatives require great articulatory precision.

• Some data for [s] (Subtelny et al., 1972):

• alveolar constriction 1 mm

• incisor constriction 2-3 mm

• Larger constrictions result in -like sounds.

• Generally, fricatives have a cross-sectional area between 6 and 12 mm2.

• Cross-sectional areas greater than 20 mm2 result in laminar flow.

• Airflow = 330 cm3/sec for voiceless fricatives

• …and 240 cm3/sec for voiced fricatives

Turbulence Sources• For fricatives, turbulence is generated by forcing a stream of air at high velocity through either a narrow channel in the vocal tract or against an obstacle in the vocal tract.

• Channel turbulence

• produced when airflow escapes from a narrow channel and hits inert outside air

• Obstacle turbulence

• produced when airflow hits an obstacle in its path

Channel vs. Obstacle• Almost all fricatives involve an obstacle of some sort.

• General rule of thumb: obstacle turbulence is much noisier than channel turbulence

• [f] vs.

• Also: obstacle turbulence is louder, the more perpendicular the obstacle is to the airflow

• [s] vs. [x]

• [x] is a “wall fricative”

Sibilants• Alveolar, dental and post-alveolar fricatives form a special class (the sibilants) because their obstacle is the back of the upper teeth.

• This yields high intensity turbulence at high frequencies.

vs.

“shy” “thigh”

Fricative Noise• Fricative noise has some inherent spectral shaping

• …like “spectral tilt”

• Note: this is a source characteristic

• This resembles what is known as pink noise:

• Compare with white noise:

Fricative Shaping• The turbulence spectrum may be filtered by the resonating tube in front of the fricative.

• (Due to narrowness of constriction, back cavity resonances don’t really show up.)

• As usual, resonance is determined by length of the tube in front of the constriction.

• The longer the tube, the lower the “cut-off” frequency.

• A basic example:

• [s] vs.

vs.

“sigh” “shy”

[s]

Sampling Rates Revisited• Remember: Digital representations of speech can only capture frequency components up to half the sampling rate

• the Nyquist frequency

• Speech should be sampled at at least 44100 Hz

(although there is little frequency information in speech above 10,000 Hz)

• [s] has higher acoustic energy from about 3500 - 10000 Hz

• Note: telephones sample at 8000 Hz

• 44100 Hz • 8000 Hz

Further Back

[xoma]

palatal vs. velar

• In more anterior fricatives, turbulence noise is generally shaped like a vowel made at the same place of articulation.

Even Further Back• Examples from Hebrew:

At the Tail End• [h] exhibits a lot of coarticulation

• [h] is not really a “fricative”;

• it’s more like a whispered or breathy voiced vowel.

“heed” “had”

Aspirated Fricatives• Like stops, fricatives can be aspirated.

• [h] follows the supraglottal frication in the vocal tract.

• Examples from Chinese:

[tsa] [tsha]

Back at the Ranch• There is not much of a resonating filter in front of labial fricatives…

• so their spectrum is flat and diffuse

• (like bilabial stop release bursts)

• Note: labio-dentals are more intense than bilabial fricatives

• (channel vs. obstacle turbulence)

Fricative Internal Cues• The articulatory precision required by fricatives means that they are less affected by context than stops.

• It’s easy for listeners to distinguish between the various fricative places on the basis of the frication noise alone.

• Result of both filter and source differences.

• Examples:

• There is, however, one exception to the rule…

Huh?• The two most confusable consonants in the English language are [f] and .

• (Interdentals also lack a resonating filter)

obstruent acoustics

Documents

burst spectrum p bursts

p burst p bursts

t bursts

velar release bursts

p burst spectrum

bilabial release bursts

posterior release bursts

release bursts nasals