diploma thesis modeling transient behaviour in vocal fold

58
Diploma Thesis Modeling transient behaviour in vocal fold vibration using bifurcating nonlinear ordinary differential equation systems Fritz Menzer Communication Systems Swiss Federal Institute of Technology, Lausanne fritz.menzer@epfl.ch Supervisor Responsible External Advisor Jonas Buchli Prof. Auke Jan Ijspeert Prof. David M. Howard Bio-Inspired Robotics Group Media Engineering Group Logic Systems Laboratory Department of Electronics Swiss Federal Institute of Technology, Lausanne The University of York September 14 th 2004

Upload: others

Post on 02-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Diploma Thesis Modeling transient behaviour in vocal fold

Diploma Thesis

Modeling transient behaviour in vocal fold vibration using

bifurcating nonlinear ordinary differential equation systems

Fritz MenzerCommunication Systems

Swiss Federal Institute of Technology, [email protected]

Supervisor Responsible External AdvisorJonas Buchli Prof. Auke Jan Ijspeert Prof. David M. Howard

Bio-Inspired Robotics Group Media Engineering GroupLogic Systems Laboratory Department of Electronics

Swiss Federal Institute of Technology, Lausanne The University of York

September 14th 2004

Page 2: Diploma Thesis Modeling transient behaviour in vocal fold

Abstract

Transient behaviour in the human voice is discussed, with particular reference to pitch breaks.Several nonlinear models for vocal fold vibration are proposed and discussed.

One particular pitch break where a triple period emerges out of a double period — and viceversa — is modeled with a simple third order system similar to the Rossler system. For a givenparameter value, the proposed model system has two intertwined attractors, one with a double andthe other with a triple period.

A model system more closely related to the physics of the vocal folds and the laryngograph Lxsignal is also proposed. The possibilities of producing pitch breaks with this system are discussed,with particular attention to the hypothesis that pitch breaks are due to a constriction of the airflowabove the vocal folds.

Page 3: Diploma Thesis Modeling transient behaviour in vocal fold

Contents

1 Introduction 31.1 A brief summary of speech production . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Laryngographic analysis of vocal fold vibration . . . . . . . . . . . . . . . . . . . . . 41.3 Acronyms related to voice measurement . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Qualitative behaviour of Lx signals 72.1 Lx waveforms for speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Waveform 1 (modal voice) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.2 Waveform 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.3 Waveform 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.4 Waveform 4 (sinusoidal) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.5 Waveform 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.6 Waveform 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.7 Waveform 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Onset Transients in Lx Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.1 Onset 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 Onset 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.3 Onset 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.4 Onset 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.5 Onset 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.6 Onset 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Offset Transients in Lx Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.1 Offset 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.2 Offset 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3.3 Offset 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3.4 Offset 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Bifurcations in Lx Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4.1 Period doubling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4.2 From double to triple period . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.3 Increasing period multiples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Dynamic systems as models for Lx signals 163.1 A model for Lx signals with two state variables . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Capabilities of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.2 Mechanical analogy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.3 Proof of the existence of a limit cycle in a two-dimensional system switching

between two linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 A simple model for pitch breaks to double and triple periods . . . . . . . . . . . . . 28

3.2.1 Construction of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1

Page 4: Diploma Thesis Modeling transient behaviour in vocal fold

3.2.3 Controlling the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2.4 Discussion of the nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Airflow-driven model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4 Airflow-driven model for Lx signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.4.1 Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4.3 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4.4 And what about pitch breaks? . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4 Conclusion 444.1 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

A Lx samples reference 45

B Model parameter values 46B.1 2D model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

C Colouring of the basins of attraction in section 3.2.2 47

D LF model fitting method 48

E Tools 49E.1 Lx denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

E.1.1 LxNoisereduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51E.2 Batch evaluation of ODE systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51E.3 Parameter interpolation for ODE systems . . . . . . . . . . . . . . . . . . . . . . . . 53

E.3.1 linearparam.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53E.3.2 splineparam.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

F Matlab code for airflow based Lx model 54F.1 flowlx1.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2

Page 5: Diploma Thesis Modeling transient behaviour in vocal fold

Chapter 1

Introduction

Nonlinear phenomena and chaos received an increasing interest in the field of voice research. Whilesome researchers have found evidence for chaotic behaviour in the human voice [17], others couldshow that changes in the nonlinearity of existing models can produce a variety of changes in thebehaviour that also occur in the real voice [3].

While the discussion wether the human voice is chaotic or not is complicated for a variety ofreasons1, nonlinear elements have been identified in the human voice [15], [3].

On the other hand, the human voice itself shows behaviours that strongly hint at nonlinearphenomena: aperiodicity, sudden pitch changes — so called pitch breaks — that remind of perioddoubling known from classic nonlinear systems, but also more “exotic” effects such as pitch breaksthat go from the normal period to a shorter period as well as pitch breaks where the ratio of thepitches is not integer.

Not only the voice itself lets suppose that nonlinear effects may be important for some of itsfeatures: also all but the most simple models for the human voice have nonlinearities. That thesenonlinearities are important has been shown in [2], where a bifurcation portrait for a model hasbeen made and Hopf-bifurcations, period-doubling and the emergence of a quasi-periodic torus wereobserved.

In this thesis, much interest was directed to a phenomenon that occurs relatively often inspeech: period doubling and pitch breaks to higher-order periods, in particular to a triple period,which is a non-integer pitch break when starting from an already doubled period. This case isparticularly interesting from a dynamic systems point of view because it clearly differs from theclassic period-doubling case.

Particular attention was also devoted to the development of a new model of the vocal folds thattakes into account their zipper-like movement and the fact that vocal fold tissue is not rigid. Thismodel has only three state variables, including contact area and glottal flow.

1Often the periods during which supposedly chaotic behaviour occurs are short and one has to be very careful todefine the voice producing system well (e. g. maybe it behaves chaotically because it is controlled by the brain whichitself is very likely to be chaotic).

3

Page 6: Diploma Thesis Modeling transient behaviour in vocal fold

1.1 A brief summary of speechproduction

The system that produces the human voice con-sists of three components: a power supply (thelungs), an oscillator (the vocal folds) and a res-onator (the vocal tract consisting of the larynx,pharynx, and the mouth). This is illustrated infigure 1.1 below.

The lungs provide a pressure difference withrespect to the pressure at the lips, which inducesan airflow through the vocal folds.

This airflow causes the vocal folds to oscil-late with the result that the airflow is “choppedup” at the frequency at which the vocal foldsvibrate. This abrupt, but recurring variation ofthe airflow produces a harmonic sound pressurewave called the voice source.

The vocal tract acts as a linear filter whichgives the harmonic spectrum an overall enve-lope in the frequency domain. The peaks in thisenvelope are called “formants” and play an im-portant role in the distinction of vowel sounds.

frequency

ampltiude

frequency

ampltiude

frequency

ampltiude

outputsound

vocal tract(resonator)

voice source

vibratingvocal folds(oscillator)

airstream

lungs(power supply)

Figure 1.1: Schematic representation of voiceproduction. Image adapted from [21]

The system that produces the human voiceis sometimes compared to an organ [21], as thethree components have equivalents in an organ:

the bellows of an organ have the same functionas the lungs, the tongue or mouth of a pipecreates a harmonic sound just as the vocal foldsdo and the body of the pipe shapes the spectrumof the sound in a way similar to how the vocaltract does it.

However, in the human voice, these compo-nents are much more flexible and versatile thanin an organ: in speech for example, the vocaltract is changed continuously to produce differ-ent vowels and diphthongs.

The vocal folds can produce a wide range ofdifferent behaviours, varying in spectral contentas well as regularity. An example for differencesin spectral content can be seen by comparingnormal speech to shouting (or harsh voice). Anirregular behaviour occurs when the vibrationof vocal folds show a period doubling or even aperiod tripling, which is a topic of this thesis.

1.2 Laryngographic analysis ofvocal fold vibration

Studying the behaviour of the vocal folds is nota straightforward task. Any recording of humanvoice (i.e. a recording of the air pressure wave-form made with a microphone) contains effectsdue to the vocal folds as well as the vocal tract.Attempts have been made to restore the signalcoming from the vocal folds from a voice record-ing (notably using linear predictive coding andinverse filtering), but no guarantee can be giventhat this method really undoes the effect thatthe vocal tract has.

Of course, placing a microphone inside thethroat would be a possibility to get a signal fromvery near the vocal folds, but besides not beingvery practical, the microphone would still mea-sure the sound waves reflected inside the vocaltract, such that even if the vocal folds wouldvibrate the same way all the time, changing theshape of the vocal tract would change the mea-sured signal. Furthermore the presence of a mi-crophone inside the vocal tract would probablychange its acoustic properties.

However, there are ways to obtain data fromthe vocal folds that does not take into accountthe vocal tract, such as high-speed photography(using a camera inside the vocal tract or evenusing x-rays) and electrolaryngography.

4

Page 7: Diploma Thesis Modeling transient behaviour in vocal fold

Figure 1.2: A laryngograph with the two elec-trodes (connected by a cable to “Lx. in”), con-trols for high- and lowpass filtering as well asgain control). The output signal comes fromthe connector at the bottom left (fitted with aphono jack adaptor in this picture).

The laryngograph (see figure 1.2) is a de-vice that measures the electrical conductivitybetween two electrodes placed on either sideof the neck. It produces an electrical signalclosely related to the contact area between thevocal folds. This signal is called Lx for Larynxexcitation waveform. The principle the Lx sig-nal is based on is very straightforward: as thecontact area between the vocal folds increases,the conductivity between the electrodes also in-creases (i.e. the resistance reduces). This isvery much the same way as the conductivity ofa wire increases when the cross-section of thewire is increased.

In figure 1.3 the Lx signal is compared tothe voice recorded simultaneously using a mi-crophone. It is apparent that the Lx signal ismuch more regular and simple, as it does onlytake into acount the movement of the vocal foldsand is independent of the vocal tract (i.e. it is

Figure 1.3: The Lx signal has the same funda-mental frequency as the speech signal, but ismore regular and simple.

Figure 1.4: The electrodes of the laryngographare fitted with a grounded outer ring to preventa current going superficially on the skin fromone electrode to the other.

for example impossible to identify a vowel bylooking only at an Lx signal).

Besides being more regular than the speechsignal, the Lx signal has a very clear meaning— vocal fold contact — which allows to obtainuseful information on the movement of the vocalfolds from an Lx recording. Most importantlyone can determine when the folds are closed andwhen they are open. The fraction of the timewhile they are closed is believed to be closelyrelated to the efficiency of the voice [14].

The folds close very rapidly, which is dueto the Bernoulli effect of the airflow in between

5

Page 8: Diploma Thesis Modeling transient behaviour in vocal fold

them. This results in a very steep rising edge ineach period of the Lx signal (phase I in figure1.5). The opening is much slower, making itdifficult to find the passage from the opening(III) to the open (IV) phase, but often thereis a relatively sharp bend in the waveform atabout the same amplitude value as where theclosing phase (I) started.

I

II

III

IV=OP

CP OP

time →

voca

l fol

d co

ntac

t →

Figure 1.5: The main phases in one cycle of theLx signal (adapted from [1]):I closing phaseII maximum contact closed phase (CP)III opening phaseIV open phase open phase (OP)

If the vocal folds were electrically isolatedfrom the rest of the body and in contact withthe electrodes, perfect measurements could beachieved. Unfortunately, this is not the case:even though the electrodes are fitted with agrounded outer ring to prevent a current on theskin (see figure 1.4), the fact that the vocal foldsare inside the larynx which can move relativeto the skin, can cause huge changes in conduc-tivity, for example if the person swallows. Buteven in normal speech these baseline movements— referred to as Gx (Gross larynx) — occur,often greater in amplitude than the useful anddesired signal (Lx).

Fortunately, the Gx and Lx signals are quitedifferent in frequency content, so with a high-pass filter with a cutoff frequency below the au-dible frequency range, the Gx signal may be re-moved quite effectively as can be seen in figure1.6.

0 200 400time [ms]

0 200 400time [ms]

LxGx

Lx−Gx

Figure 1.6: Top: the Lx signal as it comes fromthe laryngograph, plotted along with the Gxsignal (estimated using a 22Hz lowpass filter).Bottom: the Lx signal after removing the Gxsignal

1.3 Acronyms related to voicemeasurement

Lx Larynx excitation laryngograph out-put waveform

Sp Seech pressure waveformTx Fundamental periodic Time of

excitationFx Fundamental frequency of excitationGx Gross low frequency larynx move-

mentOP Vocal fold Open Phase timeOQ Vocal fold Open Quotient: OP

Tx · 100%CP Vocal fold Closed Phase timeCQ Vocal fold Closed Quotient: CP

Tx ·100%Adapted from [1].

6

Page 9: Diploma Thesis Modeling transient behaviour in vocal fold

Chapter 2

Qualitative behaviour of Lx signals

The goal of this chapter is to list the qualitative behaviour of Lx signals and discuss the possibilityof modeling it by means of a simple dynamic system.

In particular, a notation — specific to this report — of speech Lx waveforms is introduced. Thesame is done for onset and offset transients occurring in the Lx signal for speech.

Besides onset and offset transients, also transients within phonation periods can be observedin Lx signals. In many cases, this is a sudden change of fundamental frequency — a so calledpitch break. This behaviour resembles period doubling bifurcations which are a common feature ofnonlinear systems.

7

Page 10: Diploma Thesis Modeling transient behaviour in vocal fold

2.1 Lx waveforms for speech

In this section the periodic Lx behaviour is dis-cussed. For each type of behaviour two periodsof a typical example are shown.

2.1.1 Waveform 1 (modal voice)

This is probably the most common Lx waveformin speech. Analysis of 20 seconds of speech Lxdata have shown that this waveform occurs inabout 60% of the time where the vocal folds arevibrating. Counted together with similar wave-forms (1a, 1b and 5) they make up 88% of thevocal folds’ oscillating behaviour in the analyseddata. A typical example is seen in figure 2.1.

0 0.002 0.004 0.006 0.008 0.01 0.012−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

time [sec]

Am

plitu

de

Figure 2.1: wave1.wav Typical example ofmodal voice Lx waveform. The signal includesa very steep rise marking the beginning ofthe closed phase (at 0.002s in), followed by arounded peak and two roughly linear descendingparts (roughly at 0.003s-0.0045s and 0.0045s-0.005s in figure 2.1) and a wide negative peaksimilar to the negative half-period of a sinu-soid corresponding to the open phase (0.005s-0.009s).

In some cases, especially for high fundamen-tal frequencies, the waveform becomes smootherand the negative peak flatter (see figure 2.2).In particular the “kink” between the two de-scending parts becomes less evident. This wave-form may be close to the one corresponding tofalsetto voice.

In other cases the open phase ends at a higherlevel than it starts (see figure 2.3). The shape

0 0.002 0.004 0.006 0.008 0.01−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

time [sec]

Am

plitu

de

Figure 2.2: wave1a.wav A smoother exampleof an Lx waveform at higher fundamental fre-quency.

still looks like half a sinusoid, but rather thantaking sin(τ) on the interval [π, 2π], it looks as ifthe interval was shifted to the right by roughlyπ4 . This seems to happen particularly often atlow frequencies.

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

time [sec]

Am

plitu

de

Figure 2.3: wave1b.wav Example of Lx wave-form at lower fundamental frequency, introduc-ing a different shape of the open phase.

The waveforms may vary considerably fromone speaker to another. In figure 2.4 no sinu-soidal behaviour is apparent and all parts lookvery linear.

2.1.2 Waveform 2

This waveform occurs often when the vocal tractis not completely open, such as during the clos-

8

Page 11: Diploma Thesis Modeling transient behaviour in vocal fold

0 0.005 0.01 0.015−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

time [sec]

Am

plitu

de

Figure 2.4: wave1b3.wav Lx waveform similarto the one in figure 2.3 but from a different malespeaker

ing at the ‘d’ in “words” or the opening at the‘b’ in “base”. Therefore the amplitude and thehigh frequency content of the speech signal isdiminished when this waveform occurs. Thiswaveform is shown in figure 2.5.

The first peak could be modeled by a sinewave on the interval

[π4 , 5π

4

]while for the open

phase the interval[

π2 , 2π

]would be suitable.

0 0.005 0.01 0.015 0.02−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

time [sec]

Am

plitu

de

Figure 2.5: wave2.wav Lx waveform with amore symmetric positive peak and a descend-ing open phase. The waveform is caracterizedby a steep rising phase (at 0.003s), a sinusoidalpeak, a relatively steep, but short descendingphase (at 0.007s) and a sinusoidal phase aroundincluding a negative peak.

2.1.3 Waveform 3

A few cycles of this waveform appear often closeto offset transients, between waveform 1 and asinusoidal “tail”. However, this waveform mayappear also during the oscillation.

The waveform is caracterised by a narrowpeak emerging out of what appears to be almosta complete cycle of a sinusoid (approximate in-terval of the corresponding sine wave:

[π2 , 9π

4

]).

See figure 2.6.

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

time [sec]

Am

plitu

de

Figure 2.6: wave3.wav Lx waveform with aclosed phase emerging out of a sinusoid.

2.1.4 Waveform 4 (sinusoidal)

It sometimes happens that the Lx signal is al-most completely sinusoidal. I suppose that inthis case no complete closure of the vocal foldshappens. See figure 2.7.

2.1.5 Waveform 5

A behaviour similar to waveforms 1 and 1a iscaracterised by smooth transitions, a not verysteep rising phase and a negative peak similar tothe negative half-period of a sinusoid. See figure2.8. It is in many ways similar to waveform 1and transitions from one to the other are verycommon.

2.1.6 Waveform 6

This waveform is interesting as it raises the ques-tion of how the open and closed phases are tobe defined. See figure 2.9.

9

Page 12: Diploma Thesis Modeling transient behaviour in vocal fold

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

time [sec]

Am

plitu

de

Figure 2.7: wave4.wav Sinusoidal Lx waveform.

0 0.002 0.004 0.006 0.008 0.01−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

time [sec]

Am

plitu

de

Figure 2.8: wave5.wav Lx waveform withvery smooth transitions and sinusoidal negativepeak.

2.1.7 Waveform 7

Due to the high amount of noise it is not clear ifthe waveform is really sinusoidal or not. How-ever, the speech waveform is very close to si-nusoidal. The difference between this waveformand waveform number 4 is the high pitch andthat it persists for a relatively long time. Seefigure 2.10.

0 0.002 0.004 0.006 0.008 0.01 0.012−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

time [sec]

Am

plitu

de

Figure 2.9: wave6.wav Lx waveform with avery steep rising phase and a sinusoidal descend-ing phase.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 10−3

−0.03

−0.02

−0.01

0

0.01

0.02

0.03

0.04

time [sec]

Am

plitu

de

Figure 2.10: wave7.wav Lx waveform from a fe-male speaker at high frequency (around 430Hz).

2.2 Onset Transients in Lx Sig-nals

2.2.1 Onset 1

Often the transition from the non-oscillating stateto the steady oscillation is very fast, happeningin much less time than one period of the oscil-lation (see figure 2.11). This results also in analmost immediate onset of the Speech signal (avery short attack phase to speak in sound syn-thesis terms).

10

Page 13: Diploma Thesis Modeling transient behaviour in vocal fold

0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

time [sec]

Am

plitu

de

Figure 2.11: rest-osc 1.wav Very fast transi-tion from rest to oscillation

2.2.2 Onset 2

In some cases, the first positive peak of the os-cillation is smaller than the peaks in the steadystate and seems to be emerging out of a sinu-soidal oscillation as in waveform 3 (see figure2.12 and 2.1.3).

The corresponding speech signal has a longertransient than in the previous case. This is quitelogical since there must be a relationship be-tween the Lx signal and the airflow, which prob-ably means that a smaller peak in Lx (with aless steep rising edge) corresponds to a less pro-nounced glottal pulse.

This transient behaviour appears naturallyin the piecewise linear model.

0.005 0.01 0.015 0.02 0.025 0.03

−0.1

−0.05

0

0.05

0.1

0.15

0.2

time [sec]

Am

plitu

de

Figure 2.12: rest-osc 2.wav Transition withone small peak similar to waveform 3

2.2.3 Onset 3

Another phenomenon occuring at the beginningof the vocal fold oscillation is that the steadystate is preceeded by a few sinusoidal oscilla-tions (see figure 2.13). A possible interpreta-tion for this behaviour is that in the beginningthe vocal folds are open and when the air startsto flow through them they start to oscillate si-nusoidally at first, with increasing amplitudeas they draw energy from the airstream, untilthey start flapping together, which is the steadystate.

The corresponding speech signal shows thesinusoidal oscillation as well, leading to a verysoft-sounding initial transient.

This behaviour also occurs with the simplepiecewise linear model.

0.005 0.01 0.015 0.02 0.025 0.03

−0.3

−0.2

−0.1

0

0.1

0.2

time [sec]

Am

plitu

de

Figure 2.13: rest-osc 3.wav Transition withsome cycles of sinusoidal oscillation (waveform4)

2.2.4 Onset 4

This is one of the more bizarre transitions (seefigure 2.14). It may simply be a movement ofthe larynx between 0.01s and 0.015s, causingthe DC offset to change. However, the shorttime duration of this offset change somehow con-tradicts this assumption.

On the other hand the corresponding speechsignal does not show any trace of the broadpeak, but looks very much like the speech signalcorresponding to one of first two types of tran-sition. The peak could be due to some problemwith the Lx recording.

11

Page 14: Diploma Thesis Modeling transient behaviour in vocal fold

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

time [sec]

Am

plitu

de

Figure 2.14: rest-osc 4.wav Transition witha superimposed broad peak

2.2.5 Onset 5

In some cases the transition between the equi-librium and the oscillation exhibits far moreirregular behaviour than the oscillation itself(see figure 2.15). While during the oscillation awell-defined (though slowly drifting) period ex-ists, here we have a seemingly random assemblyof small and big, sinusoidal and nonsinusoidalpeaks.

The irregularity of the Lx signal can be foundalso in the speech signal which sounds ratherrough, compared to normal speech. This con-tradicts the assumption that it is solely due toan Lx recording problem.

2.2.6 Onset 6

Even though double period behaviour (an alter-nation of small and big peaks) seems to appearmainly in the middle of an oscillating phase, itmay happen that the double period behaviour isentered directly from the resting phase as maybe seen in figure 2.16. However, no transitionfrom rest to triple or higher periods has been ob-served (which of course may be because tripleperiods are far more seldom than double peri-ods).

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

time [sec]

Am

plitu

de

Figure 2.15: rest-osc 5.wav Irregular be-haviour with small and big peaks as well as si-nusoidal oscillation at the beginning of the os-cillation

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

time [sec]

Am

plitu

de

Figure 2.16: rest-osc 6.wav Transition di-rectly to doubled period

2.3 Offset Transients in Lx Sig-nals

2.3.1 Offset 1

While for the onset transients a very short tran-sition is the most common, for the oscillating-to-rest transition the most common behaviouris a “tail” of sinusoidal oscillation with decreas-ing amplitude. A typical example can be seenin figure 2.17.

The sinusoidal behaviour appears also in thespeech signal, but often fades away more quicklythan in the Lx signal. This may be due to thevocal tract closing at the end of a word.

12

Page 15: Diploma Thesis Modeling transient behaviour in vocal fold

0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12

−0.1

−0.05

0

0.05

0.1

0.15

time [sec]

Am

plitu

de

Figure 2.17: osc-rest 1.wav Transition witha sinusoidal fade-out

2.3.2 Offset 2

A far less common behaviour is an almost imme-diate transition from oscillation to rest as seenin figure 2.18. It is interesting to note that dur-ing the last four periods of oscillation seems tobe a transition to double period behaviour.

The speech signal for this example also endsvery abruptly. By listening to the speech, onemay suppose that here the sound was ended bystopping the airflow at the height of the phar-inx.

0.065 0.07 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

time [sec]

Am

plitu

de

Figure 2.18: osc-rest 2.wav Almost immedi-ate Transition

2.3.3 Offset 3

Similar to the irregular behaviour at the begin-ning of an oscillating phase (see 2.2.5) there may

also be irregular behaviour at the end (see figure2.19).

There is some irregularity in the correspond-ing speech signal, but as the irregular phase isvery short, it is not possible to identify it bylistening. A recording problem cannot be ex-cluded.

0 0.01 0.02 0.03 0.04 0.05−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

time [sec]

Am

plitu

de

Figure 2.19: osc-rest 3.wav Irregular be-haviour at the end of the oscillation

2.3.4 Offset 4

In some cases, the positive peaks have a ten-dency to change their shape into that of a de-creasing exponential curve (exp−αt).

The speech signal seems to react especiallyon the sharp rising edge of the Lx peaks, trig-gering a decaying oscillation for each edge. Thesound of both the Lx signal and the speech isvery rough, sounding more like a grunt than ahuman voice (if the sound is taken out of itscontext).

2.4 Bifurcations in Lx Signals

2.4.1 Period doubling

Depending on the speaker, period doubling wasfound very often. It is caracterised by one peakout of two decreasing or increasing in ampli-tude. This behaviour is commonly found inwell-studied nonlinear systems such as the Col-pitts oscillator. A typical example of perioddoubling (and going back to the single period)is found in figure 2.21.

13

Page 16: Diploma Thesis Modeling transient behaviour in vocal fold

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

time [sec]

Am

plitu

de

Figure 2.20: osc-rest 4.wav Series of long,exponential-like peaks at the end of the oscilla-tion

The sound of the period doubling is mainlyperceived as a change in fundamental frequency,both in the Lx and the speech signal.

1.5 1.55 1.6 1.65 1.7−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

time [sec]

Am

plitu

de

Figure 2.21: doubleper.wav Period doubling

2.4.2 From double to triple period

Observing period doubling in a system may letone think of the period doubling cascade. How-ever, this is not the way subharmonics are cre-ated in the vocal folds. Instead of creating lowersubharmonics in a series of f0,

f0

2 , f0

4 , f0

8 , ...,where f0 is the fundamental frequency of the os-cillation, the vocal folds are able to create sub-harmonics also a ratios that are not a power of2: f0

2 , f0

3 , f0

4 and f0

5 have been observed.

It seems as if the triple period emerges onlyout of double period behaviour (i.e. before thetriple period, there is a double period). Threedifferent types of triple periods have been ob-served: a triple length period consisting of threepeaks with increasing amplitude (figure 2.22),one small peak followed by two big peaks (figure2.23) and a series of three peaks with decreasingamplitude (figure 2.24). The increasing ampli-tude is the most common type.

In all of the above-mentioned examples theperceived sound (of both the speech and Lx)undergoes simply a change in pitch. The tripledperiod can clearly be seen in the speech signal.

0.12 0.14 0.16 0.18 0.2 0.22

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

time [sec]

Am

plitu

de

Figure 2.22: tripleper.wav Triple period, in-creasing peak height

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

time [sec]

Am

plitu

de

Figure 2.23: tripleper2.wav Triple period,one small peak followed by two big peaks

14

Page 17: Diploma Thesis Modeling transient behaviour in vocal fold

0.01 0.02 0.03 0.04 0.05 0.06

−0.1

0

0.1

0.2

0.3

0.4

time [sec]

Am

plitu

de

Figure 2.24: tripleper3.wav Two triple peri-ods with decreasing peak height, separated bya double period.

2.4.3 Increasing period multiples

Double and triple periods are not the only sub-harmonics that appear in speech Lx signals: quadru-ple and fivefold periods have also been observed.In figure 2.25 a transition from double to tripleto fivefold period is illustrated.

In this example the pitch change is not veryevident in the speech signal.

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

time [sec]

Am

plitu

de

Figure 2.25: period2 3 5.wav Double, tripleand fivefold period

15

Page 18: Diploma Thesis Modeling transient behaviour in vocal fold

Chapter 3

Dynamic systems as models for Lxsignals

In this project, different models for vocal fold vibration have been studied. These models wereconstructed having different goals in mind.

Starting from a very simple two-dimensional model that simply switches between two linearsystems and whose primary goal was to be the most simple dynamic system capable of producingLx-like signals, they range to an airflow-driven, Lx-based model of the vocal folds modeling thezipper-like opening and closing of the vocal folds, the “squeezing” of the vocal fold tissue and aconstriction in the vocal tract. In between these two extremes (in terms of complexity) lies a ratherabstract model for pitch breaks (sudden changes in fundamental frequency) as well as a simpleairflow-driven mass-and-spring model of the vocal fold movement.

In the design of these models, much emphasis was put on keeping the number of state variablesas low as possible.

16

Page 19: Diploma Thesis Modeling transient behaviour in vocal fold

3.1 A model for Lx signals withtwo state variables

The idea behind this model was to produce sig-nals having some of the qualities of Lx signalswith a system that has only two state variables:x1 representing Lx and x2 corresponding to thevelocity of the vocal folds (positive x2 meaningthe vocal folds are moving apart).

The state-space is defined to be divided intotwo regions: x1 > 0 corresponds to closed vocalfolds while x1 ≤ 0 means that the vocal foldsare open:

x1 ≤ 0 :x1 = −bx2 (3.1)x2 = d + ex1 + αx2 (3.2)

x1 > 0 :x1 = −ax2 (3.3)x2 = cx1 − βx2 (3.4)

To each half of the state-space, a linear sys-tem is associated, making the complete systema piecewise linear Filippov system, i.e. a piece-wise linear system with a discontinuous right-hand side [7].

The right half of the state-space, correspond-ing to the closed phase, follows the rules of adamped linear oscillator, which would have anequilibrium at the origin (x1 = 0, x2 = 0). How-ever, this point is not part of the right-handside.

The left half of the state-space which mod-els the open phase is a linear oscillator with anequilibrium point in (x1 = −d

e , x2 = 0) and adamping factor −α. For positive values of α thisnegative damping is the energy source of the os-cillator. By changing the value of α it is possibleto change the stability of the equilibrium pointin (−d

e , 0) from stable (α negative) to unstable(α positive), meaning that the overall systemundergoes a bifurcation creating a limit cycle(for a proof of the existence of a limit cycle ina similar system, see section 3.1.3).

The proposed separation of the state-spaceimplies the hypothesis that the open and closedphases can be separated by thresholding Lx,which is wrong in general. Nevertheless, forsome modes of vibration, notably waveforms 2

and 3, this hypothesis applies quite well (seealso sections 2.1.2 and 2.1.3).

3.1.1 Capabilities of the model

To sum it up, this model is capable of producingwaveforms that consist of two sinusoidal partsas well as producing exponentially damped (orincreasing) sinusoidal oscillations.

Despite its simplicity the proposed modelproduces qualitatively correct results in manyrespects:

• The model produces a stable oscillationfor certain parameter values and a con-stant output for others. This correspondsto (normal) phonation and adduction priorto phonation, respectively.

• The point of equilibrium is in the openphase. This corresponds to the vocal foldsbeing separated when not producing sound.

• Waveforms 1a, 2 and 3 (as seen in figures2.2, 2.5 and 2.6) as well as sinusoidal wave-forms (like in figures 2.7 and 2.10) can bemodeled quite accurately. With a minormodification of the system, also waveformin figure 2.8 can be approximated reason-ably well.

• Offset transient 1 (showing a “tail” of si-nusoidal oscillation) is inherent to the model.

• Transients similar to onset transients show-ing a small peak and a few sinusoidal cy-cles (see figure 2.12 and figure 2.13) arenatural behaviours of the model.

• Onset transient 1 (see figure 2.11) can beapproximated as a limit case where thespeed of the transition is so fast that thesinusoidal behaviour does not appear.

However, due to its simplicity, this modelhas some drawbacks:

• Some interesting behaviour found in Lxsignals, such as period doubling or irreg-ular oscillations cannot be produced. Inparticular, it is not possible to producechaotic behaviour with a two-dimensionalmodel [20, page 203].

17

Page 20: Diploma Thesis Modeling transient behaviour in vocal fold

• Due to the separation between closed andopen phase by means of the Lx value, it isnot possible to model waveforms for whichthe opening and closing occurs at consid-erably different values of the Lx signal1

(see figure 2.3 and figure 2.9).

• It is impossible to produce a waveformwith an opening phase having two roughlylinear parts (such as waveform 1 and 1b3in particular).

• The negative damping as the driving forceof the oscillator is not physically meaning-ful and does not represent the assumptionthat the Bernoulli effect of the airflow go-ing through the closing folds provides theenergy to the vocal folds (this means thatthe energy gain would be mainly at theend of the open phase).

Waveforms

1 2 3 4 5 6 7 8 9 10 11

x 10−3

−0.1

−0.05

0

0.05

0.1

0.15

0.2

time [sec]

originalmodel

Figure 3.1: Waveform 1a (see 2.1.1) and modeloutput. The closing phase is matched very wellwhile the minimum in the open phase occurstoo late and the opening phase is significantlybelow the recorded waveform.

Remark: the simulated waveforms have beenscaled in amplitude and time to fit the recordedwaveforms.

When simulating waveform 1a using the sys-tem in (3.1)-(3.4), the result matches the origi-

1Localising the closing and opening by choosing thesteepest positive or negative gradient of the Lx signal

−0.5 0 0.5 1 1.5 2 2.5 3 3.5 4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

x1

x 2

Figure 3.2: Limit cycle corresponding to thesimulated waveform 1a

nal in the sense that it has a similar closed quo-tient and that the open phase is very flat.

However, the shapes of the open and closedphases do not match the original. In figure 3.1,the negative peak of the open phase is around6.4ms while for the simulated waveform while itis somewhere between 5.0ms and 5.5ms for theoriginal waveform. On the other hand, the de-scending part of the closed phase peak producedby the model is significantly below the originalsignal between 3ms and 4ms.

In the state-space trajectory in figure 3.2,one can clearly see the very distinct open andclosed phases.

2 4 6 8 10 12 14 16

x 10−3

−0.05

0

0.05

0.1

0.15

0.2

time [sec]

originalmodel

Figure 3.3: Waveform 2 (see 2.1.2) and modeloutput.

The two-dimensional system is capable of

18

Page 21: Diploma Thesis Modeling transient behaviour in vocal fold

−0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

x1

x 2

Figure 3.4: Limit cycle corresponding to thesimulated waveform 2

simulating waveform 2 (see also 2.1.2) quite wellin a qualitative way. Especially the shape ofthe open phase seems to match very well. How-ever, the peak corresponding to the closed phaseseems to lack some steepness towards the end ofthe closed phase (around 4.5ms in figure 3.3).

2 4 6 8 10 12 14 16

x 10−3

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

time [sec]

originalmodel

Figure 3.5: Waveform 3 (see 2.1.3) and modeloutput. The closed phase peak is matched verywell, but the open phase deviates a little, de-spite the overall shape being reasonably wellmatched.

Waveform 3 seems to be the one which hasthe most “sinusoidal” closed phase, meaning thatthe proposed systems provides a good model forit. Also the shape of the open phase seems tomatch reasonably well, but as can be seen in fig-ure 3.5, the shape of the waveform can change

−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x 2

Figure 3.6: Limit cycle corresponding to thesimulated waveform 3

slightly from one cycle to the next. In the firstcycle, the closed phase peak starts at a slightlylower Lx value than it ends: it starts around(3ms,-0.02) and ends around (5ms,0.0). But inthe second cycle the closed phase peak startsabove zero (at about 11ms) and ends below zero(around 13ms). Such a behaviour cannot bemodeled by a system which makes the distinc-tion between open and closed phase simply bya threshold on the Lx value.

2 4 6 8 10 12

x 10−3

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

time [sec]

originalmodel

Figure 3.7: Waveform 5 (see 2.1.5) and modeloutput. The opening as well as the open phaseshow a slight mismatch.

In order to model waveform 5 it was neces-sary to slightly modify the model. As the risingpart of the closed phase peak is much steeperthan the descending part, it is necessary to use

19

Page 22: Diploma Thesis Modeling transient behaviour in vocal fold

−5 0 5 10

−6

−5

−4

−3

−2

−1

0

1

2

x1

x 2

Figure 3.8: Limit cycle corresponding to thesimulated waveform 5

a high damping factor β. This brings the state-space trajectory close to the origin, meaningthat the open phase will start out like a cosinefunction at zero, being very flat at the begin-ning and then decreasing to a minimum, muchlike in waveform 2 (see figure 3.3).

However, waveform 5 is not at all like that,as the open phase starts with a rather steepdescending slope. So the idea was to shift theequilibrium point of the linear system modelingthe open phase, such that the trajectory wouldnot have to move almost vertically when enter-ing the open phase (as in figure 3.4). So a newparameter f was introduced which denotes thex2-coordinate of the equilibrium point:

x1 = −a(x2 − f) (3.5)x2 = cx1 − β(x2 − f) (3.6)

The effect of this modification can be seen in thestate-space trajectory in figure 3.8: when enter-ing the open phase around (0,1) the trajectorydoes not change its direction as abruptly as itdoes in the other cases (in figure 3.2, figure 3.4and figure 3.6). This translates into a muchsmoother transition from closed phase to openphase in the simulated Lx signal in figure 3.7.

20

Page 23: Diploma Thesis Modeling transient behaviour in vocal fold

Offset transients

500 1000 1500 2000

−0.1

0

0.1

0.2

0.3

0.4

0.5

time [samples]

Figure 3.9: Typical simulated final transientwhere the closed phase peaks suddenly disap-pear and a sinusoidal trail follows.

−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x 2

Figure 3.10: State-space trajectory correspond-ing to the signal in figure 3.9

When changing the stability of the equilib-rium in

(−de , 0

), the system undergoes a bifur-

cation that creates a limit cycle. So by startingfrom a set of parameters where the system has astable limit cycle and decreasing the bifurcationparameter α until the system has only one sta-ble equilibrium it is possible to produce offsettransients, as can be seen in figure 3.9.

At a certain point the trajectory will stayin the open phase (see figure 3.10), leading toan exponentially damped sinusoidal oscillation(after sample 1000 in figure 3.9).

This behaviour is very common and describedas oscillation-to-rest transition 1 in section 2.3.1.

Onset transients

200 400 600 800 1000 1200 1400 1600 1800

−0.1

0

0.1

0.2

0.3

0.4

0.5

time [samples]

Figure 3.11: Typical simulated onset transient

−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x 2

Figure 3.12: State-space trajectory correspond-ing to the signal in figure 3.11

Analogously to the final transient describedabove, it is possible to produce initial transientsby changing the stability of the equilibrium fromstable to unstable. The resulting initial tran-sient can be seen in figure 3.11 and is similarto the rest-oscillation transitions 2 and 3 in thesense that it has some sinusoidal oscillation aswell as a small peak (see also sections 2.2.2 and2.2.3).

However, this behaviour does not match themost common initial transient (see section 2.2.1)

21

Page 24: Diploma Thesis Modeling transient behaviour in vocal fold

which involves a very fast transition from a non-oscillating signal to a stable oscillation. By choos-ing very high values for α and β it is possibleto approximate such a fast transition (see figure3.13).

200 400 600 800 1000 1200 1400 1600 1800

−1

0

1

2

3

4

5

6

time [samples]

Figure 3.13: Typical simulated initial transient

−1 0 1 2 3 4 5 6 7

−10

−8

−6

−4

−2

0

2

x1

x 2

Figure 3.14: State-space trajectory correspond-ing to the signal in figure 3.13

3.1.2 Mechanical analogy

As the vocal folds are elastic and have a mass,the idea of relating them to a mass-spring sys-tem is not too far-fetched [5]. However, this“mechanical” model is unusual in the respectthat the position of the mass does not repre-sent the position of the vocal folds, but the Lxamplitude value.

w

m

k2

η2w

η1

1

2

y1

k1

Figure 3.15: Mechanical equivalent of the two-dimensional vocal fold model.

Figure 3.15 shows a mechanical system thatis equivalent to the system described by equa-tions (3.1)-(3.4). “1” is a kind of “bumper”that holds the spring k1 in place when thatspring is uncoupled from the mass as soon themassless plate “2” touches the right-hand wall.The equations of this system are given in (3.7)-(3.10).

y1 ≤ 0 :y1 = −y2 (3.7)

y2 =k1

m(y1 + w)− η1

my2 (3.8)

y1 > 0 :y1 = −y2 (3.9)

y2 =k2

my1 −

η1 + η2

my2 (3.10)

The relationship between the state variables ofthe original system and the mechanical equiva-lent are given in (3.11) and (3.12)

x1 =

{ay1 y1 > 0by1 y1 ≤ 0

(3.11)

x2 = y2 (3.12)

Comparing the equations of the model sys-tem (3.1)-(3.4) to (3.7)-(3.10) while consider-ing (3.11) and (3.12), the following relations be-tween the parameters a, b, c, d, e, α and β ofthe original model to the parameters k1, η1, k2,η2, w and m of the mechanical equivalent:

ac =k2

m

eb =k1

m

d =k1w

m

22

Page 25: Diploma Thesis Modeling transient behaviour in vocal fold

α = −η1

m(3.13)

β =η1 + η2

m

One thing that is “unphysical” is that thedamping factor η1 is negative for positive α, ascan be seen from equation (3.13). This meansthat the dashpot η1 actually produces energy.This was one of the main motivations for goingto a more physically correct model.

3.1.3 Proof of the existence of a limitcycle in a two-dimensional sys-tem switching between two lin-ear systems

Intuitively it is easy to understand why there ex-ists a limit cycle in the piecewise linear system(3.1)-(3.4) when α > 0 and β � α. If a trajec-tory starts close to the equilibrium point in theleft half of the state-space, it will spiral awayfrom the equilibrium point due to α > 0. Butif it starts very far away from the equilibrium(outside the limit cycle), it will at some pointreach the right half of the state-space, which,due to β � α, will bring it faster close to theorigin than the left half will be able to bring itaway from it. This can be seen in figure 3.16.

−0.2 0 0.2 0.4 0.6 0.8 1 1.2

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

x1

x 2

Figure 3.16: Two state-space trajectory con-verging on the limit cycle. The blue trajectorybegins close to the equilibrium point while thegreen trajectory begins outside the limit cycle.

To formally prove the existence of a limitcycle, a slightly different system with only theessential two parameters, α and β, is used. De-spite the simplification, this system has a limit

cycle that occurs in the same way as in (3.1)-(3.4). The reason for using this system is thatthe exact solutions of its two subsystems aremuch simpler.

x1 ≤ 0 :x1 = α(x1 + 1)− x2 (3.14)x2 = 1 + x1 + αx2 (3.15)

x1 > 0 :x1 = −βx1 − x2 (3.16)x2 = x1 − βx2 (3.17)

The following hypothesis on α and β is as-sumed:

αβ < 1 (3.18)

One particular solution of the left subsystemdefined by equations (3.14) and (3.15) is

x1(t) = eαt sin t− 1 (3.19)x2(t) = −eαt cos t (3.20)

as can be easily verified by calculating the deriva-tives of (3.19) and (3.20):

x1(t) = α eαt sin t︸ ︷︷ ︸x1(t)+1

+ eαt cos t︸ ︷︷ ︸−x2(t)

(3.21)

x2(t) = −αeαt cos t︸ ︷︷ ︸αx2(t)

+ eαt sin t︸ ︷︷ ︸x1(t)+1

(3.22)

But as the system is linear, the particular solu-tion multiplied by a constant is still a solution:

x1(t) = C1eαt sin t− 1 (3.23)

x2(t) = −C1eαt cos t (3.24)

For the right subsystem, it can be shown inthe same way that

x1(t) = C2e−βt sin t (3.25)

x2(t) = −C2e−βt cos t (3.26)

is a solution of (3.16) and (3.17).The idea behind this proof is to find a state-

space region which is trapping, i.e. from whichno trajectory escapes (but that my be enteredby trajectories) and that does not contain anequilibrium point. According to the Bendixson-Poincare theorem this proves that inside thisregion must be a limit cycle.

The boundary of such a region can be con-structed from two parts of trajectories and two

23

Page 26: Diploma Thesis Modeling transient behaviour in vocal fold

−3 −2 −1 0 1

−2

−1

0

1

x1

x 2

s0

−s1−s

2

s3

Figure 3.17: Quiver plot representation of thesystem defined in equations (3.14)-(3.17) withα = 0.1, β = 0.5. The state-space region insidethe red, green and blue curves contains an at-tractor because trajectories may only enter butnot leave this region.

segments on the x2 axis. This is illustrated infigure 3.17, where the green and blue curvesare parts of two different trajectories and thered lines lie on the x2 axis, on the intervals[−s2,−s1] and [s3, s0]. As trajectories do notcross, no trajectory enters or leaves the regionthrough the blue or green curves. So, to fulfillthe constraint that no trajectory leaves the re-gion, it is sufficient to ensure that trajectoriesdo not leave it through or along the red lines.

So, let’s have a look at the sign of x1 justleft and right to the x2 axis. From (3.14) followsthat

limx1→0−

x1 > 0 x2 < α

limx1→0−

x1 < 0 x2 > α

While (3.16) leads to

limx1→0+

x1 > 0 x2 < 0

limx1→0+

x1 < 0 x2 > 0

The signs of x1 and x2 are the same except forx2 ∈ [0, α].

This means that there is a sliding region onthe x2 axis between 0 and α. Trajectories maypass the x2 axis from left to right if x2 < 0

and from right to left if x2 > α, but when atrajectory hits the x2 axis between 0 and α, itwill slide up on it until x2 = α and then leaveinto the left half of the state space.

Following Utkins equivalent control methodas described in [7], the trajectory on the slidingregion is governed by

x2 =(α + β)x2

2 + (1− αβ)x2

α(3.27)

so under the hypothesis in (3.18), x2 > 0 forx2 > 0. This means that if a trajectory hits thesliding region above the origin, it will not getstuck in an equilibrium.

So, now it is possible to formulate the re-quirements on s0 to s3 such that no trajectoriesescape from the trapping region through the twointervals on the x2 axis. In order to have theconstraint fulfilled for [−s1,−s2], it is necessarythat

s2 ≥ s1 > 0 (3.28)

while for the interval [s3, s0] the constraints are

s0 ≥ s3 > 0 (3.29)

ands0 > α (3.30)

where (3.30) is necessary in order to preventtrajectories from sliding out of the region alongthe x2 axis.

Now the boundaries of the trapping regioncan be constructed. The green curve follows atrajectory in the left part of the state space from(0, s0) to (0,−s1). By inserting the coordinatesof the starting point into the solutions in (3.23)and (3.24), one can determine t0 and C1 suchthat the trajectory starts at (0, s0) in t0:

C1eαt0 sin t0 − 1 = 0

−C1eαt0 cos t0 = s0

⇒ tan(t0) = − 1s0

⇒ t0 = arctan− 1s0

+ kπ

where k is an integer. As (x1(t0), x2(t0)) shouldbe in the first quadrant, k should be odd. Sok = 1 is chosen and t0 becomes

t0 = arctan− 1s0

+ π (3.31)

24

Page 27: Diploma Thesis Modeling transient behaviour in vocal fold

The same way C1 may be determined:

C1 =1

eαt0 sin t0

=1

(arctan− 1

s0+π

)sin(arctan− 1

s0+ π

)C1 =

s0

√1s20

+ 1

(arctan− 1

s0+π

) (3.32)

It is unfortunately not possible to determines1 analytically as it involves solving an equationof the type et sin t = c for t. However, the dis-tance from (−1, 0) of the trajectory constantlyincreases, which can be shown by expressing(3.23) and (3.24) in polar coordinates around(−1, 0):

x1 = 1 + r sinφ

x2 = −r cos φ

⇒ r = C1eαt (3.33)

⇒ φ = t (3.34)

So, s1 is guaranteed to be bigger than s0, whichfulfills the right part of (3.28). In order to find as2 that fulfills the left side of that constraint, itis possible to define s2 as an upper bound of s1

by taking the maximum of the absolute value ofthe x2 coordinate of the trajectory. By settingx2 = 0 in (3.22) the time tmax when x2 has anextremum can be determined:

0 = eαtmax(−α cos tmax + sin tmax)⇒ tan tmax = α

⇒ tmax = arctanα + kπ

where k is again an integer which should bechosen carefully in order to pick the right ex-tremum. As it happens the correct choice isk = 2: k = 0 would be in the right quadrant,but then tmax would be less than t0 which meansthat an extremum that occurs before the trajec-tory hits (0, s0) would be chosen.

tmax = arctanα + 2π (3.35)

From (3.35) s2 can be determined:

s2 = −x2(tmax)= C1e

αtmax cos tmax

= C1eα(arctan α+2π) cos(arctanα + 2π)

=C1e

α(arctan α+2π)

√α2 + 1

(3.36)

Given s2, determining the trajectory in theright half of the state-space is relatively easy:considering (3.25) and (3.26) one can see thatthe trajectory starts at t = 0 and that C2 = s2.The end of the trajectory is at t = π and thatleads to

s3 = s2e−βπ (3.37)

By introducing (3.32) and (3.36) into (3.37),s3 can be expressed in function of s0 which al-lows to verify the left part of constraint (3.29):

s3 =C1e

−βπeα(arctan α+2π)

√α2 + 1

=s0

√1s20

+ 1e−βπeα(arctan α+2π)

(arctan− 1

s0+π

)√

α2 + 1

=s0

√1s20

+ 1√

α2 + 1eα(arctan α+2π)−βπ

(arctan −1

s0+π

)s3 =

s0

√1s20

+ 1√

α2 + 1eα

(arctan α−arctan −1

s0

)+(α−β)π

(3.38)

So, to fulfill the constraint s0 ≥ s3, β mustbe chosen depending on α with the followingequation:

eβπ ≥

√1s20

+ 1√

α2 + 1eα

(arctan α−arctan −1

s0

)+απ

(3.39)which can also be expressed as follows:

β ≥ ln

1s20

+ 1√

α2 + 1

+αarctanα− arctan −1

s0

π+α

(3.40)which can be further simplified if s0 tends to∞:

β ≥ ln(

1√α2 + 1

)+ α

arctanα

π+ α (3.41)

and with the upper bound for the logarithm(ln(x) ≤ x − 1) and (arctan(x) < π) the fol-lowing result can be obtained:

β ≥ 1√α2 + 1

− 1 + 2α (3.42)

which, finally, due to the fact that 1/(α2+1) < 1can be simplified to be

β ≥ 2α (3.43)

25

Page 28: Diploma Thesis Modeling transient behaviour in vocal fold

So, it is proven that for the system in (3.14)to (3.17) with α > 0, β ≥ 2α and αβ < 1 it ispossible to find a trapping region (by taking s0

sufficiently large).The condition that needs to be fulfilled to

prove the existence of a limit cycle is that thetrapping region may not contain an equilibrium.Two points currently violate this condition. Thefirst one — (−1, 0) — is rather obvious as itfollows directly from the the equations govern-ing the right half of the state-space (3.14) and(3.15). Its stability can be determined very eas-ily by calculating the eigenvalues of the systemon the left:

det

[α− λ −11 α− λ

]= 0

⇒ (α− λ)2 = −1⇒ λ = α± i

As it is assumed that α > 0, this equilibriumpoint will always be unstable.

The other point is less evident as it lies onthe sliding region. Equation (3.27) shows that— under the hypothesis (3.18) — the point (0, 0)is an equilibrium and suggests that this equilib-rium is unstable, as

dx2

dx2

∣∣∣∣x2=0

> 0

However, there is a trajectory in the left halfof the state-space that arrives on exactly thatpoint (0, 0) in a non-infinite time. This point isdegenerate.

This does not affect the proof of the exis-tence of a limit cycle because it is possible toremove both equilibria — and even the wholesliding region — from the trapping region. Theprinciple is very simple: very much the sameway as the outer boundary of the trapping re-gion was constructed, one can construct an in-ner boundary that encloses the two equilibriaand through which trajectories may only passfrom the inside to the outside (see figure 3.18).This means that the region between the outerand the inner boundaries still is a trapping re-gion.

As with the outer boundary of the trap-ping region, the inner boundary consists of twoparts of trajectories — one for each half-plane

−3 −2 −1 0 1

−2

−1

0

1

x1

x 2

s4

−s5

−s6

s7

Figure 3.18: The state-space region outside thecyan, magenta and yellow curves is never left bya trajectory. α = 0.1, β = 0.5

— which are joined by two straight lines on thex2 axis. The first trajectory goes from a point(0, s4) to the point (0,−s5), while the secondtrajectory goes from (0,−s6) to (0, s7). In or-der to avoid the sliding region and to ensurethat trajectories may only cross from the in-side to the outside, following constraints mustbe met:

s4 ≥ α (3.44)

s5 > s6 (3.45)

s7 > s4 (3.46)

where (3.44) is there to avoid the sliding regionand (3.45), (3.46) ensure that trajectories passthe straight line segments in the right direction.

To start with, one can define s4 = α, whichmeets constraint (3.44). From this follows, us-ing the same calculations as the ones that leadto (3.31) and (3.32), that

x1(t) = eαt sin t− 1x2(t) = −eαt cos t

with

t0 = arctan−1α

+ π (3.47)

C1 =1

eαt0 sin t0(3.48)

is a trajectory that starts in (0, α) at the timet0.

26

Page 29: Diploma Thesis Modeling transient behaviour in vocal fold

Furthermore, at time t = 2π, this trajectorypasses through the point (−1,−s), where

s = C1eα2π cos 2π = C1e

α2π

So without knowing the exact value of s5, itis possible to find an s6 that complies with con-straint (3.45): because the trajectory has theproperty that its distance from the point (−1, 0)constantly increases, by taking s6 such that thedistance between (0,−s6) and (−1, 0) is s, onecan assure that s6 < s5, as shown in figure 3.19.

s6 =√

s2 − 1 (3.49)

1

ss

^s6=(s2−1)1/2

(−1,0) s4=α

0

−s5

−s6

Figure 3.19: By taking s6 such that the dis-tance between (−1, 0) and (0,−s6) is s, one canguarantee that s6 < s5

As the time to complete the trajectory from(0,−s6) to (0, s7) is exactly π, the relation be-tween s6 and s7 is simple:

s7 = s6e−βπ

Now, that the constraints (3.44) and (3.45)have been fulfilled by construction, what re-mains is to check whether the last condition —s4 < s7 — holds. The complete expressions fors7 is rather complicated:

s7 = e−βπ

√e4απ (1 + α2)

e2α(arctan −1α

+π)− 1

However, it is easy to verify numerically thats7 < s4 = α. For this, actual values for α and

β are chosen, in compliance with the previouslydetermined constraints α > 0, β ≥ 2α and αβ <1: α = 0.1 and β = 0.5 which allows to calculate

s7 ≈ 0.2580 > s4 = α = 0.1

This means that there is a trapping regionwithout equilibria or sliding regions and there-fore the Bendixson-Poincare theorem may beapplied (see also [19]), meaning that there mustbe a limit cycle in this region. 2

So much for the theory. That the trappingregions actually works in practice can be seenin figure 3.20.

−3 −2 −1 0 1

−2

−1

0

1

x1

x 2

Figure 3.20: A trajectory (black) gets trappedin the trapping region and converges to the limitcycle. α = 0.1, β = 0.5

27

Page 30: Diploma Thesis Modeling transient behaviour in vocal fold

3.2 A simple model for pitchbreaks to double and tripleperiods

Subharmonic pitch breaks are interesting in thiscontext for several reasons. On one hand, de-pending on the speaker they can occur relativelyoften in natural speech. On the other hand, pe-riod doubling is one of the most studied andwell-known phenomena related to nonlinear sys-tems. Examples of subharmonic pitch breakscan be found in sections 2.4.1 and 2.4.2.

The “classic” period doubling scenario —observed in systems like the logistic map or theRossler system — is that changing a parame-ter of the system has the following effect on theoutput signal: at certain parameter values theperiod of the output signal doubles, meaningthat the sequence of the periods is

{T, 2T, 4T, 8T, · · · , 2iT, · · ·}

where T is the initial period of the output signal.This phenomenon has been extensively stud-

ied, leading to some famous results. One ofthem is that the ratio of succesive parametervalue intervals between period doublings has alimit when the number of period doublings goesto infinity. This limit is called the Feigenbaumconstant and is universal, i.e. it does not de-pend on the system in question [20]. Its valueis about 4.669, which has also been confirmedwith “real life” systems such as nonlinear elec-tronic circuits and even fluid convection.

From this result follows the so-called “sub-harmonic route to chaos”: as the period dou-blings become more and more frequent, for afinite parameter value an infinite periodicity isreached. From this point onwards, chaos is present.Usually within the parameter region where chaoticbehaviour occurs, periodic “windows” are found,i.e. some parameter intervals where the outputsignal is periodic.

Another interesting result is that for a cer-tain class of systems2 the occurence of periodicwindows follows a fixed sequence. If only peri-ods up to 6 are considered, then the sequence of

2systems that may be related to a unimodal map ofthe form xn+1 = rf(xn) where the term unimodal meansthat f must be a smooth and concave function with asingle maximum [20, 370-372].

periodic windows after the first period-doublingcascade is as follows [20, 370-372]:

6, 5, 3, 2 · 3, 5, 6, 4, 6, 5, 6

Unfortunately the observations of vocal foldvibation and the described results from dynamicsystems theory do not match very well. Theobserved Lx signals show double and triple pe-riods relatively often, but no period-doublingcascade.

The model that is presented here gives anexplanation of how the observed behaviour mayrelate to nonlinear dynamics. It also shows thata triple period may be observed despite the factthat no period doubling cascade nor aperiodicbehaviour is observed. Furthermore this modelis — very loosely — based on the assumptionthat a constriction of the airflow is the cause forthe pitch breaks.

3.2.1 Construction of the model

Just like the Rossler system, the proposed sys-tem is based on a harmonic oscillator. Theequation system in (3.50) is a linear oscillatorwhose stability is controlled by the parameterα.

x1 = −x2 +αx1

x2 = x1 +αx2(3.50)

The parameter α is replaced by a third variablex3 that depends on itself and the other vari-ables.

The idea was to model the vibration of thevocal folds with the linear oscillator in (3.50)and the transglottal air pressure difference thatdrives the vocal folds with the variable x3: ifx3 greater than zero, the oscillator’s amplitudeincreases and if x3 is smaller than zero, the am-plitude decreases. A positive value of x3 wouldtherefore correspond to a large pressure differ-ence and a negative value of x3 to a small —but still positive — pressure difference.

Of course this analogy is very crude andshould not be taken seriously as a vocal foldmodel. However, one of the stunning aspectsof dynamic systems theory is that very sim-ple systems can give qualitative explanations forphenomena occuring in much more complicatedsystems. An example is the similarity betweenthe logistic map and the Lorenz map of the

28

Page 31: Diploma Thesis Modeling transient behaviour in vocal fold

Rossler system [20, 376-379], where the logisticmap — a very simple, first order discrete-timesystem — can explain the period doubling ofthe Rossler system — a third order continuoustime system.

The appearance of a double period in vo-cal fold vibration due to a constriction of theairflow was supposed to work as follows: if thepressure difference is high, the vocal folds oscil-late at a large amplitude, letting pass more air.This reduces the pressure difference, resultingin a smaller amplitude in the next cycle, whichmeans that less air passes and the pressure dif-ference is allowed to build up again.

This idea led to two “design decisions” forthe expression that controls x3. On one hand,it should include a term that makes x3 tend toa fixed, positive value, independently of x1 andx2. On the other hand, the term that dependson x1 and x2 should be emphasized during halfof each cycle of the oscillation of x1 and x2. Thisis to reflect the fact that in the vocal folds, theair passes only when the folds are open, whichis roughly one half of the time and only onceper oscillation period.

Several different systems were designed andtheir capabilities to produce double and tripleperiods were examined. The system that pro-duced the most interesting results and that willbe further discussed here is the following:

x1 = −x2 + x3x1

x2 = x1 + x3x2

x3 = b(a− x3) + c(arctan(x1)− π)(x21 − x2

2)(3.51)

The parameters a and b are fixed to 0.2 whilethe parameter c is used to control the system.The evolution of the variable x1 is also referredto as being the “output signal” of the system.

The system has a structure similar to theRossler system which can be seen in (3.52) —due to the construction based on a harmonicoscillator.

x1 = −x2 −x3

x2 = x1 +ax2

x3 = b +x3(x1 − c)(3.52)

3.2.2 Analysis

The first analysis carried out on this system wasto draw and orbit diagram. This was done by

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

x 1

c

Figure 3.21: Orbit diagram of the system (3.51)for the parameter range c ∈ [0.1, 1]. The “stan-dard” period doubling cascade (from right toleft) seems to be interrupted around c = 0.5.

simulating the system for different values of theparameter c and plotting the maxima of x1 infunction of c. The initial conditions were chosenat random in order to see the greatest varietyof behaviours.

The orbit diagram allows to see where aperiod doubling or a transition to chaotic be-haviour happen: before the period doubling allmaxima are of the same height and in the or-bit diagram is just one line. After the perioddoubling big and small maxima alternate andtherefore two lines appear in the orbit diagram.If the signal is chaotic, the maxima are ran-domly distibuted and the orbit diagram showsan area filled with dots.

The result (as seen in figure 3.21) stronglyresembles a period doubling cascade in its over-all shape. However, around c = 0.5 it is inter-rupted by something like a “chaotic window”.

Looking at the time signals produced forc = 0.5, one can see that the signal correspond-ing to the “interruption” of the period doublingcascade has a triple period with slightly irreg-ular maxima. Besides, the double period be-haviour can still be found for c = 0.5. Depend-ing on the initial conditions, the system showsone or the other behaviour (see figure 3.22).

For c = 0.5, the system has two attractors,one for the double and one for the triple period.The double period is supposed to be a limit cy-cle, as the maxima are precisely at two different

29

Page 32: Diploma Thesis Modeling transient behaviour in vocal fold

0 50 100 150 200 250−1

−0.5

0

0.5

1

1.5

x 1

x1(t

0)=−0.1277

0 50 100 150 200 250−1

−0.5

0

0.5

1

1.5

x 1

t

x1(t

0)=−0.128

Figure 3.22: Evolution of the state variablex1 for different intial conditions. For x1 =−0.1277, x2 = 0.1, x3 = 0, the system followsthe triple period attractor for a while and thenmoves to the double period attractor. If thex1 coordinate of the initial condition is changedto x1 = −0.128, the system stays on the tripleperiod attractor.

values. On the other hand, the “triple period” isonly approximately a triple period because itsmaxima are not just at three different values,but rather randomly distributed within certainlimits. Therefore the attractor does not seem tobe a limit cycle but a strange attractor.

Observations have shown that the systemhas only two attractors: a limit cycle corre-sponding to a double period and a strange at-tractor corresponding to approximately a tripleperiod.

The dependence on the initial conditions canbe shown by drawing the basins of attractionfor the two attractors. Of course, in a three-dimensional system these are subsets of the three-dimensional space, but it is possible to draw theintersection of the basins of attraction with aplane in the state space. In figure 3.23 the in-tersection of the basins of attraction with theplane x3 = 0 is shown.

The picture in figure 3.23 shows more thanjust the basins of attraction. Because the colourof each point is determined from the spectro-gram of the whole output signal (see appendixC), the speed at which the signal converges tothe attractor plays a role.

Keeping in mind that the system can seem

Figure 3.23: Basins of attraction for the dou-ble and triple period attractors, restricted tothe plane x3 = 0. Trajectories starting in bluepoints are attracted to the strange attractor,those starting in green points will eventually fol-low the double period limit cycle and the darkgreen shade indicates following an unstable sin-gle period orbit before being attracted to thedouble period limit cycle.

Figure 3.24: Zooming in on the center of figure3.23 shows a complex spiraling pattern whichmay be a fractal.

to stay on the strange attractor for a while andmove to the limit cycle after a while, one wouldexpect to see all different shades between green(limit cycle) and blue (strange attractor) be-tween the blue and green patches in figure 3.23.Interestingly, this is not the case — at least notin the areas where the attractors intersect theplot (see figure 3.25 and figure 3.26).

The explanation is that it takes only a minute

30

Page 33: Diploma Thesis Modeling transient behaviour in vocal fold

Figure 3.25: The limit cycle corresponding tothe double period behaviour intersecting itsbasin of attraction (green) in the plane x3 = 0.

Figure 3.26: The strange attractor correspond-ing to triple period behaviour and its basin ofattraction (blue) in the plane x3 = 0.

change in the initial conditions to change thetrajectory from staying on the strange attrac-tor to converging very early to the limit cycle.This difference in the initial condition is justmuch smaller than the resolution of the plot —where the distance between two pixels is 0.008,compared to a difference in the initial condi-tion of 0.0003 in figure 3.22. At this resolutionthe effect of trajectories seemingly following thestrange attractor before converging to the limitcycle cannot be seen. If the resolution of theplot would be increased, slightly blurred bound-aries between the blue and green areas could beseen. This blur could be removed by letting thesystem evolve longer.

The phenomenon that for higher resolutionsthe system must be allowed to evolve longer isalso common with fractals such as the Mandel-brot set. For the plot in figure 3.23 the time thesystem was given to evolve was 1000. The dura-tion of one period is 2π, so roughly 160 periodswere evaluated for each pixel.

However, there is an effect due to initialtransient behaviour. There are some slightlydarker patches and dark lines in the green basinof attraction of the limit cycle (see figure 3.24).The lineas are due to the trajectory followingclosely a single period closed orbit (see figure3.27) before being attracted to the limit cycle.The patches are due to irregular behaviour atthe beginning of the trajectory. This may be oftriple or higher period.

Figure 3.27: The single period orbit (red curve)is surrounded by the basin of attraction of thelimit cycle.

The reason why this single period orbit ex-ists and why so many trajectories follow it closelymay have something to do with the first perioddoubling (at c = 0.8 in figure 3.21). Before theperiod doubling there is a single period limitcycle, after the period doubling, there is a dou-ble period limit cycle. What may have happendis that the single period limit cycle has becomeunstable and a double period limit cycle has ap-peared next to it. This is just an assumption.However, this assumption is backed by the rel-ative position of the single period orbit and thelimit cycle. A trajectory starting near the singleperiod orbit and converging to the limit cyclelies on a Mobius band (see figure 3.29).

31

Page 34: Diploma Thesis Modeling transient behaviour in vocal fold

−0.8

−0.4

0

0.4

−0.4

0

0.4

−0.4

0

0.4

x1

x2

x 3

Figure 3.28: The limit cycle (green) and thestrange attractor (blue) are intertwined.

−0.4

0

0.4

−0.4

0

0.4

−0.2

0

0.2

x1

x2

x 3

Figure 3.29: A trajectory (red) starting near thesingle period orbit and converging the doubleperiod limit cycle (black) lies on a Mobius band(a “ribbon” with only one edge and only onesurface).

A way the period doubling could happen isbest illustrated if the period doubling is consid-ered “the wrong way round”, from a double pe-riod to a single period: the width of the Mobiusband between the unstable orbit and the doubleperiod limit cycle could simply become smallerand smaller as c increases until it is only a sin-gle period limit cycle. This can be related toa supercritical Hopf bifurcation of the seconditerate map of the Poincare section across theMobius band3.

3 Let’s, as a “Gedankenexperiment”, consider a cross-section through this Mobius band and the Poincare mapthat represents the position of the point where a trajec-

3.2.3 Controlling the model

As it is established that the proposed model iscapable of producing singele, double and tripleperiods, the question that remains is: how canthe model be controlled in order to produce thedesired behaviour?

As can be seen in section 2.4 and in figure3.32, Lx signals switch rapidly from one be-haviour to another. Common transitions arefrom a single to a double period (and vice-versa)as well as from double to triple period (and vice-versa).

The model can produce the transition be-tween the single and double period behaviourin a very straightforward way using the perioddoubling bifurcation around c = 0.8. If the pa-rameter value is above the bifurcation point, thetrajectory will eventually converge to the singleperiod cycle. On the other side of the bifurca-tion, it will converge to the double period cycle.Therefore, by changing the bifurcation param-eter, it is possible to force the system into oneor the other behaviour.

Reproducing the transition from a triple toa double period is also simple: when the systemis on the strange attractor that produces thetriple period output, one just needs to changethe parameter of the system until the strangeattractor ceases to exist. Another way wouldbe to perturb the system so that it lands on thelimit cycle.

However, there is one case where it is notpossible to force the system into the desiredstate: as there is no parameter value where only

tory hits the cross-section in function where it hit it thelast time. If the trajectory is on the limit cycle, it willalternate between one side and the other of the cross-section. If it is exactly on the periodic orbit, it wil stayon it, and will hit the cross-section always at the samepoint (the unstable equilibrium of the map). Any othertrajectory will converge to the outer egdes of the section,flipping from one side of the unstable equilibrium to theother.

Now, let’s consider the second iterate of the describedmap (i.e. just consider one cross-section “hit” out oftwo). The map will now still have an unstable equilib-rium where the periodic orbit passes, but the “flipping”edge will be split into two seperate stable equilibria.When the width of the Mobius band becomes smallerand smaller, the two stable equilibria at the edges ofthe cross-section will join the unstable equilibrium andbecome a single stable equilibrium — which is a super-critical Hopf bifurcation.

32

Page 35: Diploma Thesis Modeling transient behaviour in vocal fold

the strange attractor exists, it is not possible toforce the system on it. So, if brute force doesnot help, maybe it is still possible to persuadethe system to do what is desired...

Experiments have shown that the systemreacts strongly to short-time parameter varia-tions. In figure 3.30, the evolution of x1 is shownalong with the value of the parameter c. A shortdip of c from 0.5 to 0.3, followed by a slow in-crease back to 0.5 can bring the system fromthe limit cycle to the strange attractor.

0 50 100 150 200 250 300−1

0

1x

1c

0 50 100 150 200 250 300−1

0

1x

1c

0 50 100 150 200 250 300−1

0

1

t

x1

c

Figure 3.30: A short dip of the parameter c from0.5 to 0.3 can bring the system on the strangeattractor (bottom). The top and middle graphsshow the same parameter curve, but shifted tothe left by 4 and 2, respectively, which does notyiel the desired result.

What is interesting to note is that the timingof the parameter is important. If the “dip” isnot timed shortly before a maximum of the out-put signal, the perturbation does not succeed inbringing the system on the strange attractor.

One of the main goals of using nonlinear sys-tems to model vocal fold behaviour is that sim-ple causes can have complicated effects similarto the ones found in the real vocal folds. Infigure 3.33, an example is shown, how a simplechange in the parameter c can produce a dou-ble, triple and single period behaviour similar toone found in a real-life Lx signal (figure 3.32).

100 110 120 130 140 150−1

0

1

t

100 110 120 130 140 150−1

0

1x

1c

100 110 120 130 140 150−1

0

1x

1c

x1

c

Figure 3.31: Zoom on the parameter change infigure 3.30. The parameter “dip” must be timedon shortly before a maximum of the output sig-nal.

0.05 0.1 0.15 0.2 0.25 0.3 0.35

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

t [sec]

Figure 3.32: An Lx signal from the word“score”. The signal shows a double period upto about t = 0.1sec, followed by two cycles oftriple period behaviour and single period be-haviour from t = 0.2sec.

3.2.4 Discussion of the nonlinearity

During the analysis of the model in (3.51) itbecame apparent that the values of x1 usuallylie between -1 and 1. In this range the functionarctan(x1) in the expression for x3 is relativelyclose to x1 (see figure 3.34). So it was onlynatural to explore how the system behaves whenthis nonlineariy is linearised.

The partly linearised system is described by

33

Page 36: Diploma Thesis Modeling transient behaviour in vocal fold

0 50 100 150 200 250 300

0

0.2

0.4

0.6

0.8

1

t

x2 (scaled)

c

Figure 3.33: A signal with the same sequence ofdouble, triple and single periods as in figure 3.32can be produced with a very simple parameterchange, consisting of only four straight lines.

−1 −0.5 0 0.5 1

−4

−3.8

−3.6

−3.4

−3.2

−3

−2.8

−2.6

−2.4

−2.2

x1

arctan(x1)−π

x1−π

Figure 3.34: The term that emphasizes thenegative values of x1 in the expression for x3,arctan(x1) − π and its linearisation x1 − π onthe usual range of x1 values.

the following equations:

x1 = −x2 + x3x1

x2 = x1 + x3x2

x3 = b(a− x3) + c(x1 − π)(x21 − x2

2)(3.53)

Simulations have shown that for many ini-tial conditions the new system behaves very sim-ilarly to the original model. There is still adouble period limit cycle and trajectories oftenshow an initial transient with a triple period(see figure 3.35).

0 50 100 150 200 250 300−1

−0.5

0

0.5

1

x 1

original system

0 50 100 150 200 250 300−1

−0.5

0

0.5

1

t

x 1

arctan linearised

Figure 3.35: For many initial conditions, thebehaviour of the systems remains similar whenarctan(x1) is linearised.

However, it seems as if the triple period can-not be sustained, i.e. that the strange attrac-tor does not exist anymore in the new system(see figure 3.36). This is a very empirical result,based on simulations of the new system with ini-tial conditions from points known to belong tothe basin of attraction of the strange attractorof the old system.

0 50 100 150 200 250 300−1

−0.5

0

0.5

1

x 1

original system

0 50 100 150 200 250 300−1

−0.5

0

0.5

1

t

x 1

arctan linearised

Figure 3.36: In one regard, the linearisationof arctan(x1) qualitatively changes the sys-tem: the strange attractor seems to have dis-appeared. Some trajectories seem to follow itfor a while, but none has been observed to stayon it.

In any case, this does not impair the abilityof the new system to be a model for transientbehaviour of vocal fold vibrations: the system

34

Page 37: Diploma Thesis Modeling transient behaviour in vocal fold

is still able to produce triple periods during ashort time, but not to sustain them during along period of time. This does not contradictwith observations of vocal fold vibration, wheretriple periods have been observed only duringshort times.

I made the attempt to record Lx signals withsustained double and triple periods using myown voice and succeeded only for the doubleperiod.

There is another very interesting point aboutthis linearisation: when looking at the systemin (3.53) under the point of view that x3 con-trols the amplitude of the oscillator composedof x1 and x2 (i.e. that the nonlinearities in theexpressions for x1 and x2 are just there to en-able this control), the single nonlinear term inthe last equation becomes the essential part ofthe system: x2

1 − x22.

What is fascinating about this is that thisterm essentially converts the oscillation of x1

and x2 into an oscillation of the double fre-quency (half the period).

0 2 4 6 8 10 12

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

t

x1

x2

x3

Figure 3.37: Evolution of the state variables intime-domain. x3 seems to oscillate twice as fastas x1 and x2.

This can be seen experimentally (see figure3.37), but also analytically: suppose the oscil-lator (x1, x2) to be “decoupled” from x3 (i.e.suppose x3 = 0). A particular solution of thelinear oscillator is x1 = cos t, x2 = sin t. Whenx2

1 − x22 is evaluated for this solution, the result

is an oscillation at the double frequency:

x21 − x2

2 = cos2 t− sin2 t

0 1 2 3 4 5

spectrum of x1

0 1 2 3 4 5

spectrum of x2

0 1 2 3 4 5Frequency

spectrum of x3

Figure 3.38: Spectrum of the signals in figure3.37. x3 has a very strong harmonic at F = 2,which supports the hypothesis that its doublesthe frequency of the oscillation of x1 and x2.

= cos2 t− 1 + cos2 t

= 2 cos2 t− 1

= 2 · 12(cos(2t) + 1)− 1

= cos(2t)

Because of the interaction between the os-cillator and x3 as well as the other factors andterms in the expression for x3, the solution forx3 is not simply a sinusoid with twice the fre-quency of x1, but as can be seen in figure 3.38,the double frequency component is very strong.

What is fascinating about this is that a non-linearity that doubles the frequency of a signalis actually responsible for pitch breaks to lowerfrequencies. This may even suggest that therecould be a single mechanism that is capable ofproducing pitch breaks to higher as well as lowerfrequencies. Of course, both pitch breaks canoccur in the human voice (as an example, pitchbreaks to a higher frequency can occur whenscreaming).

35

Page 38: Diploma Thesis Modeling transient behaviour in vocal fold

3.3 Airflow-driven model

The first step towards a more physical modelof the vocal folds was to implement a one-massmodel, driven by an airflow. Having the glottalflow as a state variable also allows to derive thevoice source by taking the first derivative of theglottal flow.

One-mass models are the simplest physicalmodels of the vocal folds [5]. They consist of amass-spring-damper system which is meant tomodel one vocal fold. This system experiencesa force due to an airflow which provides energyto the system. Only one fold is modeled, asit is assumed that the vocal folds vibrate syn-chronously.

m

k2k1

η1

l

h

wi

p + ∆pw

p

η2

Figure 3.39: Physical model of the vocal folds.The mass-spring system m, k1, η1 representsthe mass and tension of the vocal folds whilea massless plated connected to the mass by aspring k2 and a damper η2 models the fact thatvocal fold tissue can be “squeezed” when thefolds touch. The line between the spring k2 andthe damper η2 represents a rope of the lengththat corresponds to the equilibrium position ofk2. It keeps the massless plate close enoughto the mass such that the spring k2 can onlybe compressed but not extended. The displace-ment of the mass from its equilibrium positionis the state variable x1.

The model described here extends the modeldescribed in [5] by a second spring (k2) and asecond damping element (η2) which is meant tomodel the “squeezing” of the vocal fold tissuethat occurs when the vocal folds touch (see fig-ure 3.39).

The force due to the glottal flow is calcu-lated using the Bernoulli effect (which relies onthe — incorrect — hypothesis that air is notcompressible). Basically it states that the pres-

sure in a fluid decreases when the speed of theflow increases.

So, when the air flows fast between the vocalfolds, it pulls them together. Therefore, in orderto have energy provided to the vocal fold vibra-tion by the glottal flow, it is necessary that theglottal flow is faster in the closing phase thanin the opening phase.

This is naturally the case as the air has a cer-tain inertia. In the model presented here, thisinertia is simply modeled by a certain “lag” be-tween the steady-state flow rate and the actualflow rate.

The pressure between the vocal folds pf is(p being the external pressure and vf = ϕ

l(w−xi)

the speed of the glottal flow):

pf = p + ∆p +ρ

2

(l(w − x1)

lwi− 1

)vf

pf = p + ∆p

+ρϕ2

2

(1

l2wi(w − x1)− 1

l2(w − x1)2

)This expression goes to infinity when the vocalfolds close (i.e. x1 = w and the flow is non-zero.

In order to avoid this problem (which causesinfinite forces), a correction term cϕ dependingon x1 is introduced, which is multiplied with theflow rate and which tends to zero sufficientlyfast in order to avoid the infinite force (see nextparagraph).

During the open phase (i.e. when x1 <w), the system is governed by following equa-tions which take into account the spring k1, thedamping element η1 and the Bernoulli effect dueto the glottal flow x3:

x1 = x2

x2 =−k1x1 − η1x2

m

+lhcϕρ

x232

(1

l2wi(w−x1)− 1

l2(w−x1)2

)m

x3 =

{r1(ϕsteady − x3) ϕsteady > x3

r2(ϕsteady − x3) otherwise

where

cϕ =

(w−x1

ww−x1

w + c/10

)3

and

ϕsteady =∆p

h

l(w − x1)3

12µ

36

Page 39: Diploma Thesis Modeling transient behaviour in vocal fold

and where ρ is the density of air and µ the vis-cosity coefficient of air.

During the closed phase, the spring k2 andthe damping element η2 must be considered,too, and the glottal flow is supposed to be zero:

x1 = x2

x2 =−k1x1 − k2(x1 − w)− (η1 + η2)x2

mx3 = −r3x3

3.3.1 Results

0 0.01 0.02 0.03−2

0

2

4x 10

−3

x1 [m]

0 0.01 0.02 0.03−2

0

2

4

x2=dx

1/dt [m/s]

0 0.01 0.02 0.030

0.5

1

1.5

t [sec]

x3=φ [m3/s]

Figure 3.40: State variables of the airflow-basedmodel during stable oscillation.

Simulations of this model produce very muchthe expected behaviour: a stable oscillation ofall state variables.

The most interesting state variable in thismodel is the glottal flow because it is relatedto what the voice sounds like. It is assumedthat the first derivative of the glottal flow cor-responds to the pressure wave produced by it,i.e. the voice source.

There exists a widely accepted model of theglottal flow derivative, named the LF model af-ter the people who developed it, Fant and Lil-jencrants. It is a purely mathematical modelwhich expresses each cycle of the glottal flowderivative E(t) as being composed of two parts:a sinusoid multiplied by an exponential as thefirst part

E(t) = E0eαt sinωgt

for 0 < t ≤ te and an exponential return phasefor the rest of the cycle (te < t ≤ tc):

E(t) =−E0

εta

(e−ε(t−te) − e−ε(tc−te)

)Here, a slightly simplified version of the LF

model is used, where for te < t ≤ tc the termthat guarantees that E(tc) = 0 is removed:

E(t) =−E0

εtae−ε(t−te)

This corresponds better to the model where inthe closed phase the glottal flow x3 is governedby the equation

x3 = −r3x3

which simply produces a decreasing exponen-tial, meaning that the glottal flow derivative isan exponential as well and will therefore neverbe zero.

−1 0 1 2 3 4 5 6 7 8

x 10−3

−0.04

−0.03

−0.02

−0.01

0

0.01

0.02

t [sec]

dφ/d

t [m

3 /s2 ]

simulatedL−F model

Figure 3.41: LF model fitted to the derivative ofone cycle of simulated glottal flow. The first ris-ing edge is straight in the LF model, but curvedin the simulated waveform. The negative peakis sharp in the LF model but rounded in thesimulation.

The LF model was also used to validate thesimulated glottal flow, which can be seen in fig-ure 3.41. The method used for the fitting is de-scribed in appendix D. Globally the result fitsreasonably well. In some points, however, thereare discrepancies: the first rising edge in thesimulated waveform is curved and not straightas in the LF model and the negative peak of the

37

Page 40: Diploma Thesis Modeling transient behaviour in vocal fold

simulated signal is by far not as sharp as the inthe LF model.

Both differences between simulation and LFmodel make that the LF model waveform hasmore energy in the high frequencies than thesimulated waveform. So when used as a voicesource for a vocal tract model, the simulatedwaveform will make the voice sound more dull.

38

Page 41: Diploma Thesis Modeling transient behaviour in vocal fold

3.4 Airflow-driven model for Lxsignals

The goal of creating this model was to com-bine the Lx signal and the glottal flow into onemodel.

This approach is very similar to the one de-scribed in [6] in many regards: Both modelstake into account the zipper-like opening andclosing of the vocal folds, have a non-rectangularshape of the glottal area and a continuouslyvarying vocal fold contact area (as opposed tomodels where the vocal fold is divided into partsthat can either be in contact or not, leading toa piecewise constant contact area).

There is one major difference though: themodel presented in [6] uses a two-mass model4

as a base, while the model presented in this re-port is a one-mass model with only three statevariables.

The mechanical model (figure 3.42) supposessymmetric oscillation and sees a vocal fold as arigid, massless bar of length l pivoting around apoint P like a hinge (corresponding to the pointwhere the folds are attached to the thyroid car-tilage5). A mass m is attached to the bar ata distance lm from P . The pivoting of the baris controlled by rotational spring and damperelements k1 and η1, respectively (not drawn infigure 3.42).

The “squeezable” part of the vocal fold tis-sue (grey area in figure 3.42) is supposed to havea width wt. Depending on the angle α betweenthe bar and the symmetry axis, the length onwhich the vocal folds are closed can be calcu-lated:

lc =wt

sin α

Therefore the length on which the folds are openis lo = l − lc and the glottal area is l2o sinα.

The distinction between open and closed phaseis made on lc. If lc < l the glottal area is nonzeroand the vocal folds are open.

During the open phase the contact area cal-culation is based on the assumption that thevertical extension of the contact area is con-stant: hcvo where h is the height of the vocalfolds (or thickness) and cvo is a constant be-

4Flanagan-Ishizaka5Also called “Adam’s apple”

������ l

w

wt

m

α

lo

lclm

P

Figure 3.42: “Hinge” mechanism of the airflow-driven Lx model. The vocal fold is modeled bya rigid bar with a mass and “squeezable tissue”(drawn in grey) attached to it. The glottal area(blue) is triangular.

tween 0 and 1 (see figure 3.43). The expressionfor the contact area is therefore

C = lchcvo

h

hcvo

Figure 3.43: The vocal fold contact area (grey)can be calculated from lc and hcvo

However, there is a different way of lookingat the contact area leading to the same mathe-matical expression: the tissue is probably moresqueezed at the point P than at a distance lcfrom P where the folds barely touch. There-fore it seems natural that the contact area istriangular. If the height of the triangle at P is

39

Page 42: Diploma Thesis Modeling transient behaviour in vocal fold

2hcvo, then the area remains the same (see fig-ure 3.44). It also implies that the cross-sectionof the vocal folds’ “squeezable” tissue is trian-gular, rather than quadrilateral.

h

2hcvo

Figure 3.44: Also the assumption that the vo-cal fold contact area is triangular leads to thesame area as the assumption that it is rectan-gular (figure 3.43) if the height of the triangleis chosen as 2hcvo.

When the folds are closed, the change of con-tact area is due to the “squeezing” of the tissue,based on the assumption that beyond a distancehcvo from the upper side of the folds, they havea linear shape (as in figure 3.43 and figure 3.44)with a steepness s.

The system that implements this model hasthree state variables: x1 is the contact area,x2 = α the angular velocity and x3 the airflow.This means that there is no state variable thatrepresents the position of the vocal fold. Thisinformation must be determined from the con-tact area.

From x1 one can easily calculate lc = x1hcvo

,which allows to determine if the vocal folds areopen or not.

If they are open (i.e. lc < l), sinα can becalculated as

sinα = hcvowt/x1 (3.54)

Replacing x1 by lchcvo in (3.54) leads to

lc sinα = wt (3.55)

Therefore, using at one point the assumptionthat α is small:

lc =wt

sinα

lc = − wt

sin2 αcos α︸ ︷︷ ︸≈1

α

= − x21

h2c2vowt

x2

Expressing x1 in function of lc, this gives theresult

x1 = lchcvo = − x21x2

hcvowt(3.56)

This is one of the nonlinearities of the sys-tem. Another one is of course due to the airflow.

In the case where the vocal folds are closed,the expression for x1 is linear and much simpler:

x1 = −l2sx2

The other parts of the equation system arevery similar to the system described in section3.3. The Matlab code for this system can befound in appendix F as it is an important sys-tem.

3.4.1 Normalisation

One of the problems that were encountered whenintegrating the system as described above withMatlab was that the ODE solver would oftenstop with an error message about tolerances thatcould not be met. The problem was that theranges of the state variables were different byseveral orders of magnitude, leading to numeri-cal problems.

The solution was to normalise the system.For this normalisation constants were needed.The approach taken here was to look for expres-sions depending only on the parameters of thesystem and giving a reasonable estimate of theexpected value of the state variable, bringingthe range as close to [0, 1] or [−1, 1] as possible.

For the contact area this was very simple:one could simply take lh to normalise x1, result-ing in a normalised state variable with a rangeof [0, 1] – in theory at least. In reality, x1 cango beyond lh when the vocal folds are squeezedvery much. But still, a range of about [0, 3] isacceptable.

Estimating the angular velocity x2 was a bitmore difficult, but a reasonable normalisationconstant has been found with

√k1m , which is the

angular velocity of a mass-spring system withmass m and spring constant k1.

For x3, the normalisation constant was cho-sen as a function of the steady-state flow for thecase where the vocal folds are in their equilib-

40

Page 43: Diploma Thesis Modeling transient behaviour in vocal fold

rium position. The constant is

∆p

h

(wlm

l)3

12µl

3.4.2 Results

All presented results are from the normalisedsystem with changing cvo as described in section3.4.3. They cover mainly the waveforms thatposed some problems with the simple 2D model(section 3.1).

0 0.01 0.02 0.03 0.04 0.050

1

2

3

x 1

0 0.01 0.02 0.03 0.04 0.05−0.1

0

0.1

x 2

0 0.01 0.02 0.03 0.04 0.050

0.05

0.1

x 3

time [sec]

Figure 3.45: x1 is similar to the waveform 1b(see section 2.1.1).

0 0.01 0.02 0.03 0.04 0.050

2

4

x 1

0 0.01 0.02 0.03 0.04 0.05−0.2

0

0.2

x 2

0 0.01 0.02 0.03 0.04 0.050

0.1

0.2

x 3

time [sec]

Figure 3.46: An extreme case with a very longclosed phase. That is the closest the model gotto the Lx waveform 6.

It is interesting to notice that the new modelautomatically produces very sudden onset tran-sients. Those in figure 3.45 and figure 3.46 come

0 0.01 0.02 0.03 0.04 0.050

0.5

1

1.5

x 1

0 0.01 0.02 0.03 0.04 0.05−0.04

−0.02

0

0.02

x 2

0 0.01 0.02 0.03 0.04 0.050

0.1

0.2

x 3

time [sec]

Figure 3.47: A reasonable waveform similar towaveform 1a and 5

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.080

0.5

1

1.5

x 1

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08−0.04

−0.02

0

0.02

x 2

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.080

0.1

0.2

x 3

time [sec]

Figure 3.48: A bizarre waveform with a steepfalling edge. This waveform does not seem tooccur in real speech.

very close to the onset transient described in2.2.1, while the Lx signal in figure 3.47 is verysimilar to the onset transient in 2.2.2.

3.4.3 Improvements

Changing cvo

One of the improvements made to the modelwas to allow a change of cvo during the openphase. This enables the production of wave-forms where the open phase starts at a differentamplitude than it ends. In waveform 1b this isvery prominent (see figure 2.3).

41

Page 44: Diploma Thesis Modeling transient behaviour in vocal fold

Parabolic vocal fold cross-section

Another improvement made to the model de-scribed above was to change the shape of thecross-section of the vocal fold, such that thepart below hcvo is a parabola (see figure 3.49).This means that towards the beginning and theend of the closed phase the contact area changesmore rapidly. This can help improve the modelfitting the shape of the closed phase peak inwaveforms 1a, 2, and 5 (see figures 3.1, 3.3, 3.7,where the 2D model has some problems).

h

hcvo

Figure 3.49: Cross-sections of vocal folds.Left: conventional shape, Right: new parabolicshape.

3.4.4 And what about pitch breaks?

Having a new vocal fold model producing rea-sonable results and remembering the success inproducing double and triple period pitch breakswith a system that is somehow based on theconcept of a constriction of the airflow abovethe vocal folds, it is a logical step to add some-thing to the new vocal fold model that simu-lates a constriction of the airflow. Doing so inthe hope that this new model is capable of pro-ducing pitch breaks.

The physical model for Lx signals can eas-ily be adjusted to simulate a constriction in theairflow somewhere in the vocal tract. A schemaof how the model is extended above the vocalfolds is shown in figure 3.50.

In terms of the equation system this meansthat an additional variable x4 is introduced whichis related to the pressure pc. In fact

x4 =pc − pext

∆p

VocalFolds

Constrictionpext

pc

pint

(1− x4)∆p

Figure 3.50: Physical model of a constriction ofthe airflow. The airflow through the constric-tion depends on the pressure inside the volumebetween vocal folds and constriction. The pres-sure difference ∆p that drives the vocal fold vi-bration is between the subglottal pressure pint

and pc and therefore variable.

and(1− x4)∆p = pint − pc (3.57)

As pint − pc is now the pressure differencethat drives the vocal folds — instead of ∆p —it follows from (3.57) that all the occurrences of∆p in the old model must be multiplied by thecoefficient (1− x4).

The dynamics of x4 itself are governed bythe following equation:

x4 =x4 + pext/∆p

Vc(x3 − ϕomx4)

The coefficient x4+pext/∆pVc

is derived directly fromthe ideal gas law pV = NkT . It depends in-versely on the volume Vc between the vocal foldsand the constriction. This means that for alarger volume, the effect on x4 will be smaller.

The second coefficient (x3 − ϕomx4) is sim-ply the balance of what comes into that volumeand what goes out, x3 being the airflow throughthe vocal folds (therefore into the volume) andthe constant ϕom multiplied by x4 is what goesout. The latter simply states that the outflowis proportional to pc − pext.

Every time air flows through the vocal folds,this adds to the pressure pc in the volume be-tween the the folds and the constriction, there-fore reducing the force driving the vocal foldoscillation. So the hope was that a period dou-bling could happen as follows: in one cycle a

42

Page 45: Diploma Thesis Modeling transient behaviour in vocal fold

large oscillation happens and much air goes intothe volume Vc, leading to a reduced pressure dif-ference to drive the next cycle which comes outsmaller, letting less air into Vc and therefore al-lowing the pressure difference to build up again,where the whole thing starts again.

In theory this would produce a double pe-riod limit cycle. In practice it does not work.

Several attempts were made to modify themodel by introducing new nonlinearities, suchas making the in- or outbound airflow depen-dent of a power of the respective pressure dif-ference, without success.

The way x4 (and therefore the pressure inVc) varied from cycle to cycle, was studied morein detail using a Poincare map. For the map asequence x4[n] was constructed by evaluating x4

at the times where x1 crossed a threshold set atcvo with a negative derivative, i.e. at the end ofeach closed phase. The resulting sequence wasplotted x4[n+1] against x4[n] in order to reveala function f such that x4[n + 1] = f(x4[n]) if itexists6. In fact, this function does exist and it isvery deceptive as can be seen in figure 3.51. Itis simply a straight line, crossing the line x4[n+1] = x4[n] at about 0.15 with a slope between0 and 1, which means that inevitably x4[n] willtend to 0.15.

In order to have a period doubling bifurca-tion, it would be necessary to have a negativeslope of −1 where x = f(x).

0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

x4(n)

x 4(n+1

)

Figure 3.51: Poincare map for x4 evaluated atthe end of the closed phase of each cycle.

6If this function exists, it means that the Poincaremap is one-dimensional, which is not necessarily the case

43

Page 46: Diploma Thesis Modeling transient behaviour in vocal fold

Chapter 4

Conclusion

This project deals with a wide range of top-ics. Stationary and transient vocal fold move-ment was analysed. This was modeled, raisingnew hypotheses on pitch breaks and a new phys-ical model of the vocal folds that simulates thecontact area and the glottal flow was developed.

A simple two-dimensional model simulatedthe Lx signal and was qualitatively correct inmany ways. Furthermore it was possible to get agreater understanding of the system by provingthe existence of a limit cycle.

Finding a system that produces pitch breakswith a non-integer frequency ratio was very in-teresting because this case is different from theclassic period-doubling scenario. It turned outthat these pitch breaks are not directly due to abifurcation. Instead the system is perturbed togo from one attractor to another, both of whichcoexist for a given parameter range.

A system where different attractors coexistmay offer a greater range of possible behavioursthan one with only one attractor. If the attrac-tors can bifurcate independently of each other,many different combinations are possible. Study-ing such systems is a domain where further workcould take place.

The model also raised the question if theremay be a single mechanism responsible for pitchbreaks to frequencies that are higher or lowerthan the normal vibration frequency. This cameup because the system that produces the doubleand triple period pitch breaks contains a termproducing a signal whose period is half of thesignle period. Of course this question would beinteresting to explore in future work.

The physical models of vocal fold vibrationwere designed with the aim of keeping the num-ber of state variables as low as possible, there-fore making it easier to analyse the system. The

result is a third order system having the contactarea and the glottal airflow as state variables.In terms of the number of state variables thissystem is in the same class as a one-mass modeldriven by the glottal flow. However, it has morefeatures than are usually found in a one-massmodel. It also simulates the zipper-like openingand closing of the folds and takes into accounta deformation of the vocal fold tissue.

The physical model producing Lx could beused to simulate the voice source based on arecorded Lx signal, by matching the model pa-rameters to the Lx signal, running the modeland deriving the voice source from the simu-lated glottal flow.

An application of bifurcating nonlinear mod-els could be to use them to drive real-time voicesynthesis. This may contribute to a more natu-ral sound. However, it must be considered thatmuch of the naturalness of a sound has little todo with the vibration model itself, but with theway it is controlled. Adding some vibrato to astatic waveform can already produce convincingresults. Note that vibrato in the human voicecan be voluntarily controlled. Therefore it hasmore to do with how the vocal folds are handledthan with the underlying mechanism of vibra-tion.

4.1 Acknowledgements

I would like to thank the people involved insupervising this project: Jonas Buchli for thegood discussions, encouraging me to develop thepitch break model. Prof. David Howard forsharing some of his knowledge of the humanvoice with me and Prof. Auke Ijspeert for ac-cepting me for this project.

44

Page 47: Diploma Thesis Modeling transient behaviour in vocal fold

Appendix A

Lx samples reference

sample source track timemin : sec

wave1 AS 12 0 : 2.48wave1a AS 12 0 : 2.55wave1b AS 12 0 : 2.78wave1b2 AS 12 0 : 4.45wave1b3 AS 14 0 : 32.25wave2 AS 12 0 : 2.88wave3 AS 12 0 : 2.68wave4 AS 12 0 : 3.0wave5 AS 12 0 : 3.76wave6 AS 12 0 : 41.02wave7 AS 8 1 : 44.11rest-osc 1 AS 12 0 : 2.24rest-osc 2 AS 12 0 : 3.42rest-osc 3 AS 12 0 : 3.93rest-osc 4 AS 12 0 : 6.40rest-osc 5 AS 12 0 : 6.79rest-osc 6 AS 12 0 : 21.75osc-rest 1 AS 12 0 : 11.0osc-rest 2 AS 12 0 : 36.8osc-rest 3 AS 12 1 : 33.56osc-rest 4 AS 10 0 : 17.8doubleper AS 12 0 : 15.7tripleper AS 12 0 : 28.8tripleper2 AS 12 0 : 49.82tripleper3 AS 10 0 : 39.33period2 3 5 AS 12 0 : 45.2sinusoidal AS 12 0 : 14.85littlesine AS 12 0 : 59.45nine AS 12 0 : 49.8Source abbreviationsAS Anechoic Speech CD

45

Page 48: Diploma Thesis Modeling transient behaviour in vocal fold

Appendix B

Model parameter values

B.1 2D model

behaviour ODE function a b c d e f α α1 α2 β t1 t2waveform 1a pwlinvf5n 100 3 2 10 100 5 20waveform 2 pwlinvf5n 100 10 5 10 100 5 20waveform 3 pwlinvf5n 100 6 50 10 160 5 40waveform 5 pwlinvf6n 100 30 8 -5 7 -2 2 25rest-osc 1 pwlinvf5nb 100 6 50 10 160 -1 10 80 0 0.5rest-osc 2/3 pwlinvf5nb 100 6 50 10 160 -1 40 120 0.5 0.5osc-rest 1 pwlinvf5nb 100 6 50 10 160 10 -5 80 0.5 1

46

Page 49: Diploma Thesis Modeling transient behaviour in vocal fold

Appendix C

Colouring of the basins of attractionin section 3.2.2

The aim of this colouring method is to cap-ture not only the attractor to which a trajectoryconverges, but also the speed at which it con-verges.

The attractors that should be distinguishedproduce output signals with a double or a tripleperiod, i.e. whose fundamental frequency is ei-ther one half or one third of a given frequencyf0. Therefore the frequency spectrum below f0

has one peak at f0

2 for the double period andtwo peaks at f0

3 and 2f0

3 for the triple period(see figure C.1).

0 100 200 300 400 5000

2000

4000

6000

8000

10000

12000

FFT bins

Figure C.1: Spectra of double and triple pe-riod output signals. The green curve is from adouble period and the violet curve from a tripleperiod. The colours are chosen based on thevalues of the spectrum at the highlighted fre-quencies: FFT bin 54 for the red channel, FFTbin 81 for green and FFT bin 107 for blue.

The colour for a given spectum is chosen ac-cording the amplitudes at the frequency of the

possible peaks (see figure C.1). A double pe-riod trajectory whose spectrum has only onepeak below f0 — at f0

2 — will be coloured ingreen. A triple period trajectory with peaks atf0

3 and 2f0

3 will get mainly red and blue, hencethe violet colour.

This scheme takes into account the wholesignal and not just the state at the end of theevolution. For example, a trajectory that fol-lows a single period orbit at first and convergesonly slowly to the double period limit cycle, willhave a smaller peak at f0

2 and will therefore becoloured in a darker shade of green.

47

Page 50: Diploma Thesis Modeling transient behaviour in vocal fold

Appendix D

LF model fitting method

The LF model is describes the glottal flowderivative E(t) as being composed of two parts:a sinusoid multiplied by an exponential as thefirst part

E(t) = E0eαt sinωgt

for 0 < t ≤ te and an exponential return phasefor the rest of the cycle (te < t ≤ tc):

E(t) =−E0

εta

(e−ε(t−te) − e−ε(tc−te)

)As mentioned in section 3.3.1, a simplified

version of the LF model is used, where the aboveequation is replaced by

E(t) =−E0

εtae−ε(t−te)

The fitting algorithm produces all parame-ters, given an input signal x(t) and tc and as-suming t0 = 0.

It starts by computing t1 as the first maxi-mum, tp as the first negative zero-crossing andte as the absolute minimum between t0 and tc.

From tp the frequency of the sinusoid, ωg,can be calculated as ωg = π/tp. The sinu-soid sin ωgt is computed and the input signal(up to te) is divided by it. The parameter α iscomputed by a first order polynomial fitting ofx(t)/ sinωgt.

The same way the parameter ε of the returnphase is calculated using polynomial fitting toapproximate the logarithm of x(t) for t > te as−εt + c. Finally ta is computed as

ta =E0

εeεte−c

t0 tp tc

Ei

Ee

ti te

Figure D.1: LF model parameters (simplified).Adapted from [8].

48

Page 51: Diploma Thesis Modeling transient behaviour in vocal fold

Appendix E

Tools

During this project several tools for han-dling Lx signals and ODE systems were devel-oped. As they may be of some use to otherpeople, I describe them here briefly.

E.1 Lx denoising

Lx signals are often very noisy. As the Lx signalitself is supposed to be relatively smooth — atleast piecewise — it is possible to guess whatthe waveform should look like ideally withoutthe noise.

It should be remarked that this is not usu-ally the case with audio signals. Most oftenlooking at the waveform does not tell very muchabout the sound and noise can be confused withhigher-order harmonics for example.

There are algorithms based on anisotropicdiffusion that work very well for denoising im-ages. As the Lx signal makes a “sense” visually,applying a similar algorithm was worth a try.

A problem that arises when simply filteringan Lx signal with a lowpass filter is that thevery sharp rising edge during the closing phaseis “smeared out”, attenuating the higher-orderharmonics of the signal.

In one dimension, anisotropic diffusion canalso be seen as filtering a signal with a Gaussianfilter that changes its temporal extension (andinversely its bandwidth) depending on the gra-dient of the signal. If the gradient is strong insome point, the signal is filtered using a narrowGaussian, not altering the signal too much. Onthe other hand, if the signal relatively flat, it isfiltered with a wider filter, equalising all smallperturbations in the flat region of the signal.

This is exactly what allows to recover theshape of an Lx signal: where it is flat, it will be-

come smoother, but in places with strong gradi-ents, i.e. during the opening and closing phases,these strong gradients are preserved.

For this project a very simple implementa-tion of this idea was realised: instead of adapt-ing the width of the filter, the filter is simplyinterpolated between a (relatively wide) Gaus-sian curve and an impulse. It can also be in-terpreted as taking the original signal and alowpass-filtered version of it and cross-fadingbetween the two signals depending on the es-timated gradient of the original signal.

0 1 2 3 4 5 6

x 10−3t [sec]

originallowpass filtereddenoised

Figure E.1: Example of the denoising techniquecompared to lowpass filtering. The denoised sig-nal follows the original signal very well at thebeginning of the closing phase, getting the steeprising edge right from the start. But on the flatparts before and after the peak it follows closelythe lowpass-filtered, getting rid of the noise.

This is implemented in the following Matlabfunction which takes a noisy signal and threeparameters as input. Parameter lpglen is the

49

Page 52: Diploma Thesis Modeling transient behaviour in vocal fold

length of the filter used to create the lowpass-filtered version, while dgdist and dglen controlthe estimation of the gradient. Decreasing themmakes the estimation more localised (better forgetting sharp edges), but also more noisy (lessefficient noise reduction).

50

Page 53: Diploma Thesis Modeling transient behaviour in vocal fold

E.1.1 LxNoisereduction

function Lxnr=LxNoisereduction(Lx,lpglen,dgdist,dglen)

if nargin<4, dglen =100, end;if nargin<3, dgdist=10, end;if nargin<2, lpglen=50, end;

df1=dogfilter1(dgdist,dglen);dLx=sqrt(abs(filtcenter(df1,1,Lx)));Lx0=filtcenter(dogfilter0(lpglen),1,Lx);Lxnr=(Lx0.*(max(abs(dLx))-abs(dLx))+Lx.*abs(dLx))/max(abs(dLx));

E.2 Batch evaluation of ODE systems

function [result,times]=evalsys(solver,opt,odefun,tint,fs,ic,varargin)% function [result,times]=evalsys(solver,opt,odefun,tint,fs,ic,varargin)%% Solves an ODE system with all combinations of the specified% sets of initial conditions and parameters.%% INPUTS:% solver is a function handle to one of Matlab’s ODE solvers% (e.g. ode45)% opt options given to the ODE solver (see odeset.m)% odefun specifies the ODE system (passed directly to solver)% tint is the time interval for which the system is solved% ic is a set of initial conditions organised by columns% varargin: any number of sets of parameters to be passed to odefun%% OUTPUTS:% result is a structure with following fields:% result.odefun : handle to the function defining the ode system% result.solver : handle to the ode solver used% result.param : cell array containing the initial conditions% and the parameters to the ode function% result.data : cell array containing the calculated% state-space trajectories% times is a vector containing all points in time where the% result is evaluated

nrarg=length(varargin);% initialise cell dimensions and counters (one for each dimension)celldim=[size(ic,2),ones(1,nrarg)];counters=cell(1,1+nrarg);counters{1}=1;for i=1:nrarg

celldim(i+1)=size(varargin{i},2);counters{i+1}=1;

end

51

Page 54: Diploma Thesis Modeling transient behaviour in vocal fold

result=struct(’solver’,solver,’odefun’,odefun,’param’,[],’data’,[]);result.param=cell(celldim);result.data =cell(celldim);p =cell(1,nrarg); % parameters to the ode functionicp=cell(1,nrarg+1); % initial condition plus parametersmaxeval=prod(celldim);times=[tint(1):1/fs:tint(2)];for i=1:maxeval

for j=1:nrargp{j}=varargin{j}(:,counters{j+1});icp{j+1}=p{j};

enddisp([’doing evaluation ’,num2str(i),’ out of ’,num2str(maxeval)]);disp([’init. cond.: ’,num2str(ic(:,counters{1})’)]);disp([’parameters :’]);disp(p);% solve systemsol=feval(solver,odefun,tint,ic(:,counters{1}),opt,p{:});disp([’*’]);% evaluate solution at all given timesx=deval(sol,times);% assign parameters to result using subsasgnS(1).type=’.’;S(1).subs=’param’;S(2).type=’{}’;S(2).subs=counters;icp{1}=ic(:,counters{1});subsasgn(result,S,icp);% assign state-space trajectory to result using subsasgnS(1).subs=’data’;subsasgn(result,S,x);% calculate new counters vectorcounters{end}=counters{end}+1;for j=length(celldim):-1:2

if counters{j}>celldim(j)counters{j}=1;counters{j-1}=counters{j-1}+1;

endend

end

52

Page 55: Diploma Thesis Modeling transient behaviour in vocal fold

E.3 Parameter interpolation for ODE systems

E.3.1 linearparam.m

function dx=linearparam(t,x,tt,odefun,varargin)% function dx=linearparam(t,x,tt,odefun,varargin)%% allows to modify parameters during the% evaluation of an ODE system.%% instead of the "real" ODE function, linearparam% is evaluated with the function name as its% second parameter

nrarg=length(varargin);p=cell(1,nrarg);for i=1:nrarg

if length(varargin{i})==length(tt)p{i}=interp1(tt,varargin{i},t);

elsep{i}=varargin{i};

endend

dx=feval(odefun,t,x,p{:});

E.3.2 splineparam.m

function dx=splineparam(t,x,tt,odefun,varargin)% function dx=splineparam(t,x,tt,odefun,varargin)%% allows to modify parameters during the% evaluation of an ODE system.%% instead of the "real" ODE function, splineparam% is evaluated with the function name as its% second parameter

nrarg=length(varargin);p=cell(1,nrarg);for i=1:nrarg

if length(varargin{i})==length(tt)p{i}=spline(tt,varargin{i},t);

elsep{i}=varargin{i};

endend

dx=feval(odefun,t,x,p{:});

53

Page 56: Diploma Thesis Modeling transient behaviour in vocal fold

Appendix F

Matlab code for airflow based Lxmodel

F.1 flowlx1.m

function dx=flowlx1(t,x,m,l,h,w,wt,wi,cm,s,k1,d1,kc,dc,dp,cvo,r1,r3)

% x(1) = contact area (~Lx signal)% x(2) = angular velocity% x(3) = airflow

% constantsrho = 1.229; % density of airmu = 1.73e-5; % viscosity of air

lm=l*cm;

lc=x(1)/(h*cvo); %length of horizontally closed partlo=l-lc; %length of horizontally open partlair=lc+lo/2; %position of force due to air pressure

dx=[0;0;0];if lc<l % folds are open

sina=h*cvo*wt/x(1); % sin(a) where a is the opening anglephimax=dp/h*(sina*l)^3/(12*mu)*lo;phicorr=((lo/l)/(lo/l+0.05))^2;dx(1)=-x(1)*x(1)/(h*cvo*wt)*x(2);dx(2)=-1/(m*lm)*(k1 * (lm*sina - w/2) + d1*lm*sina*x(2)) + 1/(m*lc)*(kc*x(1));dx(2)=dx(2) - l*h/(m*lm) * (phicorr*rho*x(3)*x(3)/2 * (1/(sina*lo*lo)-1/(wi*l)));dx(3)=r1 * (phimax-x(3));

else % fold are closedsina=(wt-h/s*(x(1)/(l*h)-cvo))/l;dx(1)=-l*l*s * x(2);dx(2)=-1/(m*lm)*(k1 * (lm*sina - w/2) + d1*lm*sina*x(2));dx(2)=dx(2) + 1/(m*l)*(kc*x(1) - dc*x(1)*x(2));dx(3)=-r3 * x(3);

end

54

Page 57: Diploma Thesis Modeling transient behaviour in vocal fold

Bibliography

[1] Evelyn R. M. Abberton, David M. Howard,and Adrian J. Fourcin. Laryngographic as-sessment of normal voice: a tutorial. Clin-ical Linguistics & Phonetics, 3(3):281–296,1989.

[2] J. Awrejcewicz. Bifurcation portrait ofthe human vocal cord oscillations. Jour-nal of Sound and Vibration, 136(1):151–156, 1990.

[3] R. J. Baken. The aged voice: a new hy-pothesis. Voice, 3(2):57–73, 1994.

[4] Edmond A. Brun and Andre Martinot-Lagarde. Mecanique des Fluides, volume 1.Dunod, Paris, 1959.

[5] D. G. Childers. Speech Processing and Syn-thesis Toolboxes, appendix A10.4 Reviewof vocal fold vibratory models, pages 379–383. Wiley Text Books, 1999.

[6] D. G. Childers, D. M. Hicks, G. P. Moor,and Y. A. Alsaka. A model for vocal fold vi-bratory motion, contact area, and the elec-troglottogram. J. Acoust. Soc. Am., pages1309–1320, 1986.

[7] M. di Bernardo, P. Kowalczyk, andA. Nordmark. Classification of sliding bi-furcations in dry-friction oscillators.

[8] Gunnar Fant. Glottal flow: models and in-teraction. Journal of Phonetics, pages 393–399, 1986.

[9] Gunnar Fant. The voice source in con-nected speech. Speech Communication,pages 125–139, 1997.

[10] J. Horacek and J. G. Svec. Instabilityboundaries of a vocal fold modelled as aflexibly supported rigid body vibrating ina channel conveying fluid. In Proceedings

of the Internation Mechanical EngineeringCongress & Exposition (IMECE), 2002.

[11] David M. Howard. Variation of electro-laryngographically derived closed quotientfor trained and untrained adult femalesingers. Journal of Voice, pages 163–172,1995.

[12] David M. Howard. The Voice Clinic Hand-book, chapter 13 (Instrumental voice mea-surement: uses and limitations), pages323–382. Taylor & Francis, 1998.

[13] David M. Howard. Killers in the Brain- Essays on science and technology fromthe Royal Institution, chapter The humansinging voice, pages 113–134. Oxford Uni-versity Press, 1999.

[14] David M. Howard, Geoffrey A. Lindsey,and Bridget Allen. Toward the quantifi-cation of vocal efficiency. Journal of Voice,pages 205–212, 1990.

[15] Gina Jiang. Chaos in vocal cord vibration- a look at the evidences and promises itprovides, 2003.

[16] N. J. C. Lous, G. C. J. Hofmans, R. N. J.Veldhuis, and A. Hirschberg. A symmet-rical two-mass vocal-fold model coupled tovocal tract and trachea, with applicationto prosthesis design. Acta Acustica, pages1135–1150, 1998.

[17] W. Mende, H. Herzel, and K. Wermke.Bifurcations and chaos in newborn infantcries. Physics Letters A, 145:418–424,1990.

[18] C. Pozrikidis. Introduction to Theoreticaland Computational Fluid Dynamics, chap-ter 5.1 Steady Unidirectional Flows, pages180–191. Oxford University Press, 1997.

55

Page 58: Diploma Thesis Modeling transient behaviour in vocal fold

[19] Slobodan N. Simic, Karl Henrik Johans-son, John Lygeros, and Shankar Sastry.Hybrid limit cycles and hybrid poincare-bandixson. In Proceedings of the 15thIFAC World Congres on Automatic Con-trol, Barcelona, Spain, 2002.

[20] Steven H. Strogatz. Nonlinear dynamicsand chaos: with applications to physics, bi-ology, chemistry, and engineering. West-views Press, Cambridge, MA, 1994.

[21] Johan Sundberg. The acoustics of thesinging voice. Scientific American, pages82–91, March 1977.

56