a non-linear signal processing model of the auditory syste...

91
A non-linear signal processing model of the auditory system Master thesis February 2005 Ole Hau (s020269) Centre for applied hearing research (CAHR) Ørsted Institute Danish Technical University

Upload: truongkhue

Post on 05-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

A non-linear signal processing model of the auditory system

Master thesis February 2005

Ole Hau (s020269)

Centre for applied hearing research (CAHR) Ørsted Institute

Danish Technical University

A non-linear signal processing model of the auditory system

I

Abstract The aim of this thesis was to compare the two different concepts underlying forward masking, described by the temporal-window model and the adaptation-loop model. In the first part of the thesis the models were tested in their original form described by Oxenham (2001) and Dau et al (1996). The models were then modified, such that they both use the same realistic preprocessing models of the cochlea and the same decision mechanism. This was done in order to directly compare the two modelling approaches in conditions of forward masking. In the final part the modified models were tested to verify that they still predicted forward masking and the mechanisms underlying the forward-masking predictions were compared. The tests conducted on the models showed their ability to predict forward masking at 1 kHz and at 4 kHz using a short signal following a 200-ms masker. Also conditions of intensity discrimination for a 1-kHz pure tone and broad-band white noise were tested. The results showed that both models were able to predict forward masking at 4 kHz, but not at 1 kHz, both before and after modification. Both modified models were able to predict the constant intensity-discrimination threshold in noise, described by Weber’s law, and the decrease in intensity-discrimination threshold with increasing levels in pure tones, described by the near miss to Weber’s law. In the final analysis, a comparison of the mechanisms underlying forward masking, show that the two models use a similar mechanism to account for the forward-masking thresholds found in humans. The conclusion is that the actual mechanisms assumed in the temporal-window model to account for forward masking, can be effectively viewed as an of adaptation mechanism.

A non-linear signal processing model of the auditory system

II

Preface This project was submitted in candidacy for the academic degree of “masker of science and Engineering”. The work has been carried out from August 2004 to February 2005 at the Centre for Applied Hearing Research (CAHR) situated at the Ørsted institute at the Danish Technical University (DTU), with Torsten Dau and Stephan Ewert as supervisors. First of all I would like to thank Stephan for the long discussions, and his guidance on all the questions that I stumbled on during the project, and the big effort of trying to find discrepancies in the current implementation from the original implementation of the adaptation-loop model. I would also like to thank Torsten for his always enthusiastic mood when discussing the models each week during the project, which definitely helped in keeping the spirit up. Also both Thomas Ulrich Christiansen and Oliver Fobel should have thanks for answering questions when Torsten and Stephan were not available. An extra thanks, to Torsten, Stephan, Thomas, my sister and her husband, for commenting and proofreading the report in the final phase, which was extremely helpful. I should also not forget to thank my girlfriend Signe Kjær Brandt for her patience during the entire project and especially the last two months, where I was undoubtedly a little distant.

_______________ Ole Hau

(s020269)

A non-linear signal processing model of the auditory system

III

Contents

1 Introduction 1

1.1 Physiology of the auditory system 2

1.2 Psychophysics of the auditory system 7 1.2.1 Masking 8 1.2.2 Intensity discrimination 11

2 Experiments 13

2.1 Apparatus 13

2.2 Forward masking using 10-ms 1000-Hz signal. 13 2.2.1 Procedure and Stimulus 14 2.2.2 Experimental results 14

2.3 Forward masking using 12-ms 4000-Hz signal 15 2.3.1 Procedure and Stimulus 15 2.3.2 Experimental results 16

2.4 Intensity discrimination with 1000-Hz pure tone 17 2.4.1 Procedure and Stimulus 17 2.4.2 Experimental results 18

2.5 Intensity discrimination with white noise 19 2.5.1 Procedure and Stimulus 19 2.5.2 Experimental results 19

3 Models 21

3.1 The adaptation-loop model 21 3.1.1 Verification of the model implementation 26 3.1.2 Tests with the adaptation-loop model 29 3.1.3 Discussion of the adaptation-loop model 31

3.2 The temporal-window model 32 3.2.1 Verification of the model implementation 36 3.2.2 Tests with the temporal Window model 37 3.2.3 Discussion of the temporal window model 40

4 Model modification 41

4.1 Auditory filter 41

4.2 Hair-cell envelop extraction 42

4.3 Decision mechanism 43

5 Modified models 44

5.1 The adaptation-loop model with DRNL 44 5.1.1 Tests with the adaptation-loop model with DRNL 45

5.2 The adaptation-loop model with DRNL and an squaring 47 5.2.1 Tests with the adaptation-loop model with DRNL and squaring 48

5.3 The modified temporal-window model 50 5.3.1 Tests with the modified temporal window model 51

5.4 Discussion of the modified models 52

5.5 Model Comparison 53

A non-linear signal processing model of the auditory system

IV

6 Summary and conclusion 60

6.1 Summery 60

6.2 Conclusion 62

6.3 Future work ideas 63

7 Literature 64

Appendix A

1 Frozen-noise masker experiment 66

Appendix B

1 Understanding the models 69

1.1 Understanding the adaptation-loop model 69

1.2 Understanding the Temporal window model 69

Appendix C

1 The DRNL filter 71

1.1 DRNL simplification 75

1.2 DRNL and two-tone suppression 77

Appendix D

1 Temporal Model modifications 83

1.1 The temporal-window model with DRNL 83

1.2 The temporal-window model with Optimal detector 83

1.3 The temporal-window model with half wave rectifier 84

A non-linear signal processing model of the auditory system

V

Reader Notes The figures in this thesis use short abbreviations in the legend text. Exsample: “Model: GT HWLP AD LP OptDet”, this means that the data is for a model is composed of a number of modules. Each module has a short abbreviation: GT Gammatone filter GTFB Gammatone filter-bank (500 to 4000 Hz) DRNL Dual Resonance Non-Linear filter DRNLFB DRNL filter-bank (500 to 4000 Hz) TW Temporal window AD Adaptation loops HWLP Half-wave rectification and low-pass filter FW Full-wave rectification HW Half-wave rectification NL Non-linearity SQ Squaring (x2) TWdet Temporal window detector OptDet Optimal detector TWOptDet Temporal window optimal detector (a revised version of the optimal detector) Also it should be noted that in all model predictions and measurements done in this thesis the term dB SPL is actually not referring the actual dB SPL level of the signal but the peak equivalent dB SPL level.

A non-linear signal processing model of the auditory system

1

1 Introduction The auditory system is one of the most important sensory systems we have. We use it for many different tasks like communication, locating objects and as a warning device. For this reason people have investigated the auditory system for many years, making an effort to try and describe it and explain how it works. In order to study and analyse the auditory system, it is useful to have a realistic model of the system. A model makes it easier to understand how the system works and what happens if some part of the system is damaged (hearing impairment). Some parts of the auditory system can be studied directly through physical measurements either in human or animal subjects. Physical studies have helped a lot in understanding of the outer, middle and inner ear. On the basis of the physical data very detailed models have been developed. Studying how sounds are perceived is, however, something that is currently not possible to be studied directly through physical measurements. Because of the complexity of the brain we can not measure directly how a sound is perceived. Data from individual neurons have been measured in animal studies, but the complete picture will not be understood until all the connections between the neurons are uncovered. Psychophysics is able to shed some light on how the brain works. Psychophysics is the scientific study of the relation between physical stimulus (the sound) and the psychological perception (sensation in the brain). Psychophysical experiments are built around presenting a listener with a sound, and then asking the listener how or if he perceived the sound. Psychophysical experiments treat the auditory system like a black box, stimuli can be applied and a response can be obtained as an output. When attempting to make a realistic model of the human auditory system, assumptions about the signal processing of the brain are typically included, to form a model of the auditory system. The model can then be tested against psychophysical experiments to see if it predicts the same data as human subjects. If the model predicts poorly, the assumptions made may be wrong and should be revised. This thesis is an evaluation of two different assumptions about how the human brain processes signals. The first is “The adaptation-loop model” also known as “The perception Model” (PEMO) (Dau et al, 1996). The Second is “The temporal window model” (Oxenham 2001). The two models differ on assumptions made about the mechanisms underlying forward masking. Forward masking is a special effect observed in psychophysical experiments, where the threshold1 of a signal is measured in succession of a masker. The following sections will briefly describe some known aspects of the human auditory system, some basic psychophysical signal detection theory and especially the alternative forced choice (AFC) test is explained. Then the two key types of psychophysical experiments, that are used in the comparison of the models, are discussed, masking (forward, simultaneous and backward

1 The lowest detectable level of a sound.

A non-linear signal processing model of the auditory system

2

masking) and intensity discrimination. Some sections may be skipped by readers that are already familiar with the concepts. They form a basic vocabulary of facts and terms that is necessary when discussing the adaptation-loop model and the temporal-window model that are described later.

1.1 Physiology of the auditory system

When sound reaches the ear it first passes through the ear canal (external auditory meatus) and then passes the eardrum (Tympanic membrane) into the tympanic cavity. In the tympanic cavity the sound is then transferred via the middle ear bones (malleus, incus and the stapes) to the cochlea. In the cochlea a conversion to nerve impulses takes place and the nerve impulses are transmitted to the brain. After the transfer to the nervous system further processing takes place, that strongly influences the perception of the sounds.

Figure 1-1 Cross section of the human head, showing the auditory path from the outer ear to

the cochlear nerve. (taken from Dau et al, 2004)

The Cochlea is of special interest since this is the structure which splits the sound into frequency channels that allow the brain to distinguish between different frequencies in the sound. The cochlea contains a membrane (the basilar membrane) that is set into vibration by pressure waves.

A non-linear signal processing model of the auditory system

3

Basilar membrane

Outer Hair Cells

Inner Hair Cells

Figure 1-2 Cross section of the Cochlear, showing the structure of the cochlea

with the basilar membrane and the inner and outer hair cells. (taken from Dau et

al, 2004)

The basilar membrane is constructed in such a way that different places at the basilar membrane are tuned to specific frequencies. This means that a sound of a specific frequency will vibrate the basilar membrane more at a specific place than other places. Hair cells placed along the basilar membrane pick up the vibration and convert it into nerve impulses that are transmitted to the brain through the cochlea nerve. There are two types of hair cells in the cochlea, the outer and the inner hair cells. The inner hair cells are said to be those that primary convert vibration to nerve impulses to the brain. Outer hair-cells are believed to be primarily linked to the amplification of low sound levels, also referred to as the cochlea amplifier. Figure 1-3 shows the relationship between the basilar-membrane displacement and the sound pressure level presented to the ear. Since the cochlea amplifier only amplifies low level signals, the response of the basilar membrane becomes compressive in the midlevel (about 30 to 80 dB SPL), this is referred to as the cochlea compression.

Figure 1-3 Schematic plot of the relationship between the sound pressure

presented to the ear and the basilar membrane displacement. Stipple line show

basilar membrane displacement without outer hair cell amplification.

Continues line shows how the low sound levels are amplified with the outer

hair cells.

Having cochlea amplification means that absolute thresholds (the lowest detectable sound level) are increased, because the basilar membrane acts compressive means that the dynamic range is also increased (Absolute threshold to threshold of pain).

A non-linear signal processing model of the auditory system

4

Figure 1-4 Schematic drawing of the relation ship between frequency of a low

level tone and the Basilar membrane displacement, at a single point (frequency

response). Stipple line show basilar membrane displacement without outer

hair cell amplification. Continues line shows how the low sound levels are

amplified and the frequency response sharpened with the outer hair cells.

The effect of the outer hair cells is not only limited to signal amplification. They also improve the frequency selectivity of each point on the basilar membrane for low sound levels. Figure 1-4 shows a schematic drawing of the basilar-membrane displacement of a single point of the membrane as a function of frequency, for a low-level sound. Hearing impairment is often related to outer hair cell loss. The loss of compression and frequency sensitivity at low sound levels, is of great importance when designing models that should model hearing impairment. The hair-cells pick up the pressure waves on the basilar membrane and convert it to nerve signals. The inner hair-cells intracellular voltages1 have been measured as a function of the pressure. Figure 1-5 shows how the inner hair-cells intracellular voltages change as a function of the instantaneous pressure. For low frequencies the frequency of the signal is represented in the intracellular voltage this is called fine structure, this is seen in Figure 1-6, where the intercellular voltage is shown for pure tone signal bursts of different frequencies. Fine-structure is seen to disappear for higher frequencies than 1000 Hz.

1 The voltage between the inside and outside the cell wall.

A non-linear signal processing model of the auditory system

5

Figure 1-5 Pressure to Inner hair-cell receptor

potential (taken from Pickles, 2003) The

conversion of the instantaneous pressure is

converted to inner hair-cell instantaneous

intracellular voltage. (Done with a stimulus

frequency of 600 Hz in a guineapig)

Figure 1-6 Inner hair-cell receptor potential plotted for

different frequency signals. 300 to 1000 Hz show fine

structure in the form of fluctuations corresponding with

the signal frequency. As the frequency of the signal

increases beyond 1000 Hz, fine-structure disappears and

the potential begins to reassemble the envelope of the

signal.

The intracellular voltage triggers nerve impulses (spikes) that are transferred to the brain via the auditory nerve. Physical measurements have also been done at this point of the auditory system in animal subjects, Figure 1-7 shows the spike-rate of a single auditory nerve fibre as a response to a 50-ms tone burst. The figure shows how the spike-rate has a strong onset at the start of the tone burst, spike-rate gradually decreases until it reaches a steady level. After the end of the tone burst the spike rate falls below the spontaneous rate and gradually increases to the steady-state spontaneous rate again. This phenomenon is referred to as adaptation.

A non-linear signal processing model of the auditory system

6

Figure 1-7 Singe auditory nerve fibre spike rate, As a

result of a tone burst of 50 ms. Taken from Pickles

(1988).

Beyond the auditory nerve further processes affect the signal but the added complexity from nervous system connections make it harder to trace the signal from here. The above describe theory can as such serve as the foundations for models of the auditory system.

A non-linear signal processing model of the auditory system

7

1.2 Psychophysics of the auditory system

The psychophysical experiments and simulations that are done in this thesis are based on the M alternative forced choice (M-AFC) method, for this reason this method is explained here. The AFC procedure is a good way of measuring the threshold of a signal. The signal may be measured in combination with other sounds; these sounds are referred to as the masker. In the M-AFC procedure a listener is presented with a number of trials, each trial is composed of M intervals containing sound stimuli (i.e. 2-AFC if two intervals are presented per trial.). All intervals contain the masker and one interval also contains the signal. In each trial the listener is forced to identify which interval contains the signal, the response is either correct or false. The AFC procedure usually starts with a trial where the signal is well above threshold. After each test the result is evaluated and the procedure decides if the signal variable should be increased or decreased using fixed steps. The decision can be based on previous trial results. There are a number of these tracking rules the most common perhaps being the transformed up-down procedure (U-up D-down). In the transformed up-down procedure, the listener responds correctly D consecutive times to make the signal variable decrease and U times to make the signal variable increase (i.e. 1-up 2-down, if the listener answers correctly in two consecutive trials the signal in the next trial is smaller, if the listener answers falsely one time the signal is made larger).

Figure 1-8 AFC-trial iteration using a 1-up 2-down

procedure (The procedure displayed uses a 1-up 1-

down until first turnaround). C means a correct

response, I means Incorrect response. Circled

responses indicate that the procedure turnaround point

(Signal level is either increased or decreased). (taken

from Hartmann, 2000)

Figure 1-8 shows how the AFC procedure gradually settles around a signal level where the signal is above or below the threshold. The threshold is calculated as a mean of a certain number of turnarounds. Specifying the number of intervals in each trial and the tracking procedure is important when performing AFC tests, since they influence the threshold. Using different combinations of M-

A non-linear signal processing model of the auditory system

8

AFC trials and transformed up-down procedures produce different thresholds in the AFC test. Human listeners show an s-shaped curve when plotting the probability of being correct in a AFC test, this function is called the psychometric function. A schematic drawing of the psychometric function in a 2-AFC and a 3-AFC test are shown in Figure 1-9.

Figure 1-9 Psychometric functions in a 2-AFC and a 3-AFC test.

The smallest possible probability of being correct a subject can have in a 2-AFC test is 50%. This corresponds to guessing, in the 3-AFC the smallest is 33,3% also equivalent to guessing. The maximum is a 100% correct responses for both. When the tracking procedure like 1-up 2-down is applied in the AFC test, the reversals the will settle around a 70.7% probability of being correct point on the psychometric function (50% for the 1-up 1-down). Figure 1-9 shows how the use of the use of a 1-up 2-down tracking will result in two different thresholds if used in a 2-AFC and 3-AFC procedure. The AFC procedure is used in this thesis in experiments with human subjects and in the simulations done with models. The use of the same experimental AFC procedures allows a direct comparison between experimental results and simulation data. The psychophysical experiments that are of special interest in this thesis are masking and intensity discrimination. The adaptation-loop model founds some of its assumptions about the processing of the brain on intensity discrimination, and both the temporal-window model and the adaptation-loop model claim to account for the mechanisms behind forward masking. For this reason the models are going to be evaluated against experimental results in these experiments. Basic concepts about masking and intensity discrimination are explained here, to be able to compare new experiments to old data.

1.2.1 Masking

Masking experiments are in general concerned with determining the threshold of a signal in the presence of a masker. The signal could be any type of signal, but only sinusoid signal in bursts are discussed here. The masker could also be any type of signal, but for the purposes in this thesis white broadband noise is used her.

A non-linear signal processing model of the auditory system

9

Masking experiments can roughly be separated into three categories, forward masking, simultaneous masking and backward masking. Forward masking is where the masker is not overlapping with the signal that comes after the masker. When the signal and masker are overlapping the experiment is called simultaneous masking. Backward masking is when the signal is presented prior to the masker and not overlapping. The different types of masking experiments are often discussed separately from each other. This is done even though there is a clear transition area between the experiment types, where the signal is not completely inside the masker or completely outside the masker. In the following there is no clear distinction between non-simultaneous-masking conditions and simultaneous masking. In the following the term backward or forward masking, for experiments during all conditions where the signal are partly outside the masker. Forward masking The forward-masking phenomena can be observed in an AFC experiment where two stimuli, one with a signal in the presence of a masker and one only containing a masker, are presented to a listener (see Figure 1-10) .

Figure 1-10 Schematic drawing if two stimuli’s

presented to a listener in a 2-AFC forward-masking

test.

Figure 1-11 Schematic drawing of the way

thresholds of the signal decrease as a function of

time after the end of the masker. Stippled line

shows the threshold.

By using AFC tests the threshold of a signal can be found at different positions relative to the masker. Figure 1-11 shows a schematic sketch of how thresholds decays as a function of time gap between the signal and the masker. The smaller the temporal gap the higher the threshold, and equally the longer the temporal gap the closer the threshold gets to the absolute threshold of the signal (the lowest detectable level of the signal alone). The forward-masking thresholds have been described in a large number of studies. The effect of forward masking are usually said to be in effect up to 200 ms after the masker. The effect varies as a function of masker duration, masker intensity, and signal duration. Measurement data from Oxenham 2001 and Dau et al 1996 are shown in Figure 1-12 and Figure 1-13.

A non-linear signal processing model of the auditory system

10

Figure 1-12 Mean signal thresholds in forward masking as a

function of masker-signal offset (Masker: 200-ms, 78-dB SPL, 0-

7000-Hz, random noise. Signal: 4000-Hz sinusoid tone ramped in

the beginning and end with a 2-ms ramp. The parameter is signal

duration. Onset-Offset Durations were from top to bottom on the

graph: 4, 6, 9, 12, 22, 52, and 102 ms. (from Oxenham 2001)

Figure 1-13 Mean signal thresholds in forward masking (Masker: 200-ms 20-5000-Hz frozen noise.

Signal: 1000-Hz, 10-ms, hanning-windowed sinusoid). Masker level was 77 dB SPL (○), 57 dB SPL (◊)

and 37 dB SPL (∆). (from Dau et al 1996)

Simultaneous masking The masked thresholds in simultaneous masking are usually considered to be a function of the signal-to-noise ratio of the output of the auditory filter placed at the signal frequency. The signal threshold can thus be described by:

∫∞

⋅=

0

)()( dffNfWkPS

Where W(f) is the auditory-filter frequency response, N(f) is the noise power-spectrum, k is a constant and PS is the power of the signal. This model of the signal thresholds is referred to as the power-spectrum model. The assumption of this model is that the threshold will remain constant independent of where the signal is placed temporally in the masker. Some discrepancies that can not be explained by the power-spectrum model have been seen. Zwicker (1965) established that signal thresholds were higher if the signal was placed near the

A non-linear signal processing model of the auditory system

11

onset of the masker but quickly decreased again when moving the signal away from the masker onset. The temporal phenomenon is referred to as “the overshoot”. Also the effect seemed higher for high frequency signals than for low. Bacon (1990) established that the overshoot seemed to be largest at midlevel maskers, and that the effect disappeared for low level maskers and high level maskers. Backward masking Backward masking experiments where the signal is placed before the masker have shown that the threshold of the signal increases before the signal actually enters the masker. Duifhuis 1973 showed that backward masking thresholds typically were raised placing the signal up to 20-ms in front of the masker. Dau et al. 1996 showed a similar relationship for two subjects while the third subject showed a much higher effect (Figure 1-14). Oxenham and Moore 1994 also show the backward masking effect to asymptote toward absolute threshold after about 20-ms.

Figure 1-14 Backward masking using a 10-ms Hanning windowed

1000-Hz sinusoid signal, Masker was a 200-ms frozen noise, band

limited to 20-5000 Hz. Subject AE (□), subject TD (○), Subject AS

(◊), Adaptation-loop model predictions (●).

1.2.2 Intensity discrimination

Intensity discrimination is an experiment that can show how small differences in signal level a listener is able to detect. A listener is presented with two sounds of different level. The level is adjusted so that the listener is just able to distinguish which stimuli is the louder one. This level has been termed “just noticeable difference” (JND).Figure 1-15 shows how the intervals in a 2-AFC test could look.

A non-linear signal processing model of the auditory system

12

Figure 1-15 Schematic drawing if two stimuli’s

presented to a listener in a 2-AFC intensity-

discrimination test.

Figure 1-16 Intensity discrimination from Houtsma et al

(1980) (taken from “Signals Sound and Sensation” W.

M. Hartmann)

The intensity-discrimination threshold is often plotted in a particular way that can show that there is a linear behaviour as a function of level (also called Weber’s law). Weber’s law says that the JND in intensity is a constant fraction of the intensity of the signal (∆I/I = k). Intensity discrimination is therefore often described as

∆+⋅=

I

IIL log10 or

∆⋅=

I

IL log10

Figure 1-16 shows how Weber’s law holds for white noise signals. For sinusoids the fraction seems to be decreasing for increasing level, this is called the “near miss to Weber’s law”. The near miss to Weber’s law is said to be linked to the excitation pattern on the basilar membrane. As the tone level is increased the excitation pattern is broadened on the basilar membrane, and therefore more hair cells being excited. It is believed that this behaviour makes it easier for the listener to detect the increase in sound level. In broadband noise, the broadening of the excitation pattern is not able to be used for intensity discrimination (Moore, 2003) Some discrepancies have been found to the near miss to Weber’s law in tones. Data by Carlyon and Moore (1994, 1996) and Oxenham and Moore (1994), show a deterioration of intensity discrimination for mid-level signals (30-80 dB SPL) under some special conditions. The phenomenon is observed when performing intensity-discrimination tasks with a notched noise, masking frequency bands outside the signal frequency. It has also only been observed using short signals about 30-ms or less. The effect is referred to as “The severe departure from Weber’s law”.

A non-linear signal processing model of the auditory system

13

2 Experiments The models that are going to be discussed in this thesis are tested for there predictions in 4 psychophysical experiments. For verification and comparison purposes these 4 new experiments have been conducted with human listeners. The experiments are listed below:

• Forward masking using 10-ms, 1000-Hz signal.

• Forward masking using 12-ms, 4000-Hz signal.

• Intensity discrimination with 1000-Hz, pure tone.

• Intensity discrimination with white noise band limited from 20 to 7000 Hz The experiments are based on the original experiments from the articles describing the models. The experiments serve as a reference to which the models can be compared. The experiments contain too few subjects and little data for each subject to match standards in articles like that of Dau et al (1996) or Oxenham (2001). The data is there for verified against data obtained by Dau et al (1996) or Oxenham (2001) to show that they are representative.

2.1 Apparatus

The listening tests were preformed in the sound insulated boots at DTU. The stimuli were generated digitally (sampling frequency of 44100 Hz) using the AFC-framework also available at DTU. Sound was presented through a AKG K-501 headset using a RME DIGI98/8 PAD soundcard and a TDT HB7 headphone driver with 0 dB gain. The transfer function of the setup was digitally equalized, to match a flat amplitude response between 250 and 8000 Hz1. Listeners made their responses using the numeric keypad of the computer. “Lights” were presented in the graphic interface on the computer monitor to delineate the intervals in each AFC-trial.

2.2 Forward masking using 10-ms 1000-Hz signal.

In order to verify/test the adaptation-loop model a forward-masking experiment as the one in Dau et al (1996), was carried out. The old experiments from Dau at al (1996) differ on some pointes from the experiments done here. Firstly the old experiments were based on a frozen noise2 masker. At the time of the experiments conducted here the original frozen noise were not available, for this reason the results shown here are for a new frozen noise masker. Secondly the signal in the experiments of Dau et al, were based on a frozen signal3. The use of a frozen masker and a frozen signal lead to problems related to phase interactions between the masker and the signal (See Appendix A). In simultaneous-masking conditions the threshold shifted by as much as 12 to 14 dB when shifting the phase of the signal by half a period. Phase effects were also seen in non-simultaneous masking as an effect of auditory filter ringing. Using a random phase signal in each AFC trial proved a good solution to eliminate phase interactions in the experiments. 1 Frequencies below 500 Hz deviated with 3 dB below flat spectrum. 2 Frozen noise, means that the masking noise was identical in all intervals. 3 Frozen signal, means that the signal phase was constant relative to the onset of the signal.

A non-linear signal processing model of the auditory system

14

2.2.1 Procedure and Stimulus

The following section describes the forward-masking experiment in detail. The experiment used a 3-AFC procedure using a 2-down 1-up tracking rule. The step size was 8 dB at the beginning of the experiment, it was halved after every 2 reversals, until reaching a minimum of 1 dB. 8 reversals were obtained using the 1 dB step size, and the threshold is calculated as the mean of these 8 values. Starting level of the signal is 90 dB SPL. For each subject the experiment was only run once. The masker was a 200-ms frozen noise signal “ramped” at the beginning and end with a step function. The masker is band limited to contain frequencies between 20-5000 Hz. The noise is random Gaussian noise presented to at 77 dB SPL. The signal is a 10-ms 1000-Hz sinus tone, windowed its entire duration by a Hanning window.

Figure 2-1 Schematic drawing of the interval containing signal and masker.

2.2.2 Experimental results

The experiment was done by 3 subjects (TD, KS and OH) for the left ear, subject OH repeated the experiment for the right ear. Experimental results are shown in Figure 2-2.

A non-linear signal processing model of the auditory system

15

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Ma

ske

d th

resh

old

[d

B S

PL

]Forward masking 1000 Hz

Dau et al (1996) - Frozen signal

KS Left ear

OH Left ear

OH Right ear

TD Left ear

Figure 2-2 Masked threshold results, of a 10-ms 1000-Hz signal,

masked by a 200-ms broadband masker, for subject KS OH and TD.

Masked thresholds are plotted as peak equivalent dB SPL as a

function of signal offset relative to masker offset. Dau et al (1996) data

for subject TD is plotted as squares.

All the results seem to lie reasonably in the same region as the data from Dau et al. For this reason all results from this experiment will be shown when comparing model predictions. It should be noted here that these experimental results have only been obtained through a single run for each subject, and the data from 1996 is not directly comparable to the data from this experiment, since it was based on a different frozen masker and a frozen signal. The use of a frozen signal in combination with a frozen masker can result in phase effects especially in the simultaneous masking. Signal thresholds are also affected by the phase effects after the signal is completely outside the masker because of filter ringing.

2.3 Forward masking using 12-ms 4000-Hz signal

In order to verify the temporal window model, an experiment from Oxenham (2001) was designed. On some points the experiments done here differ from the Oxenham (2001). Firstly the Oxenham experiments were conducted with a running noise masker. A frozen noise in combination with a random-phase signal was chosen to be used instead, to have a better comparison with the predictions of the two different models and with the data (See Appendix A). Secondly the Oxenham experiments were conducted for a wide variety of signal durations (4 to 102-ms, offset-offset). Because of the limited time schedule of this project the experiment was only run using a 12-ms (onset-offset) signal.

2.3.1 Procedure and Stimulus

The AFC procedure matches that of the 1000-Hz forward-masking experiment.

A non-linear signal processing model of the auditory system

16

The masker is a frozen white Gaussian noise, 0-7000-Hz, Spectrum level of 40 dB (78 dB SPL). The masker window was a 2-ms hanning ramp. The masker duration was 200-ms. The signal was a 4000-Hz sinusoid. The signal window was a 2-ms ramp function. Signal length was 12-ms onset to offset. Signal phase was randomised to eliminate phase interactions between the signal and the masker.

Figure 2-3 Schematic drawing of the interval containing signal and masker.

2.3.2 Experimental results

The experiment was done by 4 subjects (SB, TD, KS and OH), subject OH did the experiment for both ears. Experimental results are shown in Figure 2-2. As for forward masking at 1000-Hz the data were only obtained using single run for each subject.

A non-linear signal processing model of the auditory system

17

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Ma

ske

d th

resh

old

[d

B S

PL

]Forward masking 4000 Hz

Oxenham (2001) - Running noise

SB Left ear

KS Left ear

OH Left ear

OH Right ear

TD Left ear

Figure 2-4 Masked threshold results for a 12-ms 4000-Hz signal,

masked by a 200-ms broadband masker, for subject SB, KS, OH and

TD. Masked thresholds are plotted as dB SPL as a function of signal

offset relative to masker offset.

The results seem to lie in two groups. Subjects TD, SB and OH (left ear) data show an increased threshold at the 20-150-ms temporal positions, this may indicate that the subjects had a slight hearing impairment in the 4000-Hz region. Subject TD reported having a slight impairment in the upper region, and subject OH also showed a slight impairment (5-10 dB) in the 4000-Hz region in a hearing threshold test on the left ear. Subject SB was not tested. For this reason the measurements of SB, TD, and OH (left ear) are discarded when comparing to model predictions. Subject KS and OH’s right ear measurements fit within the trends of the data collected in Oxenham (2001) therefore these two measurement sets are used in the following to display the normal hearing thresholds when doing forward masking at 4000-Hz.

2.4 Intensity discrimination with 1000-Hz pure tone

This experiment is based on work about the adaptation-loop model done in Derleth (1999). The experiment was originally intended to be a modulation detection experiment. Using 0-Hz as the modulation frequency the experiment becomes an intensity-discrimination task.

2.4.1 Procedure and Stimulus

The AFC procedure matches that of the 1000-Hz forward-masking experiment, with the exception that the initial step size was 4 dB and the starting level of the increment was 10 dB. The reference stimulus was a 1-s 1000-Hz sinusoidal signal ramped at the beginning and end with 125-ms Hanning ramps. The signal was a 0-Hz modulated version of the reference signal. The reference signal was modulated between the ramps at the two ends of the signal with a Hanning windowed amplitude modulation described by:

))2cos(1()()( mod tfmtcAts ⋅⋅⋅+⋅⋅= π

A non-linear signal processing model of the auditory system

18

where

))2sin()( tftc carrier ⋅⋅= π

where m is the adjustable parameter. When fmod is 0-Hz, the Intensity increment ∆A of the signal becomes m·A. A schematic illustration of the interval containing the increment is seen in Figure 2-5.

Figure 2-5 Schematic drawing of the interval containing the intensity increment.

When comparing this experiment with other intensity-discrimination experiments like that in Figure 1-16, the transformation from m to ∆L can be calculated as

+⋅⋅=∆

A

AAmL log20

2.4.2 Experimental results

The intensity-discrimination experiment was only done using a single subject OH. The experiment was repeated 3 times for this subject. Experimental results are plotted in Figure 2-6 with m as function of the stimuli level. For comparison reasons with intensity discrimination in Figure 1-16 the data is replotted Figure 2-7 transformed to the form of ∆L.

A non-linear signal processing model of the auditory system

19

0 10 20 30 40 50 60 70 80 90 100-25

-20

-15

-10

-5

0m

[d

B]

A [dB]

0 Hz Modulation discrimination threshold in 1000 Hz tone

OH Left ear Exp. 1

OH Left ear Exp. 2

OH Left ear Exp. 3

Figure 2-6 Results for 0-Hz modulation detection task

(Intensity discrimination). Stimulus was a 1-s 1000-Hz

sinusoid with 125-ms Hanning ramps. Increment was a

hanning window increment with maximum amplitude

of m during the steady state of the stimulus.

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in 1000 Hz tone

OH Left ear Exp. 1

OH Left ear Exp. 2

OH Left ear Exp. 3

OH mean

Figure 2-7 Replot of measurement data from Figure

2-6 in a ∆L form. The mean of the data is plotted as

squares.

Intensity discrimination in these measurements do not show a clear improvement of the Intensity-discrimination threshold like the plot in Figure 1-16. Some improvement happens from 40 to 80 dB SPL. At 20 dB SPL the subject shows better intensity discrimination than for higher levels. The exact reason for this is unclear.

2.5 Intensity discrimination with white noise

Intensity discrimination for white noise was also examined in order to investigate if the near miss to Weber’s law disappears for broadband white noise.

2.5.1 Procedure and Stimulus

The procedure and stimulus match those of the intensity discrimination using a 1000-Hz tone. The exception being that the tone was replaced by a 20-8000-Hz white Gaussian noise. c(t) corresponds to a noise signal with a level of 0 dB SPL. In each AFC trial a new random noise vas generated.

2.5.2 Experimental results

This experiment was conducted 3 times by subject OH. the experimental results are shown in Figure 2-8.

A non-linear signal processing model of the auditory system

20

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6∆

L =

20

⋅ lo

g(

( ∆A

+A

))/A

) w

he

re ∆

A =

m⋅A

A [dB]

Intensity discrimination threshold in Noise

OH Left ear Exp. 1

OH Left ear Exp. 2

OH Left ear Exp. 3

OH mean

Figure 2-8 Measurement results for intensity discrimination using a

broad-band white noise.

Results show no decay of intensity-discrimination thresholds as the level of the noise increases. This is in good agreement with previous results like those in Figure 1-16.

A non-linear signal processing model of the auditory system

21

3 Models The following section describes the two models that are used to predict the forward masking and intensity discrimination.

• The adaptation-loop model

• The temporal window model Both models were implemented in Matlab and verified with their previous predicted data in forward masking. Further model predictions are compared to the new experimental data.

3.1 The adaptation-loop model

The model presented by Dau et al in 1996 is intended to compare computer model predictions with real psychoacoustical data. The model consists of an auditory preprocessing stage and a decision device. The auditory pre-processing stage models the signal path through the auditory system from the cochlea to the place in the brain where a decision is made whether or not a signal is detected. The decision device is a simulation of how the brain might detect signals in an AFC test. Like a human, it produces the best guess of which interval in an AFC trial contains the signal. This means that the model can be feed with the same acoustical stimuli as a human listener. The model then acts as the decision device in the AFC task. This makes it straight forward to compare human listening data with the computer model data, because identical experiments can be conducted on both, human subjects and the model.

Figure 3-1 shows a block diagram of the model. The model is built into the AFC framework (supplied by CAHR), which is also used to conduct psychophysical listening tests. The original implementation was done in the SI framework (Signal Interactive, software developed at University of Göttingen) on Silicon Graphics machines at Oldenburg University. The current implementation differs most significantly from the earlier implementation used by Dau et al (1996) by using a gammatone filter in the pre-processing stage, instead of a filter implementation by Strube (1985). In the following section, the adaptation-loop model is described further in detail. The current implementation will be verified against the model predictions from the Dau et al (1996) articles, to see if the current implementation is still able to explain the results from 1996. The current implementation will further be compared to new experimental data.

A non-linear signal processing model of the auditory system

22

Figure 3-1 A block diagram of the adaptation-loop model. The preprocessing of the signal consists of a 4’th-

order gammatone filter, half wave rectifier, a 1000-Hz low-pass filter, an adaptation loop stage followed by an

8-Hz low-pass filter. Noise is subsequently added to the signals. The difference between internal

representations the masker alone and masker + signal, is correlated with the normalized template difference.

The correlation is compared with a decision criterion. The input to the model is scaled so that a sinusoid

signal with peak of 1 corresponds to a Sound pressure level of 100 dB SPL.

The model needs three inputs, a masker plus a supra-threshold signal (signal that is easily detectable), stimuli only containing the masker, and stimuli containing the masker plus the current signal1. When used in an AFC test the model first calculates a mean internal representation of the masker plus a supra threshold signal [R(MS)], and the masker alone [R(M)]. This is done across a number of pre-processed realisations of the masker plus a supra-threshold and masker alone stimuli, in this thesis 10 realisation have been used. The difference between R(MS) and R(M) is then normalized to a RMS value of 1 and stored as the template. This can in a way be seen as what a listener could do in the first trials of a AFC test. In the first trials the signal is usually large and easy to detect. The listener could generate a representation of were the signal is placed temporally to the masker. The listener can thus concentrate on the segments where he knows the signal would be positioned. When the model is used in an M-AFC test the stimuli from each interval of a trial are applied to the model in the masker plus current signal path. The difference between the internal representation of the masker R(M) and of the internal representation of the current signal plus masker R(Ms) make the current difference. The current difference is cross-correlated with the template. This procedure is applied to all the intervals in the AFC trial (masker only or masker plus signal), giving a correlation value for each interval. The model then selects the interval with the highest correlation value.

1 The current signal is the signal for which the model determines is it is above or below threshold.

A non-linear signal processing model of the auditory system

23

Pre-processing The auditory pre-processing stage consists of a five modules, a band pass filter (Gammatone filter, GT), a half wave rectification (HW), a 1000-Hz low-pass filter (LP), 5 adaptation loops (AD) and a 8-Hz low-pass filter (LP). The band-pass filter is a simplification of the basilar membrane frequency to place transformation. In this model a 4’th-order gammatone filter is used, with the equivalent rectangular bandwidth (ERB, Moore and Glasburg, 1982) described by:

265.9/7.24 CFERB += where CF is the centre frequency of the filter. The nonlinear behaviour of the cochlea is neglected by using a linear gammatone filter. The model can be used with a single frequency-channel or in a gammatone filter-bank (GTFB) setup that allows multiple frequency-channels to be used in the detection process. The following half-wave rectification and 1000-Hz low-pass filtering (HWLP) effectively simulate the inner hair cell transformation of basilar membrane oscillations to hair cell receptor potentials. The fine structure seen in hair cell potentials at low frequencies, is removed at higher frequencies by the 1000-Hz low-pass filter. Effects of adaptation seen in the auditory nerve are simulated by the five adaptation loops following the low-pass filter. The adaptation loop module is drawn in Figure 3-2.The input to the adaptation stage is limited to a minimum value of 10-5 this constant determines the absolute threshold of the model (0 dB SPL). The adaptation loops were originally conceptualised by Püschel (1988). The adaptation loops are assumed to simulate neural adaptation taking place from the auditory nerve to the place in the brain were the sound is perceived. The effect of the adaptation loops is that they can account for the temporal masking effects seen in forward masking as well as intensity discrimination.

Figure 3-2 Electrical equivalent model of the five adaptation loops. Each adaptation loop

consists of a division stage a resistor and a capacitor. The resistor and the capacitor form a

low-pass filter with a time-constant τ. The input to the adaptation loops is limited to a value

of 10-5

this establishes an absolute threshold of the model.

The term adaptation refers to the effect that a stationary input signal I is gradually reduced at the output of a single adaptation loop to I0.5. This occurs because the capacitor is gradually charged or discharged through the resistor.

A non-linear signal processing model of the auditory system

24

The steady state relation between input and output for all five adaptation loops can be derived to

55.0IO =

This approaches a logarithmic transform. Figure 3-3 shows the near logarithmic transform of the adaptation loops, 0 dB SPL input in the model corresponds to a level of 10-5, and 100 dB SPL is equal to an input level of 1. The output of the adaptation loops is scaled linearly such that the input range of 0 dB to 100 dB is scaled linearly to 0 to 100 model units (MU), 1 MU change at output then roughly corresponds to 1 dB change at the input.

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

1001000 Hz Sinusoid

Amp [dB SPL]

Outp

ut [M

U]

Log transform

Adaptation loops

Figure 3-3 the steady state transformation of the five

adaptation loops The output unit is termed Model

Units (MU).

The logarithmic behaviour of the adaptation loops make it possible for the model to predict JND’s like that described by Weber’s law. The individual adaptation loops have different time constants varying from 5 ms to 500 ms (0.005 s, 0.050 s, 0.129 s, 0.253 s and 0.500 s). These time constants are based on experimental data from Dau et al, 1996). With these time constants, the adaptation-loop model is able to simulate forward-masking predictions. The adaptation loops are followed by a low-pass filter. The low-pass filter has in the original work of Püschel had the time constant of 200 ms, and was related to the temporal integration of tones. In the work of Dau et al (1996) this time constant was reduced to 20 ms, (this was done during experimental work with the adaptation-loop model). Later in 1997, Dau et al, extended the low-pass filter with a filter-bank, which should model modulation filters in the brain. The low-pass filter can in this perspective be seen as a simplified modulation-filter bank only containing the lowest filter. The low-pass is in this perspective a function that estimates the steady state level of the input signal rather than the fluctuations. The output of the low-pass filter could represent something that is representative to spike-rate signals in the brain. To show how signals at the output can reassembles auditory nerve spike-rate data (Figure 1-7), the output of the 8-Hz low-pass filter with a 200-ms 77 dB SPL noise signal input to the pre-processing stage, is shown in Figure 3-4. The output has a high onset, after that

A non-linear signal processing model of the auditory system

25

the output gradually settles to a resting level. At the offset of the noise the output falls below the zero input steady state of the system. Gradually the output increases to the zero input steady state value. This is similar to that seen in the auditory nerve even though it is very much faster acting than what is seen in the adaptation loops. Each adaptation loop should not be seen as a model of each nerve that the signal passes through on its way through the brain, but rather as a model of the cumulative adaptation that could take place.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5-0.4

-0.2

0

0.2

0.4

Time [s]

x

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5-200

0

200

400

600

Time [s]

Model units [

MU

]

Figure 3-4 Top: Output a 200-ms 77 dB SPL random

noise signal. Bottom: The output of the 8-Hz low-pass

filter as a result of using the noise as input to the

preprocessing stage. 0 corresponds to the zero input

steady state of the system.

The decision device The decision device is a simulation of how the human listener might detect which interval in an AFC test fits the stored template of how the signal should look. The detector is, in a sense, a realisation of an optimal detector. The decision process is assumed to be noisy, this is what is indicated by the addition Gaussian noise to the signals before the decision process. The actual implementation of the detector stage does not add real noise to the signals, but uses signal detection theory in the calculation of the probability of a given interval containing the signal. The use of signal detection theory increases the speed of the model in AFC tests. Signal detection theory can describe what happens statistically when a listener is forced to choose an interval in an AFC trial. When the listener hears and memorises intervals in the AFC trial, he is assumed to grade each interval on a scale ‘r’ that represents how likely it is that this interval contains the signal. When the listener grades an interval he is assumed to do this in a noisy fashion. Each time the listener hears the same interval he may grade it differently. The grade probability density function that an interval receives is assumed to be described by a Gaussian distribution. The grading of an interval containing noise (fN) and on noise and signal (fSN) are illustrated in Figure 3-5 as probability density functions. On the ‘r’ scale the mean value of the fN distribution is defined as zero. The mean value of the fSN interval is called µ.

A non-linear signal processing model of the auditory system

26

Figure 3-5 probability density functions of a interval

containing Signal and Noise fSN and the probability

density function for a interval containing Noise fN.

(Hartmann, 2000)

The detector uses the cross-correlation between the stored normalised difference template, calculated from a supra threshold signal plus masker, and the current difference signal. As an indication of where the current interval should be placed on the ‘r’ scale. By assuming a Gaussian noise distribution, described by the internal noise variance of the model and the correlation value as µ. It is possible to calculate the probability of being correct PC in an M-AFC trial (Wickens, 2002):

( ) drrFrfP

M

NSNC

1

)()(

−∞

∞−

∫ ⋅= where ∫∞−

=

r

NN dyyfrF )()(

The decision whether or not the model chooses the interval containing the signal is based on how high the PC ratio is. In a 3-AFC procedure the PC should be higher than 70.7%. When the detector is used in a multi-channel setup the stimuli is split into multiple frequency channels by the gammatone filter-bank in the pre-processing stage. The pre-processed of each channel builds a time-spectral representation matrix in the decision stage, that is also cross-correlated with the corresponding normalized time-spectral representation in the template. The correlation value can be used in the same way as the corresponding single-channel correlation value. The internal noise variance is the only adjustable parameter of the decision device. The variance is prescribed in Dau et al (1996) to be adjusted so that the model is able to predict Weber’s law in intensity discrimination. This has been done, and the internal noise variance of the model has been fixed to 6. This value shows the best fitting predictions with the model in the experiments described later. (Appendix B shows the effects on forward masking and intensity discrimination when changing the internal noise variance)

3.1.1 Verification of the model implementation

The Dau et al (1996) study shows a wide variety of experiments. Forward masking has been chosen to serve as the experiment to verify that the current implementation of the model can show the same predictions as show in 1996. A replot of the forward-masking experiment from Dau et al (1996) is show in Figure 3-6. Because the tests performed for the 1996 article were done with a frozen noise masker and a frozen signal (zero phase relative to signal onset), it was necessary to use the original noise in the

A non-linear signal processing model of the auditory system

27

verification against old predictions (See Appendix A for further details on masker signal interactions). To accommodate this requirement something that is believed to be the original noise from Dau et al was extracted from backups of the SI framework used at Oldenburg. Also a Strube-filter impulse response for a 1k-Hz filter was extracted. By using the same noise and a Strube filter it should in theory be possible to predict exactly the same data as the old predictions.

-0.005 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.0430

35

40

45

50

55

60

65

70

75

80Forward masking Dau (1996) Subject TD

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Dau (1996) measurement data

Dau (1996) model data

Figure 3-6 Replot of data and model predictions from

Dau et al, (1996) for subject TD (taken from fig. 7).

Forward masking is done with a 10-ms, 1000-Hz

hanning windowed sinusoid signal, with 0 phase

relative to onset of signal. Masker was a 200-ms

frozen white noise signal, band-limited to 20-5000-

Hz.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80Forward masking 1000 Hz Orig. Frozen noise

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Model: ST HWLP AD LP OptDet

Model: GT HWLP AD LP OptDet

Dau (1996) model data

Figure 3-7 Forward-masking predictions using the

original frozen noise and frozen signal from Dau et

al 1996. Noise level is 77 dB SPL. (□) show the AD

model predictions using a gammatone filter (GT).

(◊) show the AD model using Strube filter (ST). (♦)

show a replot of predictions from 1996.

Figure 3-7 shows the predictions of the adaptation-loop model with a gammatone filter (□) in a test using the original frozen noise used in Dau et al (1996) and a frozen signal (experimental conditions is described in 2.2). Also plotted is the prediction of the adaptation-loop model with a Strube filter (◊). The predictions show that the use of a gammatone filter can not account for the previously predicted data (♦). To investigate the difference between the gammatone filter and the Strube filter the impulse response and frequency response of the two filters, shown in Figure 3-8 and Figure 3-9. The increase in masking during the simultaneous masking (-10-ms to 10-ms) can be explained by three things. Some of the difference can be accounted for with a difference in energy. The energy passed on in white noise is by the Strube filter is 4 dB greater that that of the gammatone filter. Phase interactions between masker and signal are also shown in Appendix A to be able to influence the threshold greatly in this time-frame. The spectral properties of the noise may also contribute to this if it contains a large amount of low frequency energy near the offset of the masker. The difference in the impulse responses of the two filters can also explain the differences in the non-simultaneous predictions. The impulse response of the Strube filter can be seen to settle at a level of about -100 to -90 dB, were as the impulse response of the gammatone filter keeps on decaying. Energy from the masker is thus preserved for a longer time after the end of the masker by the Strube filter. This effect can account for the increased thresholds for predictions 150-ms after the end of the masker.

A non-linear signal processing model of the auditory system

28

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1-0.03

-0.02

-0.01

0

0.01

0.02

time [s]

x

Gammatone 1000 Hz

Strube 1000 Hz

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1-150

-100

-50

0

time [s]

20

⋅log|x

| [d

B]

Figure 3-8 Top: Plot of the impulse response of the

Strube filter and the gammatone filter, and the

frequency response of the Strube filter. Bottom: The

logarithmic plot of the absolute value of the impulse

response.

0 500 1000 1500 2000 2500 3000-120

-100

-80

-60

-40

-20

0

Freq [Hz]

Am

p [d

B]

Gammatone 1000 Hz

Strube 1000 Hz

Figure 3-9 Frequency response of the 1000-Hz

strube filter, plotted with the frequency response of

the 4- order gammatone filter used in the current

implementation.

Even though the “believed” original noise and Strube filter was used for the predictions in Figure 3-7 the predictions do not match entirely. Assuming that the noise and the noise are the same that were originally used the discrepancies may be an effect of a scaling problem of the strube filter or the noise. To test this, the predictions have been shifted to fit the old simultaneous predictions by increasing the noise level to 80 dB SPL. Figure 3-10 shows a plot of the predictions using the Strube filter, with noise levels of 77, 80 dB SPL. The threshold predictions using the 80 dB noise places predicted thresholds within 2 dB of the old predictions for the -5 to 20-ms range. For predictions above 20 ms the thresholds settle at a level that is above the old predictions that seem to be decaying towards a lower threshold.. Assuming that the lower thresholds of the old predictions continue falling to an absolute threshold of around 20 dB SPL, can lead to the conclusion the Strube filter used for the current predictions might differ from that used in the old predictions, or that some other part of the model differs from the original implementation.

A non-linear signal processing model of the auditory system

29

-0.005 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.0430

35

40

45

50

55

60

65

70

75

80Forward masking 1000 Hz Orig. Frozen noise

Offset-Offset time [s]

Ma

ske

d th

resh

old

[d

B S

PL

]Model data from Dau 1996

Orig. noise 77dB SPL Model: ST HWLP AD LP OptDet

Orig. noise 80dB SPL Model: ST HWLP AD LP OptDet

Figure 3-10 Predictions of the adaptation-loop model using the Strube

filter. Forward masking using the original 1996 frozen noise at thee

levels.

It has not been possible to verify the model the current implementation against the old results form Dau et al. The conclusion being that it is not possible if the original codes, used to generate the old predictions are found. The problem is most probably related to the Strube filter. It is believe that a different Strube filter could predict the old data. The adaptation-loop model is used in the current implementation with a gammatone filter.

3.1.2 Tests with the adaptation-loop model

Adaptation-loop model predictions are shown for two types of experiments, forward masking and intensity discrimination, as described in section 2. In forward masking, the predictions were based on a single frequency channel centred on the signal frequency. The use of more frequency channels does not influence the prediction in forward masking, because the signal-to-noise ratio is greatest for the channel centred on signal frequency. In frequency channels where the signal is dampened by the gammatone filter, the signal would be masked by the unaltered power from the masker. Intensity discrimination has been done using a single-channel model and a multi-channel model.

A non-linear signal processing model of the auditory system

30

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Model: GT HWLP AD LP OptDet

KS Left ear

OH Left ear

OH Right ear

TD Left ear

Figure 3-11 Forward masking at 1000-Hz using a 10-

ms signal. The masker is a 77 dB SPL white noise

band limited to 20-5000-Hz. Stimuli are described in

detail in section 2.2.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz

Model: GT HWLP AD LP OptDet

KS Left ear

OH Right ear

Figure 3-12 Forward masking at 4000-Hz using a

12-ms signal. Masker is a 78 dB SPL white noise

band limited to 20-5000-Hz. Stimuli are described

in detail in 2.3

Figure 3-11 show the forward-masking predictions and the measurement data for the 1000-Hz signal (○). The predictions of the model show significant deviation from the data from 0-30 ms (10 dB or more). Figure 3-12 show forward-masking predictions and measurement data for the 4000-Hz signal (○). The predictions at this frequency match data much better than that at 1000 Hz (within 5-7 dB, with the general trend too low). At 150-ms after the end of the masker the prediction and the measurement should be right above absolute threshold. Predictions at this point show that the absolute threshold might be to low for the model.

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in 1000 Hz tone

Model: GT HWLP AD LP OptDet

Model: GTFB HWLP AD LP OptDet

OH mean

Figure 3-13 Intensity discrimination for 1000-Hz

tone. Stimuli are described in detail in section 2.4.

GT stands for a single gammatone filter (single

channel) . GTFB stands for a gammatone filter-bank

(multi-channel). Filters in the GTFB are placed

between 500-4000-Hz. The mean measurements of

subject OH is plotted as squares.

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in noise

Model: GT HWLP AD LP OptDet

Model: GTFB HWLP AD LP OptDet

OH mean

Figure 3-14 Intensity discrimination for 20-7000-Hz

white noise. Stimuli are described in detail in

section 2.5. For Legend description see Figure 3-13.

A non-linear signal processing model of the auditory system

31

Figure 3-13 and Figure 3-14 show intensity-discrimination predictions for the model. Intensity discrimination for a 1000-Hz sinusoid, show that the predicted threshold for the model using a single filter centred at 1000 Hz (○) predicts the measured data well. Model predictions have also been carried out using a multi-channel approach (×). The multi-channel approach was achieved using a gammatone filter-bank (GTFB) as front end. The output of each filter in the filter-bank was processed by the model, and the decision devise used the output in a matrix in the decision process. The filter-bank contained filters with centre-frequencies of 569, 660, 761, 874, 1000, 1140, 1296, 1469, 1663, 1879, 2119, 2387, 2685, 3017, 3387, 3799-Hz. The band-width of each filter was determined by the ERB (Moore and Glasburg, 1982). The model predictions using a filter-bank produce too small values in the intensity-discrimination task. The intensity discrimination for noise shows similar results to the tone predictions. The multi-channel approach makes it possible to predict the constant fraction described by Weber’s law in noise, and the downward sloping thresholds for “the near miss to Weber’s Law” (see section 1.2.2).

3.1.3 Discussion of the adaptation-loop model

The outcome of the forward-masking experiment is that the model does predict the data for the 4000-Hz signal well. Using the same parameters the model does not predict the data at 1000 Hz well. Two possible solutions to this problem are seen, either the parameters of the adaptation-loop model must be changed as a function of frequency, or a different auditory filter than the gammatone must be used. If it is assumed that the auditory filter is the main problem in the model. Then the 4000-Hz experiment may serve as a better reference for the model than the 1000-Hz experiment. Auditory filter ringing is known to become smaller for higher frequencies. Assuming that the filter ringing at 4000 Hz is negligible such that only adaptation influence the forward-masking curve, then the adaptation loops do a good job of modelling the adaptation that take place. A differently shaped impulse response of the peripheral filter might be able to change the slope of the masking curve at 1000-Hz, somewhat like that seen by the Strube filter in section 3.1.1. If it is assumed that the auditory filter is not the main problem in the discrepancies of the 1000-Hz experiment. The only solution would be to change the adaptation loops as a function of frequency such that the model can predict the forward masking at 1000 and 4000 Hz. The exact method of how to do this has not been investigated, but preliminary experiments have shown that it is difficult to make the model predict the data at 1000 Hz. Attempts like increasing all time constants with a linear factor or removing adaptation loops do not show a clear improvement of predictions. Intensity discrimination showed good fits to the data of Subject OH using a single-channel approach. The question is whether or not the single-channel approach is realistic. The human listener does use off-frequency listening1 in an intensity-discrimination task. This is typically the suggested explanation for the near miss to Weber’s law. Using the model within a multi-channel

1 Off-frequency listening is a phenomena where the listener uses frequency channels not positioned at the actual signal frequency. The listener does this when this is advantageous for the task at hand (improves signal to noise ratio). The auditory system is assumed to do this automatically.

A non-linear signal processing model of the auditory system

32

approach does indeed show the behaviour suggested by the near miss to Weber’s law, however, the predictions are to low. Another problem currently related to the adaptation-loop model is that it does not include any particular form of non-linearity, which can directly be related to the basilar membrane non-linearity. This makes it troublesome to make the model predict what happens in hearing impaired persons with outer hair cell loss. In order to generate a more realistic model, an auditory filter that simulates non-linear effects of the basilar membrane should be implemented.

3.2 The temporal-window model

The temporal-window model was introduced by Oxenham and Moore (1994), Oxenham and Plack (1997) and Oxenham (2001). The model was originally intended to explain the phenomena’s of forward and backward masking, using temporal integration as an explanation of the phenomena. The temporal-window model consists of several preprocessing modules and a decision stage, and it can, like the adaptation-loop model, be used to predict thresholds from AFC tests of human listeners. A schematic drawing of the temporal-window model is shown in Figure 3-15.

Figure 3-15 A block diagram of the temporal window model. The preprocessing is done by a 4’th-order

gammatone filter, a full-wave rectifier, a broken-stick non-linearity, a squaring device and a convolution with

a window function (temporal window). Subsequently, a constant, N, is added to the signal, to establish an

absolute threshold and the masker+signal to masker ratio is calculated. The decision if the signal is above

threshold is based on the largest value of the ratio. If the ratio is larger than a decision criteria, k, the signal is

above threshold.

The general assumption is the temporal-window model is that the decision process in the brain is based on a signal-to-noise ratio in the time domain, somewhat analogues to that described by the power-spectrum model in the frequency domain. In the time domain such a temporal-power model could be written as:

∫∞

∞−

∞−

=

dttTWtN

dttTWts

k

)()(

)()(

2

2

A non-linear signal processing model of the auditory system

33

where t is time, s(t) is the signal at threshold, N(t) is the masker, TW(t) is a temporal window positioned at the maximum amplitude of the signal and k is a constant. The temporal window is assumed to be a weighting function, related to the way signals are integrated in the brain as a function of time. In the study of Oxenham and Moore (1994) this assumption on the decision criterion was integrated into a model using auditory pre-processing stages of basilar membrane filtering and hair-cell envelope extraction, a square law non-linearity and a “sliding” temporal-window integrator. This model showed to be able to predict both forward and backward masking. Later in Oxenham and Plack (1997) and Oxenham (2001) the square law non-linearity was replaced by a “broken stick” non-linearity related to the basilar-membrane non-linearity. For this investigation the temporal-window model has been implemented in the AFC framework, such that it can be tested with the same experiment setup as the adaptation-loop model. In the implementation, the temporal-window model only considers the AFC-trial interval containing the signal. By using the current pre-processed masker plus signal R(Ms) to masker alone R(M), the R(Ms)/R(M) ratio is calculated. The ratio is a representation of the signal-to-noise ratio at any given time of the stimuli. In the decision process it is assumed that the lissener is able to perceive the signal if the ratio at any time reaches above the threshold defined by the constant k. The maximum value of the ratio thus decides if the signal is above or below threshold. To make the AFC-procedure iterate towards the threshold, the model selects an interval not containing the signal, if the signal is below threshold. If the signal is above threshold, the model selects the interval containing the signal. The pre-processing The model as described in Oxenham (2001) consists of several pre-processing blocks and a decision stage (TWDet). The pre-processing stages consist of a gammatone filter (GT) or filter-bank (GTFB), a full wave rectification (FW), a broken-stick non-linearity (NL), and a squaring device (SQ). These stages model the actions of the basilar membrane and hair cells. A window (TW) models the temporal integration assumed to take place in the brain. Like the adaptation-loop model the temporal-window model uses a gammatone filter as a simulation of the basilar-membrane frequency-to-place transformation. The same 4’th-order gammatone filter was used for the temporal-window model. A full-wave rectifier rectifies the signal. A full-wave rectifier was used even though a half-wave rectification has been shown to be a more realistic model of the hair cell envelope extraction. But since the model originally was used in a 6 kHz region, where the hair cell fine structure has been shown to have broken down, the full wave rectification was used for simplicity reasons. If the output of the rectifications is smoothed by a low-pass filter, the difference between the output of a half-wave rectifier and a full-wave rectifier is a constant factor of 2. A non-linearity is applied in order to simulate the non-linear compressive behaviour of the basilar membrane.

),min( highhighlow KISISO +⋅⋅=

A non-linear signal processing model of the auditory system

34

where I and O are in dB (0 dB SPL corresponds to a sinusoid with a peak of 1). The temporal-window model has been used with several different nonlinear functions. All can be described by the non-linear function. The constants for the different non-linearities are listed in Table 3-1 and the Input output relations are plotted in Figure 3-16. (Appendix B shows some experiments where the nonlinearity slope is changed and the effect on forward masking)

Slow Shigh Khigh Reference

NL1 0.5 0.5 0 Oxenham and Moore (1994)

NL2 0.78 0.16 21.7 Oxenham and Plack (1997)

NL3 1 0.25 26.25 Oxenham (2001)

Table 3-1 The constants for the non-linearity as

used throughout the temporal window articles.

0 10 20 30 40 50 60 70 80 90 1000

5

10

15

20

25

30

35

40

45

50

Input [dB]

Outp

ut

[dB

]

Linear

NL1

NL2

NL3

SQ

Figure 3-16 Input-output relation of the non-

linearities NL1, NL2 and NL3. Also plotted is the

response of the squaring SQ.

The output of the non-linearity is squared to transform the signal to units of intensity. The squaring is related the physiological findings by Yates (1990), that has shown a squaring transformation from sound pressure level to neural spike-rate functions in the auditory nerve. After the non-linearities the signal is convolved with the temporal-window impulse response. The temporal window is contained by a single exponential function describing one half of the window and two exponential functions describing the other half of the window. The temporal window impulse response is calculated by:

0,)(

0,)1()( 21

≤=

>⋅+⋅−=−−

tetW

tewewtW

a

bb

Tt

Tt

Tt

Where t is the time and Ta, Tb1, Tb2 and w are constants. The exponential function that defines the filter for t ≤ 0 (the acausal part), plays a large role in the prediction of backward masking. The causal part of the filter plays the dominant role in predicting forward masking. Different sets of temporal-window parameters have been derived for the different nonlinearities. The parameters for the temporal window are listed in Table 3-2. The impulse response of the temporal window is plotted in Figure 3-17. The parameters for the temporal window were derived to obtain the best fit to forward-masking data. TW1 by Oxenham and Moore (1994) was derived from forward masking at 6000-Hz, TW2 was derived from 2000-Hz and 6000-Hz forward masking and TW3 was derived from forward masking at 4000-Hz.

A non-linear signal processing model of the auditory system

35

-0.04 -0.02 0 0.02 0.04 0.06 0.08 0.10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

time [s]

Am

p

TW1

TW2

TW3

Figure 3-17 Plot of a temporal window with TW1, TW2

and TW3 parameters. Parameters are listed in Table

3-2.

To establish an absolute threshold the model adds a constant N to the output of the temporal window. N could simulate internal noise in the auditory system. The decision device The decision device decides whether a signal is detected or not is based on the R(Ms)/R(M) ratio. This ratio is monotonically related to the signal-to-noise ratio and can for linear pre-processing stages be rewritten to R(s)/R(M)+1, where R(s)/R(M) is the signal-to-noise ratio. The threshold is decided by the decision variable k. If the largest value of the ratio is above k the signal is above threshold. The calibration of the model is done by changing k and N so that the model shows good forward-masking predictions. N can be derived by calibrating the absolute threshold of the model. k is a free parameter, that can be fitted freely to obtain the best prediction in forward masking. (Appendix B shows how forward masking changes when varying k) For each set of non-linearity and temporal window parameters a different ratio k has been derived. The k ratios that produced the best predictions of the individual temporal windows and their non-linearity’s are shown in Table 3-2. The absolute threshold of the model was calibrated so that a 4000-Hz, 6-ms sinusoidal signal ramped with 2-ms ramp functions produced an absolute threshold of 20.9 dB SPL like in Oxenham (2001). This can be done by using the model to predict the maximum value of the signal by passing the 6-ms signal scaled to 20.9 dB SPL through the preprocessing stage. The maximum value at output is termed R (R was predicted to 15.42 for this implementation using NL3 and TW3). N can be calculated from:

k

RN

−=

1

Using this noise N in the model should also predict the absolute threshold for signals of different duration.

A non-linear signal processing model of the auditory system

36

Non-linearity Temporal window1 Ratio Reference

Slow Shigh Khigh Ta [s] Tb1 [s] Tb2 [s] w k

NL1 0.5 0.5 0 TW1 0.0035 0.004 0.029 0.025 2.75 Oxenham and Moore (1994)

NL2 0.78 0.16 21.7 TW2 0.0035 0.0031 0.021 0.206 1.62 Oxenham and Plack (1997)

NL3 1 0.25 26.25 TW3 0.0035 0.0046 0.0166 0.170 1.67

Oxenham (2001)

Table 3-2 Table of different non-linearity’s and the temporal window parameters and decision rations.

Parameters are derived for the best fit to forward-masking predictions in different studies.

3.2.1 Verification of the model implementation

The current implementation of the model was verified in conditions of forward masking with the same parameters as in Oxenham (2001) (NL3, TW3 and k = 1.67). The Oxenham (2001) study shows forward-masking data for signal durations from 4 to 102 ms, the signal durations of 6 ms and 12 ms have been chosen to be used in the verification of the new implementation of the temporal window model. The experiment is described in section 2.3. Some differences should be noted about the comparison with the Oxenham predictions, the predictions done with the model in Oxenham (2001) were based on flat noise envelope signals, and not on fluctuating noise envelopes like done in this study. The effect of this should, however, negligible at the predictions at the temporal positions considered here.

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.110

15

20

25

30

35

40

45

50

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz, 12ms, signal

TW3 Oxenham2001 Measuered data

TW3 Oxenham2001 Model data

Model: GT FW NL3 SQ TW3 TWDet k=1.67

Figure 3-18 Oxenham model prediction and

measured data for a 12-ms signal compared to

prediction from the current implementation.

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.115

20

25

30

35

40

45

50

55

60

65

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz, 6ms, signal

TW3 Oxenham2001 Measuered data

TW3 Oxenham2001 Model data

Model: GT FW NL3 SQ TW3 TWDet k=1.67

Figure 3-19 Oxenham model predictions and

measured data for a 6-ms signal compared to

prediction from the current implementation.

Figure 3-18 and Figure 3-19 shows the predictions of the current model (○) compared to old predictions (□) from Oxenham (2001). The predictions of the current implementation show results that are within 2 to 3 dB of the old predictions, these relatively small deviations from the old predictions lead to the assumption that the implemented model was correct.

1 The temporal window in this implementation was normalised so the gain of the filter was 0 dB.

A non-linear signal processing model of the auditory system

37

3.2.2 Tests with the temporal Window model

The temporal window model was explicitly developed to explain non-simultaneous masking. In this thesis the model has been extended to show simultaneous masking and intensity discrimination, to offer a direct comparison to the adaptation-loop model.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Model: GT FW NL3 SQ TW3 TWDet

KS Left ear

OH Left ear

OH Right ear

TD Left ear

Figure 3-20 Forward masking with a 10ms 1000-Hz

sinusoid signal. Experiment is described in section

2.2

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz

Model: GT FW NL3 SQ TW3 TWDet

KS Left ear

OH Right ear

Figure 3-21 Forward masking at with a 12-ms,

4000-Hz sinusoid signal. Experiment is described in

section 2.3

In Figure 3-20 and Figure 3-21 forward-masking predictions at 1000 Hz and 4000 Hz are shown (○). The predictions were obtained using NL3, TW3 and k = 1.67. The predictions at 4000 Hz fit the data for subject KS and OH right ear well for the non-simultaneous masking (within 5 dB). Simultaneous-masking conditions show that the model predicts to high thresholds in comparison to the measured data. The forward-masking predictions at 1000 Hz, show too high prediction values in simultaneous masking, like it was the case for the 4000-Hz predictions. The predictions in the 10-ms to 30-ms area show too little masking. Other preliminary experiments have shown that using different parameters for the temporal window, the predictions at 1000 Hz can be fitted to the data. Using these parameters, however predictions in the 4000-Hz conditions did not fit. A different window shape for different frequencies seems to be necessary to make good predictions for both experiments, if this approach is used. Intensity-discrimination predictions have been preformed using a single channel (GT) centred on the signal frequency and multi-channel model, using a filter-bank (GTFB) containing filters from 500-4000-Hz (matching the filter-bank used in the adaptation-loop model tests). Intensity discrimination was done in a 1000-Hz pure tone signal and a white noise (20-7000-Hz).

A non-linear signal processing model of the auditory system

38

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12∆

L =

20 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in 1000 Hz tone

Model: GT HW NL3 SQ TW3 TWDet

Model: GTFB HW NL3 SQ TW3 TWDet

OH mean

Figure 3-22 Intensity discrimination for a 1000-Hz

tone. GT refers to a single gammatone filter with

centre-frequency (CF) of 1000-Hz. GTFB refers to a

filter bank with filter CF’s from 500-4000-Hz.

Experiment is described in section 2.4.

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in noise

Model: GT HW NL3 SQ TW3 TWDet

Model: GTFB HW NL3 SQ TW3 TWDet

OH mean

Figure 3-23 Intensity discrimination for a white

noise, band-limited to frequencies between 20 and

7000-Hz. See Figure 3-22 text. Experiment is

described in section 2.5.

Intensity-discrimination predictions in a pure tone (Figure 3-22) using a single frequency channel centred at the 1000-Hz tone (○) produce a dramatic increase in discrimination, when the signal level comes into the compressive region of the non-linearity (30 dB). The increase is removed in predictions of the multi-channel model (×). The difference in prediction between the single-channel and the multi-channel model can be explained by a mechanism that reassembles off-frequency listening in human listeners. When the channel centred on the frequency of the tone reaches the limit where the non linearity begins to compress the signal, like seen in the single channel predictions, the frequency channels next to the affected channel are still in the linear region because the signal damped by the gammatone filters. These channels show a bigger R(MS)/R(M) ratios than the channel where the signal is in the compressive region. The model predicts the signal to be above threshold if only one instance in time in one of the channels is above the decision ratio, intensity thus settles at the minimum level of the single filter predictions. The mechanism works until all the frequency channels of the model have signal levels in the compressive region, this happens at 100 dB where thresholds increase for the multi-frequency channel model. Intensity discrimination for noise (Figure 3-23) does not show the same flat intensity discrimination as in the pure tone for the multi-channel model (×). In white noise all channels come into the compressive region at the same level, because the spectrum of the signal is theoretically flat. The multi-channel model does show lower predictions than the single channel model (○), this is due to spectral and envelope fluctuations of the noise. A common problem for both 1000-Hz and 4000-Hz forward masking is that the model does not produce good predictions in conditions of simultaneous masking. Parameters derived in Oxenham (2001) were only derived from non-simultaneous data. In an investigation of whether or not the temporal-window model could predict simultaneous as well as non-simultaneous data the parameters of the model were changed. The parameters for the improved fit (TW4) are shown in Table 3-3.

A non-linear signal processing model of the auditory system

39

Non-linearity Temporal window1 Ratio

Slow Shigh Khigh Ta [s] Tb1 [s] Tb2 [s] W K

NL3 1 0.25 26.25 TW3 0.0035 0.0046 0.0166 0.170 1.67

NL3 1 0.25 26.25 TW4 0.0035 0.0161 0.0166 0.370 1.20 Table 3-3 Table of different non-linearity, the temporal window and decision rations parameters derived for

the best fit to forward-masking predictions.

Forward-masking predictions with TW4 are shown in Figure 3-24 and Figure 3-25. Intensity discrimination for the TW4 parameters are shown in Figure 3-26 and Figure 3-27

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Model: GT FW NL3 SQ TW4 TWDet

Model: GT FW NL3 SQ TW3 TWDet

KS Left ear

OH Left ear

OH Right ear

TD Left ear

Figure 3-24 Forward masking using a 1000-Hz

signal. Predictions of TW4 using k=1.2, are plotted

with predictions of TW3 with k=1.67.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz

Model: GT FW NL3 SQ TW4 TWDet

Model: GT FW NL3 SQ TW3 TWDet

KS Left ear

OH Right ear

Figure 3-25 Forward masking using a 4000-Hz

signal. Predictions of TW4 using k=1.2, are plotted

with predictions of TW3 with k=1.67.

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in 1000 Hz tone

Model: GT FW NL3 SQ TW4 TWDet

Model: GTFB FW NL3 SQ TW4 TWDet

OH mean

Figure 3-26 Intensity discrimination for a 1000-Hz

tone. GT refers to a single gammatone filter with CF

of 1000 Hz. GTFB refers to a filter bank with filter

CF’s from 500-4000-Hz. Experiment is described in

section 2.4

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in noise

Model: GT FW NL3 SQ TW4 TWDet

Model: GTFB FW NL3 SQ TW4 TWDet

OH mean

Figure 3-27 Intensity discrimination for white noise,

band-limited to frequencies between 20 and 7000

Hz. See Figure 3-26 text. Experiment is described in

section 2.5.

1 The temporal window in this implementation was normalised so the gain of the filter was 0 dB.

A non-linear signal processing model of the auditory system

40

The prediction obtained with the new parameter set gives better results in during simultaneous masking in the forward-masking experiment. More experiments should probably be performed, because the temporal-window shape influences the temporal integration of the signal, forward-masking experiments using signals of different duration could be affected. The temporal window parameters from Oxenham (2001) were derived from a larger number of experiments using different duration signals, they are as such perhaps better parameters for the model, when only considering non-simultaneous data. The intensity-discrimination predictions show that the changed decision ratio k, results in decreased intensity discrimination to a level at about 1 dB. This is more in line with literature results like that shown in section 1.2.2. The problem with increased intensity discrimination for noise remains, when signal levels reach the compressive region of the non-linearity.

3.2.3 Discussion of the temporal window model

The overall results of the temporal-window model show that it can predict the forward masking at 4000 Hz, like the adaptation-loop model. Forward masking at 1000 Hz show a similar trend as the adaptation-loop model. The temporal-window model seems unable to use the same parameters for 1000 Hz and 4000 Hz. Perhaps the problem could be solved in both models by having a different peripheral filter than the gammatone. Literature (Oxenham and Moore 1994) also suggests that the ringing of the auditory filters below 1000 Hz does influence forward masking, for this reason experiments with the temporal-window model has always been conducted at 2000 Hz or higher frequencies. The preliminary tests have shown that the temporal-window model can be fitted to the experimental data if the parameters are fitted individually at the different frequencies. The parameters derived previously in Oxenham (2001) showed too much masking in simultaneous conditions. The Oxenham parameters were derived to predict non-simultaneous data. Using a different set of parameters for the model, show that the simultaneous predictions can be improved without disturbing the non-simultaneous predictions in these experiments. It has jet to be verified if the alternate set of parameters work with different duration signals than 12 ms, like those done in Oxenham (2001).

A non-linear signal processing model of the auditory system

41

4 Model modification Further investigations of the two models can perhaps prove how the models account for forward masking, and why the both are able to predict the forward masking. Both models contain comparable elements of auditory filters, envelope extraction and decision devices, the implementations of the elements however do differ. To eliminate any effects that the differing shared elements have on forward masking, the elements have been standardized.

4.1 Auditory filter

The adaptation model is built with a linear gammatone filter this filter does not display any form of amplification of low level signals and compression of midlevel signals. The temporal-window model does implement a non-linearity that compresses mid-level and high-level signals much like that found in the cochlea. To standardize the mechanism related to the auditory filtering a realistic filter or filter-bank that simulates both the frequency response and the compressive nature of the cochlea must be chosen. Many models of the basilar membrane exist. As a general they fall into two categories point models or transmission line models. As a model for the basilar membrane the model should be relatively simple, both computationally and comprehensionaly, to be used with the adaptation-loop model and the temporal window model. Transmission line models tend to be computationally heavy in experiments where only a single frequency channel needs to be considered. For in these applications a point models like the gammatone filter proves much more efficient. As a model of the auditory filter with a non-linear behaviour the dual resonance non-linear (DRNL) filter was chosen. The DRNL was presented by Meddis, O’Mard and Lopez-Poveda (2001). It is a point model so it can directly replace the gammatone filter used in the original models. The DRNL is built on animal data, measuring the dynamic behaviour of the basilar membrane, as such it displays many of the phenomena’s known to take place in the cochlea. The DRNL is comprised of two signal paths one describing the linear characteristics of the cochlea and one that describes the active amplification of low level signals (Figure 4-1).

A non-linear signal processing model of the auditory system

42

Gammatone filter Gammatone filter Non-linearity Lowpass filter

Linear gain Lowpass filter Gammatone filter

Figure 4-1 Block diagram of the dual resonance non-linear

(DRNL) filter. The DRNL is compromised of a linear path

(top), and a non-linear path (bottom). Linear path has a linear

gain, a gammatone filter and a low-pass filter. Nonlinear path

has a gammatone filter, a broken stick non-linearity, a

gammatone filter and a low-pass filter. Paths are added

together at the output of the filter.

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

801000 Hz Sinusoid

Amp [dB SPL]

Outp

ut

am

p [

dB

]

DRNL output

DRNL Linear path

DRNL Nonlinear path

Figure 4-2 The steady-state input-output relation of a 1000-Hz

sinusoid passed through the 1000-Hz centre-frequency (CF)

DRNL filter. The response is the sum of two paths a linear and

non-linear path.

0 500 1000 1500 2000 2500 3000-60

-50

-40

-30

-20

-10

0

10Input 30 dB

Am

p [

dB

]

Freq [Hz]

DRNL

Gammatone

0 500 1000 1500 2000 2500 3000-60

-50

-40

-30

-20

-10

0

10Input 75 dB

Am

p [

dB

]

Freq [Hz]

DRNL

Gammatone

Figure 4-3 Frequency responses of the

DRNL filter at 30 dB SPL and 75 dB SPL

frequency response is normalized so the

gain at 1000-Hz is 0 dB. The frequency

response of the gammatone filter is plotted

as reference (Dashed line).

The most important feature of the filter is that it simulates the compressive properties of the cochlea. Figure 4-2 displays the steady-state input-output relation of the DRNL filter to a 1000-Hz sinusoidal signal. The use of the DRNL filter results in a compressive function in the signal path for pure tone signals from 40 dB SPL to 85 dB SPL. The DRNL is also able to simulate the reduction in frequency selectivity, at high signal levels the frequency response of the filter is broadened (Figure 4-3), and the peak of the filter is shifted toward lower frequencies. The DRNL filter is described in detail in Appendix C.

4.2 Hair-cell envelop extraction

The temporal-window model uses a full-wave rectification as envelope extraction. Full-wave rectification does not display properties of fine-structure seen in hair-cell receptor potentials (Pickles, 1988). Full-wave rectification can only be seen as a good approximation of the envelope extraction at high frequencies. The adaptation-loop model uses a more realistic model of the hair-cell envelope extraction the adaptation-loop model uses a half-wave rectification followed by a 1000-Hz low-pass filter. This mechanism models the fine-structure seen in hair-cells at different positions of the basilar

A non-linear signal processing model of the auditory system

43

membrane. For frequencies below 1000 Hz the hair-cell fine-structure is seen at the output of the low-pass filter, and at higher frequencies the fine-structure is smeared by the low-pass filter. The temporal-window model was therefore modified so it uses the more realistic half-wave and low-pass filter from the adaptation-loop model. The temporal-window model also uses a squaring expansion that could be seen to simulate the intensity relation seen from the basilar membrane to the action potentials measured in the auditory nerve (Yates et al, 1990). The adaptation-loop model has also been tested with the squaring expansion.

4.3 Decision mechanism

The temporal-window model has a decision mechanism that only takes into account one instance in time. If the R(MS)/R(M) ratio at any instance in time reaches the decision ratio k the signal is detected as above threshold. Oxenham and Moore (1994) suggest that this decision mechanism is not a good approximation for some experiments, where forward and backward masking is combined in the same stimuli (combined masking). They suggest a “multiple look” decision mechanism that uses a window function to weight several instances in time in an optimal way, as a solution to this problem. The adaptation-loop model from Dau et al (1996) incorporates a decision device that matches that which was described by Oxenham and Moore (1994). The template of the adaptation-loop model is a window function that is optimised to weight different instances in time to achieve the best signal-to-noise ratio. The decision process is as such based on optimal detector theory. As a model of the decision mechanism the optimal detector from Dau 1996 was chosen.

A non-linear signal processing model of the auditory system

44

5 Modified models In this section, the predictions of two modified versions of the adaptation-loop model and one modified version of the temporal window model, are discussed and compared. In the modified adaptation-loop model the gammatone filter is replaced by the DRNL filter (section 5.1). The modified adaptation-loop model is also tested with a squaring device so it reassembles the temporal-window model more (section 5.2). The temporal-window model is modified so it uses the optimal detector as realised by Dau et al (1996). Further the gammatone filter and the broken-stick nonlinearity are replaced by the DRNL and the full-wave rectification is replaced by a half-wave rectification followed by a low-pass filter (section 5.3).

5.1 The adaptation-loop model with DRNL

The adaptation-loop model was modified so it used a DRNL filter instead of a gammatone filter. A schematic drawing of the model is shown in Figure 5-1.

Figure 5-1 Block diagram of the adaptation-loop model using a DRNL filter or filter-bank as peripheral

auditory filter.

The replacement of the gammatone filter with a DRNL filter influences the forward masking and intensity discrimination for various ways. The following effects should be kept in mind when analysing the results. The non-linear behaviour changes the overall steady state transformation of the model, which in the original model was an approximated log transformation (see section 3.1 about adaptation loops). The new steady state relation can be seen in Figure 5-2. It can be seen that the approximate linear transformation from dB to MU no longer holds for the modified adaptation-loop model. Also an important effect is the reduced frequency selectivity at higher sound pressure levels.

A non-linear signal processing model of the auditory system

45

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

1001000 Hz Sinusoid

Amp [dB SPL]

Outp

ut [M

U]

Log transform

GT HWLP AD

DRNL HWLP AD

Figure 5-2 The overall steady state transformation

of the adaptation-loop model with a non-linear

DRNL filter. The graph displays the output of the

adaptation loops in model units [MU] as a function

of a pure tone input to the model.

0 1000 2000 3000 4000 5000 6000 7000-50

-40

-30

-20

-10

0

10Input 30 dB

Am

p [

dB

]

Freq [Hz]

DRNL

Gammatone

Figure 5-3 Auditory filters of the DRNL filter-bank

from 500 to 4000-Hz CF with ERB spacing of 1. Used

in the intensity detection experiment with the DRNL-

Dau model. Centre-frequencies 569, 660, 761, 874,

1000, 1140, 1296, 1469, 1663, 1879, 2119, 2387, 2685,

3017, 3387, 3799-Hz.

5.1.1 Tests with the adaptation-loop model with DRNL

The forward-masking and intensity-discrimination predictions are done to compare predictions from the original model

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Model: DRNL HWLP AD LP OptDet

Model: GT HWLP AD LP OptDet

KS Left ear

OH Left ear

OH Right ear

TD Left ear

Figure 5-4 Forward masking at 1000-Hz.

Predictions from the adaptation-loop model using a

DRNL filter as peripheral auditory filter are

plotted with the predictions from the unmodified

model.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz

Model: DRNL HWLP AD LP OptDet

Model: GT HWLP AD LP OptDet

KS Left ear

OH Right ear

Figure 5-5 Forward masking at 4000-Hz. Predictions

from the adaptation-loop model using a DRNL filter

as peripheral auditory filter are plotted with the

predictions from the unmodified model.

Figure 5-4 and Figure 5-5 show forward-masking predictions (○) compared to previous predictions (···) and experimental data. It can be seen how the DRNL influences the adaptation-loop model. For simultaneous masking the threshold is increased. For the transition from simultaneous to non-simultaneous masking, the threshold curve gets steeper. For the non-

A non-linear signal processing model of the auditory system

46

simultaneous-masking conditions, the threshold curve is lower in the beginning but convey towards the same threshold. The raised masking level in simultaneous-masking conditions can be explained by the signal being in a level region where the DRNL filter is broader than the original gammatone filter, passing more energy from the masker through. The theory of “The power spectrum model” can explain the raised thresholds because the signal-to-noise ratio of the output is decreased. In the non-simultaneous conditions, the signal threshold is going to be in a region where the DRNL compression no longer influences the signal (signal level is lower than the 40 dB where the compression starts). The masker with a level of 77 dB SPL is still influenced by the compression. Because of the added compression the total power of the masker is reduced relative to the gammatone filter implementation. The charged capacity of the adaptation loop is thus smaller, which in turn leads to less suppression of the signal. The absolute threshold remains the same as before the introduction of the DRNL. The explanation for this can be found in the lower linear region of the steady-state transformation (Figure 5-2), for the near absolute threshold signal the steady-state relation is equivalent to the one found in the unmodified model.

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in 1000 Hz tone

Model: DRNL HWLP AD LP OptDet

Model: DRNLFB HWLP AD LP OptDet

OH mean

Figure 5-6 Intensity discrimination for a 1000-Hz

sinusoid. Predictions are done using a single

channel model (DRNL) and a multi-channel model

(DRNLFB). The frequency channels in the multi-

channel model were placed between 500-4000 Hz.

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in noise

Model: DRNL HWLP AD LP OptDet

Model: DRNLFB HWLP AD LP OptDet

OH mean

Figure 5-7 Intensity-discrimination predictions from

the adaptation-loop model using a DRNL filter as

peripheral auditory filter. See caption in Figure 5-6

for legend explanation.

Figure 5-6 and Figure 5-7 show intensity discrimination for pure tone and noise. In the intensity-discrimination tasks the effect of introducing the DRNL is seen clearly. For model predictions in a 1000-Hz sinus tone, a defined bump is observed (○). When the signal level enters the compressive region of the DRNL, intensity-discrimination threshold is increased. When the signal is within the compressive region, a 1-dB increment in level will not result in an increment of 1 MU at the output of the preprocessing stage (Figure 5-2), like in the original model with linear gammatone filter. Instead the compressive region of the DRNL compresses the increment of down to 0.25 MU. The decision stage will then need an increment that is 4 times larger in this region to be able to detect the increment. Above or below the compressive region the effect is removed, because the DRNL is linear in these regions. Predictions here are equal to that of the gammatone filter.

A non-linear signal processing model of the auditory system

47

When using a model implementation that uses multiple filters (Figure 5-3) the bump is completely removed (×). This is an indication that the model uses off-frequency listening to detect the increment. In intensity discrimination for pure tones, the frequency channels next to the filter positioned on the signal frequency can be used in the detection process when the signal level enters the compressive region for the on-frequency filter. In predictions of intensity discrimination for white noise, the bump is also observed both for the model when using a single filter (○) and filter-bank (×), though it is not quite as defined as in the pure tone condition. The reduction of the bump is quite probably due to the stochastic nature of the white noise. Envelope fluctuations of the noise may result in the sometimes being below compressive threshold enabling the model to detect the increment. The use of more frequency bands increases the probability of the model having a signal below the compressive region.

5.2 The adaptation-loop model with DRNL and an squaring

The adaptation-loop model with DRNL and squaring device has been implemented to provide a more direct comparison to the temporal window model. Also the squaring expansion was expected to counteract the added compression from the DRNL, and perhaps improve the poor predictions after introducing the DRNL.

Figure 5-8 Block diagram of the adaptation-loop model with DRNL and expansion. Decision mechanism is

the same as in the unmodified implementation.

The squaring device was implemented so that the input of a 10-5 amplitude signal (corresponding to a 0 dB SPL signal in the model) at the input of the squaring yields a 10-5 amplitude signal at the output. This is important for the adaptation loops that limits the signal to 10-5. The squaring expansion is:

52 10⋅= xy

A non-linear signal processing model of the auditory system

48

Where x is the input and y is the output. The squaring device influences the overall steady-state transformation since the squaring expands the signal at the output of the preprocessing stage. The steady-state relations from input to output of the model are plotted in Figure 5-9. It is shown how a increment of 1 dB with the squaring devise yields roughly 2-MU at the output for the previous linear parts of the steady state relation.

0 10 20 30 40 50 60 70 80 90 1000

20

40

60

80

100

120

140

160

180

2001000 Hz Sinusoid

Amp [dB SPL]

Outp

ut [M

U]

Log transform

DRNL HWLP AD

DRNL HWLP SQ AD

Figure 5-9 The overall steady-state transformation of

the adaptation-loop model with DRNL filter and

squaring device. The graph displays the output of the

adaptation loops in model units [MU] as a function of a

pure tone input to the model.

5.2.1 Tests with the adaptation-loop model with DRNL and squaring

Like previously the predictions are done for forward masking and intensity discrimination, to show how the changes influences predictions.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Model: DRNL HWLP SQ AD LP OptDet

Model: GT HWLP AD LP OptDet

KS Left ear

OH Left ear

OH Right ear

TD Left ear

Figure 5-10 Forward masking at 1000-Hz predictions

from the adaptation-loop model using a DRNL filter

as peripheral auditory filter and a squaring before

the adaptation loops.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz

Model: DRNL HWLP SQ AD LP OptDet

Model: GT HWLP AD LP OptDet

KS Left ear

OH Right ear

Figure 5-11 Forward-masking predictions at 4000-

Hz. See Figure 5-10 text.

Figure 5-10 and Figure 5-11 show the forward-masking prediction for the modified adaptation-loop model using the additional expansion (○). Since the introduction of the compressive DRNL

A non-linear signal processing model of the auditory system

49

lead to an increased steepness op the forward-masking prediction it is to be expected that a successive expansion should counteract the compression slightly. The masking curve has become less steep after the introduction of the expansion. In fact, the predictions fall almost on top on the previous predictions (···) for the section between 5-ms to 100-ms. The masked threshold in the simultaneous-masking condition are still raised. An improvement of the absolute threshold have also been tested, by raising the minimum limit of the adaptation loops from 10-5 (0-dB) to 2.5·10-5 (8-dB). Using this limit in the adaptation loops (AD8) an improvement of the thresholds around 150 ms were obtained the forward-masking result at 4000 Hz are, shown in Figure 5-12.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz

Model: DRNL HWLP SQ AD8 LP OptDet

Model: GT HWLP AD LP OptDet

KS Left ear

OH Right ear

Figure 5-12 Forward-masking predictions at 4000-Hz,

with changes in the minimum limit before adaptation

from 10-5

(0-dB) to 2.5·10-5

(8-dB). The changed

adaptation loops are abbreviated AD8.

Figure 5-13 and Figure 5-14 show intensity discrimination for a 1000-Hz sinusoid and for white noise. Intensity discrimination is seen to have dropped to about half that seen in the experiment with the model without squaring (Figure 5-6 and Figure 5-7). The drop can be explained with the steady-state relation of the model (Figure 5-9) which slope is increased by a factor of two for the level range 0 to 100-dB SPL. An increase of 1 dB, in the linear region of the DRNL, results in a change of 2-MU at the output. Since the Detector is still adjusted to detect intensity increments of 1-MU the threshold decreases to half that seen previously.

A non-linear signal processing model of the auditory system

50

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6∆

L =

20 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in 1000 Hz tone

Model: DRNL HWLP SQ AD LP OptDet

Model: DRNLFB HWLP SQ AD LP OptDet

OH mean

Figure 5-13 Intensity-discrimination predictions for

pure tone from the adaptation-loop model using a

DRNL filter as peripheral auditory filter and a

expansion before the adaptation loops.

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in noise

Model: DRNL HWLP SQ AD LP OptDet

Model: DRNLFB HWLP SQ AD LP OptDet

OH mean

Figure 5-14 Intensity-discrimination predictions for

white noise. See Figure 5-13 text.

5.3 The modified temporal-window model

The temporal-window model has been modified so it uses the DRNL filter as peripheral auditory filter. The envelope extraction has been changed from a full wave rectification to a half wave rectification. The broken stick nonlinearity is removed and replaced by squaring expansion (Like the expansion from section 5.2). The model is placed in a optimal detector setup like the adaptation-loop model. A block diagram of the modified model is seen in Figure 5-15.

Figure 5-15 A block diagram of the temporal-window model after modification. Gammatone filter and

broken stick non-linearity is replaced by the DRNL filter. Full wave rectification is replaced by a half wave

rectification followed by a low pass filter. The preprocessing and the division stage function as input to the

optimal detector stage. The mid section stage where input is masker and masker can be removed since output

will always be 1.

A non-linear signal processing model of the auditory system

51

Appendix Dshows the effect of the individual changes on the predictions of the model. Through the tests shown in Appendix D individual parameters of the model were changed such that the model still predicted the same results. The introduction of all modifications means, that it was necessary to change the temporal window shape, and to adjust the noise N added to the signal. Also, the internal noise of the detector must be adjusted the model it can predict the forward-masking data. This was done on a trial and error basis, leading to the parameters shown in Table 5-1, these parameters show good forward-masking predictions in the 4000-Hz experiment.

Temporal window1 Noise Int. var.

Ta [s] Tb1 [s] Tb2 [s] w N σinternal

TW6 0.0035 0.011 0.047 0.001 10e-5 0.001 Table 5-1 Temporal window parameters for the modified temporal-window

model derived to obtain good forward-masking predictions .

5.3.1 Tests with the modified temporal window model

The following plots of predictions are from the modified temporal-window model new fitted parameters in Table 5-1.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Model: DRNL HWLP SQ TW5 TWOptDet

Model: GT FW NL3 TW4 TWDet

KS Left ear

OH Left ear

OH Right ear

TD Left ear

Figure 5-16 Forward masking at 1000-Hz.

Predictions for the unmodified model is plotted in

combination with the modified temporal window

model.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz

Model: DRNL HWLP SQ TW5 TWOptDet

Model: GT FW NL3 TW4 TWDet

KS Left ear

OH Right ear

Figure 5-17 Forward masking at 4000-Hz. (See

Figure 5-16 text)

Forward-masking predictions for the modified temporal-window model shown in Figure 5-16 and Figure 5-17 The parameters were derived to obtain a good fit with the previous predictions in the 4000-Hz experiment. The model parameters can also be fitted to predict the measurement data much better. The previous predictions of the temporal-window model and the new predictions match well at both 1000-Hz and 4000-Hz. The 1000-Hz predictions show the same problem between 10-30-ms as before the modification.

1 The temporal window in this implementation was normalised so the gain of the filter was 0 dB for DC.

A non-linear signal processing model of the auditory system

52

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in 1000 Hz tone

Model: DRNL HWLP SQ TW5 TWOptDet

Model: DRNLFB HWLP SQ TW5 TWOptDet

OH mean

Figure 5-18 Intensity-discrimination predictions for

1000-Hz pure tone for the modified temporal window

model.

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in noise

Model: DRNL HWLP SQ TW5 TWOptDet

Model: DRNLFB HWLP SQ TW5 TWOptDet

Figure 5-19 Intensity-discrimination predictions for

white noise for the modified temporal window

model.

The intensity-discrimination predictions using the modified model are shown in Figure 5-18 and Figure 5-19. The predicted overall intensity discrimination are too low for both intensity-discrimination tasks. Like the adaptation-loop model and the unmodified temporal-window model the effect of compression is seen as a bump in the single-channel predictions (○). The temporal-window model also seems to predict decreasing intensity discrimination with increasing level for the pure tones like described by “the near miss to Weber’s law” and the constant intensity discrimination for noise, when using a multi-channel approach (×).

5.4 Discussion of the modified models

In general both models seem able to predict forward-masking data in the 4000-Hz experiment after the expansion has been added to the adaptation-loop model. A more thorough investigation is needed to see if the adaptation-loop model is able to account for the forward-masking data without the expansion. If the parameters of the adaptation loops can be adjusted in a way such that they compensate for the increased steepness of the masking curve, the model could work without the squaring. However, preliminary experiments have shown that this can not be done easily by simply increasing the time constants of the loops. More work would be needed to achieve a better fit with the adaptation loops. Both models have the same problem in predicting experimental data in the 1000-Hz forward-masking experiment. These findings are very similar to those observed before the modifications to the models were introduced. Applying the DRNL as pre-processing filter has not solved this problem. Intensity-discrimination thresholds with pure tones for the modified adaptation-loop model without squaring show the same results as in before the modification, except for the increased thresholds in the single filter channel predictions (30-80 dB). The added compression of the DRNL does, however, not cause a great problem for the model if the model is allowed to use off-frequency information to detect intensity increments. Using more frequency channels the model will always have a channel where the increment is not compressed. In noise the use of multi-channel approach also show to be an advantage for the model in decreasing the increment

A non-linear signal processing model of the auditory system

53

between 30 and 80-dB from the added compression. Because of the envelope fluctuations of the noise the model has more excitation that is in the linear region of the DRNL filter to detect increments. Intensity discrimination for the modified adaptation-loop model with squaring device show decreased intensity-discrimination thresholds relative to the model without squaring. This can be clearly linked to the squaring that amplifies the differences in the signals. Intensity discrimination for the temporal-window model has decreased considerably to a level well below the data. This can be explained by the change in the detector stage (Appendix D shows that the change of the detector causes decreased intensity discrimination). Even though the levels of the predictions are too low, the use of the optimal detector does show an advantage if Weber’s law and “the near miss” are to be taken into account. The temporal-window model does display the differences in slopes from pure tone experiments to noise experiments that were not seen using the original decision device. It is perhaps surprising that the intensity discrimination increase due to the midlevel compression of the DRNL disappears completely in multi-channel predictions. The effect is due to the off-frequency “listening” that the model simulates by applying more emphasis on frequency channels where signal differences are not compressed by the DRNL. This behaviour could be linked to the fact that no bump is observed in human listeners even though the cochlea is known to show the compression like that simulated by the DRNL. The fact that the bump is still observed in simulations using intensity discrimination for noise, can perhaps be linked to the simplifications made to the DRNL. The simplified DRNL shows compression in the same region for all frequencies, the original DRNL has compressive regions wider variety of compressive regions (see Appendix C). If the original DRNL parameters were used the bump in intensity discrimination for noise would probably be smeared or completely disappear. In some special cases of intensity-discrimination experiments, with a notched noise, masking frequency bands outside the signal frequency (side-bands), have shown a strong midlevel increase intensity discrimination (the severe departure from Weber’s law, See section 1.2.2). The experiments using a single-channel could be a simulation of this since the model cannot use off-frequency information, like in a notched-noise intensity-discrimination experiment. The effect though has only been observed in short signal intensity discrimination, and not for long signals like the ones done in these predictions. The simulations could however give an indication the departure from Weber’s law could be linked to the compression of the cochlea.

5.5 Model Comparison

When both the temporal-window model and the adaptation-loop model are implemented within the same frame work of peripheral filtering and decision making, they both seem able to predict forward masking, even though the predictions do not match completely. The predictions for forward masking at 4000 Hz of the two models are shown in Figure 5-20.

A non-linear signal processing model of the auditory system

54

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Ma

ske

d th

resh

old

[d

B S

PL

]Forward masking 4000 Hz

Model: DRNL HWLP SQ TW5 TWOptDet

Model: DRNL HWLP SQ AD8 OptDet

KS Left ear

OH Right ear

Figure 5-20 Comparison of the adaptation loop model and the

temporal-window model forward masking predictions a 4000 Hz.

The unified modelling framework that the models are implemented in after the modification, offers the option to directly compare the mechanisms responsible for forward masking in the models. In the following, the apparent equally good ability to predict the data is analysed, to se if the mechanisms underlying are related. The elements of the adaptation-loop model and the temporal-window model that could account for the mechanism of forward masking are seen in Figure 5-22 and Figure 5-21 . The adaptation-loop model first limits the stimuli to levels above a minimum limit, the stimuli is then adapted by the five adaptation loops, and smeared by an 8-Hz low-pass filter. The temporal-window model first smears the stimuli by filtering with the temporal window, then limited to signal levels above the minimum level established by the addition of N, and is then suppressed in a division by the representation of the masker alone R(M).

A non-linear signal processing model of the auditory system

55

Figure 5-21 A schematic drawing of the adaptation-

loop model. Showing how the adaptation loops

accounts for forward masking by suppressing the

signal in the division with the divisors.

Figure 5-22 A schematic drawing of the temporal

window model. Showing how the temporal window

accounts for forward masking by suppressing the

signal in the division with the divisor R(M).

Because the low-pass filter of the adaptation-loop model and the temporal window in the masker plus signal path, are both linear transformations of the signal and masker, they cannot alone account for the forward masking in this modelling framework. They do, however, play a role in the integration of the signal. The minimum-limit of the adaptation-loop model and the addition of N in the temporal-window model, effectively establish an absolute threshold for both models. They do not account for the decreasing thresholds after the masker. The only place where the signal is influenced by the masker in non-simultaneous forward-masking conditions is in the division stages of both models. The adaptation-loop model divides the stimuli with the low-pass filtered output of each adaptation loop. The output of each low-pass filter will display a certain amount of persistence from the masker, and because of this the signal is suppressed by a larger amount at temporal positions close to the masker, than far away from the masker. The temporal-window model divides the stimuli by the temporal-window filtered masker-alone stimuli. The temporal window is in essence also low-pass filter and does as such also display an amount of persistence from the masker, by dividing the signal by the filtered masker the signal is suppressed by persistent energy from the masker. To establish a mathematically equivalent division in both models, the adaptation loops were rewritten such that they that the output is expressed as a single division of an input x.

A non-linear signal processing model of the auditory system

56

)()()()()(

...

)()(,

)(,

)(,

)(

)(

)(,

)(,

)(,

)(

)(,

)(,

)(,

)(,

)(

544332211

544

3

33

23

22

12

11

1

5

44

3

33

23

22

12

11

1

5

4

44

34

33

23

22

12

11

1

yfyfyfyfyf

xy

yfyf

yy

yf

yy

yf

yy

yf

xy

yf

yf

y

yyf

yy

yf

yy

yf

xy

yf

yy

yf

yy

yf

yy

yf

yy

yf

xy

⋅⋅⋅⋅=

⋅====

====

=====

y1, y2, y3, y4, y are intermediate signals in the adaptation loops shown in Figure 5-21. fq is the low-pass filter function of each adaptation loop (q = 1, 2, 3, 4, 5). It is seen that the adaptation loops can be derived as the input x divided by the product of the output of each low-pass filter (the divisors). Using the rewritten adaptation loops it is possible to separated the divisor calculation from the signal path, and see the loops in a feed forward mechanism like shown in Figure 5-23. The adaptation-divisor function is the product of all the divisors in the adaptation loops like shown previously.

Figure 5-23 Schematic drawing of the adaptation

loops. The adaptation loops have been rewritten to a

single division with a divisor that can be calculated

from the input signal to the loops. The adaptation

loops are depicted as a feed forward mechanism in

this form. The adaptation divisor function refers to

the product of all the divisors.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5-90

-80

-70

-60

-50

-40

-30

-20

-10

Time [s]

De

vis

or

[dB

]

TW5 divisor

AD8 divisor (masker)

AD8 divisor (masker+signal)

Figure 5-24 Divisor plot for the adaptation-loop model

and the temporal-window model (TW5). Divisor is

plot for the masker alone and for masker plus signal

in a number of offset conditions. Divisor was derived

from forward masking at 4000-Hz, the temporal

positions of the signal offset relative to masker offset

(200-ms) are -10, 5, 22, 60 102, 150-ms.

To clarify how the signal thresholds in both models are influenced by the divisors, the divisor of the temporal-window model (dash-dot line, TW5 divisor) and the adaptation-loop model (dotted line, AD8 divisor) are shown in Figure 5-24, for signals at threshold for different temporal positions relative to the masker. Also show is the adaptation-loop divisor for stimuli only containing the masker (solid line). It is seen that the temporal-window model divisor and the adaptation-loop divisor follow each other closely until the end of the masker, where they begin to drop toward the minimum limit,

A non-linear signal processing model of the auditory system

57

established by N in the temporal-window model and the minimum-limit in the adaptation-loop model1. The temporal window model divisor is independent on signal level or position, whereas the adaptation loop divisor displays a dependence on the signal level and position (seen as peaks on the divisor plot). The peaks are an effect of the signal energy charging the capacitors of the adaptation loops, which is in turn is used to divide the signal, this effect is termed self-suppression. By looking at the size of the peaks it is seen that the amount of self-suppression is not of equal size, but apparently also depends on the temporal position relative to the masker. Even though the peaks of the adaptation-loop model divisor come close to the temporal-window divisor for some of the temporal positions of the signal, the temporal-window model and adaptation-loop model cannot be directly compared using these modelling schemes. Instead of comparing the temporal-window model directly to the adaptation-loop model, it was compared to a simplified adaptation-loop model. In the simplified-adaptation-loop model, effects of self-suppression were eliminated by only calculating the divisor based on the masker stimuli, like in the temporal-window model. A schematic drawing of the simplified adaptation-loop model is shown in Figure 5-25. It can be seen how the adaptation-loop model reassembles the temporal-window model in the division being done on the basis of a masker-alone stimuli. To further show the mathematical relations between the models the temporal window model was rewritten (Figure 5-26). Because the temporal-window filter is a linear transformation the addition of N can be moved in front of the temporal-window filter. By making this change the addition of N and the min. limit modules have the same function of establishing the absolute threshold at the same stage in the models. The temporal window function in the divisor path can also be seen as the temporal-window equivalent of the adaptation divisor function. The temporal-window filter in the signal path and the low-pass filter of the adaptation-loop model, could also be seen as related functions, even though they are on opposite sides of the division, because they both low-pass the stimuli.

Figure 5-25 Schematic drawing of the simplified-

adaptation-loop model, where self-suppression is

eliminated. Divisor no longer depends on the signal

but only on the masker. Divisor can be seen in Figure

5-24 as the solid line (masker).

Figure 5-26 Rewritten Temporal window model, the

addition of the constant N is roughly equivalent to the

minimum limit of the adaptation loops. The temporal

window filter in the divisor path (bottom) is

equivalent to the adaptation divisor function.

The effect the neglecting the self-suppression is shown in Figure 5-27, where forward-masking predictions of the simplified-adaptation-loop model (◊) are shown with predictions of the

1 The minimum limit in the adaptation loops displayed here is 2.5·10-5

, refer to AD8 in section 5.2.1.

A non-linear signal processing model of the auditory system

58

modified adaptation-loop model (□). The self-suppression influences the forward-masking predictions strongly, 6-7 dB at the maximum, and cannot be neglected if the model should predict the measured thresholds. The findings from the temporal-window model, that show it is possible to predict forward masking without self-suppression, leads to the assumption that the effect of self-suppression can be compensated for in the calculation of the divisor of the adaptation loops. To show that the simplified-adaptation-loop model and the temporal-window model are closely related, the divisor of the temporal-window model has been adjusted such that it shows the same divisor curve as the simplified-adaptation-loop model (Shown in Figure 5-28 as the dotted line), this was done by adjusting the temporal-window parameters to the values shown in Table 5-2 (TW6).

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz

Model: DRNL HWLP SQ AD8 OptDet

Model: DRNL HWLP SQ AD8Fixed OptDet

KS Left ear

OH Right ear

Figure 5-27 Predictions of the adaptation-loop model

(○) and the simplified-adaptation-loop model (◊)

where the effect of self suppression has been

eliminated, by using a fixed divisor calculated from

the masker alone (AD8Fixed).

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5-90

-80

-70

-60

-50

-40

-30

-20

-10

Time [s]

Devis

or

[dB

]

TW5 devisor

TW6 devisor

AD devisor

Figure 5-28 The divisor of temporal window (TW5)

(dashed line) and the temporal window (TW6) model

(dotted line). TW6 has been fitted to the masker-alone

adaptation loop divisor (solid line).

A non-linear signal processing model of the auditory system

59

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz

Model: DRNL HWLP SQ TW6 TWOptDet

Model: DRNL HWLP SQ AD8Fixed OptDet

KS Left ear

OH Right ear

Figure 5-29 Predictions for forward masking at 4000-

Hz. Predictions are shown for the temporal-window

model (○) using TW6 parameters. Also plotted are

the predictions of the simplified-adaptation-loop

model (◊) with fixed divisor (fixed divisor was the

divisor from masker alone)

Temporal window1 Noise Int. var.

Ta [s] Tb1 [s]

Tb2 [s]

w N σinternal

TW6 0.0035 0.004 0.035 0.002 3.5e-5 0.001

Table 5-2 Temporal window parameters for the

modified temporal-window model derive to obtain

the same divisor as the adaptation loop divisor

(masker alone).

In Figure 5-29 the predictions of the TW6 parameters (○) have been compared to the predictions simplified-adaptation-loop model (◊). From the predictions shown, it can be seen that the models predict thresholds that are within 3-dB of error to each other. These discrepancies can be related to the fact that the models still differ on the temporal-window, which corresponds to a low-pass filter with a 3-dB drop-off at 26 Hz, and the 8-Hz low-pass filter, before and after the division. The forward-masking predictions shown do not match the actual measurements any more. It should however be possible to get better predictions again either by reapplying self-suppression in someway or change the divisor function so it compensates for the missing suppression. At this point it can be seen that the models are still not mathematically equivalent, but closely related. Both models account for the forward masking by suppressing the signal by something that can be seen as persistent energy from the masker. In the adaptation-loop model this is viewed as an effect of adaptation, and is seen as a model of the adaptation seen in the auditory-nerve fibres and perhaps higher stages in the brain. The fact that the two models are so closely related can lead to the conclusion that the temporal window model could be a model of adaptation.

1 The temporal window in this implementation was normalised so the gain of the filter was 0 dB.

A non-linear signal processing model of the auditory system

60

6 Summary and conclusion The objective of this thesis was to compare the two different concepts underlying forward masking, described by the temporal-window model (persistence) and the adaptation-loop model (adaptation). To make the comparison both models were implemented in matlab and modified so all peripheral elements of the models were the same. Both models used very simplified models of the auditory filter and the non-linearity observed in the cochlea, and also the decision devices differed. To narrow down the differences between the models, both models were modified such that they used a more realistic model of the basilar membrane (the DRNL filter), and realistic models of hair-cells transform of vibration to nerve signals (a half-wave rectification, 1000-Hz low-pass filter and squaring). Also the optimal detector presented in Dau et al (1996) was adapted to be used with the temporal window model. Before concluding on the initial question the findings for the models in their original form and their modified form are summarised.

6.1 Summery

The adaptation-loop model was obtained from CAHR (Centre for Applied Hearing Research at DTU). The model was tested in two forward-masking conditions, one using a 10-ms, 1000-Hz, sinusoidal signal and one using a 12-ms, 4000-Hz, sinusoidal signal, succeeding a 200-ms, broad-band noise masker. A comparison with own preliminary1 experimental data showed that the adaptation-loop model could be used to predict forward-masking data in the 4000-Hz conditions, with a deviation from the data below 5 to 7-dB (data from Dau et al (1996) showed similar differences across subjects). The temporal-window model was implemented as described in Oxenham (2001) and initially tested with the parameters from this study. The temporal-window model showed predictions within 2 to 3-dB of the old predictions from Oxenham (2001). The model was also tested under the two forward-masking conditions. Using the parameters from Oxenham (2001) the model showed too high simultaneous-masking thresholds relative to the preliminary measurements. A different set of parameters was designed that improved predictions under simultaneous-masking conditions without affecting the non-simultaneous-masking predictions. Forward-masking predictions at 1000 Hz did not show good agreement with the data for neither the temporal-window model nor the adaptation-loop model. Both models had problems in predicting data in the 10-30-ms region. To the knowledge of the author, the temporal-window model has never been tested in forward masking at 1000 Hz previously, no reference is as such available that it should. It can be argued that the ringing of the auditory filter may influence the predictions of the model below 1000 Hz, for this reason the model was previously always compared in measurements well above 1000 Hz. The previous predictions of Dau et al (1996) showed good predictions at 1000 Hz. The only difference found between the current and the original implementation was the Strube filter that showed a considerably longer activity in the impulse response, than the current gammatone filter. The effect of peripheral filtering might

1 Preliminary in the sense, that the data were obtained from a limited number of experiments and a limited number of subjects. Experiments were conducted in a fixed experimental setup, using sound insulated booths and digitally generated stimuli, so they can be verified on a later stage.

A non-linear signal processing model of the auditory system

61

explain the difference in the predictions at 1000 Hz. A different model of the auditory filter, as the Strube filter, may solve the problem in the predictions both with the adaptation-loop model and the temporal window model. Another solution could be to have frequency dependent parameters for the temporal window and the adaptation loops. Both models were also tested in their original form in intensity discrimination. Two types of experiments were conducted, one showing intensity discrimination for a pure tone and one in broad-band noise. The adaptation-loop model proved to be able to account for the level in intensity discrimination using a single auditory-filter-channel centred on the signal frequency. Using the model with multiple auditory-filter-channels the adaptation-loop model proved to be able to predict the downward sloping intensity discrimination for pure tone experiments and constant intensity discrimination for the noise conditions, which are described by Weber’s law. The temporal-window model was also tested in the intensity-discrimination task even though it was not initially intended to predict intensity discrimination. The temporal-window model did not account for the measured intensity discrimination as well as the adaptation-loop model. It did not show the downward sloping intensity discrimination described by the near miss to Weber’s law for pure tones nor did it show constant intensity discrimination for white noise using a multi-channel or a single-channel approach. The initial modified adaptation-loop model without squaring device did not show good agreement with forward-masking data. The predictions were, however, improved by the addition of the squaring device and the predictions showed a good fit with previous predictions. Both, the modified adaptation-loop model with a squaring device and the modified temporal window model, showed good match with the forward-masking data at 4000 Hz, but with the same problem in predicting thresholds at 1000 Hz, as before the modifications. Intensity discrimination, however, was affected by the modifications in predictions using a single auditory filter channel. The DRNL non-linear behaviour introduced a increment for mid-level amplitudes in both noise and pure tones. The squaring device lowered intensity-discrimination thresholds for all signal levels, when introduced in the adaptation-loop model. Intensity-discrimination thresholds for the temporal window model were decreased by the introduction of the optimal detector to a level well below measured data. When using a multi-channel temporal-window model and adaptation-loop model, the mid-level increment was reduced in the noise experiment, and completely removed in the pure-tone experiment. Also both modified models showed the downward sloping intensity discrimination for pure tone experiments and the constant intensity discrimination for noise. A more thorough investigation of the mechanisms underlying forward-masking predictions showed that the two models suppressed1 the signal in forward-masking conditions by a comparable amount, linked to the duration and power of the masker and the signal. By “simplifying” the adaptation-loop model, so the suppression is only calculated based on the masker, it was found that the adaptation-loop model and temporal-window model become very similar. If the temporal window parameters were fitted to predict the same amount of suppression as the simplified-adaptation-loop model, the models predicted similar results. 1 Suppressed in the sense that the signal is divided by an amount that is related to the duration and power of the masker.

A non-linear signal processing model of the auditory system

62

6.2 Conclusion

In conclusion, the temporal-window model and the adaptation-loop model may not differ much, when considering effects of forward masking. The starting point was that the two models differed very much in their assumption on the mechanisms explaining forward masking. The temporal-window model describes it as an effect of temporal integration of signals and the adaptation-loop model describes it as an effect of adaptation. But, as the final model comparison has shown (section 5.5), the mechanisms responsible for masking in the temporal-window model and adaptation-loop model, are connected to the division and the divisor in the two models. In both models the divisor shows persistence from the masker and the division of the signal by the divisor means that the signal is suppressed by this persistence. In the adaptation-loop model the division is seen as an effect of adaptation in the auditory nerve and higher stages of the auditory pathways, and the division is as such directly related to physiological data. In the temporal-window model the division was seen as a part of the detection mechanism taking place at a more central decision stage of the brain (a signal-to-noise criterion). After modifying the temporal-window model in to an optimal detector scheme, and comparing the model divisors, the division could be viewed as something that effectively models adaptation. Seen in this perspective, the temporal-window model is a simplified adaptation model that neglects self-suppression of the signal, but is able to counteract the effect by changing the shape of the temporal window.

A non-linear signal processing model of the auditory system

63

6.3 Future work ideas

The previous section discussed the main findings of this thesis, because of the limited period of this project it has been necessary stop at this point. During the investigations done here, other topics that were beyond the scope/schedule of this project have been uncovered. These topics could be enlightening in understanding the mechanism of the model and the auditory system.

• The biggest difference between the adaptation-loop model and the temporal-window model is the way the divisor is calculated. The adaptation-loop model assumes that the signal itself, influences suppression in forward-masking conditions (self-suppression), and the temporal-window model assumes that the suppression describing forward masking is only a result of the masker. To really separate the two models a thorough investigation of this self-suppression could be done, to se if some relationship exists between the temporal-window divisor and the adaptation-loop divisor (Figure 5-24, section 5.5).

• A more detailed investigation of how the self-suppression of the adaptation-loop model influences masking in different conditions should be done to verify that the model is also able to predict forward masking and simultaneous masking for longer duration signals. The study of Oxenham (2001) tested forward masking and simultaneous masking using different duration signals for the temporal-window model. This study showed good predictions for the temporal-window model for all conditions. A similar study could be done with the adaptation-loop model to verify that is can make similarly good masking predictions with different duration signal in both simultaneous-masking and forward-masking conditions.

• The problem of the 1000-Hz forward-masking predictions also remains unsolved at the end of this project. A more detailed analysis of how the ringing of the auditory filters influence forward masking could be an interesting study.

• In Dau et al (1997) the adaptation-loop model was modified with a modulation filter-bank with a simultaneous change of the Strube filter to a gammatone filter. Some preliminary simulations were conducted with the model in this setup, that showed that the introduction of the modulation filter-bank showed an improvement of forward-masking predictions at 1000-Hz, when using the gammatone filter. The question arises why this happens and if the model is then able to make good forward masking predictions at 4000-Hz.

• The adaptation-loop model in its modified form has showed some interesting properties in the description of intensity discrimination. It has only been shown that the model predict a downward sloping intensity discrimination for pure tones and a constant intensity discrimination for noise. The data provided in this experiment does not really show the magnitude of the slope in tones, so a comparison could be made. It would be interesting to see if the multi-channel model can really predict the level and the slopes related to Weber’s law and the “near miss”. It would also be interesting to see if the model could predict some of the phenomena’s related to intensity discrimination like “severe departure from Weber’s law”. These phenomena’s could be explained by the model, as effects of basilar-membrane compression.

A non-linear signal processing model of the auditory system

64

7 Literature Bacon, Sid P. (1990) “Effect of masker level on overshoot” J. Acoust. Soc. Am. 88, 698 Carlyon, R. P and Moore, B. C. J. (1984) “Intensity discrimination: A severe departure from

Weber’s law” J. Acoust. Soc. Am. 76, 1369 Carlyon, R. P and Moore, B. C. J. (1986) “Continuous versus gated pedestals and the “Severe

departure” from Weber’s law” J. Acoust. Soc. Am. 79, 453 Dau, T., Püschel, D. and Kohlrausch, A., (1996). ”A quantitative model of the ``effective'' signal

processing in the auditory system. I. Model structure“, J. Acoust. Soc. Am. 99, 3615 Dau, T., Püschel, D. and Kohlrausch, A., (1996). ”A quantitative model of the ``effective'' signal

processing in the auditory system. II. Simulations and measurements “, J. Acoust. Soc. Am. 99, 3623 Dau, T., Kollmeier, B. and Kohlrausch, A., (1997). “Modeling auditory processing of amplitude

modulation. I. Detection and masking with narrow-band carriers“. J. Acoust. Soc. Am. 102, 2892

Dau, T., Ewert, S., and Fobel, O. (2004) “Cochlea transformation” Lecture notes. Duifhuis, H. (1973) “Consequences of Peripheral Frequency Selectivity for Nonsimultaneous

Masking”. J. Acoust. Soc. Am. 54, 1471 – 1488 Enrique A. Lopez-Poveda and Ray Meddis “A human nonlinear cochlear filterbank“ J. Acoust. Soc. Am. 110, 3107 (2001) Houtsma, A. J. M., Durlach, N. I. and Braida, L. D. (1980) “Intensity perception XI.

Experimental results on the relation of intensity resolution to loudness matching”. J. Acoust. Soc. Am. 68, 807 Moore , Brian C. J. and Glasberg , Brian R. (1983) “Suggested formulae for calculating

auditory-filter bandwidths and excitation patterns” J. Acoust. Soc. Am. 74, 750 Moore, Brian C. J. (2003) ”An introduction to the Psychology of Hearing”, Academic Press (Amsterdam). Oxenham, Andrew J. and Moore, B. C. J. (1995) “Overshoot and the “severe departure” from

Weber’s law” J. Acoust. Soc. Am. 103, 1598 Oxenham, Andrew J. (2001) “Forward masking: Adaptation or integration?” J. Acoust. Soc. Am. 109, 732 Pfeiffer, Russell R. (1970) “A model for two-tone inhibition of single cochlear nerve fibers” J. Acoust. Soc. Am. 48, 11373

A non-linear signal processing model of the auditory system

65

Pickles, J.O. (1988). ”An introduction to the Physiology of Hearing”, Academic Press (Amsterdam). Plack and Oxenham (1997) “Basilar-membrane nonlinearity and the growth of forward

masking” J. Acoust. Soc. Am. 97, 2442 Pralong, D., and Carlile, S. (1996). “The role of individualized headphone calibration for the

generation of high fidelity virtual auditory space”, J. Acoust. Soc. Am. 100, 3785 Püschel, Dirk (1988) “Prinzipien der zeitlichen analyse beim hören“, Ph.d. Thesis Universität Göttingen. Ralph Peter Derleth (1999). “Temporal and compressive Properties of the Normal and Impaired

Auditory system”, Ph.d. Thesis, Bibliotheks- und Informationssystem der Carl von Ossietzky Universität Oldenburg (BIS). Ray Meddis, Lowel P. O'Mard, and Enrique A. Lopez-Poveda (2001),”A computational

algorithm for computing nonlinear auditory frequency selectivity“ J. Acoust. Soc. Am. 109, 2852 Strube H. W. (1995) “A computationally efficient basilar-membrane model” Acustica 58, 207-214. Wickens, T.D. (2002). ”Elementary Signal Detection Theory”, Oxford University Press, Oxford. William. M. Hartmann “Signals, Sound, and Sensation” Springer 2000 ISBN 1-56396-383-7 Yates, Graeme K. (1990) “Basilar membrane nonlinearity an it infuene on auditory nerve rate-

intensity functions” Hearing Research 50, 145-162 Zwicker, Eberhard (1984) “Dependence of post-masking on masker duration and its relation to

temporal effects in loudness”, J. Acoust. Soc. Am. 75, 219 Zwicker, Eberhard (1965) “Temporal Effects in Simultaneous Masking by White-Noise

Bursts”, J. Acoust. Soc. Am. 37, 653 – 663

A non-linear signal processing model of the auditory system

66

Appendix A

1 Frozen-noise masker experiment A frozen-noise masker experiment refers to an experiment where the noise is exactly the same in all presented maskers. The masker therefore always shows the same envelope and spectral properties. Since thresholds are highly dependant on the on the spectral and envelope properties of the masker, the measured and simulated thresholds will depended on the masker and be specifically linked to properties of this masker. By using a random noise masker instead of a frozen, the spectral and envelope properties of the noise will be averaged out, if the data is acquired across a large number of experiments. The measured and simulated thresholds, when using a random noise masker, will thus not depend on the exact properties of a specific noise representation. Frozen noise experiments, however, is still a good way of comparing measurement data to simulated model data. If both the listening subjects and the model are presented with the same frozen noise the thresholds measured and simulated are based on the same noise properties. If the model is a good model of the auditory system, it should show the same dependence on the masker as the human listener. The inherit fluctuations of a random noise, in signal level and frequency content, influences the AFC test when it iterates towards the mean-threshold. When the noise is changed in each trial, the threshold can shift by huge amounts from trial to trial. This potentially results, in the procedure of the AFC-test iterating away from the mean-threshold. Using a random noise thus means that the variance of the predictions is higher than if the noise was frozen. By using a frozen masker the threshold of the signal (also frozen in phase) will be the same in all AFC-trials, meaning that the uncertainty stemming from fluctuating noise is not present. This should reduce the variance of the measured and predicted thresholds, meaning that the number of times a subject or the models need to do the experiment can be reduced. Using a frozen-noise masker compared to a random noise, thus improves speed of the AFC test and the accuracy of the predictions. Some inherent problems are observed when using frozen noise in combination with a frozen signal in simultaneous and forward-masking conditions.

A non-linear signal processing model of the auditory system

67

-0.01 0 0.01 0.02 0.03 0.04

20

30

40

50

60

70

Offset-Offset time [s]

Ma

ske

d th

resh

old

[d

B S

PL

]Forward masking 1000 Hz

Adaptation-loop model - frozen noise and frozen signal

Adaptation-loop model - frozen noise and random phase signal

-0.01 0 0.01 0.02 0.03 0.04

20

30

40

50

60

70

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 4000 Hz

Adaptation-loop model - frozen noise and frozen signal

Adaptation-loop model - frozen noise and random phase signal

Figure 1-1 1000-Hz, and 4000-Hz forward-masking predictions of the adaptation-loop model, with frozen

noise and frozen phase signal, and frozen noise and random phase signal. The stimuli for these predictions

were the same as for forward masking at 1000 Hz and 4000 Hz. The sampling frequency of threshold

predictions was 16000 Hz for frozen noise and frozen phase signal predictions.

Figure 1-1 shows how the adaptation-loop model predicts thresholds in a frozen-noise 3-AFC experiment, using a frozen-phase signal and a random-phase signal. From the figure it can be seen that the threshold increase and decrease with the same frequency as the signal when using a frozen-phase signal. The measured threshold in simultaneous masking conditions can fluctuate by as much as 12-14 dB, when shifting the signal by half of the period of the sinusoid. This phenomenon is related to the phase interactions between the frozen noise and the frozen-phase signal. A similar effect is observed in human listeners when subjected to the same test. In the 1000-Hz predictions, the fluctuations can be seen to continue for about 7 ms, even after the signal is completely outside the masker (10 ms), this is related to the filter ringing of the gammatone filter used in these simulations. The filter-ringing does not influence the predictions at 4000 Hz after the 10-ms mark, this is because the ringing of the gammatone filter is much shorter 4000 Hz than at 1000 Hz. The phase-interaction effect can be removed by using a random phase signal (○), as shown in Figure 1-1. The use of a random-phase signal in each AFC trial average out the phase interactions, such that the predicted thresholds are independent of the phase interactions of the masker and the signal. The predicted thresholds will settle above the mean value of the maximum and minimum of the phase fluctuation, this is related to the 1-up 2-down tracking rule that settles at the 70.7% probability of being correct point.

A non-linear signal processing model of the auditory system

68

-0.01 0 0.01 0.02 0.03 0.04

20

30

40

50

60

70

80

Offset-Offset time [s]

Ma

ske

d th

resh

old

[d

B S

PL

]Forward masking 1000 Hz

Adaptation-loop model - Gammatone filter

Adaptation-loop model - Strube filter

Figure 1-2 The influence the peripheral filter. The phase

interactions between the frozen-phase signal and the frozen-

noise masker, using a gammatone and a Strube filter as front-

ends to the adaptation-loop model. The sampling frequency of

threshold predictions was 16000 Hz.

The peripheral filters that are found in the models (and the auditory system), also show a dramatic influence in the fluctuations of the when using frozen noise and frozen-phase signals. Figure 1-2 shows the difference in predictions using a Strube filter against those obtained by the gammatone filter. It can be seen that the predictions differ very much in the amplitude of the phase interactions during the threshold predictions. This indicates that the measurement data and model predictions can be completely in comparable in simultaneous and “near” simultaneous masking, if the measurement points are positioned at unfortunate places, when the auditory filters do not match. To minimize the effects of this on measured and simulated threshold, it was chosen to use a random-phase signal, for the experiments in this thesis.

A non-linear signal processing model of the auditory system

69

Appendix B

1 Understanding the models When comparing a model to another model, or change parameters to obtain better fits from a model, it becomes useful to have an understanding of the workings of the different modules and parameters in the models. Therefor a series of simulations have been conducted to clarify what the out come of changing parameters or elements in the models.

1.1 Understanding the adaptation-loop model

The key parameter of the adaptation-loop model is the internal noise, a series of test have been preformed to se the effect of changing the variance of the internal noise.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Adaptation loop Model IN=6

Adaptation loop Model IN=12

Adaptation loop Model IN=24

Figure 1-1 Adaptation-loop model forward-masking

predictions using different internal noise variances.

Forward-masking experiment is described in 2.2.

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in noise

Adaptation loop Model IN=6

Adaptation loop Model IN=12

Adaptation loop Model IN=24

Figure 1-2 Intensity-discrimination predictions using

different internal noise variances. Experiment is

described in 2.4.

The model used in these predictions was the adaptation-loop model described in section 3.1. Figure 1-1 show Figure 1-2 forward-masking and Intensity-discrimination predictions, using three different noise variances. Increasing internal noise variance increases the thresholds of the predictions. As a calibration of the model the intensity-discrimination predictions have been fitted so they predict the measured data. The internal noise variance to do this is 6.

1.2 Understanding the Temporal window model

The nonlinearity of the temporal-window model has changed several times during time. In the original paper, by Oxenham and Moore 1994, the non linearity was a simple power function y=|x|n with n equal to 0.5 or 0.7. Later on in Plack and Oxenham 1997 the nonlinearity was changed to broken stick nonlinearity, having a logarithmic slope of 0.78 until 35 dB SPL and 0.16 above 35 dB SPL. In Oxenham 2001 this was changed to a broken stick with logarithmic slope of 1.00 until 35 dB SPL and 0.25 above 35 dB SPL. Each time the nonlinearity was changed the temporal window parameters were also changed. Too investigate the effect of changing the compression characteristics of the nonlinearity. The temporal-window model using

A non-linear signal processing model of the auditory system

70

TW3 parameters has been used to make forward-masking predictions with different power functions.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Temporal window model with slope 1.0

Temporal window model with slope 0.5

Temporal window model with slope 0.25

Figure 1-3 Forward-masking predictions of the

temporal window (TW3) at with a 1000-Hz sinusoid

signal. Nonlinearity was a power function y=|x|n with

n equal to 0.25, 0.50 and 1.00. Parameters are

described in Tabel 1-1.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Temporal window model with k=1.67

Temporal window model with k=1.2

Figure 1-4 Forward masking with a 4000-Hz sinusoid

signal with different decision ratios. Simulations have

been done using TW3 parameters from Table 3-1,

with k equal to 1.67 and 1.2.

Figure 1-3 shows the forward-masking results with different power functions. The noise that is added to the signal was adjusted for each non-linearity, so the models predict the same absolute threshold of 19 dB SPL. It can be seen how removing compression decreases the steepness of the masking curve. Thresholds for simultaneous masking are decreased and for non-simultaneous masking the threshold is increased.

Nonliniarity Decision ratio Temporal window Noise

n k TW R

0.25 1.67 TW3 28.72

0.50 1.67 TW3 3.23

1.00 1.67 TW3 1.24 Tabel 1-1 Table of different non-linearity’s an the temporal window and decision rations parameters derived

for the best fit to forward-masking predictions

Forward-masking thresholds have also been investigated for different values of the decision ratio k. Figure 1-4 shows how changing the decision ratio from 1.67 to 1.2, lowers thresholds for the simultaneous masking an also lowers the non-simultaneous forward-masking thresholds.

A non-linear signal processing model of the auditory system

71

Appendix C

1 The DRNL filter The original Dual Resonance Non-Linear filter DRNL was first presented in Meddis, O’Mard and Lopez-Poveda (2001), and is based on actual measurements of the basilar membrane in chinchillas and guinea pigs. Later in Lopez-Poveda and Meddis 2001 the model was fitted to human psychophysical data. The human non-linear cochlea filter-bank they developed consisted of thee overall modules. Figure 1-1 shows the model described by Lopez-Poveda and Meddis (2001). The model consists of an outer ear filter, a middle ear filter and the DRNL filter that simulates the actions of the basilar membrane (BM).

Outer ear filter DRNL filter Middle ear filter

Presure [Pa]

Presure [Pa]

Staples velocity [m/s]

BM velocity [m/s]

Figure 1-1 The overall model the outer and inner ear model using a DRNL filter.

The DRNL model is based on staples velocity to basilar membrane velocity. In the following sections the model will briefly be described. Outer ear and middle ear filters The outer ear model from Lopez-Poveda and Meddis 2001 were based on a measurement done by Pralong and Carlile 1996. The study involved measuring head phone to eardrum transfer functions. The transfer functions are as such headphone and subject specific. The transfer function frequency response is plotted in Figure 1-2.

A non-linear signal processing model of the auditory system

72

103

104

-30

-25

-20

-15

-10

-5

0

5

10Outer Ear Frequency response

Freq [Hz]

Ga

in [

dB

]

Figure 1-2 Outer ear frequency response from

Lopez-Poveda and Meddis 2001.

103

104

10-5

10-4

10-3

Outer Ear Frequency response

Freq [Hz]

Ga

in [

dB

]

Figure 1-3 Inner ear frequency response from Lopez-

Poveda and Meddis 2001.

The Inner ear frequency response was derived from Goode et al 1994, that measured the staples displacement in cadavers. Any influences from the mussels connected to the Ossicies are there for neglected. The transfer function is plotted in Figure 1-3. The DRNL filter The DRNL filter is the interesting part from Lopez-Poveda and Meddis 2001. It consists of two signal paths, a linear and a nonlinear path. The linear path has a response that could be said to model the cochlea in hearing impaired subjects (without the cochlea amplifier). The nonlinear path models the active amplification of low signal levels.

Gammatone filter Gammatone filter Non-linearity Lowpass filter

Linear gain Lowpass filter Gammatone filter

The linear path The linear path of the DRNL consists of a linear gain

)()( nxgny ⋅=

Followed by a NGT lin’th order Gammatone filter, at centre frequency CFlin and bandwidth BWlin. (The gammatone filters were implemented as N number of cascaded first order GT filters.) At the end a NLP lin’th order low-pass filter, with 3 dB cut-off at LPlin, further filters the signal. (The low-pass filters were also implemented as N number of cascading first order N Butterworth filters.)

A non-linear signal processing model of the auditory system

73

The nonlinear path The nonlinear path begins with at NGT nlin‘th order GT filter, with centre frequency at CFnlin and bandwidth BWnlin. Then a broken stick non-linearity described by

))(,)(min())(()(c

nxbnxanxsignny ⋅⋅⋅=

“sign” means that the sign of the expression in the brackets is returned. “min” means that the minimum value of the expressions in the brackets is returned. The non-liniarity is then followed by an NGT nlin’th order GT filter, with a center frequency of CFnlin and bandwidth BWnlin. At the end a NLP nlin’th order low-pass filter with 3 dB cut-off at LPnlin. DRNL The parameters for the human DRNL filter-bank in Lopez-Poveda and Meddis 2001 are listed in the table below

Signal frequency

250 500 1000 2000 4000 8000

DRNL Linear path

G 1400 800 520 400 270 250

NGT lin 2 2 2 2 2 2

CFlin 235 460 945 1895 3900 7450

BWlin 115 150 240 390 620 1550

NLP lin 4 4 4 4 4 4

LPlin CFlin CFlin CFlin CFlin CFlin CFlin

DRNL Nonlinear path

NGT nlin 3 3 3 3 3 3

CFnlin 250 500 1000 2000 4000 8000

BWlin 84 103 175 300 560 1100

A 2124 4609 4598 9244 30274 76354

B 0,45 0,28 0,13 0,078 0,060 0,035

C 0,25 0,25 0,25 0,25 0,25 0,25

NLP nlin 3 3 3 3 3 3

LPnlin CFnlin CFnlin CFnlin CFnlin CFnlin CFnlin Table 1-1 parameters for the Lopez-Poveda and Meddis 2001 human DRNL filter bank

Using the parameters from Table 1-1 we can plot the input output response for a 1000-Hz sinus signal with varying amplitude. This will show how the DRNL becomes compressive (non-linear) in a region.

A non-linear signal processing model of the auditory system

74

0 10 20 30 40 50 60 70 80 90 100-140

-120

-100

-80

-60

-40

-20Freq 1000 Hz

Amp [dB SPL]

Ou

tput

am

p [

dB

]

Figure 1-4 The input output relations of a 1000-Hz sinus signal with

varying amplitude. From 0 dB SPL to 100 dB SPL (0 dB SPL equal to

peak amplitude of the input sinusoid of 10-5

). The solid line indicates

the sum of both linear path and nonlinear path. The spaced line

indicates the output from the linear path. The dotted line indicates the

output from the nonlinear path.

It can be seen from Figure 1-4, that the DRNL filter centred at 1000-Hz becomes non-linear (compressive) from 40 dB SPL to about 80 dB SPL. Another property of the DRNL filter is the simulation of broadening of the BM filters as a function of level. On Figure 1-5 a frequency sweep of a sinus tone has been plotted at 30 dB SPL and 85 dB SPL.

0 500 1000 1500 2000 2500 3000-160

-150

-140

-130

-120

-110

-100

-90

-80

-70

-60Input 30 dB

Am

p [

dB

]

Freq [Hz] 0 500 1000 1500 2000 2500 3000

-100

-90

-80

-70

-60

-50

-40

-30

-20Input 85 dB

Am

p [

dB

]

Freq [Hz] Figure 1-5 Plot of input output relation to a single 1000-Hz DRNL. Dotted line is the output of the nonlinear

path of the filter. Spaced line is the output of the linear path of the filter. Full line is the summed output.

The output from both linear and nonlinear paths have been plotted, in combination with the resulting output from the DRNL. The plots show how the filter has a sharper filter function at low sound levels and a broader filter for higher levels. In Figure 1-6 the DRNL output is compared with the output of a gammatone filter.

A non-linear signal processing model of the auditory system

75

0 500 1000 1500 2000 2500 3000-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0Input 40 dB

Am

p [

dB

]

Freq [Hz]

DRNL

Gammatone

0 500 1000 1500 2000 2500 3000-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0Input 60 dB

Am

p [

dB

]

Freq [Hz]

DRNL

Gammatone

0 500 1000 1500 2000 2500 3000-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0Input 80 dB

Am

p [

dB

]

Freq [Hz]

DRNL

Gammatone

0 500 1000 1500 2000 2500 3000

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0Input 100 dB

Am

p [

dB

]

Freq [Hz]

DRNL

Gammatone

Figure 1-6 Plot of the DRNL frequency response in relation to the corresponding gammatone filter used in

the Adaptation-loop model and the temporal window model. The DRNL has been fitted so that 0 dB SPL

1000-Hz yields 0 dB gain like the gammatone filter.

This plot shows how the differences from using a linear gamma tone filter, how the filter frequency response dynamically changes as a function of level.

1.1 DRNL simplification

In order to use the DRNL filter for the adaptation-loop model and temporal-window model the output range needs to be predictable. The input-output relations vary quite a lot with the parameters supplied from Lopez-Poveda and Meddis 2001. As an example the gain of the DRNL is plotted for the 6 specified filters at their respective frequencies. 250 – 8000-Hz including outer and middle ear filtering in Figure 1-7.

A non-linear signal processing model of the auditory system

76

0 10 20 30 40 50 60 70 80 90 100-120

-100

-80

-60

-40

-20

0

Amp [dB SPL]

Outp

ut

am

p [

dB

]

250 Hz

500 Hz

1000 Hz

2000 Hz

4000 Hz

8000 Hz

Figure 1-7 The input output relations of the filters specified in Table

1-1. The inputs to each filter are a sinusoidal signal with a frequency

placed at CF of the filter. The amplitude was varied from 0 dB SPL to

100 dB SPL.

To avoid dealing with questions, that may arise due to different compression curves and filters at different levels and frequencies, the DRNL is simplified. The DRNL is “normalised” so that gains and compressions are the same for all frequencies. The 1000-Hz filter is chosen as the normal for the DRNL parameters. The outer and middle ear filters are also omitted and replaced by a simple linear gain. From the Figure 1-2 it is derived that the gain of the outer ear is 0 dB at 1000-Hz such that the outer ear transformation can be written as

xy ⋅=1

The middle ear with output being staples velocity is also simplified. The model adaptation-loop model and temporal-window model have a amplitude of 1 to be equal 100 dB SPL. So 0 dB SPL corresponds to a amplitude of:

20/10010100001,0 −⋅=

From Figure 1-3 the gain output of a 0 dB SPL 1000-Hz tone can be derived to 8·10-9 the gain function for converting to staples velocity can thus be deducted to

xxy4

9

10800001,0

108 −

⋅=⋅

=

The DRNL parameters have been adjusted so that all gains are the same. The relations between The Linear path CF and Nonlinear path CF follow a relation, as do the bandwidths of the filters.

A non-linear signal processing model of the auditory system

77

Signal frequency

250 500 1000 2000 4000 8000

DRNL Linear path

G 520 520 520 520 520 520

NGT lin 2 2 2 2 2 2

CFlin 236.25 473 945 1890 3780 7560

BWlin 139 173 240 375 644 1183

NLP lin 4 4 4 4 4 4

LPlin CFlin CFlin CFlin CFlin CFlin CFlin

DRNL Nonlinear path

NGT nlin 3 3 3 3 3 3

CFnlin 250 500 1000 2000 4000 8000

BWnlin 79.8 111.53 175 301.93 555.8 1063.5

A 4598 4598 4598 4598 4598 4598

B 0.13 0.13 0.13 0.13 0.13 0.13

C 0,25 0,25 0,25 0,25 0,25 0,25

NLP nlin 3 3 3 3 3 3

LPnlin CFnlin CFnlin CFnlin CFnlin CFnlin CFnlin Table 1-2 Simplified parameters for the DRNL filter bank.

The Input-output relations can be replotted for the new filter parameters this is done in Figure 1-8.

0 10 20 30 40 50 60 70 80 90 100-100

-90

-80

-70

-60

-50

-40

-30

-20

Amp [dB SPL]

Outp

ut

am

p [

dB

]

250 Hz

500 Hz

1000 Hz

2000 Hz

4000 Hz

8000 Hz

Figure 1-8 Input output relations like in Figure 1-7 but with simplified

parameters.

The output of the DRNL filters will now have a constant dynamic range that can be fitted to the temporal-window model and the Adaptation-loop model.

1.2 DRNL and two-tone suppression

The non linear part of the DRNL model is based on Pheiffer (1970). The DRNL model like the Pheiffer model places a nonlinear function in between two band pass filters (Figure 1-10). The original idea of the setup, is to be able to simulate two-tone suppression. The basic concept is

A non-linear signal processing model of the auditory system

78

that the response of a pure tome presented alone gives a certain excitation of the basilar membrane. By presenting a second pure tone off-frequency to the first tone proves to influence the excitation of the first tone even though the frequency is much higher or lower than the first tone. Two-tone suppression is something that can be measured on the basilar membrane and seen is some psychophysical experiment. It influences thresholds of a signal in simultaneous masking conditions. Two-tone suppression measured in pulsation threshold experiments, results are shown in Figure 1-9.

Figure 1-9 Results from an experiment measuring pulsation thresholds for a 1-kHz signal alternating with a

two-tone masker (one tone was the signal, the second was the suppressor tone). Signal was fixed a 40-dB and

the second tone was varied in frequency and level. Shaded areas show where the second tone resused

pulsation thresholds by 3-dB or more. Taken from Moore, Brian C. J. (2003).

First GT filter Second GT filter Non-linearity

Figure 1-10 The setup of band-pass filters and the nonlinearity.

Two tone suppression, can be shown to be an effect of this compressional setup of band-pass filters and non-linearity. The output of the second band-pass filter has been plotted for tone with level of 40 dB in combination with a second tone with varying level and frequency, Figure 1-11 shows the output level plotted in dB.

A non-linear signal processing model of the auditory system

79

-150

-100

-50

050010001500200025003000

-90

-80

-70

-60

-50

-40

Freq [Hz]Amp [dB] 0 500 1000 1500 2000 2500 3000

-130

-120

-110

-100

-90

-80

-70

Freq [Hz]

Am

p [

dB

]

-80

-75

-70

-65

-60

-55

-50

-45

Figure 1-11 Left, a plot of the surface of the output mean energy signal, with a fixed input tone of 1000-Hz -

122 dB (corresponding to 40 dB SPL), whole sweeping a second tone added to the first tone from 0 to 3000-Hz

and -132 to -62 dB (30 to 100 dB SPL). Right, same as left plot just a conture plot.

In the following section the mechanism behind the two tone suppression effect in the NL part of the DRNL is explained. First the nonlinearity is examined. The nonlinearity is implemented in the digital domain as follows:

))(,)(min())(()(c

nxbnxanxsignny ⋅⋅⋅=

“sign” means that the sign of the expression in the brackets is returned. “min” means that the minimum value of the expressions in the brackets is returned. The input output relation if the nonliniarety at 1000 Hz is shown in Figure 1-12

A non-linear signal processing model of the auditory system

80

Figure 1-12 Input-output relation of the DRNL non-linearity

(parameters are taken from subject YO in Lopez-Poveda & Meddis

2001) The gain on the linear part is 73 dB.

When passing a single tone through the NL path of the DRNL the tips of the tone can be seen being cut off after passing through the non linearity with sinusoids with levels above the braking point at about -122 dB. A signal before and after the nonlinearity is shown in Figure 1-13 and the frequency spectrum of the signals are show in Figure 1-14.

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02-4

-3

-2

-1

0

1

2

3

4x 10

-3

Time [s]

Am

p

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02

-0.015

-0.01

-0.005

0

0.005

0.01

0.015

Time [s]

Am

p

Figure 1-13 Left 1000-Hz sinus signal at -122 dB. Right sinus signal at -112 dB (Solid line is the output with

the non-linearity, dotted line is the out put with just linear gain)

A non-linear signal processing model of the auditory system

81

Figure 1-14 Left Normalised DFT of the sinus signal at -122 dB. Right Normalised DFT of sinus signal at -112

dB.

The nonlinearity produces large frequency components away from the input frequency. The important thing when discussing the two tone suppression effect is the average input output energy relation.

)(

)(2

2

input

output

xE

xER =

When the input signal is above the -122 dB threshold the average energy of the input output energy relation changes (the higher the input energy the lower the input output energy relation). It can now be seen how the addition of a second tone with a different frequency can influence the input output energy relation. If a second tone is added to the input, the signal will begin to beat (Figure 1-15). After the nonlinearity is applied the amplitude of the beats are reduced relative to whet they would have been with out non-linearity (Figure 1-16).

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02-2

-1.5

-1

-0.5

0

0.5

1

1.5

2x 10

-6

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

-70

-60

-50

-40

-30

-20

-10

0

10

Freq [Hz]

Norm

alis

ed A

mp [

dB

]

Figure 1-15 Left input to the nonlinearity with the beating envelope of a 1000-Hz and 1500-Hz -122 dB tones.

Right, is DFT of the input to the nonliniarity.

A non-linear signal processing model of the auditory system

82

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02-8

-6

-4

-2

0

2

4

6

8x 10

-3

Time [s]

Am

p

Figure 1-16 Left, the output from the nonliniarity, the dotted is the outbut with liniar gain. Right, The is DFT

of the output of the nonliniarity, the dotted linie is the freqency response of the second gammatone bandpass

filter.

Occasionally the signal may be below threshold of the nonlinearity occasionally it may be above the threshold. The maximum signal level is only dependant on the level of the 2 tones. When the signal is above threshold, the nonlinearity will influence in a way that lowers the overall input-output energy relation. In the DRNL model the nonlinearity is placed between 2 Gammatone (GT) band pass filters. All GT filters have a centre frequency that corresponds to that of the first tone. They will not dampen this tone. The second tone (that is varied in frequency) may however be dampened by the first filter, therefore the frequency of the second tone influences the level of the second tone when it arrives at the nonlinearity. If the second tone is to far away from the centre frequency (CF), the second tone will be dampened to an extent where will not generate much beating with the first tone. With frequencies closer to the CF the second tone will get through the GT filter with sufficiently large amplitudes to generate beatings that could be above the NL threshold. This explains why the suppression disappears when the frequency to the second tone gets to large or to small (Figure 1-11). The overall input-output energy relation from the non-linearity will be evenly distributed between the energy of the two tones. So after the NL the energy from the first tone is dampened, as is second tone. The energy from the second tone though may be placed at a different frequency than that of the first tone. When the signal is passed through the second GT filter, the energy from the second tone may be “cut-off” (Figure 1-16, Right) if the frequency is large enough. The energy from the first tone is, unaltered by the GT filter, but because the first tone was dampened by the non-linearity the total energy is dampend relative to if the second tone had not been there. If the frequency of the second tone gets to close to the CF the energy also passed through the GT filter, this explains the increase in Figure 1-11 when the second tone gets closer to CF. Example: For a signal, containing a 1000-Hz single tone at -122-dB, passing through the nonlinearity and the second GT filter the mean energy in the signal is -52.0-dB gain. For a signal, containing a 1000-Hz tone at -122 dB added to a 1500-Hz tone at -122-dB, passing through the nonlinearity and the second GT filter the mean energy in the signal is -53.2. This means that adding the second tone decreases the energy at the output by 1.2-dB.

A non-linear signal processing model of the auditory system

83

Appendix D

1 Temporal Model modifications The temporal-window model was changed in steps to standardize the model taward the fully modified model. The sections below show the influences of each individual change.

1.1 The temporal-window model with DRNL

The temporal widow model was tested in its original form with TW3 parameters with the non-linearity and the gammatone filter exchanged with the DRNL filter. Predictions are shown in for forward masking at 1000 Hz and intensity discrimination for pure tones (single-channel model)

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Model: GT FW NL3 SQ TW3 TWDet

Model: DRNL FW SQ TW3 TWDet

Figure 1-1 Forward masking at 1000-Hz for the

temporal-window model using TW3 parameters. The

gammatone filter and nonlinearity were exchanged

with the DRNL filter.

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in 1000 Hz tone

Model: GT HW NL3 SQ TW3 TWDet

Model: DRNL FW SQ TW3 TWDet

Figure 1-2 Intensity-discrimination thresholds in 1000-

Hz pure tone for the single-channel temporal-window

model using TW3 parameters. The gammatone filter

and nonlinearity were exchanged with the DRNL

filter.

1.2 The temporal-window model with Optimal detector

The original temporal window was implemented in an optimal detector setup, shown in Figure 1-3.

A non-linear signal processing model of the auditory system

84

Figure 1-3 Optimal detector setup for the temporal window model. The preprocessing and the division stage

function as input to the optimal detector stage. The mid section stage where input is masker and masker can

be removed since output will always be 1.

The decision ratio of the temporal-window model is discarded as a parameter and replaced with the internal noise of the optimal detector. An internal noise level of 0.007 was found to produce a good match with previous predictions in forward masking. The predictions for a single channel model are shown in Figure 1-4 and Figure 1-5.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Model: GT FW NL3 SQ TW3 TWDet

Model: GT FW NL3 SQ TW3 TWOptDet

Figure 1-4 Forward masking at 1000-Hz for the

temporal-window model with optimal detector.

Internal noise was adjusted to 0.007 for these

predictions.

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

∆L

= 2

0 ⋅

log

( ( ∆

A+A

))/A

) w

here

∆A

= m

⋅A

A [dB]

Intensity discrimination threshold in 1000 Hz tone

Model: GT HW NL3 SQ TW3 TWDet

Model: GT FW NL3 SQ TW3 TWOptDet

Figure 1-5 Intensity discrimination for the temporal-

window model with optimal detector. The optimal

detector internal noise set to 0.007.

It can be seen that the model with the optimal detector is able to fit previous predictions in the forward-masking conditions. In intensity discrimination a decrease is seen from previous predictions.

1.3 The temporal-window model with half wave rectifier

The full-wave rectifier (FW) has been exchanged with a half-wave rectification and a low-pass filter (HWLP). To show the effects of this change the model predictions are shown for a model

A non-linear signal processing model of the auditory system

85

using half-wave (HW) instead of full-wave rectification Figure 1-6. The noise constant N has been halved to produce the predictions for the model using the half-wave rectification. The step to using a half-wave rectification and low-pass filter is shown in Figure 1-7.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Model: GT FW NL3 SQ TW3 TWDet

Model: GT HW NL3 SQ TW3 TWDet

Figure 1-6 Comparison of predicted data before and

after full wave rectifier was introduced. Further a half

wave model prediction with a noise constant N/2

variance is plotted to show that the model can be

modified to fit old predictions.

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.1610

20

30

40

50

60

70

80

Offset-Offset time [s]

Maske

d thre

sh

old

[dB

SP

L]

Forward masking 1000 Hz

Model: GT HW NL3 SQ TW3 TWDet

Model: GT HWLP NL3 SQ TW3 TWDet

Figure 1-7 The temporal-window model wit half wave

rectification and 1000-Hz low-pass filtering. In

comparison to predictions without low-pass filtering.

It was expected that when introducing the half wave rectifier in the model, predictions will not change. This is expected since both noise and signal are going to be subjected to the rectifier. The energy in both signals are halved the signal to noise ratio is therefore unchanged. The noise constant N added to the signal must be changed accordingly to predict the absolute threshold. The introduction of the 1000-Hz low-pass filter was expected to change the prediction of the model. Since the low-pass filter is going to introduce more persistence of the masker.