investigating the role of crossmodal phase resetting in ......ii investigating the role of...

Investigating the Role of Crossmodal Phase Resetting in Audiovisual Binding

by

Phillip R. Johnston

A thesis submitted in conformity with the requirements for the degree of Master of Arts

Department of Psychology University of Toronto

© Copyright by Phillip R. Johnston 2019

ii

Investigating the Role of Crossmodal Phase Resetting in

Audiovisual Binding

Phillip R. Johnston

Master of Arts

Department of Psychology

University of Toronto

2019

Abstract

The mechanisms allowing the brain to fuse paired auditory and visual stimuli into a unified

percept, despite differences in their timing, remain largely unknown. Crossmodal phase resetting

of ongoing oscillations by the leading stimulus may facilitate integration of the following

stimulus by establishing a shared temporal structure within a network of primary and

multisensory areas. EEG was recorded during a simultaneity judgment task to assess whether

phase resetting (measured as intertrial coherence) differentiates ambiguous stimuli that are

perceived as fused and those perceived as segregated. No differences in intertrial coherence were

observed between fused and segregated trials, however differences in ERP amplitude tracked

perception in auditory-leading trials. Source modelling of this difference identified a network

previously implicated in the perception of intersensory asynchrony, comprising both unisensory

and multisensory cortical regions. Additionally, a correlational relationship was identified

between individual sensitivity to asynchrony and intertrial coherence elicited by nearly

synchronous stimuli.

iii

Acknowledgments

I would like to express my sincere gratitude to my supervisor, Dr. Randy McIntosh, for his

invaluable mentorship, his generosity with his time and expertise, and above all for inspiring me

with his dedication and his vision. I am also grateful to my subsidiary advisor, Dr. Claude Alain,

whose expert guidance and keen insight have greatly aided this project from its inception. I

would also like to thank Ricky Chow and Alain Fournier for their technical assistance, and the

students of the ERP Lab at Baycrest for making me feel more than welcome during my long days

of data collection there. Finally, this work would not have been possible without the patience and

unconditional support of Kate Taylor and my family, for which I am grateful always.

iv

Table of Contents

Acknowledgments.......................................................................................................................... iii

Table of Contents ........................................................................................................................... iv

List of Tables ................................................................................................................................ vii

List of Figures .............................................................................................................................. viii

Chapter 1 ..........................................................................................................................................1

1 Introduction .................................................................................................................................1

1.1 Perceptual Binding ...............................................................................................................1

1.2 Study Overview ...................................................................................................................2

Chapter 2 ..........................................................................................................................................4

2 Background .................................................................................................................................4

2.1 Temporal Determinants of Binding .....................................................................................4

2.1.1 The Problem of Intersensory Timing .......................................................................4

2.1.2 Quantifying the Binding Window ............................................................................5

2.1.3 Characteristics of the Temporal Binding Window ..................................................5

2.2 Potential Mechanisms ..........................................................................................................7

2.2.1 Classical Models and Their Limitations ..................................................................7

2.2.2 Ongoing Oscillations and Crossmodal Phase Resetting ..........................................8

2.2.3 Crossmodal Phase Resetting and the Temporal Window of Integration ...............10

Chapter 3 ........................................................................................................................................12

3 Methods .....................................................................................................................................12

3.1 Participants .........................................................................................................................12

3.2 Stimuli and Task ................................................................................................................12

3.2.1 Overview ................................................................................................................12

v

3.2.2 Simultaneity Judgment Task ..................................................................................13

3.2.3 Calibration Task .....................................................................................................14

3.2.4 Main Task ..............................................................................................................14

3.3 EEG Data Collection and Processing ................................................................................15

3.3.1 EEG Data Collection..............................................................................................15

3.3.2 EEG Pre-Processing ...............................................................................................15

3.3.3 Calculation of Time-Frequency Measures .............................................................16

3.3.4 Source Estimation ..................................................................................................17

3.4 Statistical Comparisons ......................................................................................................18

3.4.1 Behavioural Analyses ............................................................................................18

3.4.2 Response-Based Analyses .....................................................................................18

3.4.3 Individual Differences Analysis ............................................................................19

Chapter 4 ........................................................................................................................................21

4 Results .......................................................................................................................................21

4.1 Behavioural Results ...........................................................................................................21

4.1.1 Calibration Task Results ........................................................................................21

4.1.2 Main Task Results..................................................................................................21

4.2 EEG Results .......................................................................................................................25

4.2.1 Response-Based Comparisons ...............................................................................25

4.2.2 Source-Modelling of the Response-Based Differences .........................................25

4.2.3 Individual Differences Analysis ............................................................................26

Chapter 5 ........................................................................................................................................33

5 Discussion .................................................................................................................................33

5.1 Behavioural Bias towards “Synchronous” Responses .......................................................33

5.2 Response-Based Analyses .................................................................................................34

vi

5.2.1 Intertrial Coherence and the Role of Phase Resetting in Multisensory

Integration ..............................................................................................................34

5.2.2 Response-Based ERP Differences and Source Modelling .....................................36

5.3 Intertrial Coherence and Individual Differences in TBW Width .......................................39

5.4 Limitations .........................................................................................................................40

5.5 Conclusions and Future Directions ....................................................................................42

References ......................................................................................................................................44

vii

List of Tables

Table 1: SOAs presented during the main task. ................................................................... 15

Table 2: Descriptive statistics for the main task behavioural results. ................................ 23

viii

List of Figures

Figure 1: Schematic of the audiovisual simultaneity judgment task. ................................. 13

Figure 2: Group temporal binding window. ............................................................................ 22

Figure 3: Right temporal binding window width vs left temporal binding window width.. 22

Figure 4: Percent of ambiguous trials perceived as synchronous for each trial type

during the main task. ................................................................................................................. 23

Figure 5: Percent of ambiguous trials perceived as synchronous during each block of

the main task. .............................................................................................................................. 24

Figure 6: ITC enhancement in response to ambiguous stimuli at two representative

electrodes .................................................................................................................................... 27

Figure 7: Topography of PLS salience values for “synchronous” vs “asynchronous”

responses time-locked to S1 .................................................................................................... 28

Figure 8: Grand average ERPs for the A50 condition time-locked to S1 at three

representative central electrodes. ........................................................................................... 28

Figure 9: Topography of PLS salience values for “synchronous” vs “asynchronous”

responses time-locked to S2. ................................................................................................... 29

Figure 10: Grand average ERPs for the A50 condition time-locked to S2 at three

representative central electrodes. ........................................................................................... 29

Figure 11: Current density maps depicting difference between A50 “asynchronous” and

A50 “synchronous” trials time-locked to S1. .......................................................................... 30

Figure 12: Current density maps depicting difference between A50 “asynchronous” and

A50 “synchronous” trials time-locked to S2. .......................................................................... 31

ix

Figure 13: Correlation between mean temporal binding window and Behaviour PLS

brain score for the V10 condition. ............................................................................................ 32

Figure 14: Topography of ITC values positively correlated with mean TBW in the V10

condition. ..................................................................................................................................... 32

1

Chapter 1

1 Introduction

1.1 Perceptual Binding

We perceive the world through multiple sensory systems, each of which makes a unique

contribution to the richness of our perceptual experience. Because each of these systems captures

different kinds of physical energy, together they offer complementary information about our

environments and the events that unfold around us. Crucially, the healthy brain is capable of

combining these signals into a unified perceptual experience in a process known as perceptual

binding, or perceptual fusion. In conversation, for instance, the words that we hear and the

accompanying movements that we see are experienced as a unified perceptual event, rather than

separate auditory and visual streams. Producing this bound percept flexibly across contexts

means that the brain must contend with variable delays between the modalities, caused by

differences in timing both outside and within the nervous system. At present, the neural

mechanisms that mediate perceptual binding, despite these delays, have yet to be established.

Furthermore, behavioural work has demonstrated that individuals differ widely in their “temporal

binding windows” (TBW) which index their sensitivity to intersensory delays (Conrey & Pisoni,

2006; Dixon & Spitz, 1980; Miller & D’Esposito, 2005; Powers, Hillock, & Wallace, 2009;

Stevenson, Altieri, Kim, Pisoni, & James, 2010; Stevenson, Zemtsov, & Wallace, 2012), raising

questions about how these mechanisms could vary between individuals.

In addition to the perceptual consequences, the behavioural importance of the ability to

synthesize information from across the senses is well established. Specifically, multisensory

stimuli can produce a number of behavioural enhancements, including improved detection and

reaction times compared to unisensory stimuli (Diederich & Colonius, 2004; Hershenson, 1962;

Lovelace, Stein, & Wallace, 2003; Nickerson, 1973). Furthermore, they have been shown to

facilitate higher-order processes such as learning (Shams & Seitz, 2008) and comprehension of

speech in noise (Grant, Walden, & Seitz, 1998; Sumby & Pollack, 1954). These findings, in

combination with growing evidence of ubiquitous multisensory interactions from anatomy and

physiology, motivate the developing view that the brain is fundamentally organized to integrate

2

information across putatively unisensory neural systems, with wide-reaching consequences for

perception, cognition, and action (Calvert, Spence, & Stein, 2004; Driver & Noesselt, 2008;

Ghazanfar & Schroeder, 2006).

In light of these considerations, it is clear that the brain’s ability to selectively synthesize

information from multiple sensory systems into a coherent percept is crucial to understanding

and responding adaptively to our environments. Indeed, disturbances in multisensory integration,

including reduced sensitivity to intersensory delays in perceptual binding, have been linked to

deficits in several clinical and neurodevelopmental conditions, including schizophrenia, autism,

and dyslexia (see Wallace & Stevenson, 2014 for a review), as well as mild cognitive

impairment (Murray et al., 2018) and reduced spatial working memory and mobility in older

adults (Wu, Chen, Yeh, Wu, & Tang, 2015). A better understanding of perceptual binding may

therefore shed light on basic principles of brain operation, as well as reveal differences relevant

to diagnosis and treatment in these conditions.

1.2 Study Overview

New insight into the mechanisms of binding may come from an emerging framework proposing

that multisensory integration is mediated by large-scale neural oscillations, which coordinate the

activity of distributed neural populations in order to effect integration (see Keil & Senkowski,

2018 for a review). Notably, invasive recordings in both animals and humans have demonstrated

that a stimulus in one modality (e.g. auditory) can reset the phase of ongoing oscillations in

primary cortex associated with another modality (e.g. vision; Kayser, Petkov, & Logothetis,

2008; Lakatos, Chen, O’Connell, Mills, & Schroeder, 2007; Lakatos, O’Connell, et al., 2009;

Mercier et al., 2013). While several authors have hypothesized that this oscillatory phase

resetting may promote integration within a multisensory network (Keil & Senkowski, 2018;

Lakatos, O’Connell, et al., 2009; Mercier et al., 2013; van Atteveldt, Murray, Thut, & Schroeder,

2014), only one study has so far demonstrated a link between phase resetting and perceptual

fusion (Kambe, Kakimoto, & Araki, 2015). However, these authors employed a restrictive region

of interest approach, thereby ignoring potentially relevant activity produced elsewhere in the

brain, and did not report inverse modelling of the relevant cortical sources. The latter would have

strengthened their claim that the phase resetting observed at the scalp truly originated from

primary sensory cortices, and was thus a result of genuine crossmodal phase resetting.

3

Given these concerns, this study aimed to clarify whether crossmodal phase resetting is indeed

associated with audiovisual binding through a more comprehensive characterization of the brain

activity produced during this process. To accomplish this, an audiovisual simultaneity judgment

task, similar to that used by Kambe and colleagues (2015), was employed, with the perception of

synchrony used as an index of perceptual binding. By calibrating the temporal delays between

the auditory and visual stimuli for each participant, ambiguous stimuli were produced. As such,

differences in phase resetting between those stimuli perceived as “synchronous” and those

perceived as “asynchronous” could be interpreted as differences between bound and unbound

percepts, respectively. Concomitant differences in power or the ERP, which represent a potential

confound (van Diepen & Mazaheri, 2018), were also assessed. Additionally, multivariate

analysis and source modelling were employed in order to better characterize the distributed

spatial and temporal patterns of activity differentiating “synchronous” and “asynchronous”

percepts. Lastly, the temporal binding window width of each participant was correlated with

their neural activity in response to audiovisual stimulation, with the aim of connecting individual

differences in temporal binding with corresponding differences in oscillatory phase resetting.

4

Chapter 2

2 Background

2.1 Temporal Determinants of Binding

2.1.1 The Problem of Intersensory Timing

Temporal coincidence has been established as one of the chief determinants of interaction

between the senses (Meredith, Nemitz, & Stein, 1987; Meredith & Stein, 1986), and sensory

events that occur close together in time are more likely to be perceptually bound (Conrey &

Pisoni, 2006; Dixon & Spitz, 1980; Miller & D’Esposito, 2005). However, this seemingly simple

heuristic is problematic, given that the brain does not have access to the absolute timing of

events, and must in fact contend with various lags both outside and within the nervous system.

Given that the transmission of light is several orders of magnitude faster than the transmission of

sound through air (roughly 3x108 m/s and 3.4x102 m/s, respectively), the sound of an event will

effectively always reach an observer after the light does, and this discrepancy increases with

distance. Further delays are introduced during transduction, where stimulus properties such as

intensity can change transduction time (Lennie, 1981) and after transduction, where early

processing of auditory information is generally faster than that of visual information

(approximately 10 ms for auditory information and 50 ms for visual; Keetels & Vroomen, 2012;

Pöppel, Schill, & von Steinbüchel, 1990). Together, these delays mean that visual and auditory

information originating from the same event arrive at their respective primary cortices at

different times, and the direction of the offset can vary depending on distance, early sensory

processing speed, and properties of the stimuli themselves. Despite these challenges, healthy

observers can reliably experience audiovisual binding for events at various distances, not just the

small “horizon of synchrony” where the physical and neural delays are balanced (Pöppel et al.,

1990). Taken together, these factors suggest that the brain can only determine synchrony

approximately, and must therefore tolerate a range of asynchronies when binding stimuli. While

such a window of integration has been well-documented empirically (e.g. Conrey & Pisoni,

2006; Powers, Hillock, & Wallace, 2009; Stevenson, Zemtsov, & Wallace, 2012; Van Eijk,

5

Kohlrausch, Juola, & Van De Par, 2008; van Wassenhove, Grant, & Poeppel, 2007), the neural

mechanisms that produce it remain an open question.

2.1.2 Quantifying the Binding Window

The temporal binding window (TBW) is a construct that provides a means of quantifying an

individual’s window of integration. It can be derived from a handful of psychophysical tasks,

which assume that the perception of two stimuli as synchronous (i.e. they are not distinguishable

as distinct events) constitutes multisensory integration of those stimuli. One such method is the

simultaneity judgment (SJ) task, which presents participants with two brief stimuli (e.g. one

auditory and one visual) separated by a systematically varied stimulus onset asynchrony (SOA),

and asks them to judge whether or not the stimuli occurred simultaneously (e.g. Powers, Hillock,

& Wallace, 2009; Stevenson & Wallace, 2013; Van Eijk, Kohlrausch, Juola, & Van De Par,

2008). By fitting a psychometric function to these data, the likelihood of integration can then be

modelled as a function of SOA for each individual (see section 4, Figure 2 for an example).

2.1.3 Characteristics of the Temporal Binding Window

Through behavioural measurements of the TBW, the characteristics of the binding window have

been explored, revealing several consistent features. The first is asymmetry, where the right

(visual-leading) side of the window is wider than the left (auditory-leading) side on average

(Conrey & Pisoni, 2006; Dixon & Spitz, 1980; Hillock, Powers, & Wallace, 2011; Stevenson et

al., 2012; van Atteveldt, Formisano, & Blomert, 2007; van Wassenhove et al., 2007; Vroomen &

Keetels, 2010), however the width of the left and right sides are highly correlated within an

individual (Stevenson et al., 2012). Similarly, the offset where simultaneity is most likely to be

perceived, known as the point of subjective simultaneity (PSS), is often found slightly to the

right of true synchrony (Dixon & Spitz, 1980; Kayser et al., 2008; Stone et al., 2001; van Eijk et

al., 2008; Zampini, Shore, & Spence, 2003). In other words, stimuli with a slight auditory lag are

generally more likely to be perceived as synchronous than stimuli that are objectively

synchronous, perhaps reflecting the naturalistic case where auditory stimuli always slightly lag

visual stimuli due to the differences in their respective speeds (see Van Eijk et al., 2008 for

further discussion of this issue).

6

Additionally, the TBW has been shown to vary substantially between healthy individuals in

terms of its overall width (Conrey & Pisoni, 2006; Dixon & Spitz, 1980; Miller & D’Esposito,

2005; Powers et al., 2009; Stevenson et al., 2010, 2012) – a finding that has yet to be explained

mechanistically. Furthermore, widening of the TBW has been identified in several

neurodevelopmental conditions, including schizophrenia (Foucher, Lacambre, Pham, Giersch, &

Elliott, 2007; Martin, Giersch, Huron, & van Wassenhove, 2013), autism (de Boer-Schellekens,

Eussen, & Vroomen, 2013; Kwakye, Foss-Feig, Cascio, Stone, & Wallace, 2011), and dyslexia

(Hairston, Burdette, Flowers, Wood, & Wallace, 2005), pointing to differences in multisensory

processing that may play a key role in higher level deficits associated with these conditions

(Wallace & Stevenson, 2014).

Multisensory performance is also known to change across the lifespan, potentially indexing

developmental health. Cross-sectional studies show that the TBW is wider in childhood, reaches

its narrowest sometime in middle age, and then broadens again in older adulthood (Hillock-Dunn

& Wallace, 2012; Hillock et al., 2011; Noel, Niear, Burg, & Wallace, 2016; Stevenson, Baum,

Krueger, Newhouse, & Wallace, 2018). In children, those who experience greater reaction time

benefit from audiovisual cues (as opposed to auditory or visual alone) are more likely to perform

well on a standard measure of intelligence (WISC-IV; Barutchu et al., 2011). In middle aged and

older adults, the width of the temporal binding window was found to be correlated with working

memory performance (Wu et al., 2015). Similarly, performance in an audiovisual detection task,

coupled with information about modality dominance, was recently shown to be an effective

screen for mild cognitive impairment (MCI) in older adults (Murray et al., 2018). Together these

findings suggest that multisensory integration performance, including the TBW, follows a

protracted course of development over the life span, and changes in the trajectory of this

development could index elements of overall brain function and health.

In summary, the canonical features of the TBW (asymmetry, correlated left and right windows,

rightward-shifted PSS), as well its individual variability, widening in neurodevelopmental

populations, and developmental time course, are all features that a complete account of the

neural mechanisms of multisensory integration will eventually have to explain. However, there

has so far been little effort to use these behavioural findings to inform electrophysiological

research aiming to describe the neural mechanisms of multisensory integration (but see

Kaganovich & Schumaker, 2016). To take the first step in this direction, individual differences in

7

TBW should be treated as variables of interest, rather than just unwanted sources of variability to

be controlled for in electrophysiology studies (as in e.g. Ikumi, Torralba, Ruzzoli, & Soto-

Faraco, 2018; Kambe, Kakimoto, & Araki, 2015; Yuan, Li, Liu, Yuan, & Huang, 2016). While

necessarily correlational, determining which neural responses to multisensory stimuli correlate

with individual differences in the window of integration could highlight which neural

phenomena may play a mechanistic role in integration. In doing so, future work may bridge the

gap between behaviour, neural mechanisms, and their alteration in development and disease.

2.2 Potential Mechanisms

2.2.1 Classical Models and Their Limitations

Much of the current mechanistic understanding of multisensory integration comes from seminal

work on superior colliculus (SC) neurophysiology in cats (Meredith et al., 1987; Meredith &

Stein, 1983, 1986, 1996). The superior colliculus, a midbrain structure involved in spatial

orienting, is a point of confluence where auditory, visual, and somatosensory inputs converge on

single neurons. Critically, these multisensory neurons display multisensory enhancement (MSE)

in response to bimodal stimuli, meaning that their response to both stimuli at once exceeds the

response to the more effective of the unisensory stimuli alone (B. E. Stein & Stanford, 2008; B.

Stein & Meredith, 1993). The presence of this effect has since become a standard criterion for

deeming a response “multisensory”, and has been widely used to assess the spatial and temporal

determinants of integration (Beauchamp, 2005; Calvert, Hansen, Iversen, & Brammer, 2001; B.

E. Stein & Stanford, 2008; B. Stein, Stanford, & Rowland, 2009).

A number of neural network models have since been developed to reproduce the specific

behaviours of multisensory SC neurons, including MSE (e.g. Alvarado, Rowland, Stanford, &

Stein, 2008; Ohshiro, Angelaki, & Deangelis, 2011; Rowland, Stanford, & Stein, 2007), however

such models are likely not sufficient to explain multisensory temporal fusion for two main

reasons. First, while multisensory SC neurons do exhibit a temporal window of integration (i.e.

multisensory stimuli closer together in time produce greater MSE in these neurons), there is no

evidence yet to suggest that the presence of MSE in multisensory circuits is responsible for

producing perceptual binding, rather than simply regulating basic behaviours such as orienting

(Stein, 1998) . In other words, the gap in understanding between the cellular and perceptual

levels of analysis is too broad to suggest that MSE and perceptual binding are one and the same,

8

and therefore an account of the cellular temporal window of integration is not necessarily

sufficient to explain the perceptual binding window.

Second, these models carry the implicit assumption that multisensory integration is a

feedforward, hierarchical process, in contrast to emerging views of the cortex as largely

multisensory (Calvert et al., 2004; Driver & Noesselt, 2008; Ghazanfar & Schroeder, 2006).

More specifically, the classical model of multisensory integration holds that each modality is

processed largely within anatomically segregated streams, and the computations involved in their

integration take place at the locations where these streams converge (e.g. SC, but also superior

temporal sulcus, inferior parietal cortex, and others). However there is now overwhelming

anatomical and electrophysiological evidence that putatively unisensory areas receive input from

other modalities, and these inputs can alter activity there, including enhanced responses to

subsequent stimuli (Driver & Noesselt, 2008; Falchier, Clavagnier, Barone, & Kennedy, 2002;

Ghazanfar & Schroeder, 2006; Kayser & Logothetis, 2007; Lakatos et al., 2007; Lakatos,

O’Connell, et al., 2009; Mercier et al., 2013; Rockland & Ojima, 2003). Even at the scalp,

crossmodal effects on ERP amplitude have been observed as early as 40 ms after stimulus onset,

suggesting that an auditory stimulus can alter the amplitude of early cortical responses to a

subsequent visual stimulus (Giard & Peronnet, 1999; Molholm et al., 2002). Furthermore, fMRI

studies investigating perceptual fusion of asynchronous audiovisual stimuli in humans identify a

network of cortical regions including primary visual and auditory cortex, superior temporal

sulcus, as well as prefrontal and parietal regions (Dhamala, Assisi, Jirsa, Steinberg, & Scott

Kelso, 2007; Powers, Hevey, & Wallace, 2012; Stevenson et al., 2010), suggesting that

formation of the fused percept may be a function of a distributed network of interacting regions,

rather than the result of a local neural computation. Together, these findings strongly suggest that

multisensory interactions in fact occur both within and upstream of convergence areas, in

putatively unisensory cortex. As such, an account of multisensory integration which only models

the region of convergence, without accounting for early crossmodal interactions within the

broader network (including primary cortex), is likely to be deficient.

2.2.2 Ongoing Oscillations and Crossmodal Phase Resetting

In contrast to the localized computation posited by SC-inspired models, an emerging framework

holds that multisensory integration is coordinated by neural oscillations at distributed spatial and

9

temporal scales (see Keil & Senkowski, 2018 for a review). This framework, which posits that

large scale modulatory oscillations coordinate integration, could suggest a mechanism of binding

which accounts for the observed pervasive crossmodal interactions as well as the temporal

sensitivity of this binding process captured by the temporal binding window.

It is well established that ongoing oscillations modulate cortical excitability, and this behaviour

may be central to the coordination of activity among distributed neural populations (Buzsáki &

Chrobak, 1995). By imposing a shared temporal structure between neuronal populations through

common periods of high and low excitability, large-scale oscillations could promote

communication between populations, thereby facilitating processes such as perceptual binding

(Buzsáki & Chrobak, 1995). Crucially, it is now known that incoming stimuli can alter this

temporal structure by resetting the phase of ongoing oscillations, and this mechanism is thought

to contribute to the observed ERP response to a single stimulus (Makeig, Debener, Onton, &

Delorme, 2004; Sayers, Beagley, & Henshall, 1974). Moreover, recent invasive

electrophysiological work in macaques and humans has provided striking evidence that this

phase resetting occurs crossmodally in primary sensory areas. For instance, intracranial

recordings in macaques demonstrate that an attended stimulus can reset the phase of ongoing

theta and gamma oscillations in both A1 and V1, regardless of whether the stimulus is auditory

or visual (Lakatos, Connell, et al., 2009). Studies recording from just one primary sensory area

found similar evidence of phase resetting in macaque A1 using visual (Kayser et al., 2008) and

somatosensory stimuli (Lakatos et al., 2007), and human V1 using auditory stimuli (Mercier et

al., 2013). Given these findings, Lakatos and colleagues (2009) have proposed that a salient

stimulus, regardless of modality, resets the phase of ongoing oscillations in primary cortical

areas (visual, auditory, and somatosensory), thereby altering how subsequent stimuli will be

processed in these areas.

This crossmodal alteration of ongoing oscillations has clear implications for multisensory

integration, as this shared temporal structure may promote synchronization between unisensory

and multisensory regions at higher frequencies (e.g. gamma), thereby promoting perceptual

binding of subsequent stimuli (Fries, 2015; Mercier et al., 2015; Senkowski, Schneider, Foxe, &

Engel, 2008; Voloh & Womelsdorf, 2016). Indeed, there is preliminary evidence to suggest that

phase resetting can promote inter-regional synchronization, as demonstrated by a correlation

10

between phase resetting in auditory cortex following an audiovisual stimulus, and delta phase

synchronization between auditory and motor cortex (Mercier et al., 2015).

2.2.3 Crossmodal Phase Resetting and the Temporal Window of Integration

If crossmodal phase resetting is a mechanism of multisensory integration, it could account for

key features of the temporal binding window. In essence, resetting the ongoing oscillations of a

distributed network could impose a temporal frame of reference which determines whether

subsequent stimuli are integrated or segregated. For instance, two stimuli arriving at different

times could be fused as long as the second arrives within the excitatory phase of the oscillatory

cycle set by the first (Lakatos, O’Connell, et al., 2009). Phase resetting induced by the first

stimulus could also promote a limited period of inter-regional synchronization among unisensory

areas and a broader multisensory network, thereby enhancing communication within this

functional network and potentially facilitating binding of the second stimulus when it arrives. As

this synchronization decays, the likelihood of integration would decrease, accounting for the

decreasing probability of perceptual binding with time. Such a correlation between phase

resetting and inter-regional synchronization has been demonstrated invasively in humans

between auditory cortex and motor cortex (Mercier et al., 2015), but not yet a distributed

multisensory network.

In healthy adults, individual differences in the effectiveness of this phase resetting mechanism

could also explain individual differences in the width of the temporal binding window. Similarly,

the widened temporal binding window observed in autism, for example, could arise from a

deficit in coordinating this resetting within the multisensory network, in line with views of

autism as a disorder characterized by abnormal connectivity and functional coordination across

brain regions (Belmonte et al., 2004). Furthermore, within individuals, trial-by-trial differences

in integration could be explained by variability in the phase to which ongoing oscillations are

reset, due to endogenous factors like fluctuations in attention (Lakatos, Karmos, Mehta, Ulbert,

& Schroeder, 2008; Lakatos, O’Connell, et al., 2009) or random variability in the power or phase

of pre-stimulus oscillations (Ikumi et al., 2018; Yuan et al., 2016).

To our knowledge, only a single study has demonstrated a link between phase resetting and the

perception of audiovisual synchrony (Kambe et al., 2015). These authors found that higher beta

11

phase resetting (measured as inter-trial coherence) in the 100 ms following the first stimulus

differentiated trials seen as fused from those seen as segregated. This result is plausible, given

that beta synchronization has been observed between neural groups in different cortical areas

(Brovelli et al., 2004; Tallon-Baudry, Bertrand, & Fischer, 2001), and computational modelling

has demonstrated that oscillators can synchronize in the beta range despite long conduction

delays (Kopell, Ermentrout, Whittington, & Traub, 2000). Furthermore, Mercier and colleagues

(2013) identified auditory-driven beta resetting invasively in human participants, but

hypothesized that this activity was related to response preparation, given the beta band’s prior

implication in sensorimotor processing (Pfurtscheller & Neuper, 1996).

Currently, it cannot be concluded that the phase resetting observed at the scalp by Kambe and

colleagues was indeed crossmodal phase resetting from the scalp topography alone. While neural

sources cannot be located conclusively with EEG, the connection between crossmodal phase

resetting and synchrony perception (and therefore fusion) would be greatly strengthened if these

results could be reproduced, and source modelling employed to verify that the observed phase

resetting originates in primary cortex associated with the second modality. Furthermore, the

application of multivariate approaches to the entire dataset, rather than univariate analysis on

limited regions of interest, would allow much finer characterization of the spatiotemporal

activity patterns associated with the two percepts.

12

Chapter 3

3 Methods

3.1 Participants

Twenty-eight healthy young adult participants were recruited from the Baycrest Centre

participant database. All participants provided written consent according to the guidelines

established by Baycrest Centre and the University of Toronto, and were provided monetary

compensation for their participation. Seven participants were excluded because of unsuccessful

model fitting to behaviour during the calibration task, previously undisclosed psychiatric

medications, or excessive artifacts in the EEG signal. The final sample included twenty-one

healthy young adults (19-33 years old, mean age 23.5, 10 female) all with normal or corrected-

to-normal vision and hearing, and no reported history of psychiatric illness. Nineteen participants

were right-handed, one left-handed and one ambidextrous.

3.2 Stimuli and Task

3.2.1 Overview

Data collection took place over two stages. The first stage was a short behavioural calibration

phase, wherein each participant’s temporal binding window (TBW) and point of maximum

ambiguity (SOA50%) were measured using a two-alternative forced-choice simultaneity judgment

task, described below. In the second stage, electroencephalography (EEG) data was collected

while participants performed the same task, this time with individually calibrated stimuli based

on the results of the first stage.

All recordings took place in a dimly-lit, sound-attenuating room at the Rotman Research Institute

at Baycrest Centre. The task was built with PsychoPy software (Peirce et al., 2019) and presented

by a Dell Precision T3600 computer. Stimulus timing was verified to be accurate within ±4 ms

using a Tektronix TDS210 2-channel oscilloscope.

13

3.2.2 Simultaneity Judgment Task

The simultaneity judgment task consisted of a short jittered fixation period (1000-1500 ms),

followed by a brief visual flash and auditory beep stimulus, and lastly a response prompt (see

Figure 1). The flash and beep stimuli were separated by a systematically varied stimulus onset

asynchrony (SOA) ranging from -300 to 300 ms, where a negative number denotes auditory-first

presentation and a positive number denotes visual-first presentation. Hereafter the leading

stimulus will be referred to as S1, and the following stimulus S2.

The auditory stimulus was a 3500 Hz pure sine tone, 10 ms in length, delivered by a GSI 61

audiometer through insert earphones. The audiometer was calibrated such that a 5 s tone at the

same frequency produced an intensity of 102 dB SPL. The visual stimulus was a white annulus

flash on a black background, presented for 10 ms and covering 3.8 degrees of visual angle at a

viewing distance of 60 cm. It was presented on a Dell Trinitron CRT monitor at a refresh rate of

100 Hz.

After a short interval (750 ms) a prompt was displayed and the participant reported whether they

perceived the two stimuli as synchronous (‘Yes’) or asynchronous (‘No’) using a left or right

keyboard button press (counterbalanced between participants) within a 2000 ms time limit. After

a brief intertrial interval (750 ms) the next trial began and the process repeated.

Figure 1: Schematic of the audiovisual simultaneity judgment task, depicting the visual-leading condition.

14

3.2.3 Calibration Task

The calibration task measured an individual participant’s probability of synchrony perception at

a range of SOAs, allowing the point of maximum perceptual ambiguity (SOA50%) to be

calculated.

Following previous work, 19 SOAs were used in total, ranging from -300 to 300 ms.

Specifically, SOAs of 0, 10, 20, 50, 80, 100, 150, 200, 250, and 300 ms were presented in both

the auditory-leading and visual-leading cases (Stevenson et al., 2012). The task was broken into

four blocks, each containing four presentations of each of the 19 SOAs in a random order,

resulting in each SOA being presented 16 times. 285 trials were presented in total, over a

duration of roughly 15 minutes. Participants were offered a self-timed break between each block.

After measurement, the probability of synchrony perception for each SOA was calculated as the

number of “synchronous” responses divided by the total number of presentations (16). Two

psychometric sigmoid functions (Hillock-Dunn & Wallace, 2012; Powers et al., 2009; Stevenson

& Wallace, 2013) were fit to these data using Python’s lmfit function to model the left and right

halves of the temporal binding window. The left and right functions describe the relationship

between SOA and the probability of synchrony perception in the auditory-leading and visual-

leading case, respectively. By solving these functions for a rate of 50% synchrony perception,

the SOA that is maximally ambiguous for that participant (equally likely to be perceived as

synchronous or asynchronous) was estimated for both auditory-leading and visual-leading cases.

In addition, the point at which stimuli were predicted to be experienced as asynchronous 95% of

the time (SOA95%) was also determined for use as a filler trial in the main task. The data and

resulting functions were plotted automatically so that the success of the model fitting could be

verified before proceeding with the main task. SOA50% values less than 100 ms were rounded to

100 ms, in order to provide at least a 100 ms interval free from S2-evoked activity for analysis

purposes.

3.2.4 Main Task

Participants again performed a simultaneity judgment task as described above, this time with just

six SOAs. These were the SOA50% and SOA95% values for both auditory- and visual-leading

conditions calculated during the calibration task, as well as 10 ms and -10 ms trials. The

15

additional values were included to create a wider variety of SOAs, and prevent adaptation to the

SOA50% condition (Fujisaki, Shimojo, Kashino, & Nishida, 2004; Kambe et al., 2015). The

SOA50% and SOA95% values were rounded to the nearest multiple of 10 ms to accommodate the

monitor’s 10 ms frame time. The auditory-leading trials will hereafter be referred to as A10,

A50, and A95, and the visual-leading as V10, V50, and V95.

The task was broken into four blocks, each with 128 trials. Each block consisted of 32 A50 trials,

32 V50 trials, and the remaining 64 trials were split evenly between the remaining four trial types

(Table 1).

Label Leading modality SOA duration Trials per block Total trials

A50 Auditory SOA50% 32 128 A10 Auditory 10 ms 16 64 A95 Auditory SOA95% 16 64 V50 Visual SOA50% 32 128 V10 Visual 10 ms 16 64 V95 Visual SOA95% 16 64

Table 1: SOAs presented during the main task. SOA50% and SOA95% values were calculated for each participant individually

based on the results of the calibration task.

3.3 EEG Data Collection and Processing

3.3.1 EEG Data Collection

EEG data was recorded during the main task at a sampling rate of 512 Hz using a BioSemi

Active Two acquisition system (BioSemi Instrumentation, Netherlands). 66 scalp electrodes

were employed, using BioSemi’s 64+2 electrode cap configuration based on the 10/20 system.

Ten additional electrodes were applied in pairs to the mastoids, pre-auricular points, upper

cheeks, outer canthi of the eyes, and inferior orbit of the eyes. These provided better coverage of

the scalp, as well an accurate record of eye movements for later artifact removal.

3.3.2 EEG Pre-Processing

All EEG pre-processing was performed in Brainstorm, an open-source application for M/EEG

data processing and visualization (Tadel, Baillet, Mosher, Pantazis, & Leahy, 2011). The raw

EEG files were digitally high-pass filtered at a cutoff of 0.5 Hz (60 dB stopband attenuation) to

16

remove DC offset and low-frequency components (Widmann, Schröger, & Maess, 2015). Given

the limited contamination by 60 Hz line noise, and the a priori interest in oscillatory phase, notch

filters and low-pass filters were not applied in order to prevent undue distortion of the data

(Widmann et al., 2015). The continuous files were visually inspected for bad channels, which

were removed from subsequent analysis, and the remaining channels were then re-referenced to

the average of all remaining channels. Bad segments with obvious contamination from

movement or other artifacts were then manually rejected from each continuous data file. Artifact

detection and removal was achieved with independent component analysis (ICA; Makeig, Bell,

Jung, & Sejnowski, 1996). The Infomax ICA algorithm was applied to the longest available

continuous segment of data without major artifacts for each participant (minimum 7.5 minutes,

or 230,400 samples) and resulting components were visually inspected. Resulting ICA

components with a topography suggesting horizontal eye movement, vertical eye

movement/blinks, and cardiac activity were subtracted from the continuous EEG. Finally, the

data was epoched according to the trial type (see Table 1), and response (in the case of the A50

and V50 trials only). Each epoch was 2000 ms in length, spanning from 1000 ms before S1 to

1000 ms after S1. Remaining bad trials were rejected manually before averaging by condition

and participant. For the purposes of ERP analysis, each participant average was baseline

corrected by subtracting the mean of the 500 ms preceding the onset of the first stimulus.

3.3.3 Calculation of Time-Frequency Measures

To investigate non-phase-locked oscillatory power, time-frequency decompositions of the non-

baseline-corrected EEG signals from 1 to 80 Hz were computed using complex Morlet wavelets

(Bertrand & Tallon-Baudry, 2000). Brainstorm’s Time-Frequency process was applied to each

epoch individually, with a wavelet central frequency of 1 Hz, and time resolution (full width at

half maximum) of 3 seconds. The resulting time-frequency decompositions were then averaged

by trial type and participant, and then normalized to the pre-stimulus baseline (-1000 ms to 0 ms)

using the event related synchronization/desynchronization method, which returns the deviation

from the mean of the baseline at each channel and frequency bin (Pfurtscheller, 1992).

To investigate phase-locked oscillatory activity, inter-trial coherence (ITC; Tallon-baudry,

Bertrand, Delpuech, & Pernier, 1996) was calculated for each condition for each participant. ITC

represents the concentration of phases across trials relative to an event, where a value of 0

17

represents completely random phase distribution and a value of 1 represents identical phases

across trials. The Morlet wavelet decomposition of each trial and calculation of ITC from 1 to 80

Hz was performed with the Fieldtrip toolbox in MATLAB (Oostenveld, Fries, Maris, &

Schoffelen, 2011). Because phase concentration measures are particularly sensitive to low trial

numbers (resulting in a positive bias; Aydore, Pantazis, & Leahy, 2013; M. X. Cohen, 2014) the

number of trials in the ‘Yes’ and ‘No’ conditions were equated within a participant using random

sampling (Ikumi et al., 2018). For instance, if a participant had only 40 trials in the

“asynchronous” condition, ITC would be computed on 40 trials selected at random from the

“synchronous” condition. This random sampling was repeated 30 times, and the resulting ITC

maps averaged together.

3.3.4 Source Estimation

Source modelling was performed in order to investigate the neural sources of the measured

activity. First, all participants were assigned the MNI/ICBM152 default anatomy in Brainstorm

(Fonov, Evans, Mckinstry, Almli, & Collins, 2009), and the forward model was calculated from

this anatomy using the OpenMEEG BEM method (Gramfort, Papadopoulo, Olivi, & Clerc, 2010;

Kybic et al., 2005). A noise covariance matrix was calculated for each participant from the pre-

processed EEG during pre-stimulus baseline periods (-1000 to 0 ms) of all available trials. Data

covariance matrices were similarly calculated from the A50 “synchronous”, A50

“asynchronous”, V50 “synchronous” and V50 “asynchronous” trials (0 to 1000 ms).

Cortical current density maps were then calculated for every trial using minimum norm imaging

with unconstrained sources (Baillet, Mosher, & Leahy, 2001). For each time point, this method

estimates the current at each point of the cortical surface, using three orthogonal dipoles at each

location. Due to the size of the resulting files each map was downsampled to the 68-region

Desikan-Killiany atlas (Desikan et al., 2006). To explore the average ERP response at the source

level, these maps were then averaged by condition and participant, and Z-score normalized to the

pre-stimulus baseline (-1000 to 0 ms). Time frequency measures were also calculated on the un-

normalized source maps using the same procedures described above (section 3.3.3).

18

3.4 Statistical Comparisons

3.4.1 Behavioural Analyses

As a manipulation check, the rate of synchrony perception in the main experiment for each SOA

type (e.g. A10, A50, and A95) were compared for both the auditory- and visual-leading

conditions with a repeated-measures ANOVA. This ensured that longer SOAs resulted in

decreased synchrony perception as expected. Overall rate of synchrony perception for the A50

and V50 trials were similarly compared between the four blocks of the main task, to ensure that

rate of synchrony perception remained stable over the course of the task, and ensure that

adaptation or fatigue effects were not present.

3.4.2 Response-Based Analyses

In order to identify the neural activity associated with synchrony perception (and therefore

integration), a number of planned comparisons were carried out between the “synchronous” and

“asynchronous” conditions (A50 “synchronous” vs A50 “asynchronous” and V50 “synchronous”

vs V50 “asynchronous”). Importantly, a substantial imbalance in the number of trials between

two conditions presents a potential confound when comparing them, as lower trial numbers can

positively bias both ERP averages (Thomas, Grice, & Najm-Briscoe, 2004) and time-frequency

measures (Aydore et al., 2013; M. X. Cohen, 2014). Therefore, in addition to the random

sampling procedure outlined above, any participants with less than 30 artifact-free trials in either

the “synchronous” or “asynchronous” condition were excluded from the relevant comparisons

involving those conditions (M. X. Cohen, 2014). Because of this criterion, 13 participants were

included in the auditory-leading comparison (mean number of trials in the less-populated

condition = 39.3, SD = 7.6) and nine in the visual-leading comparison (mean = 42.1, SD = 11.5).

This attrition rate (9-13 analyzed out of 21 participants) is comparable to those reported by other

EEG studies employing phase-based measures, especially those where trial count depended on

participant behaviour (e.g. 9-10 analyzed out of 27 in Ikumi et al., 2018; 8 out of 21 in van Dijk,

Schoffelen, Oostenveld, & Jensen, 2008; 11 out of 18 in Mathewson, Gratton, Fabiani, Beck, &

Ro, 2009).

At both the sensor and source levels, responses were compared using a two-tailed paired

permutation t-test with 1000 randomizations (Pantazis, Nichols, Baillet, & Leahy, 2005). Such

19

tests were carried out for the averaged ERP responses, as well as the averaged power and ITC

maps for the first 100 ms after S1, as well as a longer interval spanning 1000 ms after S1. False

Discovery Rate (FDR) correction (Benjamini & Hochberg, 1995) was used to control the

familywise error rate for all comparisons, controlling for the signal and time dimensions in the

case of the ERP comparisons, and signal, time, and frequency dimensions in the case of time-

frequency comparisons.

In addition to the univariate analyses described above, Partial Least Squares (PLS; McIntosh &

Lobaugh, 2004), was employed to investigate spatiotemporal patterns of activity that may index

multisensory integration. PLS is a multivariate technique for neuroimaging analysis which, in

contrast to univariate approaches, allows inferences about task or behaviour-based differences in

patterns of activity that are extended in space and time. The resulting patterns (latent variables or

LVs) can then be tested for statistical significance with permutation tests. For those LVs that

pass a significance threshold, the reliability of each sensor’s (or source’s) contribution to the

experimental effect at each time point can be assessed using bootstrap resampling. Together,

these approaches provide a picture of which elements contribute reliably to a significant,

distributed experimental effect, and at what time points.

As for the univariate tests, PLS was computed separately on the ERP, time-frequency, and ITC

averages at both the sensor and source levels to compare the “synchronous” vs “asynchronous”

conditions. For time-frequency and ITC averages this required that, for each observation, the

two-dimensional time-frequency maps be collapsed into a one-dimensional vector with: 𝑙𝑒𝑛𝑔𝑡ℎ =

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 × 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑏𝑖𝑛𝑠 × 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑛𝑠𝑜𝑟𝑠/𝑠𝑜𝑢𝑟𝑐𝑒𝑠. Time-frequency

measures calculated with Morlet wavelets also have “edge effects” – transients at the beginning

and end of each frequency bin where the frequency components are systematically

misrepresented. The positions of these edge effects are determined directly by the wavelet

parameters, and could therefore be calculated and removed from the PLS data structure.

3.4.3 Individual Differences Analysis

In addition to the response-based analyses, the task design offered an opportunity to explore

individual differences in the tendency to integrate temporally offset stimuli, as indexed by the

temporal binding window measured during the calibration task. Specifically, Behaviour PLS

(McIntosh & Lobaugh, 2004) allows identification of distributed patterns of brain activity that

20

correlate with a behavioural measure, in this case the mean temporal binding window (average of

left and right TBWs). Because the SOA50% and SOA95% varied between participants, and were

determined directly by the TBW values, only the A10 and V10 conditions could be included in

this analysis. Similar to above, Behavior PLS using the mean TBW as the behavioural measure

of interest was computed for all 21 participants (0-1000 ms) on the ERP, power and ITC

averages of the A10 and V10 trials at both the sensor and source levels.

21

Chapter 4

4 Results

4.1 Behavioural Results

4.1.1 Calibration Task Results

The mean SOA50% measured during the calibration task was 162.6 ms (SD = 62.9 ms) for the

auditory-leading condition, and 240.0 ms (SD = 106.2 ms) for the visual-leading condition.

Figure 2 depicts a group temporal binding window fit to the calibration data of all participants.

These data reproduced the established finding that, on average, the right (visual-leading) side of

the temporal binding window is wider than the left (auditory-leading; t = 4.80, p < .001). The

width of the left side of the TBW (measured at SOA50%) ranged from 57.6 to 293.9 ms, and the

right side from 93 to 505 ms. As described previously (Stevenson et al., 2012), the widths of the

left and right sides were correlated (r(19)= 0.73, p < .001) within participants (Figure 3).

4.1.2 Main Task Results

For each condition (auditory-leading and visual-leading) the mean percentage of trials perceived

as synchronous was significantly different between all three levels of stimulus type (Figure 4).

This suggests that, on the whole, the manipulation of SOA was successful in producing different

probabilities of synchrony perception in each trial type, and these levels could therefore be

treated as distinct for further analyses. Figure 4 summarizes the percentage of trials perceived as

synchronous for each trial type during the main task.

However, the mean percentage of synchrony perception differed substantially from the targeted

values. For the ambiguous trial types (A50 and V50), the rate of synchrony perception differed

from the targeted 50% for both the auditory-leading (t(20) = 5.45, p < .001) and visual-leading

(t(20) = 4.02, p < .001) conditions (Figure 4). Similarly, mean synchrony perception in the A95

(t(20) = 7.41, p < .001) and V95 (t(20) = 6.15, p < .001) conditions were different than the target

of 5% synchrony perception. This means that, on average, participants were more likely to report

perceiving synchrony during the main task compared to the calibration task given the same SOA.

Such discrepancies, albeit of smaller magnitude, have been reported by previous studies

22

Figure 2: Group temporal binding window, produced by fitting two sigmoid functions (left and

right) to the calibration task data of the entire sample. By convention, the left side depicts the

auditory-leading case, and the right side the visual-leading case. Dots represent individual

participants, with darker dots indicating overlapping points.

Figure 3: Right temporal binding window width vs left temporal binding window width, measured

at point of maximum ambiguity (SOA50%).

23

Table 2: Descriptive statistics for the main task behavioural results.

Trial type Mean perceived synchronous (%)

Quartiles Interquartile Range Lower Upper

A50 70.3 64.1 84.4 20.3 A10 91.0 85.9 98.4 12.5 A95 28.8 20.3 35.5 15.2 V50 69.7 48.0 85.9 37.9 V10 91.4 85.9 98.4 12.5 V95 38.3 17.5 57.8 40.3

Figure 4: Percent of trials perceived as synchronous for each trial type during the main task. One-way repeated measures

ANOVA identified a main effect of SOA type in both the auditory-leading (F(2,40) = 240.24, p < .001) and visual leading

conditions (F(2,40) = 240.24, p < .001). All three levels of each condition (e.g. A10, A50, A95) are different from each other at

p < .001 (Tukey’s HSD).

24

employing a similar design (Kambe et al., 2015; Yuan et al., 2016), but the cause remains

unclear. Possible interpretations of this finding are discussed in section 5.1. To examine whether

this apparent bias towards “synchronous” responses was stable across the task, the effect of task

block on rate of synchrony perception was assessed. One-way repeated measures ANOVA

revealed an effect of block on synchrony perception for the auditory-leading condition, but

follow-up analysis revealed that the first block alone was responsible for this difference (Figure

5). This suggests that the overall rate of synchrony perception remained stable from the second

block onwards, indicating against a gradual change in perception over the course of the

recording.

Figure 5: Percent of ambiguous (A50 or V50) trials perceived as synchronous during each block of the main task. One-way

repeated measures ANOVA identified a main effect of block in the auditory-leading (F(3, 60) = 9.46, p < .001) but not visual-

leading (F(3,60) = 9.46, p < .001) condition. The mean marked *** was found to be different from the other three blocks at p

< .001 (Tukey’s HSD).

25

4.2 EEG Results

4.2.1 Response-Based Comparisons

In both the auditory-leading and visual-leading conditions, enhanced ITC values compared to

pre-stimulus baseline were observed in a wide frequency range across the scalp (Figure 6). This

enhancement was distributed across a range of frequencies, with peaks concentrated between 3 to

15 Hz for both auditory and visual stimuli, and an additional peak in the low-gamma (roughly

30-55 Hz) range over central scalp for auditory stimuli. However, univariate comparisons

revealed no significant differences in these ITC enhancements between the “synchronous” and

“asynchronous” responses. Similarly, no response-related differences were detected in the ERPs

or oscillatory power at the sensor and source levels with univariate comparisons.

In contrast, multivariate analysis with PLS applied to the scalp ERPs identified one latent

variable which captured a difference between “synchronous” and “asynchronous” responses in

the A50 condition (Figure 7). Specifically, “asynchronous” responses were associated with

higher amplitude in a cluster of central electrodes from roughly 350 to 450 ms post-stimulus

(Figure 8). But given that the identified difference occurs after the arrival of the second stimulus

for some participants, it cannot be determined whether this difference is specifically related to

the first or second stimulus. To clarify this issue, PLS analysis was also applied to ERPs time-

locked to the second stimulus. Again, a latent variable was identified which differentiated

“synchronous” and “asynchronous” responses (p = 0.025; Figure 9), with central electrodes again

displaying higher amplitudes for the “asynchronous” compared to the “synchronous” trials, this

time between latencies of 150 to 300 ms after the second stimulus (Figure 10). No significant

latent variables were detected with PLS applied to ITC or power, at either sensor or source

levels.

4.2.2 Source-Modelling of the Response-Based Differences

In order to localize the cortical sources of these response-based ERP differences, the A50 grand

averages were subtracted (average of “asynchronous” response trials - average of “synchronous”

response trials) and the resulting difference wave projected onto the cortical surface using

minimum norm imaging with unconstrained sources (Baillet et al., 2001). The resulting current

density maps depict the cortical regions that are estimated to be more active during the windows

26

of interest identified by PLS time-locked to S1 (Figure 11) and S2 (Figure 12). Given the limited

spatial accuracy of inverse modeling methods, as well as the bias in current density maps

towards superficial sources, the neural generators responsible for the observed ERP differences

can only be approximately localized. However, a large peak in the posterior left temporal lobe

during the maximum of the ERP response to the first stimulus suggests that auditory cortex may

be the main generator of the observed difference there. Additional contributions may also be

attributable to bilateral anterior temporal lobes, bilateral occipital cortex, as well as left

prefrontal cortex. The validity of the activity identified in the left precentral, post-central, and

supramarginal gyri is difficult to assess, as activity in these regions may be misattributed from

the temporal surface below.

Source modelling of the difference wave associated with S2 similarly shows clear activity in the

left posterior temporal cortex (Figure 12). Again, activity identified in the left inferior parietal

lobule (supramarginal gyrus and angular gyrus) may simply be misattributed from temporal

cortex, although this region has previously been implicated in the perception of asynchrony with

audiovisual stimuli (Bushara, Grafman, & Hallett, 2001; Dhamala et al., 2007). Similarly,

activity was identified in bilateral inferior frontal gyrus, also previously associated with the

perception of audiovisual asynchrony (Bushara et al., 2001; Dhamala et al., 2007).

4.2.3 Individual Differences Analysis

In the V10 condition, Behaviour PLS using data from all 21 participants identified one LV (p =

0.049) reflecting a correlation between individual mean TBW width and a distributed pattern of

ITC at the sensor level (r = 0.97; Figure 13). In the first 100 ms after stimulus onset, a cluster of

central electrodes, as well as right posterior and left temporal electrodes, were highly salient in

frequency bins between 3.5-8 Hz (Figure 14), suggesting that early, low-frequency ITC elicited

by V10 stimuli may be predictive of the temporal window of integration at the individual level.

Such a correlation was not identified with non-baseline-corrected power or the ERP, suggesting

that this effect is not driven by differences in power or amplitude (M. X. Cohen, 2014; van

Diepen & Mazaheri, 2018).

27

Figure 6: ITC values during a) A50 and b) V50 trials at two representative electrodes (FCz and POz). ITC

values displayed are significantly different from pre-stimulus baseline (-500 to -100 ms; Student’s t test)

with a significance threshold of p < 0.01.

28

Figure 7

Figure 8:

Figure 7: Topography of PLS salience values for the LV (p = 0.011) identified on the contrast between A50 "synchronous" and

A50 "asynchronous" averages time-locked to S1, revealing a cluster of central electrodes with high salience (see fig. 8 for

detailed view of this cluster). Circles represent time points with bootstrap ratios more extreme than ± 2.58.

Figure 8: Grand average ERPs for the A50 condition

time-locked to S1 at three representative central

electrodes, split by response (“synchronous” vs

“asynchronous”). PLS analysis identified one latent

variable (LV; p = 0.011) capturing a difference

between “synchronous” and “asynchronous”

responses. Blue circles mark time points where the

given electrode makes a reliable contribution to this

LV according to bootstrap resampling (bootstrap

threshold = ±2.58).

29

Figure 9

Figure 9: Topography of PLS salience values for the LV (p = 0.025) identified on the contrast between A50 "synchronous" and

A50 "asynchronous" averages time-locked to S2, revealing a cluster of central electrodes with high salience (see fig. 10 for

detailed view of this cluster). Circles represent time points with bootstrap ratios more extreme than ± 2.58.

Figure 10: Grand average ERPs for the A50

condition time-locked to S2 at three

representative central electrodes, split by

response (“synchronous” vs “asynchronous”).

PLS analysis identified one latent variable (LV; p

= 0.025) capturing a difference between

“synchronous” and “asynchronous” responses.

Blue circles mark time points where the given

electrode makes a reliable contribution to this LV

according to bootstrap resampling (bootstrap

threshold = ±2.58).

30

Figure 10: Current density maps depicting the difference between trials perceived as asynchronous and trials perceived as

synchronous during the A50 condition (time-locked to S1, 350 to 450 ms). Values less than 40 pA.m were excluded from

the visualization, as well as clusters comprising 20 vertices or less.

31

Figure 11: Current density maps depicting the difference between trials perceived as asynchronous and trials perceived as

synchronous during the A50 condition (time-locked to S2, 150 to 300 ms). Values less than 40 pA.m were excluded from

the visualization, as well as clusters comprising 20 vertices or less.

32

Figure 12: Correlation between mean temporal binding window and Behaviour PLS brain score for the V10 condition.

Figure 13: Topography of ITC values that are positively correlated with mean TBW at a representative time and frequency

(50 ms post-stimulus, 7 Hz). Left: Electrode salience. Right: electrodes that exceed the ±2.58 bootstrap ratio threshold are

marked in white.

33

Chapter 5

5 Discussion

5.1 Behavioural Bias towards “Synchronous” Responses

In the main task, the majority of participants demonstrated a bias towards responding

“synchronous” when presented with an ambiguous stimulus, in both the auditory-leading and

visual-leading conditions (Figure 4). Additionally, some participants responded “synchronous”

to relatively extreme SOA values (A95 and V95) much more often than their individual

psychometric functions predicted. In short, participants demonstrated more tolerance of

asynchrony in the main task compared to the calibration task when responding “synchronous”.

This result is broadly consistent with previous findings that simultaneity judgments are flexible,

in that they adapt to the temporal statistics of preceding stimuli in a process known as temporal

recalibration. More specifically, prior repeated exposure to audiovisual stimuli with a constant

audiovisual lag will bias subsequent simultaneity judgments in the direction of the lag (Fujisaki

et al., 2004; Roseboom & Arnold, 2011; Yuan, Bi, Yin, Li, & Huang, 2014). However, there was

no long period of constant lag during which adaptation could occur in this task, due to the

inclusion of filler SOAs (A10/V10 and A95/V95), added specifically to avoid this effect (Kambe

et al., 2015). Moreover, the lags were balanced on either side of 0 (true synchrony), rather than in

one direction at a time as in previous studies demonstrating temporal recalibration (Fujisaki et

al., 2004; Yuan et al., 2016). If temporal recalibration is the reason for the bias towards

synchrony perception observed here, it must have occurred in both the auditory-leading and

visual-leading directions simultaneously. Such a bi-directional adaptation has been demonstrated

with audiovisual stimuli (Roseboom & Arnold, 2011), but only when the auditory lag and visual

lag are each associated with clearly distinguishable sources (e.g. a male talker with a visual lag

and a female talker with an auditory lag), whereas no such relationship was established in this

task.

Rather than assuming that two opposing perceptual biases were imposed simultaneously, a more

parsimonious explanation would be that participants “anchored” their perceptual decision criteria

for asynchrony to the more obviously asynchronous trials (A95 and V95) which made up a larger

34

proportion of trials compared to similarly extreme values in the calibration task. Using such

extreme criteria to resolve uncertainty when confronted with ambiguous stimuli would then

result in a strong bias towards “synchronous” judgments, regardless of the leading stimulus.

However, this explanation would fail to clarify why some participants frequently responded

“synchronous” to these more extreme SOAs as well.

Another possibility is that fatigue, boredom, or otherwise decreased vigilance resulted in

impaired detection of asynchrony regardless of the leading stimulus. However such effects

would be expected to increase in influence over the length of the 30-40 minute task, while in fact

the effect appears to stabilize after just the first block (Figure 5).

In summary, the cause of the observed bias towards synchrony responses, and its greater

magnitude compared to other studies with similar designs (Kambe et al., 2015; Yuan et al.,

2016), remains unclear. Regardless of the cause, it is an important reminder that biases may

affect the judgment of synchrony at multiple levels between sensation and response, and more

than one may be operating at once. Because synchrony judgments are simply used as a proxy for

multisensory binding here, it is important when interpreting these results to emphasize that these

biases can produce a dissociation between the perceptual experience of binding and the

behavioural response.

5.2 Response-Based Analyses

5.2.1 Intertrial Coherence and the Role of Phase Resetting in Multisensory Integration

ITC enhancement was detected broadly across the scalp after stimulation, suggesting that phase

resetting may have occurred (Figure 6). Despite these enhancements, no significant differences

between “synchronous” and “asynchronous” responses were observed.

Several factors should be considered before drawing conclusions from this result. Most

importantly, the behavioural response bias discussed previously negatively impacted the

potential for this analysis to detect an effect for two reasons. First, it caused the exclusion of

several participants who lacked the necessary number of trials in both conditions. Second, it

required that trial counts be artificially balanced between conditions within the remaining

participants, as ITC values are sensitive to low trial count (Aydore et al., 2013; van Diepen &

35

Mazaheri, 2018). The decision to accomplish this by sampling a smaller number of trials from

the more populated condition amounted to a trade-off, where the signal-to-noise ratio of the more

populated condition was sacrificed to prevent spurious differences arising from unbalanced trial

counts. On the other hand, modelled data has illustrated that ITC differences can arise from

sources other than true phase resetting, such as ERP or power differences (van Diepen &

Mazaheri, 2018). Therefore, the ERP difference identified by PLS between “synchronous” and

“asynchronous” conditions could have been the source of any ITC differences, had one been

found.

The lack of evidence for ITC differences here is in contrast to the conclusions of Kambe and

colleagues (2015), who associated enhanced post-stimulus beta ITC with perception of

synchrony. In addition to minor variations in calibration, stimuli, and timing, methodological

differences in data analysis should be considered when comparing these divergent results.

Foremost, Kambe and colleagues employed a region of interest approach, opting to average

groups of six electrodes together to produce an “auditory” and a “visual” region of interest

(ROI), thereby dramatically reducing the dimensionality of their data. In contrast, the full dataset

was leveraged in the current study in order to identify distributed patterns of activity with

multivariate analysis, and allow their neural sources to be estimated with inverse modelling

methods. Additionally, Kambe and colleagues do not report a direct comparison between

“synchronous” and “asynchronous” conditions, even though they conclude a difference between

the two. Rather, for each condition and ROI they compared the time-averaged ITC between 0-

100 ms with that of the -100-0 ms interval in each ROI, and viewed the resulting differences

between ROIs as sufficient evidence for differences in phase resetting. This choice of baseline is

also problematic, as the interval from -100-0 ms is likely contaminated by stimulus-related

activity, especially in the lower frequencies (M. X. Cohen, 2014). Additionally, the raw ITC

values were not reported, but rather baseline-normalized values, meaning that the size of the

effect they identify cannot be assessed (van Diepen & Mazaheri, 2018). Lastly, the authors report

that there was no ERP difference between the conditions, but do not elaborate on how this

difference was tested. As a result, it is difficult to interpret whether the observed ITC differences

may be the result of sources other than genuine phase resetting (van Diepen & Mazaheri, 2018).

In light of these considerations, and the results of the current study, there is only weak evidence

that phase resetting plays a mechanistic role in synchrony perception for temporally offset

36

stimuli. To the extent that synchrony perception is an index of perceptual binding, there is

therefore little to suggest currently that phase resetting mediates integration of this kind.

However, it should be noted that Kambe et al. (2015) and the current study only employed short

stimuli, presented intermittently, rather than more continuous or naturalistic stimuli. This could

be important, given converging evidence suggesting that phase resetting may only be critical in

one of two potential sensory “processing modes”, which are characterized by different rhythmic

activity, and are thought to be flexibly determined by stimulus properties (van Atteveldt et al.,

2014). More specifically, a “rhythmic” processing mode manifests when stimuli have a

predictable, rhythmic structure. Phase resetting has been implicated as a key mechanism in this

mode, as it allows incoming stimuli to reset the structure of ongoing slow oscillations in order to

better align periods of excitability to the structure of the ongoing stimulus. Audiovisual speech is

an example of such a predictable stimulus, as visual information in the form of articulatory

gestures often precedes auditory information, and therefore offers information about what will be

heard next. Indeed, phase resetting in auditory cortex in response to visual speech has recently

been demonstrated invasively in humans (Mégevand et al., 2019). In contrast, the other,

“continuous”, mode occurs when stimuli are unpredictable, as in the simultaneity judgment task

employed here. In this mode, low frequency oscillations are thought to be suppressed in favour

of gamma range oscillations, as the temporal structure of slow oscillations cannot help predict

future input in this case. As such, the current task may induce a “continuous” mode of operation,

in which incoming stimuli do not produce crossmodal phase reset. This would suggest that

crossmodal phase resetting may still mediate integration in some situations, but perhaps not in

the context of this task.

5.2.2 Response-Based ERP Differences and Source Modelling

PLS analysis of the A50 condition identified a spatiotemporal pattern distinguishing trials that

were perceived as synchronous from those that were perceived as asynchronous. The most

notable feature of this spatiotemporal pattern was a cluster of salient electrodes over central scalp

which demonstrated higher amplitudes for “asynchronous” compared to “synchronous” trials.

This pattern was observed regardless of whether the PLS was applied to data time-locked to S1

or S2. Interpretation of the former is difficult, given that S2 arrives within the analysis interval at

a different time for each participant. The S2 time-locked results are not affected by this

limitation, however, and can therefore be interpreted more straightforwardly as activity set

37

primarily in motion by the arrival of S2. Given that S2 is in this case a visual stimulus, it is

striking that the differences are observed over central scalp, with a topography resembling that of

an auditory response. Moreover, source modelling of the difference between “asynchronous” and

“synchronous” averages at its peak strongly identified the left posterior temporal lobe,

suggesting a source in auditory cortex. One interpretation of this finding would be that the

incoming visual stimulus (S2) has the potential to elicit activity in auditory cortex, and the

degree of this activity plays a role in determining whether the two stimuli are segregated. More

specifically, this activity could represent a reconstitution of the neural representation of the

preceding auditory stimulus, thereby promoting the perception of two distinct (auditory and

visual) stimuli, rather than one (audiovisual) stimulus. Furthermore, the presence or absence of

this activity could depend on the oscillatory state (e.g. phase and amplitude of low-frequency

modulatory oscillations) at the arrival of the second stimulus. Such a mechanism is purely

speculative however, and the necessary interaction between S2 and S1-associated cortex

currently lacks support from invasive recordings.

An alternative explanation is that the inferred activity in posterior temporal cortex did not

originate from auditory regions, but instead from nearby multisensory cortex previously

implicated in synchrony perception. More specifically, the nearby superior temporal sulcus

(STS) is the cortical region most commonly implicated in the perception of intersensory

synchrony/asynchrony in the fMRI literature on this topic (Balk et al., 2010; Calvert et al., 2001;

Macaluso, Spence, George, Driver, & Dolan, 2004; Marchant, Ruff, & Driver, 2012; Noesselt et

al., 2007), as well as the superior temporal gyrus (Dhamala et al., 2007; Marchant et al., 2012;

Noesselt et al., 2007; Stevenson et al., 2010; Stevenson, VanDerKlok, Pisoni, & James, 2011).

Given the anatomical differences between participants, as well as the use of unconstrained

sources and the general spatial limitations of minimum norm imaging, activity in these

multisensory regions could have been attributed to a broader region of the posterior temporal

lobe. If this is the case, these results would suggest that enhanced activity in multisensory

regions of the left temporal lobe at a latency of roughly 150-300 ms after S2 play a role in the

segregation of audiovisual stimuli into two distinct percepts. However, this hypothesis cannot be

confirmed at the current spatial resolution.

In addition to posterior temporal cortex, source modelling identified two more multisensory

regions previously implicated in the perception of asynchrony, namely the inferior parietal lobule

38

(Bushara et al., 2001; Dhamala et al., 2007) and the inferior frontal gyrus (Dhamala et al., 2007).

Together, the results of the source modelling analysis accord well with the findings of Dhamala

and colleagues (2007), who concluded that the perception of asynchrony (with the auditory

stimulus preceding the visual), is mediated by a network of unisensory and multisensory areas

comprising the inferior frontal gyrus, inferior parietal lobule, and primary auditory and visual

cortex. The fact that the same regions were identified here, in association with the same percept

(auditory-leading asynchrony), corroborates the role of this network in the perception of

intersensory asynchrony, and highlights specifically the role that interactions between auditory

cortex, prefrontal and inferior parietal regions may play in its determination.

However, before accepting the interpretation of these source modelling results, it is necessary to

acknowledge some important caveats, in addition to the inherent spatial limitations of inverse

methods discussed previously. Most importantly, the reliability of each region’s contribution to

the spatiotemporal pattern identified by PLS cannot be assessed directly. This is because PLS

was used to identify a time window during which reliable differences were present (based on

bootstrap resampling) at certain sensors, but the difference wave from which the source model

was computed contained all activity during that window, not just the activity that contributed

reliably to the identified latent variable. A more informative approach would have been to

examine the results of PLS applied directly to the source amplitude time series, but this analysis

did not yield any significant LVs (LV1 p = 0.11, LV2 p = 0.09). However, it is plausible that the

coarseness of the Desikan-Killiany parcellation obscured the effect that was detectable at the

sensor level. In sum, source models derived from the sensor-level difference wave are a useful

means of exploring the potential sources of the observed difference at the scalp, but are by no

means conclusive.

Additionally, the difference at the sensor level could potentially be an artifact caused by the

unbalanced number of trials from which the condition averages were produced (mean

“synchronous” trials = 78, mean “asynchronous” trials = 43), but this was deemed unlikely for

the following reasons. First, unlike ITC, low trial counts do not cause a directional bias in the

ERP signal, but instead reduce the signal-to-noise ratio (M. X. Cohen, 2014). To combat this, all

participants with less than 30 trials in either of the response conditions were excluded from

relevant analyses. This number is often proposed as a pragmatic threshold, and there is evidence

that benefits to the signal-to-noise ratio diminish steeply beyond 30 trials (J. Cohen & Polich,

39

1997; M. X. Cohen, 2014; Polich, 1986; but see also Thomas et al., 2004). Finally, the current

analysis did not involve comparing peak amplitudes between univariate ERPs, a practice which

is particularly problematic when trial counts are unbalanced (Luck, 2014), but rather a sustained

difference at multiple time points and locations, which would not be systematically biased on the

whole by random fluctuations in noise.

The last question to be addressed is why a similar difference was not observed in the visual-

leading (V50) condition (LV1 p = 0.37, LV2 p = 0.35). A simple explanation is that V50 analysis

was simply under-powered, as only 9 participants had enough trials in each condition to be

included in this analysis. However, there is evidence to suggest that auditory-leading and visual-

leading binding may proceed by different mechanisms, perhaps producing responses less readily

detectable at the scalp. A fundamental difference between auditory- and visual-leading binding is

suggested by an EEG study employing topographical representational similarity analysis (RSA),

which demonstrated that auditory-leading and visual-leading stimuli produce different patterns of

activity at the scalp during a simultaneity judgment task (after subtracting evoked responses),

suggesting that they recruit different neural systems (Cecere, Gross, Willis, & Thut, 2017).

Furthermore behavioural findings that the right side of the temporal binding window is wider and

considerably more responsive to training than the left also support such a difference (Cecere,

Gross, & Thut, 2016; Powers et al., 2009). As a result, it remains unclear whether the neural

correlates identified here capture a general mechanism of intersensory processing, or one unique

to the auditory-leading case.

5.3 Intertrial Coherence and Individual Differences in TBW Width

Exploratory analysis with Behaviour PLS identified a correlation between individual mean TBW

width and a distributed pattern of ITC enhancements in response to V10 stimuli, suggesting that

individual differences in phase resetting elicited by multisensory stimuli could play a role in

individual differences in temporal binding. Only the A10 and V10 conditions could be used, as

these were the only conditions where SOA did not vary with TBW width. Therefore, this

analysis represents an exploration of what individual differences in the neural responses to

synchronous stimuli might reveal about the temporal binding window. Unfortunately no

genuinely synchronous stimulus was employed in this design, but the A10 and V10 conditions

were assumed to be sufficiently close to genuine synchrony for exploratory purposes.

40

A correlation between ITC and TBW width was identified only in the V10 condition, perhaps

relating to the prior finding that stimuli with a slight visual lead are most likely to be perceived

as genuinely synchronous (Dixon & Spitz, 1980; Kayser et al., 2008; Stone et al., 2001; van Eijk

et al., 2008; Zampini et al., 2003). The identified ITC pattern was distributed across time and

frequency bins, but a cluster of reliable, high salience values were concentrated in the low

frequencies (3.5-8 Hz), between the onset of the stimulus and roughly 150 ms afterward. A clear

cluster was identifiable during this interval over frontal-central scalp, with other possible clusters

over right temporal-occipital and left temporal scalp (see Figure 14 for topography). Notably,

Behaviour PLS did not identify any similar correlations between mean TBW width and power or

ERP responses, suggesting that genuine phase resetting may underlie the observed correlation.

This would suggest that individuals with wider TBWs undergo greater low frequency phase

resetting in response to V10 stimuli. While current understanding does not suggest a mechanistic

interpretation of this result, it is notable that the phase resetting here is observed in the delta and

theta ranges, which were previously implicated in cross-sensory phase resetting by invasive

recording (Lakatos et al., 2007; Lakatos, O’Connell, et al., 2009; Mercier et al., 2013). However,

it is not possible to tell with EEG precisely where the observed oscillations originate, and thus

whether they are a result of crossmodal phase resetting. Additionally, it should be noted, that

Behavioural PLS analysis applied to ITC in the source space did not yield any significant LVs (p

= 0.32), perhaps again owing to the coarseness of the parcellation.

Given these uncertainties, no strong conclusions can be drawn from this analysis. However, even

a potential link between the individual variability of the temporal binding window (one of its

central features) and candidate oscillatory mechanisms make this finding worthy of future

follow-up.

5.4 Limitations

The behavioural bias towards “synchrony” responses caused two problems that limit the

interpretation of these findings. First, this bias caused roughly 50% of the participants to be

excluded from the main analyses, resulting in small sample sizes (13 in the auditory-leading, and

9 in the visual-leading condition), and therefore decreasing the likelihood of detecting relevant

effects. Second, this tendency for most participants to perceive ambiguous stimuli as

“synchronous” on the majority of trials raises the possibility that the effect found here is akin to

41

an oddball effect. In other words, the observed pattern of amplitude differences could be

associated more with the novelty, or unexpectedness of the “asynchronous” percept, rather than

the formation of the percept itself.

In addition to these primary concerns, a few practical limitations bear discussion. First,

measurement by oscilloscope showed that the auditory stimulus varied in its timing by ± 4 ms

due to limitations of the presentation computer. Because stimulus timing was registered to the

EEG signal by parallel port trigger, rather than measuring the stimulus directly, the auditory

triggers were slightly jittered from the actual stimulus presentation. This jitter, while unlikely to

affect the ERP results, can cause higher frequencies to be misrepresented in time-frequency

analysis, especially with phase-based measures such as ITC (M. X. Cohen, 2014).

Furthermore, it should be noted that time-frequency decomposition by Morlet wavelets involves

a trade-off between frequency resolution and temporal resolution, which is contingent on the

choice of wavelet parameters. Sub-optimal choice of wavelet parameters may therefore have

obscured differences characterized by rapid onsets, or those limited to a narrow frequency range.

Similarly, unsuitable parameters may account for the incongruent finding that the observed

amplitude difference at the scalp was not accompanied by a related increase in power.

Lastly, a challenge to the conceptual foundation of the study, namely the use of ambiguous

stimuli, should be noted. The task was designed under the assumption that the perception of

synchrony versus asynchrony is a binary state. In other words, stimuli are either definitively

perceived as one event (perceptually bound) or two events (perceptually segregated). This

assumption may reflect our intuitive understanding of synchrony and facilitate task design, but

may also mischaracterize the experience. There is a possibility that there is a third, ambiguous

state, in which the stimuli are neither fully fused nor fully segregated. Dhamala and colleagues

(2007) explored this possibility by taking a dynamical systems approach, modelling multisensory

integration as the interaction of two weakly coupled oscillators. This model demonstrated that,

when periodically stimulated, this system of oscillators will adopt one of three regimes: an in-

phase state (corresponding to integration) an anti-phase state (corresponding to segregation), and

a “drift” state, in which no stable percept is resolved. The authors emphasize that this is a

departure from previous conceptualizations of integration and segregation, where segregation is

typically assumed to be a failure of integration, rather than a stable state of its own. Notably, they

42

found behavioural evidence for the existence of these three perceptual states, which were

predicted by SOA. The existence of such a third perceptual state would be problematic for this

study and others that employ ambiguous stimuli, as using such stimuli could in fact induce a

qualitatively distinct, ambiguous state representing a failure of both integration and segregation,

and therefore furthering understanding of neither.

5.5 Conclusions and Future Directions

It cannot be concluded from the present results that crossmodal phase resetting itself constitutes a

mechanism of perceptual binding. Nevertheless, the possibility remains that crossmodal phase

resetting, while not itself sufficient for binding, plays a facilitatory role in the binding process by

creating a common temporal structure between distributed regions. Perceptual binding might

then arise from the inter-regional synchronization that this shared temporal structure promotes

(Fries, 2015; Mercier et al., 2015; Senkowski et al., 2008). Therefore, future work investigating

perceptual binding may benefit from emphasizing oscillatory coherence between regions, rather

than primary phase resetting, in order to better understand the binding process and its

determinants. More specifically, future work could investigate correspondence between

perceptual binding and inter-regional coherence within the network of unisensory and

multisensory areas identified here and by Dhamala and colleagues (2007). Given the novel

finding that activity differs with perception between 150-300 ms after the second stimulus,

coherence during this window specifically may be relevant to formation of the percept.

Additionally, the correlation identified here between temporal binding window width and low-

frequency ITC induced by V10 stimuli provides an intriguing, albeit indirect, link between low-

frequency phase resetting and perception. By presenting the same stimuli to all participants

across a range of SOAs, future work could better establish whether there is a relationship

between early oscillatory responses and TBW width, and whether these individual differences

arise from crossmodal phase resetting or a different mechanism. With such a correlate of binding

window width identified, the next step would be to investigate whether differences in this

response result from local properties of the implicated cortical regions themselves, the connectivity

between them, or broader features of individual brain network organization.

Computational modelling with The Virtual Brain (Sanz Leon et al., 2013; Schirner, McIntosh, Jirsa,

Deco, & Ritter, 2018), a neuroinformatics platform allowing the simulation of large-scale brain

43

dynamics generated from personalized brain models, provides a potential avenue to explore both of

the questions above by probing the network properties associated with various metrics of oscillatory

coherence, and individual variation in these dynamics. Future work could investigate whether

differences in the magnitude or duration of various coherence measures vary with structural

connectivity between sensory regions, or one of several other biologically-inspired network

parameters that could be varied systematically. Furthermore, characterizing the breadth of differences

in these dynamics in healthy adults would provide a stepping stone towards understanding the

development and degeneration of this process across the lifespan, and provide clues to the etiology of

various conditions in which integration is abnormal, such as autism and schizophrenia.

44

References

Alvarado, J. C., Rowland, B. A., Stanford, T. R., & Stein, B. E. (2008). A neural network model

of multisensory integration also accounts for unisensory integration in superior colliculus.

Brain Research, 1242, 13–23. https://doi.org/10.1016/j.brainres.2008.03.074

Aydore, S., Pantazis, D., & Leahy, R. M. (2013). A note on the phase locking value and its

properties. NeuroImage, 74, 231–244. https://doi.org/10.1016/j.neuroimage.2013.02.008

Baillet, S., Mosher, J. C., & Leahy, R. M. (2001). Electromagnetic Brain Mapping. IEEE Signal

Processing Magazine, (November).

Balk, M. H., Ojanen, V., Pekkola, J., Autti, T., Sams, M., & Jääskeläinen, I. P. (2010).

Synchrony of audio-visual speech stimuli modulates left superior temporal sulcus.

NeuroReport, 21(12), 822–826. https://doi.org/10.1097/WNR.0b013e32833d138f

Barutchu, A., Crewther, S. G., Fifer, J., Shivdasani, M. N., Innes-Brown, H., Toohey, S., …

Paolini, A. G. (2011). The Relationship Between Multisensory Integration and IQ in

Children. Developmental Psychology, 47(3), 877–885. https://doi.org/10.1037/a0021903

Beauchamp, M. S. (2005). Statistical Criteria in fMRI Studies of Multisensory Integration.

Neuroinformatics, 93–113. https://doi.org/10.1385/NI

Belmonte, M. K., Allen, G., Beckel-Mitchener, A., Boulanger, L. M., Carper, R. A., & Webb, S.

J. (2004). Autism and abnormal development of brain connectivity. Journal of

Neuroscience, 24(42), 9228–9231. https://doi.org/10.1523/JNEUROSCI.3340-04.2004

Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and

Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society, Series B,

57(1), 289–300.

Bertrand, O., & Tallon-Baudry, C. (2000). Oscillatory gamma activity in humans : a possible role

for object representation. International Journal of Psychophysiology, 38(3), 211–223.

45

Brovelli, A., Ding, M., Ledberg, A., Chen, Y., Nakamura, R., & Bressler, S. L. (2004). Beta

oscillations in a large-scale sensorimotor cortical network : Directional influences revealed

by Granger causality. PNAS, 101(26), 9849–9854.

Bushara, K. O., Grafman, J., & Hallett, M. (2001). Neural correlates of auditory-visual stimulus

onset asynchrony detection. The Journal of Neuroscience : The Official Journal of the

Society for Neuroscience, 21(1), 300–304. Retrieved from

http://www.ncbi.nlm.nih.gov/pubmed/11150347

Buzsáki, G., & Chrobak, J. J. (1995). Temporal structure in spatially organized neuronal

ensembles: a role for interneuronal networks. Current Opinion in Neurobiology, 5(4), 504–

510. https://doi.org/10.1016/0959-4388(95)80012-3

Calvert, G. A., Hansen, P. C., Iversen, S. D., & Brammer, M. J. (2001). Detection of audio-visual

integration sites in humans by application of electrophysiological criteria to the BOLD

effect. NeuroImage, 14(2), 427–438. https://doi.org/10.1006/nimg.2001.0812

Calvert, G. A., Spence, C., & Stein, B. E. (2004). The Handbook of Multisensory Processes.

Cambridge, MA: The MIT Press.

Cecere, R., Gross, J., & Thut, G. (2016). Behavioural evidence for separate mechanisms of

audiovisual temporal binding as a function of leading sensory modality. European Journal

of Neuroscience, 43, 1561–1568. https://doi.org/10.1111/ejn.13242

Cecere, R., Gross, J., Willis, A., & Thut, G. (2017). Being First Matters: Topographical

Representational Similarity Analysis of ERP Signals Reveals Separate Networks for

Audiovisual Temporal Binding Depending on the Leading Sense. The Journal of

Neuroscience, 37(21), 5274–5287. https://doi.org/10.1523/JNEUROSCI.2926-16.2017

Cohen, J., & Polich, J. (1997). On the number of trials needed for P300. International Journal of

Psychophysiology, 25, 249–255.

Cohen, M. X. (2014). Analyzing Neural Time Series Data. London, England: MIT Press.

Conrey, B., & Pisoni, D. B. (2006). Auditory-visual speech perception and synchrony detection

for speech and nonspeech signals. The Journal of the Acoustical Society of America, 119(6),

46

4065–4073. https://doi.org/10.1121/1.2195091

de Boer-Schellekens, L., Eussen, M., & Vroomen, J. (2013). Diminished sensitivity of

audiovisual temporal order in autism spectrum disorder. Frontiers in Integrative

Neuroscience, 7(February), 1–8. https://doi.org/10.3389/fnint.2013.00008

Desikan, R. S., Ségonne, F., Fischl, B., Quinn, B. T., Dickerson, B. C., Blacker, D., … Killiany,

R. J. (2006). An automated labeling system for subdividing the human cerebral cortex on

MRI scans into gyral based regions of interest. NeuroImage, 31(3), 968–980.

https://doi.org/10.1016/j.neuroimage.2006.01.021

Dhamala, M., Assisi, C. G., Jirsa, V. K., Steinberg, F. L., & Scott Kelso, J. A. (2007).

Multisensory integration for timing engages different brain networks. NeuroImage, 34(2),

764–773. https://doi.org/10.1016/j.neuroimage.2006.07.044

Diederich, A., & Colonius, H. (2004). Bimodal and trimodal multisensory enhancement: Effects

of stimulus onset and intensity on reaction time. Perception and Psychophysics, 66(8),

1388–1404.

Dixon, N. F., & Spitz, L. (1980). The detection of auditory visual desynchrony. Perception, 9(6),

719–721. https://doi.org/10.1068/p090719

Driver, J., & Noesselt, T. (2008). Multisensory Interplay Reveals Crossmodal Influences on

‘Sensory-Specific’ Brain Regions, Neural Responses, and Judgments. Neuron, 11–23.

https://doi.org/10.1016/j.neuron.2007.12.013

Falchier, A., Clavagnier, S., Barone, P., & Kennedy, H. (2002). Anatomical Evidence of

Multimodal Integration in Primate Striate Cortex. The Journal of Neuroscience, 22(13),

5749–5759.

Fonov, V. S., Evans, A. C., Mckinstry, R. C., Almli, C. R., & Collins, D. L. (2009). Unbiased

nonlinear average age-appropriate brain templates from birth to adulthood. Human Brain

Mapping Journal, 47, S102. https://doi.org/10.1016/S1053-8119(09)70884-5

Foucher, J. R., Lacambre, M., Pham, B. T., Giersch, A., & Elliott, M. A. (2007). Low time

resolution in schizophrenia. Lengthened windows of simultaneity for visual, auditory and

47

bimodal stimuli. Schizophrenia Research, 97(1–3), 118–127.

https://doi.org/10.1016/j.schres.2007.08.013

Fries, P. (2015). Rhythms for Cognition: Communication through Coherence. Neuron, 88(1),

220–235. https://doi.org/10.1016/j.neuron.2015.09.034

Fujisaki, W., Shimojo, S., Kashino, M., & Nishida, S. (2004). Recalibration of audiovisual

simultaneity. Nature Neuroscience, 7(7), 773–778. https://doi.org/10.1038/nn1268

Ghazanfar, A. A., & Schroeder, C. E. (2006). Is neocortex essentially multisensory ? Trends in

Cognitive Sciences, 10(6). https://doi.org/10.1016/j.tics.2006.04.008

Giard, M. H., & Peronnet, F. (1999). Auditory-Visual Integration during Multimodal Object

Recognition in Humans: A Behavioral and Electrophysiological Study, 473–490.

Gramfort, A., Papadopoulo, T., Olivi, E., & Clerc, M. (2010). OpenMEEG: opensource software

for quasistatic bioelectromagnetics. BioMedical Engineering OnLine, 9(45), 1–20.

Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual speech recognition by

hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual

integration. The Journal of the Acoustical Society of America, 103(5), 2677–2690.

https://doi.org/10.1121/1.422788

Hairston, W. D., Burdette, J. H., Flowers, D. L., Wood, F. B., & Wallace, M. T. (2005). Altered

temporal profile of visual-auditory multisensory interactions in dyslexia. Experimental

Brain Research, 166(3–4), 474–480. https://doi.org/10.1007/s00221-005-2387-6

Hershenson, M. (1962). Reaction time as a measure of intersensory facilitation. Journal of

Experimental Psychology, 63(3), 289–293.

Hillock-Dunn, A., & Wallace, M. T. (2012). Developmental changes in the multisensory

temporal binding window persist into adolescence. Developmental Science, 15(5), 688–696.

https://doi.org/10.1111/j.1467-7687.2012.01171.x

Hillock, A. R., Powers, A. R., & Wallace, M. T. (2011). Binding of sights and sounds: Age-

related changes in multisensory temporal processing. Neuropsychologia, 49(3), 461–467.

48

https://doi.org/10.1016/j.neuropsychologia.2010.11.041

Ikumi, N., Torralba, M., Ruzzoli, M., & Soto-Faraco, S. (2018). The phase of pre-stimulus brain

oscillations correlates with cross-modal synchrony perception. European Journal of

Neuroscience, (September), 1–15. https://doi.org/10.1111/ejn.14186

Kaganovich, N., & Schumaker, J. (2016). Electrophysiological correlates of individual

differences in perception of audiovisual temporal asynchrony. Neuropsychologia, 86, 119–

130. https://doi.org/10.1016/j.neuropsychologia.2016.04.015

Kambe, J., Kakimoto, Y., & Araki, O. (2015). Phase reset affects auditory-visual simultaneity

judgment. Cognitive Neurodynamics, 9(5), 487–493. https://doi.org/10.1007/s11571-015-

9342-4

Kayser, C., & Logothetis, N. K. (2007). Do early sensory cortices integrate cross-modal

information? Brain Structure and Function, 212(2), 121–132.

https://doi.org/10.1007/s00429-007-0154-0

Kayser, C., Petkov, C. I., & Logothetis, N. K. (2008). Visual Modulation of Neurons in Auditory

Cortex. Cerebral Cortex, (July). https://doi.org/10.1093/cercor/bhm187

Keetels, M., & Vroomen, J. (2012). Perception of Synchrony between the Senses. In M. M.

Murray & M. T. Wallace (Eds.), The Neural Bases of Multisensory Processes. Boca Raton

(FL): CRC Press/Taylor & Francis. https://doi.org/10.1201/9781439812174-12

Keil, J., & Senkowski, D. (2018). Neural Oscillations Orchestrate Multisensory Processing. The

Neuroscientist, 1–18. https://doi.org/10.1177/1073858418755352

Kopell, N., Ermentrout, G. B., Whittington, M. A., & Traub, R. D. (2000). Gamma rhythms and

beta rhythms have different synchronization properties. PNAS, 97(4), 1867–1872.

Kwakye, L. D., Foss-Feig, J. H., Cascio, C. J., Stone, W. L., & Wallace, M. T. (2011). Altered

Auditory and Multisensory Temporal Processing in Autism Spectrum Disorders. Frontiers

in Integrative Neuroscience, 4(January), 1–11. https://doi.org/10.3389/fnint.2010.00129

Kybic, J., Clerc, M., Abboud, T., Faugeras, O., Keriven, R., & Papadopoulo, T. (2005). A

49

Common Formalism for the Integral Formulations of the Forward EEG Problem. IEEE

Transactions on Medical Imaging, 24(1), 12–28.

Lakatos, P., Chen, C. M., O’Connell, M. N., Mills, A., & Schroeder, C. E. (2007). Neuronal

Oscillations and Multisensory Interaction in Primary Auditory Cortex. Neuron, 53(2), 279–

292. https://doi.org/10.1016/j.neuron.2006.12.011

Lakatos, P., Connell, M. N. O., Barczak, A., Mills, A., Javitt, D. C., & Schroeder, C. E. (2009).

The Leading Sense: Supramodal Control of Neurophysiological Context by Attention.

Neuron, 64(3), 419–430. https://doi.org/10.1016/j.neuron.2009.10.014

Lakatos, P., Karmos, G., Mehta, A. D., Ulbert, I., & Schroeder, C. E. (2008). Entrainment of

neuronal oscillations as a mechanism of attentional selection. Science, 320(5872), 110–113.

https://doi.org/10.1126/science.1154735

Lakatos, P., O’Connell, M. N., Barczak, A., Mills, A., Javitt, D. C., & Schroeder, C. E. (2009).

The Leading Sense: Supramodal Control of Neurophysiological Context by Attention.

Neuron, 64(3), 419–430. https://doi.org/10.1016/j.neuron.2009.10.014

Lennie, P. (1981). The physiological basis of variations in visual latency. Vision Research, 21(6),

815–824. https://doi.org/10.1016/0042-6989(81)90180-2

Lovelace, C. T., Stein, B. E., & Wallace, M. T. (2003). An irrelevant light enhances auditory

detection in humans: a psychophysical analysis of multisensory integration in stimulus

detection. Cognitive Brain Research, 17, 447–453.

Luck, S. J. (2014). An Introduction to the Event-Related Potential Technique (2nd ed.).

Cambridge, MA: MIT Press. https://doi.org/10.1073/pnas.0703993104

Macaluso, E., Spence, C., George, N., Driver, J., & Dolan, R. (2004). Spatial and temporal

factors during processing of audiovisual speech: a PET study. NeuroImage, 21(2), 725–732.


Makeig, S., Debener, S., Onton, J., & Delorme, A. (2004). Mining event-related brain dynamics.

Trends in Cognitive Sciences, 8(5). https://doi.org/10.1016/j.tics.2004.03.008

50

Marchant, J. L., Ruff, C. C., & Driver, J. (2012). Audiovisual synchrony enhances BOLD

responses in a brain network including multisensory STS while also enhancing target-

detection performance for both modalities. Human Brain Mapping, 33(5), 1212–1224.

https://doi.org/10.1002/hbm.21278

Martin, B., Giersch, A., Huron, C., & van Wassenhove, V. (2013). Temporal event structure and

timing in schizophrenia: Preserved binding in a longer “now.” Neuropsychologia, 51(2),

358–371. https://doi.org/10.1016/j.neuropsychologia.2012.07.002

Mathewson, K. E., Gratton, G., Fabiani, M., Beck, D. M., & Ro, T. (2009). To See or Not to See:

Prestimulus Phase Predicts Visual Awareness. Journal of Neuroscience, 29(9), 2725–2732.

https://doi.org/10.1523/JNEUROSCI.3963-08.2009

McIntosh, A. R., & Lobaugh, N. J. (2004). Partial least squares analysis of neuroimaging data:

Applications and advances. NeuroImage, 23, 250–263.


Mégevand, P., Mercier, M. R., Groppe, D. M., Golumbic, E. Z., Mesgarani, N., Beauchamp, M.

S., … Mehta, A. D. (2019). Phase resetting in human auditory cortex to visual speech.

BioRxiv Preprint, 1–34. https://doi.org/10.1101/405597

Mercier, M. R., Foxe, J. J., Fiebelkorn, I. C., Butler, J. S., Schwartz, T. H., & Molholm, S.

(2013). Auditory-driven phase reset in visual cortex: Human electrocorticography reveals

mechanisms of early multisensory integration. NeuroImage, 79, 19–29.


Mercier, M. R., Molholm, S., Fiebelkorn, I. C., Butler, X. J. S., Schwartz, T. H., & Foxe, J. J.

(2015). Neuro-Oscillatory Phase Alignment Drives Speeded Multisensory Response Times:

An Electro-Corticographic Investigation. The Journal of Neuroscience, 35(22), 8546–8557.


Meredith, M., Nemitz, J., & Stein, B. (1987). Determinants of multisensory integration in

superior colliculus neurons. I. Temporal factors. Journal of Neuroscience, 7(10), 3215–

3229. https://doi.org/10.1523/JNEUROSCI.07-10-03215.1987

51

Meredith, M., & Stein, B. (1983). Interactions among Converging Sensory Inputs in the Superior

Colliculus. Science, 221(4608), 389–391.

Meredith, M., & Stein, B. (1986). Visual, Auditory, and Somatosensory Convergence on Cells in

Superior Colliculus Results in Multisensory Integration. Journal of Neurophysiology, 56(3),

640–662.

Meredith, M., & Stein, B. (1996). Spatial determinants of multisensory integration in cat superior

colliculus neurons. Journal of Neurophysiology, 75(5), 1843–1857.

https://doi.org/10.1152/jn.1996.75.5.1843

Miller, L. M., & D’Esposito, M. (2005). Perceptual Fusion and Stimulus Coincidence in the

Cross- Modal Integration of Speech. The Journal of Neuroscience, 25(25), 5884–5893.


Molholm, S., Ritter, W., Murray, M. M., Javitt, D. C., Schroeder, C. E., & Foxe, J. J. (2002).

Multisensory auditory – visual interactions during early sensory processing in humans: a

high-density electrical mapping study. Cognitive Brain Research, 14, 115–128.

Murray, M. M., Eardley, A. F., Edginton, T., Oyekan, R., Smyth, E., & Matusz, P. J. (2018).

Sensory dominance and multisensory integration as screening tools in aging. Nature, (May),

1–11. https://doi.org/10.1038/s41598-018-27288-2

Nickerson, R. S. (1973). Intersensory facilitation of reaction time: Energy summation or

preparation enhancement? Psychological Review, 80(6), 489–509.

https://doi.org/10.1037/h0035437

Noel, J., De Niear, M., Van der Burg, E., & Wallace, M. T. (2016). Audiovisual Simultaneity

Judgment and Rapid Recalibration throughout the Lifespan. PLoS ONE, 1–14.

https://doi.org/10.1371/journal.pone.0161698

Noesselt, T., Rieger, J. W., Schoenfeld, M. A., Kanowski, M., Hinrichs, H., Heinze, H.-J., &

Driver, J. (2007). Audiovisual Temporal Correspondence Modulates Human Multisensory

Superior Temporal Sulcus Plus Primary Sensory Cortices. Journal of Neuroscience, 27(42),

11431–11441. https://doi.org/10.1109/SENSOR.2007.4300668

52

Ohshiro, T., Angelaki, D. E., & Deangelis, G. C. (2011). A Normalization Model of

Multisensory Integration. Nature Neuroscience, 14(6), 775–782.

https://doi.org/10.1038/nn.2815.A

Oostenveld, R., Fries, P., Maris, E., & Schoffelen, J. (2011). FieldTrip : Open Source Software

for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data, 2011.

https://doi.org/10.1155/2011/156869

Pantazis, D., Nichols, T. E., Baillet, S., & Leahy, R. M. (2005). A comparison of random field

theory and permutation methods for the statistical analysis of MEG data. NeuroImage, 25,

383–394. https://doi.org/10.1016/j.neuroimage.2004.09.040

Peirce, J., Gray, J. R., Simpson, S., Macaskill, M., Höchenberger, R., Sogo, H., … Lindeløv, J.

K. (2019). PsychoPy2 : Experiments in behavior made easy. Behavior Research Methods,

51, 195–203.

Pfurtscheller, G. (1992). Event-related synchronization (ERS): an electrophysiological correlate

of cortical areas at rest. Electroencephalography and Clinical Neurophysiology, 83, 62–69.

Pfurtscheller, G., & Neuper, C. (1996). Post-movement beta synchronization. A correlate of an

idling motor area? Electroencephalography and Clinical Neurophysiology, 98, 281–293.

Polich, J. (1986). P300 Development from Auditory Stimuli. Psychophysiology, 23(5), 590–597.

Pöppel, E., Schill, K., & von Steinbüchel, N. (1990). Sensory integration within temporally

neutral systems states: A hypothesis. Naturwissenschaften, 77, 89–91.

Powers, A. R., Hevey, M. A., & Wallace, M. T. (2012). Neural Correlates of Multisensory

Perceptual Learning. Journal of Neuroscience, 32(18), 6263–6274.

https://doi.org/10.1523/jneurosci.6138-11.2012

Powers, A. R., Hillock, A. R., & Wallace, M. T. (2009). Perceptual Training Narrows the

Temporal Window of Multisensory Binding. Journal of Neuroscience, 29(39), 12265–

12274. https://doi.org/10.1523/JNEUROSCI.3501-09.2009

Rockland, K. S., & Ojima, H. (2003). Multisensory convergence in calcarine visual areas in

53

macaque monkey. International Journal of Psychophysiology, 50(03), 19–26.

https://doi.org/10.1016/S0167-8760(03)00121-1

Roseboom, W., & Arnold, D. H. (2011). Twice Upon a Time: Multiple Concurrent Temporal

Recalibrations of Audiovisual Speech. Psychological Science, 22(7), 872–877.

https://doi.org/10.1177/0956797611413293

Rowland, B. A., Stanford, T. R., & Stein, B. E. (2007). A model of the neural mechanisms

underlying multisensory integration in the superior colliculus. Perception, 36(10), 1431–

1443. https://doi.org/10.1068/p5842

Sanz Leon, P., Knock, S. A., Woodman, M. M., Domide, L., Mersmann, J., McIntosh, A. R., &

Jirsa, V. (2013). The Virtual Brain: a simulator of primate brain network dynamics.

Frontiers in Neuroinformatics, 7(June). https://doi.org/10.3389/fninf.2013.00010

Sayers, B., Beagley, H., & Henshall, W. (1974). The mechanism of auditory evoked EEG

responses. Nature, 247(5441), 481–483. https://doi.org/10.1038/247481a0

Schirner, M., McIntosh, A. R., Jirsa, V., Deco, G., & Ritter, P. (2018). Inferring multi-scale

neural mechanisms with brain network modelling. ELife, 7.

https://doi.org/10.7554/eLife.28927

Senkowski, D., Schneider, T. R., Foxe, J. J., & Engel, A. K. (2008). Crossmodal binding through

neural coherence: implications for multisensory processing. Trends in Neurosciences, 31(8),

401–409. https://doi.org/10.1016/j.tins.2008.05.002

Shams, L., & Seitz, A. R. (2008). Benefits of multisensory learning. Trends in Cognitive

Sciences, 30(10), 1–7. https://doi.org/10.1016/j.tics.2008.07.006

Stein, B., & Stanford, T. R. (2008). Multisensory integration: Current issues from the perspective

of the single neuron. Nature Reviews Neuroscience, 9(4), 255–266.

https://doi.org/10.1038/nrn2331

Stein, B., & Meredith, M. (1993). The Merging of the Senses. (B. Stein & M. Meredith, Eds.).

Cambridge, MA: MIT Press.

54

Stein, B., Stanford, T., & Rowland, B. (2009). The Neural Basis of Multisensory Integration in

the Midbrain: Its Organization and Maturation. Hearing Research, 258, 4-15.

https://doi.org/10.1016/j.heares.2009.03.012.The

Stein, B. (1998). Neural mechanisms for synthesizing sensory information and producing

adaptive behaviors. Experimental Brain Research, 123, 124–135. Retrieved from

papers3://publication/uuid/5ACCA31B-ED0E-44D1-9E2C-6CC9B28AFE7E

Stevenson, R. A., Altieri, N. A., Kim, S., Pisoni, D. B., & James, T. W. (2010). Neural

processing of asynchronous audiovisual speech perception. NeuroImage, 49(4), 3308–3318.


Stevenson, R. A., Baum, S. H., Krueger, J., Newhouse, P. A., & Wallace, M. T. (2018). Links

between temporal acuity and multisensory integration across the life span. Journal of

Experimental Psychology: Human Perception and Performance, 44(1), 106–116.

https://doi.org/10.1037/xhp0000424

Stevenson, R. A., VanDerKlok, R. M., Pisoni, D. B., & James, T. W. (2011). Discrete neural

substrates underlie complementary audiovisual speech integration processes. NeuroImage,

55(3), 1339–1345. https://doi.org/10.1016/j.neuroimage.2010.12.063

Stevenson, R. A., & Wallace, M. T. (2013). Multisensory temporal integration: Task and

stimulus dependencies. Experimental Brain Research, 227(2), 249–261.

https://doi.org/10.1007/s00221-013-3507-3

Stevenson, R. A., Zemtsov, R. K., & Wallace, M. T. (2012). Individual differences in the

multisensory temporal binding window predict susceptibility to audiovisual illusions.

Journal of Experimental Psychology: Human Perception and Performance, 38(6), 1517–

1529. https://doi.org/10.1037/a0027339

Stone, J. V., Hunkin, N. M., Porrill, J., Wood, R., Keeler, V., Beanland, M., … Porter, N. R.

(2001). When is now? Perception of simultaneity. Proceedings of the Royal Society B:

Biological Sciences, 268(1462), 31–38. https://doi.org/10.1098/rspb.2000.1326

Sumby, W. H., & Pollack, I. (1954). Visual Contribution to Speech Intelligibility in Noise. The

55

Journal of the Acoustical Society of America, 26(212). https://doi.org/10.1121/1.1907309

Tallon-Baudry, C., Bertrand, O., Delpuech, C., & Pernier, J. (1996). Stimulus Specificity of

Phase-Locked and Non-Phase-Locked 40 Hz Visual Responses in Human. The Journal of

Neuroscience, 16(13), 4240–4249.

Tallon-Baudry, C., Bertrand, O., & Fischer, C. (2001). Oscillatory Synchrony between Human

Extrastriate Areas during Visual Short-Term Memory Maintenance. The Journal of

Neuroscience, 21, 1–5.

Thomas, D. G., Grice, J. W., & Najm-Briscoe, R. G. (2004). The Influence of Unequal Numbers

of Trials on Comparisons of Average Event- Related Potentials. Psychology, 26(3), 753–

774.

van Atteveldt, N., Formisano, E., & Blomert, L. (2007). The Effect of Temporal Asynchrony on

the Multisensory Integration of Letters and Speech Sounds. Cerebral Cortex, 17(April),

962–974. https://doi.org/10.1093/cercor/bhl007

van Atteveldt, N., Murray, M. M., Thut, G., & Schroeder, C. E. (2014). Multisensory integration:

Flexible use of general operations. Neuron, 81(6), 1240–1253.

https://doi.org/10.1016/j.neuron.2014.02.044

van Diepen, R. M., & Mazaheri, A. (2018). The Caveats of observing Inter-Trial Phase-

Coherence in Cognitive Neuroscience. Scientific Reports, 8(1), 1–9.

https://doi.org/10.1038/s41598-018-20423-z

van Dijk, H., Schoffelen, J.-M., Oostenveld, R., & Jensen, O. (2008). Prestimulus Oscillatory

Activity in the Alpha Band Predicts Visual Discrimination Ability. Journal of

Neuroscience, 28(8), 1816–1823. https://doi.org/10.1523/jneurosci.1853-07.2008

van Eijk, R. L. J., Kohlrausch, A., Juola, J. F., & van de Par, S. (2008). Audiovisual synchrony

and temporal order judgments: Effects of experimental method and stimulus type.

Perception and Psychophysics, 70(6), 955–968. https://doi.org/10.3758/PP.70.6.955

van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in

auditory-visual speech perception. Neuropsychologia, 45(3), 598–607.

56

https://doi.org/10.1016/j.neuropsychologia.2006.01.001

Voloh, B., & Womelsdorf, T. (2016). A Role of Phase-Resetting in Coordinating Large Scale

Neural Networks During Attention and Goal-Directed Behavior. Frontiers in Systems

Neuroscience, 10(March), 1–19. https://doi.org/10.3389/fnsys.2016.00018

Vroomen, J., & Keetels, M. (2010). Perception of intersensory synchrony: A tutorial review.

Attention, Perception & Psychophysics, 72(4), 871–884.

Wallace, M. T., & Stevenson, R. A. (2014). The construct of the multisensory temporal binding

window and its dysregulation in developmental disabilities. Neuropsychologia, 64, 105–

123. https://doi.org/10.1016/j.neuropsychologia.2014.08.005

Wu, P., Chen, Y., Yeh, S., Wu, M., & Tang, P. (2015). Multisensory integration ability correlates

with spatial working memory and functional mobility in cognitively normal middle-aged

and older adults. Physiotherapy, 101, e1663–e1664.

https://doi.org/10.1016/j.physio.2015.03.061

Yuan, X., Bi, C., Yin, H., Li, B., & Huang, X. (2014). The recalibration patterns of perceptual

synchrony and multisensory integration after exposure to asynchronous speech.

Neuroscience Letters, 569, 148–152. https://doi.org/10.1016/j.neulet.2014.03.057

Yuan, X., Li, H., Liu, P., Yuan, H., & Huang, X. (2016). Pre-stimulus beta and gamma

oscillatory power predicts perceived audiovisual simultaneity. International Journal of

Psychophysiology, 107, 29–36. https://doi.org/10.1016/j.ijpsycho.2016.06.017

Zampini, M., Shore, D. I., & Spence, C. (2003). Audiovisual temporal order judgments.

Experimental Brain Research, 152(2), 198–210. https://doi.org/10.1007/s00221-003-1536-z

investigating the role of crossmodal phase resetting in ......ii investigating the role of...

Documents