investigating the role of crossmodal phase resetting in ......ii investigating the role of...
TRANSCRIPT
Investigating the Role of Crossmodal Phase Resetting in Audiovisual Binding
by
Phillip R. Johnston
A thesis submitted in conformity with the requirements for the degree of Master of Arts
Department of Psychology University of Toronto
© Copyright by Phillip R. Johnston 2019
ii
Investigating the Role of Crossmodal Phase Resetting in
Audiovisual Binding
Phillip R. Johnston
Master of Arts
Department of Psychology
University of Toronto
2019
Abstract
The mechanisms allowing the brain to fuse paired auditory and visual stimuli into a unified
percept, despite differences in their timing, remain largely unknown. Crossmodal phase resetting
of ongoing oscillations by the leading stimulus may facilitate integration of the following
stimulus by establishing a shared temporal structure within a network of primary and
multisensory areas. EEG was recorded during a simultaneity judgment task to assess whether
phase resetting (measured as intertrial coherence) differentiates ambiguous stimuli that are
perceived as fused and those perceived as segregated. No differences in intertrial coherence were
observed between fused and segregated trials, however differences in ERP amplitude tracked
perception in auditory-leading trials. Source modelling of this difference identified a network
previously implicated in the perception of intersensory asynchrony, comprising both unisensory
and multisensory cortical regions. Additionally, a correlational relationship was identified
between individual sensitivity to asynchrony and intertrial coherence elicited by nearly
synchronous stimuli.
iii
Acknowledgments
I would like to express my sincere gratitude to my supervisor, Dr. Randy McIntosh, for his
invaluable mentorship, his generosity with his time and expertise, and above all for inspiring me
with his dedication and his vision. I am also grateful to my subsidiary advisor, Dr. Claude Alain,
whose expert guidance and keen insight have greatly aided this project from its inception. I
would also like to thank Ricky Chow and Alain Fournier for their technical assistance, and the
students of the ERP Lab at Baycrest for making me feel more than welcome during my long days
of data collection there. Finally, this work would not have been possible without the patience and
unconditional support of Kate Taylor and my family, for which I am grateful always.
iv
Table of Contents
Acknowledgments.......................................................................................................................... iii
Table of Contents ........................................................................................................................... iv
List of Tables ................................................................................................................................ vii
List of Figures .............................................................................................................................. viii
Chapter 1 ..........................................................................................................................................1
1 Introduction .................................................................................................................................1
1.1 Perceptual Binding ...............................................................................................................1
1.2 Study Overview ...................................................................................................................2
Chapter 2 ..........................................................................................................................................4
2 Background .................................................................................................................................4
2.1 Temporal Determinants of Binding .....................................................................................4
2.1.1 The Problem of Intersensory Timing .......................................................................4
2.1.2 Quantifying the Binding Window ............................................................................5
2.1.3 Characteristics of the Temporal Binding Window ..................................................5
2.2 Potential Mechanisms ..........................................................................................................7
2.2.1 Classical Models and Their Limitations ..................................................................7
2.2.2 Ongoing Oscillations and Crossmodal Phase Resetting ..........................................8
2.2.3 Crossmodal Phase Resetting and the Temporal Window of Integration ...............10
Chapter 3 ........................................................................................................................................12
3 Methods .....................................................................................................................................12
3.1 Participants .........................................................................................................................12
3.2 Stimuli and Task ................................................................................................................12
3.2.1 Overview ................................................................................................................12
v
3.2.2 Simultaneity Judgment Task ..................................................................................13
3.2.3 Calibration Task .....................................................................................................14
3.2.4 Main Task ..............................................................................................................14
3.3 EEG Data Collection and Processing ................................................................................15
3.3.1 EEG Data Collection..............................................................................................15
3.3.2 EEG Pre-Processing ...............................................................................................15
3.3.3 Calculation of Time-Frequency Measures .............................................................16
3.3.4 Source Estimation ..................................................................................................17
3.4 Statistical Comparisons ......................................................................................................18
3.4.1 Behavioural Analyses ............................................................................................18
3.4.2 Response-Based Analyses .....................................................................................18
3.4.3 Individual Differences Analysis ............................................................................19
Chapter 4 ........................................................................................................................................21
4 Results .......................................................................................................................................21
4.1 Behavioural Results ...........................................................................................................21
4.1.1 Calibration Task Results ........................................................................................21
4.1.2 Main Task Results..................................................................................................21
4.2 EEG Results .......................................................................................................................25
4.2.1 Response-Based Comparisons ...............................................................................25
4.2.2 Source-Modelling of the Response-Based Differences .........................................25
4.2.3 Individual Differences Analysis ............................................................................26
Chapter 5 ........................................................................................................................................33
5 Discussion .................................................................................................................................33
5.1 Behavioural Bias towards “Synchronous” Responses .......................................................33
5.2 Response-Based Analyses .................................................................................................34
vi
5.2.1 Intertrial Coherence and the Role of Phase Resetting in Multisensory
Integration ..............................................................................................................34
5.2.2 Response-Based ERP Differences and Source Modelling .....................................36
5.3 Intertrial Coherence and Individual Differences in TBW Width .......................................39
5.4 Limitations .........................................................................................................................40
5.5 Conclusions and Future Directions ....................................................................................42
References ......................................................................................................................................44
vii
List of Tables
Table 1: SOAs presented during the main task. ................................................................... 15
Table 2: Descriptive statistics for the main task behavioural results. ................................ 23
viii
List of Figures
Figure 1: Schematic of the audiovisual simultaneity judgment task. ................................. 13
Figure 2: Group temporal binding window. ............................................................................ 22
Figure 3: Right temporal binding window width vs left temporal binding window width.. 22
Figure 4: Percent of ambiguous trials perceived as synchronous for each trial type
during the main task. ................................................................................................................. 23
Figure 5: Percent of ambiguous trials perceived as synchronous during each block of
the main task. .............................................................................................................................. 24
Figure 6: ITC enhancement in response to ambiguous stimuli at two representative
electrodes .................................................................................................................................... 27
Figure 7: Topography of PLS salience values for “synchronous” vs “asynchronous”
responses time-locked to S1 .................................................................................................... 28
Figure 8: Grand average ERPs for the A50 condition time-locked to S1 at three
representative central electrodes. ........................................................................................... 28
Figure 9: Topography of PLS salience values for “synchronous” vs “asynchronous”
responses time-locked to S2. ................................................................................................... 29
Figure 10: Grand average ERPs for the A50 condition time-locked to S2 at three
representative central electrodes. ........................................................................................... 29
Figure 11: Current density maps depicting difference between A50 “asynchronous” and
A50 “synchronous” trials time-locked to S1. .......................................................................... 30
Figure 12: Current density maps depicting difference between A50 “asynchronous” and
A50 “synchronous” trials time-locked to S2. .......................................................................... 31
ix
Figure 13: Correlation between mean temporal binding window and Behaviour PLS
brain score for the V10 condition. ............................................................................................ 32
Figure 14: Topography of ITC values positively correlated with mean TBW in the V10
condition. ..................................................................................................................................... 32
1
Chapter 1
1 Introduction
1.1 Perceptual Binding
We perceive the world through multiple sensory systems, each of which makes a unique
contribution to the richness of our perceptual experience. Because each of these systems captures
different kinds of physical energy, together they offer complementary information about our
environments and the events that unfold around us. Crucially, the healthy brain is capable of
combining these signals into a unified perceptual experience in a process known as perceptual
binding, or perceptual fusion. In conversation, for instance, the words that we hear and the
accompanying movements that we see are experienced as a unified perceptual event, rather than
separate auditory and visual streams. Producing this bound percept flexibly across contexts
means that the brain must contend with variable delays between the modalities, caused by
differences in timing both outside and within the nervous system. At present, the neural
mechanisms that mediate perceptual binding, despite these delays, have yet to be established.
Furthermore, behavioural work has demonstrated that individuals differ widely in their “temporal
binding windows” (TBW) which index their sensitivity to intersensory delays (Conrey & Pisoni,
2006; Dixon & Spitz, 1980; Miller & D’Esposito, 2005; Powers, Hillock, & Wallace, 2009;
Stevenson, Altieri, Kim, Pisoni, & James, 2010; Stevenson, Zemtsov, & Wallace, 2012), raising
questions about how these mechanisms could vary between individuals.
In addition to the perceptual consequences, the behavioural importance of the ability to
synthesize information from across the senses is well established. Specifically, multisensory
stimuli can produce a number of behavioural enhancements, including improved detection and
reaction times compared to unisensory stimuli (Diederich & Colonius, 2004; Hershenson, 1962;
Lovelace, Stein, & Wallace, 2003; Nickerson, 1973). Furthermore, they have been shown to
facilitate higher-order processes such as learning (Shams & Seitz, 2008) and comprehension of
speech in noise (Grant, Walden, & Seitz, 1998; Sumby & Pollack, 1954). These findings, in
combination with growing evidence of ubiquitous multisensory interactions from anatomy and
physiology, motivate the developing view that the brain is fundamentally organized to integrate
2
information across putatively unisensory neural systems, with wide-reaching consequences for
perception, cognition, and action (Calvert, Spence, & Stein, 2004; Driver & Noesselt, 2008;
Ghazanfar & Schroeder, 2006).
In light of these considerations, it is clear that the brain’s ability to selectively synthesize
information from multiple sensory systems into a coherent percept is crucial to understanding
and responding adaptively to our environments. Indeed, disturbances in multisensory integration,
including reduced sensitivity to intersensory delays in perceptual binding, have been linked to
deficits in several clinical and neurodevelopmental conditions, including schizophrenia, autism,
and dyslexia (see Wallace & Stevenson, 2014 for a review), as well as mild cognitive
impairment (Murray et al., 2018) and reduced spatial working memory and mobility in older
adults (Wu, Chen, Yeh, Wu, & Tang, 2015). A better understanding of perceptual binding may
therefore shed light on basic principles of brain operation, as well as reveal differences relevant
to diagnosis and treatment in these conditions.
1.2 Study Overview
New insight into the mechanisms of binding may come from an emerging framework proposing
that multisensory integration is mediated by large-scale neural oscillations, which coordinate the
activity of distributed neural populations in order to effect integration (see Keil & Senkowski,
2018 for a review). Notably, invasive recordings in both animals and humans have demonstrated
that a stimulus in one modality (e.g. auditory) can reset the phase of ongoing oscillations in
primary cortex associated with another modality (e.g. vision; Kayser, Petkov, & Logothetis,
2008; Lakatos, Chen, O’Connell, Mills, & Schroeder, 2007; Lakatos, O’Connell, et al., 2009;
Mercier et al., 2013). While several authors have hypothesized that this oscillatory phase
resetting may promote integration within a multisensory network (Keil & Senkowski, 2018;
Lakatos, O’Connell, et al., 2009; Mercier et al., 2013; van Atteveldt, Murray, Thut, & Schroeder,
2014), only one study has so far demonstrated a link between phase resetting and perceptual
fusion (Kambe, Kakimoto, & Araki, 2015). However, these authors employed a restrictive region
of interest approach, thereby ignoring potentially relevant activity produced elsewhere in the
brain, and did not report inverse modelling of the relevant cortical sources. The latter would have
strengthened their claim that the phase resetting observed at the scalp truly originated from
primary sensory cortices, and was thus a result of genuine crossmodal phase resetting.
3
Given these concerns, this study aimed to clarify whether crossmodal phase resetting is indeed
associated with audiovisual binding through a more comprehensive characterization of the brain
activity produced during this process. To accomplish this, an audiovisual simultaneity judgment
task, similar to that used by Kambe and colleagues (2015), was employed, with the perception of
synchrony used as an index of perceptual binding. By calibrating the temporal delays between
the auditory and visual stimuli for each participant, ambiguous stimuli were produced. As such,
differences in phase resetting between those stimuli perceived as “synchronous” and those
perceived as “asynchronous” could be interpreted as differences between bound and unbound
percepts, respectively. Concomitant differences in power or the ERP, which represent a potential
confound (van Diepen & Mazaheri, 2018), were also assessed. Additionally, multivariate
analysis and source modelling were employed in order to better characterize the distributed
spatial and temporal patterns of activity differentiating “synchronous” and “asynchronous”
percepts. Lastly, the temporal binding window width of each participant was correlated with
their neural activity in response to audiovisual stimulation, with the aim of connecting individual
differences in temporal binding with corresponding differences in oscillatory phase resetting.
4
Chapter 2
2 Background
2.1 Temporal Determinants of Binding
2.1.1 The Problem of Intersensory Timing
Temporal coincidence has been established as one of the chief determinants of interaction
between the senses (Meredith, Nemitz, & Stein, 1987; Meredith & Stein, 1986), and sensory
events that occur close together in time are more likely to be perceptually bound (Conrey &
Pisoni, 2006; Dixon & Spitz, 1980; Miller & D’Esposito, 2005). However, this seemingly simple
heuristic is problematic, given that the brain does not have access to the absolute timing of
events, and must in fact contend with various lags both outside and within the nervous system.
Given that the transmission of light is several orders of magnitude faster than the transmission of
sound through air (roughly 3x108 m/s and 3.4x102 m/s, respectively), the sound of an event will
effectively always reach an observer after the light does, and this discrepancy increases with
distance. Further delays are introduced during transduction, where stimulus properties such as
intensity can change transduction time (Lennie, 1981) and after transduction, where early
processing of auditory information is generally faster than that of visual information
(approximately 10 ms for auditory information and 50 ms for visual; Keetels & Vroomen, 2012;
Pöppel, Schill, & von Steinbüchel, 1990). Together, these delays mean that visual and auditory
information originating from the same event arrive at their respective primary cortices at
different times, and the direction of the offset can vary depending on distance, early sensory
processing speed, and properties of the stimuli themselves. Despite these challenges, healthy
observers can reliably experience audiovisual binding for events at various distances, not just the
small “horizon of synchrony” where the physical and neural delays are balanced (Pöppel et al.,
1990). Taken together, these factors suggest that the brain can only determine synchrony
approximately, and must therefore tolerate a range of asynchronies when binding stimuli. While
such a window of integration has been well-documented empirically (e.g. Conrey & Pisoni,
2006; Powers, Hillock, & Wallace, 2009; Stevenson, Zemtsov, & Wallace, 2012; Van Eijk,
5
Kohlrausch, Juola, & Van De Par, 2008; van Wassenhove, Grant, & Poeppel, 2007), the neural
mechanisms that produce it remain an open question.
2.1.2 Quantifying the Binding Window
The temporal binding window (TBW) is a construct that provides a means of quantifying an
individual’s window of integration. It can be derived from a handful of psychophysical tasks,
which assume that the perception of two stimuli as synchronous (i.e. they are not distinguishable
as distinct events) constitutes multisensory integration of those stimuli. One such method is the
simultaneity judgment (SJ) task, which presents participants with two brief stimuli (e.g. one
auditory and one visual) separated by a systematically varied stimulus onset asynchrony (SOA),
and asks them to judge whether or not the stimuli occurred simultaneously (e.g. Powers, Hillock,
& Wallace, 2009; Stevenson & Wallace, 2013; Van Eijk, Kohlrausch, Juola, & Van De Par,
2008). By fitting a psychometric function to these data, the likelihood of integration can then be
modelled as a function of SOA for each individual (see section 4, Figure 2 for an example).
2.1.3 Characteristics of the Temporal Binding Window
Through behavioural measurements of the TBW, the characteristics of the binding window have
been explored, revealing several consistent features. The first is asymmetry, where the right
(visual-leading) side of the window is wider than the left (auditory-leading) side on average
(Conrey & Pisoni, 2006; Dixon & Spitz, 1980; Hillock, Powers, & Wallace, 2011; Stevenson et
al., 2012; van Atteveldt, Formisano, & Blomert, 2007; van Wassenhove et al., 2007; Vroomen &
Keetels, 2010), however the width of the left and right sides are highly correlated within an
individual (Stevenson et al., 2012). Similarly, the offset where simultaneity is most likely to be
perceived, known as the point of subjective simultaneity (PSS), is often found slightly to the
right of true synchrony (Dixon & Spitz, 1980; Kayser et al., 2008; Stone et al., 2001; van Eijk et
al., 2008; Zampini, Shore, & Spence, 2003). In other words, stimuli with a slight auditory lag are
generally more likely to be perceived as synchronous than stimuli that are objectively
synchronous, perhaps reflecting the naturalistic case where auditory stimuli always slightly lag
visual stimuli due to the differences in their respective speeds (see Van Eijk et al., 2008 for
further discussion of this issue).
6
Additionally, the TBW has been shown to vary substantially between healthy individuals in
terms of its overall width (Conrey & Pisoni, 2006; Dixon & Spitz, 1980; Miller & D’Esposito,
2005; Powers et al., 2009; Stevenson et al., 2010, 2012) – a finding that has yet to be explained
mechanistically. Furthermore, widening of the TBW has been identified in several
neurodevelopmental conditions, including schizophrenia (Foucher, Lacambre, Pham, Giersch, &
Elliott, 2007; Martin, Giersch, Huron, & van Wassenhove, 2013), autism (de Boer-Schellekens,
Eussen, & Vroomen, 2013; Kwakye, Foss-Feig, Cascio, Stone, & Wallace, 2011), and dyslexia
(Hairston, Burdette, Flowers, Wood, & Wallace, 2005), pointing to differences in multisensory
processing that may play a key role in higher level deficits associated with these conditions
(Wallace & Stevenson, 2014).
Multisensory performance is also known to change across the lifespan, potentially indexing
developmental health. Cross-sectional studies show that the TBW is wider in childhood, reaches
its narrowest sometime in middle age, and then broadens again in older adulthood (Hillock-Dunn
& Wallace, 2012; Hillock et al., 2011; Noel, Niear, Burg, & Wallace, 2016; Stevenson, Baum,
Krueger, Newhouse, & Wallace, 2018). In children, those who experience greater reaction time
benefit from audiovisual cues (as opposed to auditory or visual alone) are more likely to perform
well on a standard measure of intelligence (WISC-IV; Barutchu et al., 2011). In middle aged and
older adults, the width of the temporal binding window was found to be correlated with working
memory performance (Wu et al., 2015). Similarly, performance in an audiovisual detection task,
coupled with information about modality dominance, was recently shown to be an effective
screen for mild cognitive impairment (MCI) in older adults (Murray et al., 2018). Together these
findings suggest that multisensory integration performance, including the TBW, follows a
protracted course of development over the life span, and changes in the trajectory of this
development could index elements of overall brain function and health.
In summary, the canonical features of the TBW (asymmetry, correlated left and right windows,
rightward-shifted PSS), as well its individual variability, widening in neurodevelopmental
populations, and developmental time course, are all features that a complete account of the
neural mechanisms of multisensory integration will eventually have to explain. However, there
has so far been little effort to use these behavioural findings to inform electrophysiological
research aiming to describe the neural mechanisms of multisensory integration (but see
Kaganovich & Schumaker, 2016). To take the first step in this direction, individual differences in
7
TBW should be treated as variables of interest, rather than just unwanted sources of variability to
be controlled for in electrophysiology studies (as in e.g. Ikumi, Torralba, Ruzzoli, & Soto-
Faraco, 2018; Kambe, Kakimoto, & Araki, 2015; Yuan, Li, Liu, Yuan, & Huang, 2016). While
necessarily correlational, determining which neural responses to multisensory stimuli correlate
with individual differences in the window of integration could highlight which neural
phenomena may play a mechanistic role in integration. In doing so, future work may bridge the
gap between behaviour, neural mechanisms, and their alteration in development and disease.
2.2 Potential Mechanisms
2.2.1 Classical Models and Their Limitations
Much of the current mechanistic understanding of multisensory integration comes from seminal
work on superior colliculus (SC) neurophysiology in cats (Meredith et al., 1987; Meredith &
Stein, 1983, 1986, 1996). The superior colliculus, a midbrain structure involved in spatial
orienting, is a point of confluence where auditory, visual, and somatosensory inputs converge on
single neurons. Critically, these multisensory neurons display multisensory enhancement (MSE)
in response to bimodal stimuli, meaning that their response to both stimuli at once exceeds the
response to the more effective of the unisensory stimuli alone (B. E. Stein & Stanford, 2008; B.
Stein & Meredith, 1993). The presence of this effect has since become a standard criterion for
deeming a response “multisensory”, and has been widely used to assess the spatial and temporal
determinants of integration (Beauchamp, 2005; Calvert, Hansen, Iversen, & Brammer, 2001; B.
E. Stein & Stanford, 2008; B. Stein, Stanford, & Rowland, 2009).
A number of neural network models have since been developed to reproduce the specific
behaviours of multisensory SC neurons, including MSE (e.g. Alvarado, Rowland, Stanford, &
Stein, 2008; Ohshiro, Angelaki, & Deangelis, 2011; Rowland, Stanford, & Stein, 2007), however
such models are likely not sufficient to explain multisensory temporal fusion for two main
reasons. First, while multisensory SC neurons do exhibit a temporal window of integration (i.e.
multisensory stimuli closer together in time produce greater MSE in these neurons), there is no
evidence yet to suggest that the presence of MSE in multisensory circuits is responsible for
producing perceptual binding, rather than simply regulating basic behaviours such as orienting
(Stein, 1998) . In other words, the gap in understanding between the cellular and perceptual
levels of analysis is too broad to suggest that MSE and perceptual binding are one and the same,
8
and therefore an account of the cellular temporal window of integration is not necessarily
sufficient to explain the perceptual binding window.
Second, these models carry the implicit assumption that multisensory integration is a
feedforward, hierarchical process, in contrast to emerging views of the cortex as largely
multisensory (Calvert et al., 2004; Driver & Noesselt, 2008; Ghazanfar & Schroeder, 2006).
More specifically, the classical model of multisensory integration holds that each modality is
processed largely within anatomically segregated streams, and the computations involved in their
integration take place at the locations where these streams converge (e.g. SC, but also superior
temporal sulcus, inferior parietal cortex, and others). However there is now overwhelming
anatomical and electrophysiological evidence that putatively unisensory areas receive input from
other modalities, and these inputs can alter activity there, including enhanced responses to
subsequent stimuli (Driver & Noesselt, 2008; Falchier, Clavagnier, Barone, & Kennedy, 2002;
Ghazanfar & Schroeder, 2006; Kayser & Logothetis, 2007; Lakatos et al., 2007; Lakatos,
O’Connell, et al., 2009; Mercier et al., 2013; Rockland & Ojima, 2003). Even at the scalp,
crossmodal effects on ERP amplitude have been observed as early as 40 ms after stimulus onset,
suggesting that an auditory stimulus can alter the amplitude of early cortical responses to a
subsequent visual stimulus (Giard & Peronnet, 1999; Molholm et al., 2002). Furthermore, fMRI
studies investigating perceptual fusion of asynchronous audiovisual stimuli in humans identify a
network of cortical regions including primary visual and auditory cortex, superior temporal
sulcus, as well as prefrontal and parietal regions (Dhamala, Assisi, Jirsa, Steinberg, & Scott
Kelso, 2007; Powers, Hevey, & Wallace, 2012; Stevenson et al., 2010), suggesting that
formation of the fused percept may be a function of a distributed network of interacting regions,
rather than the result of a local neural computation. Together, these findings strongly suggest that
multisensory interactions in fact occur both within and upstream of convergence areas, in
putatively unisensory cortex. As such, an account of multisensory integration which only models
the region of convergence, without accounting for early crossmodal interactions within the
broader network (including primary cortex), is likely to be deficient.
2.2.2 Ongoing Oscillations and Crossmodal Phase Resetting
In contrast to the localized computation posited by SC-inspired models, an emerging framework
holds that multisensory integration is coordinated by neural oscillations at distributed spatial and
9
temporal scales (see Keil & Senkowski, 2018 for a review). This framework, which posits that
large scale modulatory oscillations coordinate integration, could suggest a mechanism of binding
which accounts for the observed pervasive crossmodal interactions as well as the temporal
sensitivity of this binding process captured by the temporal binding window.
It is well established that ongoing oscillations modulate cortical excitability, and this behaviour
may be central to the coordination of activity among distributed neural populations (Buzsáki &
Chrobak, 1995). By imposing a shared temporal structure between neuronal populations through
common periods of high and low excitability, large-scale oscillations could promote
communication between populations, thereby facilitating processes such as perceptual binding
(Buzsáki & Chrobak, 1995). Crucially, it is now known that incoming stimuli can alter this
temporal structure by resetting the phase of ongoing oscillations, and this mechanism is thought
to contribute to the observed ERP response to a single stimulus (Makeig, Debener, Onton, &
Delorme, 2004; Sayers, Beagley, & Henshall, 1974). Moreover, recent invasive
electrophysiological work in macaques and humans has provided striking evidence that this
phase resetting occurs crossmodally in primary sensory areas. For instance, intracranial
recordings in macaques demonstrate that an attended stimulus can reset the phase of ongoing
theta and gamma oscillations in both A1 and V1, regardless of whether the stimulus is auditory
or visual (Lakatos, Connell, et al., 2009). Studies recording from just one primary sensory area
found similar evidence of phase resetting in macaque A1 using visual (Kayser et al., 2008) and
somatosensory stimuli (Lakatos et al., 2007), and human V1 using auditory stimuli (Mercier et
al., 2013). Given these findings, Lakatos and colleagues (2009) have proposed that a salient
stimulus, regardless of modality, resets the phase of ongoing oscillations in primary cortical
areas (visual, auditory, and somatosensory), thereby altering how subsequent stimuli will be
processed in these areas.
This crossmodal alteration of ongoing oscillations has clear implications for multisensory
integration, as this shared temporal structure may promote synchronization between unisensory
and multisensory regions at higher frequencies (e.g. gamma), thereby promoting perceptual
binding of subsequent stimuli (Fries, 2015; Mercier et al., 2015; Senkowski, Schneider, Foxe, &
Engel, 2008; Voloh & Womelsdorf, 2016). Indeed, there is preliminary evidence to suggest that
phase resetting can promote inter-regional synchronization, as demonstrated by a correlation
10
between phase resetting in auditory cortex following an audiovisual stimulus, and delta phase
synchronization between auditory and motor cortex (Mercier et al., 2015).
2.2.3 Crossmodal Phase Resetting and the Temporal Window of Integration
If crossmodal phase resetting is a mechanism of multisensory integration, it could account for
key features of the temporal binding window. In essence, resetting the ongoing oscillations of a
distributed network could impose a temporal frame of reference which determines whether
subsequent stimuli are integrated or segregated. For instance, two stimuli arriving at different
times could be fused as long as the second arrives within the excitatory phase of the oscillatory
cycle set by the first (Lakatos, O’Connell, et al., 2009). Phase resetting induced by the first
stimulus could also promote a limited period of inter-regional synchronization among unisensory
areas and a broader multisensory network, thereby enhancing communication within this
functional network and potentially facilitating binding of the second stimulus when it arrives. As
this synchronization decays, the likelihood of integration would decrease, accounting for the
decreasing probability of perceptual binding with time. Such a correlation between phase
resetting and inter-regional synchronization has been demonstrated invasively in humans
between auditory cortex and motor cortex (Mercier et al., 2015), but not yet a distributed
multisensory network.
In healthy adults, individual differences in the effectiveness of this phase resetting mechanism
could also explain individual differences in the width of the temporal binding window. Similarly,
the widened temporal binding window observed in autism, for example, could arise from a
deficit in coordinating this resetting within the multisensory network, in line with views of
autism as a disorder characterized by abnormal connectivity and functional coordination across
brain regions (Belmonte et al., 2004). Furthermore, within individuals, trial-by-trial differences
in integration could be explained by variability in the phase to which ongoing oscillations are
reset, due to endogenous factors like fluctuations in attention (Lakatos, Karmos, Mehta, Ulbert,
& Schroeder, 2008; Lakatos, O’Connell, et al., 2009) or random variability in the power or phase
of pre-stimulus oscillations (Ikumi et al., 2018; Yuan et al., 2016).
To our knowledge, only a single study has demonstrated a link between phase resetting and the
perception of audiovisual synchrony (Kambe et al., 2015). These authors found that higher beta
11
phase resetting (measured as inter-trial coherence) in the 100 ms following the first stimulus
differentiated trials seen as fused from those seen as segregated. This result is plausible, given
that beta synchronization has been observed between neural groups in different cortical areas
(Brovelli et al., 2004; Tallon-Baudry, Bertrand, & Fischer, 2001), and computational modelling
has demonstrated that oscillators can synchronize in the beta range despite long conduction
delays (Kopell, Ermentrout, Whittington, & Traub, 2000). Furthermore, Mercier and colleagues
(2013) identified auditory-driven beta resetting invasively in human participants, but
hypothesized that this activity was related to response preparation, given the beta band’s prior
implication in sensorimotor processing (Pfurtscheller & Neuper, 1996).
Currently, it cannot be concluded that the phase resetting observed at the scalp by Kambe and
colleagues was indeed crossmodal phase resetting from the scalp topography alone. While neural
sources cannot be located conclusively with EEG, the connection between crossmodal phase
resetting and synchrony perception (and therefore fusion) would be greatly strengthened if these
results could be reproduced, and source modelling employed to verify that the observed phase
resetting originates in primary cortex associated with the second modality. Furthermore, the
application of multivariate approaches to the entire dataset, rather than univariate analysis on
limited regions of interest, would allow much finer characterization of the spatiotemporal
activity patterns associated with the two percepts.
12
Chapter 3
3 Methods
3.1 Participants
Twenty-eight healthy young adult participants were recruited from the Baycrest Centre
participant database. All participants provided written consent according to the guidelines
established by Baycrest Centre and the University of Toronto, and were provided monetary
compensation for their participation. Seven participants were excluded because of unsuccessful
model fitting to behaviour during the calibration task, previously undisclosed psychiatric
medications, or excessive artifacts in the EEG signal. The final sample included twenty-one
healthy young adults (19-33 years old, mean age 23.5, 10 female) all with normal or corrected-
to-normal vision and hearing, and no reported history of psychiatric illness. Nineteen participants
were right-handed, one left-handed and one ambidextrous.
3.2 Stimuli and Task
3.2.1 Overview
Data collection took place over two stages. The first stage was a short behavioural calibration
phase, wherein each participant’s temporal binding window (TBW) and point of maximum
ambiguity (SOA50%) were measured using a two-alternative forced-choice simultaneity judgment
task, described below. In the second stage, electroencephalography (EEG) data was collected
while participants performed the same task, this time with individually calibrated stimuli based
on the results of the first stage.
All recordings took place in a dimly-lit, sound-attenuating room at the Rotman Research Institute
at Baycrest Centre. The task was built with PsychoPy software (Peirce et al., 2019) and presented
by a Dell Precision T3600 computer. Stimulus timing was verified to be accurate within ±4 ms
using a Tektronix TDS210 2-channel oscilloscope.
13
3.2.2 Simultaneity Judgment Task
The simultaneity judgment task consisted of a short jittered fixation period (1000-1500 ms),
followed by a brief visual flash and auditory beep stimulus, and lastly a response prompt (see
Figure 1). The flash and beep stimuli were separated by a systematically varied stimulus onset
asynchrony (SOA) ranging from -300 to 300 ms, where a negative number denotes auditory-first
presentation and a positive number denotes visual-first presentation. Hereafter the leading
stimulus will be referred to as S1, and the following stimulus S2.
The auditory stimulus was a 3500 Hz pure sine tone, 10 ms in length, delivered by a GSI 61
audiometer through insert earphones. The audiometer was calibrated such that a 5 s tone at the
same frequency produced an intensity of 102 dB SPL. The visual stimulus was a white annulus
flash on a black background, presented for 10 ms and covering 3.8 degrees of visual angle at a
viewing distance of 60 cm. It was presented on a Dell Trinitron CRT monitor at a refresh rate of
100 Hz.
After a short interval (750 ms) a prompt was displayed and the participant reported whether they
perceived the two stimuli as synchronous (‘Yes’) or asynchronous (‘No’) using a left or right
keyboard button press (counterbalanced between participants) within a 2000 ms time limit. After
a brief intertrial interval (750 ms) the next trial began and the process repeated.
Figure 1: Schematic of the audiovisual simultaneity judgment task, depicting the visual-leading condition.
14
3.2.3 Calibration Task
The calibration task measured an individual participant’s probability of synchrony perception at
a range of SOAs, allowing the point of maximum perceptual ambiguity (SOA50%) to be
calculated.
Following previous work, 19 SOAs were used in total, ranging from -300 to 300 ms.
Specifically, SOAs of 0, 10, 20, 50, 80, 100, 150, 200, 250, and 300 ms were presented in both
the auditory-leading and visual-leading cases (Stevenson et al., 2012). The task was broken into
four blocks, each containing four presentations of each of the 19 SOAs in a random order,
resulting in each SOA being presented 16 times. 285 trials were presented in total, over a
duration of roughly 15 minutes. Participants were offered a self-timed break between each block.
After measurement, the probability of synchrony perception for each SOA was calculated as the
number of “synchronous” responses divided by the total number of presentations (16). Two
psychometric sigmoid functions (Hillock-Dunn & Wallace, 2012; Powers et al., 2009; Stevenson
& Wallace, 2013) were fit to these data using Python’s lmfit function to model the left and right
halves of the temporal binding window. The left and right functions describe the relationship
between SOA and the probability of synchrony perception in the auditory-leading and visual-
leading case, respectively. By solving these functions for a rate of 50% synchrony perception,
the SOA that is maximally ambiguous for that participant (equally likely to be perceived as
synchronous or asynchronous) was estimated for both auditory-leading and visual-leading cases.
In addition, the point at which stimuli were predicted to be experienced as asynchronous 95% of
the time (SOA95%) was also determined for use as a filler trial in the main task. The data and
resulting functions were plotted automatically so that the success of the model fitting could be
verified before proceeding with the main task. SOA50% values less than 100 ms were rounded to
100 ms, in order to provide at least a 100 ms interval free from S2-evoked activity for analysis
purposes.
3.2.4 Main Task
Participants again performed a simultaneity judgment task as described above, this time with just
six SOAs. These were the SOA50% and SOA95% values for both auditory- and visual-leading
conditions calculated during the calibration task, as well as 10 ms and -10 ms trials. The
15
additional values were included to create a wider variety of SOAs, and prevent adaptation to the
SOA50% condition (Fujisaki, Shimojo, Kashino, & Nishida, 2004; Kambe et al., 2015). The
SOA50% and SOA95% values were rounded to the nearest multiple of 10 ms to accommodate the
monitor’s 10 ms frame time. The auditory-leading trials will hereafter be referred to as A10,
A50, and A95, and the visual-leading as V10, V50, and V95.
The task was broken into four blocks, each with 128 trials. Each block consisted of 32 A50 trials,
32 V50 trials, and the remaining 64 trials were split evenly between the remaining four trial types
(Table 1).
Label Leading modality SOA duration Trials per block Total trials
A50 Auditory SOA50% 32 128 A10 Auditory 10 ms 16 64 A95 Auditory SOA95% 16 64 V50 Visual SOA50% 32 128 V10 Visual 10 ms 16 64 V95 Visual SOA95% 16 64
Table 1: SOAs presented during the main task. SOA50% and SOA95% values were calculated for each participant individually
based on the results of the calibration task.
3.3 EEG Data Collection and Processing
3.3.1 EEG Data Collection
EEG data was recorded during the main task at a sampling rate of 512 Hz using a BioSemi
Active Two acquisition system (BioSemi Instrumentation, Netherlands). 66 scalp electrodes
were employed, using BioSemi’s 64+2 electrode cap configuration based on the 10/20 system.
Ten additional electrodes were applied in pairs to the mastoids, pre-auricular points, upper
cheeks, outer canthi of the eyes, and inferior orbit of the eyes. These provided better coverage of
the scalp, as well an accurate record of eye movements for later artifact removal.
3.3.2 EEG Pre-Processing
All EEG pre-processing was performed in Brainstorm, an open-source application for M/EEG
data processing and visualization (Tadel, Baillet, Mosher, Pantazis, & Leahy, 2011). The raw
EEG files were digitally high-pass filtered at a cutoff of 0.5 Hz (60 dB stopband attenuation) to
16
remove DC offset and low-frequency components (Widmann, Schröger, & Maess, 2015). Given
the limited contamination by 60 Hz line noise, and the a priori interest in oscillatory phase, notch
filters and low-pass filters were not applied in order to prevent undue distortion of the data
(Widmann et al., 2015). The continuous files were visually inspected for bad channels, which
were removed from subsequent analysis, and the remaining channels were then re-referenced to
the average of all remaining channels. Bad segments with obvious contamination from
movement or other artifacts were then manually rejected from each continuous data file. Artifact
detection and removal was achieved with independent component analysis (ICA; Makeig, Bell,
Jung, & Sejnowski, 1996). The Infomax ICA algorithm was applied to the longest available
continuous segment of data without major artifacts for each participant (minimum 7.5 minutes,
or 230,400 samples) and resulting components were visually inspected. Resulting ICA
components with a topography suggesting horizontal eye movement, vertical eye
movement/blinks, and cardiac activity were subtracted from the continuous EEG. Finally, the
data was epoched according to the trial type (see Table 1), and response (in the case of the A50
and V50 trials only). Each epoch was 2000 ms in length, spanning from 1000 ms before S1 to
1000 ms after S1. Remaining bad trials were rejected manually before averaging by condition
and participant. For the purposes of ERP analysis, each participant average was baseline
corrected by subtracting the mean of the 500 ms preceding the onset of the first stimulus.
3.3.3 Calculation of Time-Frequency Measures
To investigate non-phase-locked oscillatory power, time-frequency decompositions of the non-
baseline-corrected EEG signals from 1 to 80 Hz were computed using complex Morlet wavelets
(Bertrand & Tallon-Baudry, 2000). Brainstorm’s Time-Frequency process was applied to each
epoch individually, with a wavelet central frequency of 1 Hz, and time resolution (full width at
half maximum) of 3 seconds. The resulting time-frequency decompositions were then averaged
by trial type and participant, and then normalized to the pre-stimulus baseline (-1000 ms to 0 ms)
using the event related synchronization/desynchronization method, which returns the deviation
from the mean of the baseline at each channel and frequency bin (Pfurtscheller, 1992).
To investigate phase-locked oscillatory activity, inter-trial coherence (ITC; Tallon-baudry,
Bertrand, Delpuech, & Pernier, 1996) was calculated for each condition for each participant. ITC
represents the concentration of phases across trials relative to an event, where a value of 0
17
represents completely random phase distribution and a value of 1 represents identical phases
across trials. The Morlet wavelet decomposition of each trial and calculation of ITC from 1 to 80
Hz was performed with the Fieldtrip toolbox in MATLAB (Oostenveld, Fries, Maris, &
Schoffelen, 2011). Because phase concentration measures are particularly sensitive to low trial
numbers (resulting in a positive bias; Aydore, Pantazis, & Leahy, 2013; M. X. Cohen, 2014) the
number of trials in the ‘Yes’ and ‘No’ conditions were equated within a participant using random
sampling (Ikumi et al., 2018). For instance, if a participant had only 40 trials in the
“asynchronous” condition, ITC would be computed on 40 trials selected at random from the
“synchronous” condition. This random sampling was repeated 30 times, and the resulting ITC
maps averaged together.
3.3.4 Source Estimation
Source modelling was performed in order to investigate the neural sources of the measured
activity. First, all participants were assigned the MNI/ICBM152 default anatomy in Brainstorm
(Fonov, Evans, Mckinstry, Almli, & Collins, 2009), and the forward model was calculated from
this anatomy using the OpenMEEG BEM method (Gramfort, Papadopoulo, Olivi, & Clerc, 2010;
Kybic et al., 2005). A noise covariance matrix was calculated for each participant from the pre-
processed EEG during pre-stimulus baseline periods (-1000 to 0 ms) of all available trials. Data
covariance matrices were similarly calculated from the A50 “synchronous”, A50
“asynchronous”, V50 “synchronous” and V50 “asynchronous” trials (0 to 1000 ms).
Cortical current density maps were then calculated for every trial using minimum norm imaging
with unconstrained sources (Baillet, Mosher, & Leahy, 2001). For each time point, this method
estimates the current at each point of the cortical surface, using three orthogonal dipoles at each
location. Due to the size of the resulting files each map was downsampled to the 68-region
Desikan-Killiany atlas (Desikan et al., 2006). To explore the average ERP response at the source
level, these maps were then averaged by condition and participant, and Z-score normalized to the
pre-stimulus baseline (-1000 to 0 ms). Time frequency measures were also calculated on the un-
normalized source maps using the same procedures described above (section 3.3.3).
18
3.4 Statistical Comparisons
3.4.1 Behavioural Analyses
As a manipulation check, the rate of synchrony perception in the main experiment for each SOA
type (e.g. A10, A50, and A95) were compared for both the auditory- and visual-leading
conditions with a repeated-measures ANOVA. This ensured that longer SOAs resulted in
decreased synchrony perception as expected. Overall rate of synchrony perception for the A50
and V50 trials were similarly compared between the four blocks of the main task, to ensure that
rate of synchrony perception remained stable over the course of the task, and ensure that
adaptation or fatigue effects were not present.
3.4.2 Response-Based Analyses
In order to identify the neural activity associated with synchrony perception (and therefore
integration), a number of planned comparisons were carried out between the “synchronous” and
“asynchronous” conditions (A50 “synchronous” vs A50 “asynchronous” and V50 “synchronous”
vs V50 “asynchronous”). Importantly, a substantial imbalance in the number of trials between
two conditions presents a potential confound when comparing them, as lower trial numbers can
positively bias both ERP averages (Thomas, Grice, & Najm-Briscoe, 2004) and time-frequency
measures (Aydore et al., 2013; M. X. Cohen, 2014). Therefore, in addition to the random
sampling procedure outlined above, any participants with less than 30 artifact-free trials in either
the “synchronous” or “asynchronous” condition were excluded from the relevant comparisons
involving those conditions (M. X. Cohen, 2014). Because of this criterion, 13 participants were
included in the auditory-leading comparison (mean number of trials in the less-populated
condition = 39.3, SD = 7.6) and nine in the visual-leading comparison (mean = 42.1, SD = 11.5).
This attrition rate (9-13 analyzed out of 21 participants) is comparable to those reported by other
EEG studies employing phase-based measures, especially those where trial count depended on
participant behaviour (e.g. 9-10 analyzed out of 27 in Ikumi et al., 2018; 8 out of 21 in van Dijk,
Schoffelen, Oostenveld, & Jensen, 2008; 11 out of 18 in Mathewson, Gratton, Fabiani, Beck, &
Ro, 2009).
At both the sensor and source levels, responses were compared using a two-tailed paired
permutation t-test with 1000 randomizations (Pantazis, Nichols, Baillet, & Leahy, 2005). Such
19
tests were carried out for the averaged ERP responses, as well as the averaged power and ITC
maps for the first 100 ms after S1, as well as a longer interval spanning 1000 ms after S1. False
Discovery Rate (FDR) correction (Benjamini & Hochberg, 1995) was used to control the
familywise error rate for all comparisons, controlling for the signal and time dimensions in the
case of the ERP comparisons, and signal, time, and frequency dimensions in the case of time-
frequency comparisons.
In addition to the univariate analyses described above, Partial Least Squares (PLS; McIntosh &
Lobaugh, 2004), was employed to investigate spatiotemporal patterns of activity that may index
multisensory integration. PLS is a multivariate technique for neuroimaging analysis which, in
contrast to univariate approaches, allows inferences about task or behaviour-based differences in
patterns of activity that are extended in space and time. The resulting patterns (latent variables or
LVs) can then be tested for statistical significance with permutation tests. For those LVs that
pass a significance threshold, the reliability of each sensor’s (or source’s) contribution to the
experimental effect at each time point can be assessed using bootstrap resampling. Together,
these approaches provide a picture of which elements contribute reliably to a significant,
distributed experimental effect, and at what time points.
As for the univariate tests, PLS was computed separately on the ERP, time-frequency, and ITC
averages at both the sensor and source levels to compare the “synchronous” vs “asynchronous”
conditions. For time-frequency and ITC averages this required that, for each observation, the
two-dimensional time-frequency maps be collapsed into a one-dimensional vector with: 𝑙𝑒𝑛𝑔𝑡ℎ =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 × 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑏𝑖𝑛𝑠 × 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑒𝑛𝑠𝑜𝑟𝑠/𝑠𝑜𝑢𝑟𝑐𝑒𝑠. Time-frequency
measures calculated with Morlet wavelets also have “edge effects” – transients at the beginning
and end of each frequency bin where the frequency components are systematically
misrepresented. The positions of these edge effects are determined directly by the wavelet
parameters, and could therefore be calculated and removed from the PLS data structure.
3.4.3 Individual Differences Analysis
In addition to the response-based analyses, the task design offered an opportunity to explore
individual differences in the tendency to integrate temporally offset stimuli, as indexed by the
temporal binding window measured during the calibration task. Specifically, Behaviour PLS
(McIntosh & Lobaugh, 2004) allows identification of distributed patterns of brain activity that
20
correlate with a behavioural measure, in this case the mean temporal binding window (average of
left and right TBWs). Because the SOA50% and SOA95% varied between participants, and were
determined directly by the TBW values, only the A10 and V10 conditions could be included in
this analysis. Similar to above, Behavior PLS using the mean TBW as the behavioural measure
of interest was computed for all 21 participants (0-1000 ms) on the ERP, power and ITC
averages of the A10 and V10 trials at both the sensor and source levels.
21
Chapter 4
4 Results
4.1 Behavioural Results
4.1.1 Calibration Task Results
The mean SOA50% measured during the calibration task was 162.6 ms (SD = 62.9 ms) for the
auditory-leading condition, and 240.0 ms (SD = 106.2 ms) for the visual-leading condition.
Figure 2 depicts a group temporal binding window fit to the calibration data of all participants.
These data reproduced the established finding that, on average, the right (visual-leading) side of
the temporal binding window is wider than the left (auditory-leading; t = 4.80, p < .001). The
width of the left side of the TBW (measured at SOA50%) ranged from 57.6 to 293.9 ms, and the
right side from 93 to 505 ms. As described previously (Stevenson et al., 2012), the widths of the
left and right sides were correlated (r(19)= 0.73, p < .001) within participants (Figure 3).
4.1.2 Main Task Results
For each condition (auditory-leading and visual-leading) the mean percentage of trials perceived
as synchronous was significantly different between all three levels of stimulus type (Figure 4).
This suggests that, on the whole, the manipulation of SOA was successful in producing different
probabilities of synchrony perception in each trial type, and these levels could therefore be
treated as distinct for further analyses. Figure 4 summarizes the percentage of trials perceived as
synchronous for each trial type during the main task.
However, the mean percentage of synchrony perception differed substantially from the targeted
values. For the ambiguous trial types (A50 and V50), the rate of synchrony perception differed
from the targeted 50% for both the auditory-leading (t(20) = 5.45, p < .001) and visual-leading
(t(20) = 4.02, p < .001) conditions (Figure 4). Similarly, mean synchrony perception in the A95
(t(20) = 7.41, p < .001) and V95 (t(20) = 6.15, p < .001) conditions were different than the target
of 5% synchrony perception. This means that, on average, participants were more likely to report
perceiving synchrony during the main task compared to the calibration task given the same SOA.
Such discrepancies, albeit of smaller magnitude, have been reported by previous studies
22
Figure 2: Group temporal binding window, produced by fitting two sigmoid functions (left and
right) to the calibration task data of the entire sample. By convention, the left side depicts the
auditory-leading case, and the right side the visual-leading case. Dots represent individual
participants, with darker dots indicating overlapping points.
Figure 3: Right temporal binding window width vs left temporal binding window width, measured
at point of maximum ambiguity (SOA50%).
23
Table 2: Descriptive statistics for the main task behavioural results.
Trial type Mean perceived synchronous (%)
Quartiles Interquartile Range Lower Upper
A50 70.3 64.1 84.4 20.3 A10 91.0 85.9 98.4 12.5 A95 28.8 20.3 35.5 15.2 V50 69.7 48.0 85.9 37.9 V10 91.4 85.9 98.4 12.5 V95 38.3 17.5 57.8 40.3
Figure 4: Percent of trials perceived as synchronous for each trial type during the main task. One-way repeated measures
ANOVA identified a main effect of SOA type in both the auditory-leading (F(2,40) = 240.24, p < .001) and visual leading
conditions (F(2,40) = 240.24, p < .001). All three levels of each condition (e.g. A10, A50, A95) are different from each other at
p < .001 (Tukey’s HSD).
24
employing a similar design (Kambe et al., 2015; Yuan et al., 2016), but the cause remains
unclear. Possible interpretations of this finding are discussed in section 5.1. To examine whether
this apparent bias towards “synchronous” responses was stable across the task, the effect of task
block on rate of synchrony perception was assessed. One-way repeated measures ANOVA
revealed an effect of block on synchrony perception for the auditory-leading condition, but
follow-up analysis revealed that the first block alone was responsible for this difference (Figure
5). This suggests that the overall rate of synchrony perception remained stable from the second
block onwards, indicating against a gradual change in perception over the course of the
recording.
Figure 5: Percent of ambiguous (A50 or V50) trials perceived as synchronous during each block of the main task. One-way
repeated measures ANOVA identified a main effect of block in the auditory-leading (F(3, 60) = 9.46, p < .001) but not visual-
leading (F(3,60) = 9.46, p < .001) condition. The mean marked *** was found to be different from the other three blocks at p
< .001 (Tukey’s HSD).
25
4.2 EEG Results
4.2.1 Response-Based Comparisons
In both the auditory-leading and visual-leading conditions, enhanced ITC values compared to
pre-stimulus baseline were observed in a wide frequency range across the scalp (Figure 6). This
enhancement was distributed across a range of frequencies, with peaks concentrated between 3 to
15 Hz for both auditory and visual stimuli, and an additional peak in the low-gamma (roughly
30-55 Hz) range over central scalp for auditory stimuli. However, univariate comparisons
revealed no significant differences in these ITC enhancements between the “synchronous” and
“asynchronous” responses. Similarly, no response-related differences were detected in the ERPs
or oscillatory power at the sensor and source levels with univariate comparisons.
In contrast, multivariate analysis with PLS applied to the scalp ERPs identified one latent
variable which captured a difference between “synchronous” and “asynchronous” responses in
the A50 condition (Figure 7). Specifically, “asynchronous” responses were associated with
higher amplitude in a cluster of central electrodes from roughly 350 to 450 ms post-stimulus
(Figure 8). But given that the identified difference occurs after the arrival of the second stimulus
for some participants, it cannot be determined whether this difference is specifically related to
the first or second stimulus. To clarify this issue, PLS analysis was also applied to ERPs time-
locked to the second stimulus. Again, a latent variable was identified which differentiated
“synchronous” and “asynchronous” responses (p = 0.025; Figure 9), with central electrodes again
displaying higher amplitudes for the “asynchronous” compared to the “synchronous” trials, this
time between latencies of 150 to 300 ms after the second stimulus (Figure 10). No significant
latent variables were detected with PLS applied to ITC or power, at either sensor or source
levels.
4.2.2 Source-Modelling of the Response-Based Differences
In order to localize the cortical sources of these response-based ERP differences, the A50 grand
averages were subtracted (average of “asynchronous” response trials - average of “synchronous”
response trials) and the resulting difference wave projected onto the cortical surface using
minimum norm imaging with unconstrained sources (Baillet et al., 2001). The resulting current
density maps depict the cortical regions that are estimated to be more active during the windows
26
of interest identified by PLS time-locked to S1 (Figure 11) and S2 (Figure 12). Given the limited
spatial accuracy of inverse modeling methods, as well as the bias in current density maps
towards superficial sources, the neural generators responsible for the observed ERP differences
can only be approximately localized. However, a large peak in the posterior left temporal lobe
during the maximum of the ERP response to the first stimulus suggests that auditory cortex may
be the main generator of the observed difference there. Additional contributions may also be
attributable to bilateral anterior temporal lobes, bilateral occipital cortex, as well as left
prefrontal cortex. The validity of the activity identified in the left precentral, post-central, and
supramarginal gyri is difficult to assess, as activity in these regions may be misattributed from
the temporal surface below.
Source modelling of the difference wave associated with S2 similarly shows clear activity in the
left posterior temporal cortex (Figure 12). Again, activity identified in the left inferior parietal
lobule (supramarginal gyrus and angular gyrus) may simply be misattributed from temporal
cortex, although this region has previously been implicated in the perception of asynchrony with
audiovisual stimuli (Bushara, Grafman, & Hallett, 2001; Dhamala et al., 2007). Similarly,
activity was identified in bilateral inferior frontal gyrus, also previously associated with the
perception of audiovisual asynchrony (Bushara et al., 2001; Dhamala et al., 2007).
4.2.3 Individual Differences Analysis
In the V10 condition, Behaviour PLS using data from all 21 participants identified one LV (p =
0.049) reflecting a correlation between individual mean TBW width and a distributed pattern of
ITC at the sensor level (r = 0.97; Figure 13). In the first 100 ms after stimulus onset, a cluster of
central electrodes, as well as right posterior and left temporal electrodes, were highly salient in
frequency bins between 3.5-8 Hz (Figure 14), suggesting that early, low-frequency ITC elicited
by V10 stimuli may be predictive of the temporal window of integration at the individual level.
Such a correlation was not identified with non-baseline-corrected power or the ERP, suggesting
that this effect is not driven by differences in power or amplitude (M. X. Cohen, 2014; van
Diepen & Mazaheri, 2018).
27
Figure 6: ITC values during a) A50 and b) V50 trials at two representative electrodes (FCz and POz). ITC
values displayed are significantly different from pre-stimulus baseline (-500 to -100 ms; Student’s t test)
with a significance threshold of p < 0.01.
28
Figure 7
Figure 8:
Figure 7: Topography of PLS salience values for the LV (p = 0.011) identified on the contrast between A50 "synchronous" and
A50 "asynchronous" averages time-locked to S1, revealing a cluster of central electrodes with high salience (see fig. 8 for
detailed view of this cluster). Circles represent time points with bootstrap ratios more extreme than ± 2.58.
Figure 8: Grand average ERPs for the A50 condition
time-locked to S1 at three representative central
electrodes, split by response (“synchronous” vs
“asynchronous”). PLS analysis identified one latent
variable (LV; p = 0.011) capturing a difference
between “synchronous” and “asynchronous”
responses. Blue circles mark time points where the
given electrode makes a reliable contribution to this
LV according to bootstrap resampling (bootstrap
threshold = ±2.58).
29
Figure 9
Figure 9: Topography of PLS salience values for the LV (p = 0.025) identified on the contrast between A50 "synchronous" and
A50 "asynchronous" averages time-locked to S2, revealing a cluster of central electrodes with high salience (see fig. 10 for
detailed view of this cluster). Circles represent time points with bootstrap ratios more extreme than ± 2.58.
Figure 10: Grand average ERPs for the A50
condition time-locked to S2 at three
representative central electrodes, split by
response (“synchronous” vs “asynchronous”).
PLS analysis identified one latent variable (LV; p
= 0.025) capturing a difference between
“synchronous” and “asynchronous” responses.
Blue circles mark time points where the given
electrode makes a reliable contribution to this LV
according to bootstrap resampling (bootstrap
threshold = ±2.58).
30
Figure 10: Current density maps depicting the difference between trials perceived as asynchronous and trials perceived as
synchronous during the A50 condition (time-locked to S1, 350 to 450 ms). Values less than 40 pA.m were excluded from
the visualization, as well as clusters comprising 20 vertices or less.
31
Figure 11: Current density maps depicting the difference between trials perceived as asynchronous and trials perceived as
synchronous during the A50 condition (time-locked to S2, 150 to 300 ms). Values less than 40 pA.m were excluded from
the visualization, as well as clusters comprising 20 vertices or less.
32
Figure 12: Correlation between mean temporal binding window and Behaviour PLS brain score for the V10 condition.
Figure 13: Topography of ITC values that are positively correlated with mean TBW at a representative time and frequency
(50 ms post-stimulus, 7 Hz). Left: Electrode salience. Right: electrodes that exceed the ±2.58 bootstrap ratio threshold are
marked in white.
33
Chapter 5
5 Discussion
5.1 Behavioural Bias towards “Synchronous” Responses
In the main task, the majority of participants demonstrated a bias towards responding
“synchronous” when presented with an ambiguous stimulus, in both the auditory-leading and
visual-leading conditions (Figure 4). Additionally, some participants responded “synchronous”
to relatively extreme SOA values (A95 and V95) much more often than their individual
psychometric functions predicted. In short, participants demonstrated more tolerance of
asynchrony in the main task compared to the calibration task when responding “synchronous”.
This result is broadly consistent with previous findings that simultaneity judgments are flexible,
in that they adapt to the temporal statistics of preceding stimuli in a process known as temporal
recalibration. More specifically, prior repeated exposure to audiovisual stimuli with a constant
audiovisual lag will bias subsequent simultaneity judgments in the direction of the lag (Fujisaki
et al., 2004; Roseboom & Arnold, 2011; Yuan, Bi, Yin, Li, & Huang, 2014). However, there was
no long period of constant lag during which adaptation could occur in this task, due to the
inclusion of filler SOAs (A10/V10 and A95/V95), added specifically to avoid this effect (Kambe
et al., 2015). Moreover, the lags were balanced on either side of 0 (true synchrony), rather than in
one direction at a time as in previous studies demonstrating temporal recalibration (Fujisaki et
al., 2004; Yuan et al., 2016). If temporal recalibration is the reason for the bias towards
synchrony perception observed here, it must have occurred in both the auditory-leading and
visual-leading directions simultaneously. Such a bi-directional adaptation has been demonstrated
with audiovisual stimuli (Roseboom & Arnold, 2011), but only when the auditory lag and visual
lag are each associated with clearly distinguishable sources (e.g. a male talker with a visual lag
and a female talker with an auditory lag), whereas no such relationship was established in this
task.
Rather than assuming that two opposing perceptual biases were imposed simultaneously, a more
parsimonious explanation would be that participants “anchored” their perceptual decision criteria
for asynchrony to the more obviously asynchronous trials (A95 and V95) which made up a larger
34
proportion of trials compared to similarly extreme values in the calibration task. Using such
extreme criteria to resolve uncertainty when confronted with ambiguous stimuli would then
result in a strong bias towards “synchronous” judgments, regardless of the leading stimulus.
However, this explanation would fail to clarify why some participants frequently responded
“synchronous” to these more extreme SOAs as well.
Another possibility is that fatigue, boredom, or otherwise decreased vigilance resulted in
impaired detection of asynchrony regardless of the leading stimulus. However such effects
would be expected to increase in influence over the length of the 30-40 minute task, while in fact
the effect appears to stabilize after just the first block (Figure 5).
In summary, the cause of the observed bias towards synchrony responses, and its greater
magnitude compared to other studies with similar designs (Kambe et al., 2015; Yuan et al.,
2016), remains unclear. Regardless of the cause, it is an important reminder that biases may
affect the judgment of synchrony at multiple levels between sensation and response, and more
than one may be operating at once. Because synchrony judgments are simply used as a proxy for
multisensory binding here, it is important when interpreting these results to emphasize that these
biases can produce a dissociation between the perceptual experience of binding and the
behavioural response.
5.2 Response-Based Analyses
5.2.1 Intertrial Coherence and the Role of Phase Resetting in Multisensory Integration
ITC enhancement was detected broadly across the scalp after stimulation, suggesting that phase
resetting may have occurred (Figure 6). Despite these enhancements, no significant differences
between “synchronous” and “asynchronous” responses were observed.
Several factors should be considered before drawing conclusions from this result. Most
importantly, the behavioural response bias discussed previously negatively impacted the
potential for this analysis to detect an effect for two reasons. First, it caused the exclusion of
several participants who lacked the necessary number of trials in both conditions. Second, it
required that trial counts be artificially balanced between conditions within the remaining
participants, as ITC values are sensitive to low trial count (Aydore et al., 2013; van Diepen &
35
Mazaheri, 2018). The decision to accomplish this by sampling a smaller number of trials from
the more populated condition amounted to a trade-off, where the signal-to-noise ratio of the more
populated condition was sacrificed to prevent spurious differences arising from unbalanced trial
counts. On the other hand, modelled data has illustrated that ITC differences can arise from
sources other than true phase resetting, such as ERP or power differences (van Diepen &
Mazaheri, 2018). Therefore, the ERP difference identified by PLS between “synchronous” and
“asynchronous” conditions could have been the source of any ITC differences, had one been
found.
The lack of evidence for ITC differences here is in contrast to the conclusions of Kambe and
colleagues (2015), who associated enhanced post-stimulus beta ITC with perception of
synchrony. In addition to minor variations in calibration, stimuli, and timing, methodological
differences in data analysis should be considered when comparing these divergent results.
Foremost, Kambe and colleagues employed a region of interest approach, opting to average
groups of six electrodes together to produce an “auditory” and a “visual” region of interest
(ROI), thereby dramatically reducing the dimensionality of their data. In contrast, the full dataset
was leveraged in the current study in order to identify distributed patterns of activity with
multivariate analysis, and allow their neural sources to be estimated with inverse modelling
methods. Additionally, Kambe and colleagues do not report a direct comparison between
“synchronous” and “asynchronous” conditions, even though they conclude a difference between
the two. Rather, for each condition and ROI they compared the time-averaged ITC between 0-
100 ms with that of the -100-0 ms interval in each ROI, and viewed the resulting differences
between ROIs as sufficient evidence for differences in phase resetting. This choice of baseline is
also problematic, as the interval from -100-0 ms is likely contaminated by stimulus-related
activity, especially in the lower frequencies (M. X. Cohen, 2014). Additionally, the raw ITC
values were not reported, but rather baseline-normalized values, meaning that the size of the
effect they identify cannot be assessed (van Diepen & Mazaheri, 2018). Lastly, the authors report
that there was no ERP difference between the conditions, but do not elaborate on how this
difference was tested. As a result, it is difficult to interpret whether the observed ITC differences
may be the result of sources other than genuine phase resetting (van Diepen & Mazaheri, 2018).
In light of these considerations, and the results of the current study, there is only weak evidence
that phase resetting plays a mechanistic role in synchrony perception for temporally offset
36
stimuli. To the extent that synchrony perception is an index of perceptual binding, there is
therefore little to suggest currently that phase resetting mediates integration of this kind.
However, it should be noted that Kambe et al. (2015) and the current study only employed short
stimuli, presented intermittently, rather than more continuous or naturalistic stimuli. This could
be important, given converging evidence suggesting that phase resetting may only be critical in
one of two potential sensory “processing modes”, which are characterized by different rhythmic
activity, and are thought to be flexibly determined by stimulus properties (van Atteveldt et al.,
2014). More specifically, a “rhythmic” processing mode manifests when stimuli have a
predictable, rhythmic structure. Phase resetting has been implicated as a key mechanism in this
mode, as it allows incoming stimuli to reset the structure of ongoing slow oscillations in order to
better align periods of excitability to the structure of the ongoing stimulus. Audiovisual speech is
an example of such a predictable stimulus, as visual information in the form of articulatory
gestures often precedes auditory information, and therefore offers information about what will be
heard next. Indeed, phase resetting in auditory cortex in response to visual speech has recently
been demonstrated invasively in humans (Mégevand et al., 2019). In contrast, the other,
“continuous”, mode occurs when stimuli are unpredictable, as in the simultaneity judgment task
employed here. In this mode, low frequency oscillations are thought to be suppressed in favour
of gamma range oscillations, as the temporal structure of slow oscillations cannot help predict
future input in this case. As such, the current task may induce a “continuous” mode of operation,
in which incoming stimuli do not produce crossmodal phase reset. This would suggest that
crossmodal phase resetting may still mediate integration in some situations, but perhaps not in
the context of this task.
5.2.2 Response-Based ERP Differences and Source Modelling
PLS analysis of the A50 condition identified a spatiotemporal pattern distinguishing trials that
were perceived as synchronous from those that were perceived as asynchronous. The most
notable feature of this spatiotemporal pattern was a cluster of salient electrodes over central scalp
which demonstrated higher amplitudes for “asynchronous” compared to “synchronous” trials.
This pattern was observed regardless of whether the PLS was applied to data time-locked to S1
or S2. Interpretation of the former is difficult, given that S2 arrives within the analysis interval at
a different time for each participant. The S2 time-locked results are not affected by this
limitation, however, and can therefore be interpreted more straightforwardly as activity set
37
primarily in motion by the arrival of S2. Given that S2 is in this case a visual stimulus, it is
striking that the differences are observed over central scalp, with a topography resembling that of
an auditory response. Moreover, source modelling of the difference between “asynchronous” and
“synchronous” averages at its peak strongly identified the left posterior temporal lobe,
suggesting a source in auditory cortex. One interpretation of this finding would be that the
incoming visual stimulus (S2) has the potential to elicit activity in auditory cortex, and the
degree of this activity plays a role in determining whether the two stimuli are segregated. More
specifically, this activity could represent a reconstitution of the neural representation of the
preceding auditory stimulus, thereby promoting the perception of two distinct (auditory and
visual) stimuli, rather than one (audiovisual) stimulus. Furthermore, the presence or absence of
this activity could depend on the oscillatory state (e.g. phase and amplitude of low-frequency
modulatory oscillations) at the arrival of the second stimulus. Such a mechanism is purely
speculative however, and the necessary interaction between S2 and S1-associated cortex
currently lacks support from invasive recordings.
An alternative explanation is that the inferred activity in posterior temporal cortex did not
originate from auditory regions, but instead from nearby multisensory cortex previously
implicated in synchrony perception. More specifically, the nearby superior temporal sulcus
(STS) is the cortical region most commonly implicated in the perception of intersensory
synchrony/asynchrony in the fMRI literature on this topic (Balk et al., 2010; Calvert et al., 2001;
Macaluso, Spence, George, Driver, & Dolan, 2004; Marchant, Ruff, & Driver, 2012; Noesselt et
al., 2007), as well as the superior temporal gyrus (Dhamala et al., 2007; Marchant et al., 2012;
Noesselt et al., 2007; Stevenson et al., 2010; Stevenson, VanDerKlok, Pisoni, & James, 2011).
Given the anatomical differences between participants, as well as the use of unconstrained
sources and the general spatial limitations of minimum norm imaging, activity in these
multisensory regions could have been attributed to a broader region of the posterior temporal
lobe. If this is the case, these results would suggest that enhanced activity in multisensory
regions of the left temporal lobe at a latency of roughly 150-300 ms after S2 play a role in the
segregation of audiovisual stimuli into two distinct percepts. However, this hypothesis cannot be
confirmed at the current spatial resolution.
In addition to posterior temporal cortex, source modelling identified two more multisensory
regions previously implicated in the perception of asynchrony, namely the inferior parietal lobule
38
(Bushara et al., 2001; Dhamala et al., 2007) and the inferior frontal gyrus (Dhamala et al., 2007).
Together, the results of the source modelling analysis accord well with the findings of Dhamala
and colleagues (2007), who concluded that the perception of asynchrony (with the auditory
stimulus preceding the visual), is mediated by a network of unisensory and multisensory areas
comprising the inferior frontal gyrus, inferior parietal lobule, and primary auditory and visual
cortex. The fact that the same regions were identified here, in association with the same percept
(auditory-leading asynchrony), corroborates the role of this network in the perception of
intersensory asynchrony, and highlights specifically the role that interactions between auditory
cortex, prefrontal and inferior parietal regions may play in its determination.
However, before accepting the interpretation of these source modelling results, it is necessary to
acknowledge some important caveats, in addition to the inherent spatial limitations of inverse
methods discussed previously. Most importantly, the reliability of each region’s contribution to
the spatiotemporal pattern identified by PLS cannot be assessed directly. This is because PLS
was used to identify a time window during which reliable differences were present (based on
bootstrap resampling) at certain sensors, but the difference wave from which the source model
was computed contained all activity during that window, not just the activity that contributed
reliably to the identified latent variable. A more informative approach would have been to
examine the results of PLS applied directly to the source amplitude time series, but this analysis
did not yield any significant LVs (LV1 p = 0.11, LV2 p = 0.09). However, it is plausible that the
coarseness of the Desikan-Killiany parcellation obscured the effect that was detectable at the
sensor level. In sum, source models derived from the sensor-level difference wave are a useful
means of exploring the potential sources of the observed difference at the scalp, but are by no
means conclusive.
Additionally, the difference at the sensor level could potentially be an artifact caused by the
unbalanced number of trials from which the condition averages were produced (mean
“synchronous” trials = 78, mean “asynchronous” trials = 43), but this was deemed unlikely for
the following reasons. First, unlike ITC, low trial counts do not cause a directional bias in the
ERP signal, but instead reduce the signal-to-noise ratio (M. X. Cohen, 2014). To combat this, all
participants with less than 30 trials in either of the response conditions were excluded from
relevant analyses. This number is often proposed as a pragmatic threshold, and there is evidence
that benefits to the signal-to-noise ratio diminish steeply beyond 30 trials (J. Cohen & Polich,
39
1997; M. X. Cohen, 2014; Polich, 1986; but see also Thomas et al., 2004). Finally, the current
analysis did not involve comparing peak amplitudes between univariate ERPs, a practice which
is particularly problematic when trial counts are unbalanced (Luck, 2014), but rather a sustained
difference at multiple time points and locations, which would not be systematically biased on the
whole by random fluctuations in noise.
The last question to be addressed is why a similar difference was not observed in the visual-
leading (V50) condition (LV1 p = 0.37, LV2 p = 0.35). A simple explanation is that V50 analysis
was simply under-powered, as only 9 participants had enough trials in each condition to be
included in this analysis. However, there is evidence to suggest that auditory-leading and visual-
leading binding may proceed by different mechanisms, perhaps producing responses less readily
detectable at the scalp. A fundamental difference between auditory- and visual-leading binding is
suggested by an EEG study employing topographical representational similarity analysis (RSA),
which demonstrated that auditory-leading and visual-leading stimuli produce different patterns of
activity at the scalp during a simultaneity judgment task (after subtracting evoked responses),
suggesting that they recruit different neural systems (Cecere, Gross, Willis, & Thut, 2017).
Furthermore behavioural findings that the right side of the temporal binding window is wider and
considerably more responsive to training than the left also support such a difference (Cecere,
Gross, & Thut, 2016; Powers et al., 2009). As a result, it remains unclear whether the neural
correlates identified here capture a general mechanism of intersensory processing, or one unique
to the auditory-leading case.
5.3 Intertrial Coherence and Individual Differences in TBW Width
Exploratory analysis with Behaviour PLS identified a correlation between individual mean TBW
width and a distributed pattern of ITC enhancements in response to V10 stimuli, suggesting that
individual differences in phase resetting elicited by multisensory stimuli could play a role in
individual differences in temporal binding. Only the A10 and V10 conditions could be used, as
these were the only conditions where SOA did not vary with TBW width. Therefore, this
analysis represents an exploration of what individual differences in the neural responses to
synchronous stimuli might reveal about the temporal binding window. Unfortunately no
genuinely synchronous stimulus was employed in this design, but the A10 and V10 conditions
were assumed to be sufficiently close to genuine synchrony for exploratory purposes.
40
A correlation between ITC and TBW width was identified only in the V10 condition, perhaps
relating to the prior finding that stimuli with a slight visual lead are most likely to be perceived
as genuinely synchronous (Dixon & Spitz, 1980; Kayser et al., 2008; Stone et al., 2001; van Eijk
et al., 2008; Zampini et al., 2003). The identified ITC pattern was distributed across time and
frequency bins, but a cluster of reliable, high salience values were concentrated in the low
frequencies (3.5-8 Hz), between the onset of the stimulus and roughly 150 ms afterward. A clear
cluster was identifiable during this interval over frontal-central scalp, with other possible clusters
over right temporal-occipital and left temporal scalp (see Figure 14 for topography). Notably,
Behaviour PLS did not identify any similar correlations between mean TBW width and power or
ERP responses, suggesting that genuine phase resetting may underlie the observed correlation.
This would suggest that individuals with wider TBWs undergo greater low frequency phase
resetting in response to V10 stimuli. While current understanding does not suggest a mechanistic
interpretation of this result, it is notable that the phase resetting here is observed in the delta and
theta ranges, which were previously implicated in cross-sensory phase resetting by invasive
recording (Lakatos et al., 2007; Lakatos, O’Connell, et al., 2009; Mercier et al., 2013). However,
it is not possible to tell with EEG precisely where the observed oscillations originate, and thus
whether they are a result of crossmodal phase resetting. Additionally, it should be noted, that
Behavioural PLS analysis applied to ITC in the source space did not yield any significant LVs (p
= 0.32), perhaps again owing to the coarseness of the parcellation.
Given these uncertainties, no strong conclusions can be drawn from this analysis. However, even
a potential link between the individual variability of the temporal binding window (one of its
central features) and candidate oscillatory mechanisms make this finding worthy of future
follow-up.
5.4 Limitations
The behavioural bias towards “synchrony” responses caused two problems that limit the
interpretation of these findings. First, this bias caused roughly 50% of the participants to be
excluded from the main analyses, resulting in small sample sizes (13 in the auditory-leading, and
9 in the visual-leading condition), and therefore decreasing the likelihood of detecting relevant
effects. Second, this tendency for most participants to perceive ambiguous stimuli as
“synchronous” on the majority of trials raises the possibility that the effect found here is akin to
41
an oddball effect. In other words, the observed pattern of amplitude differences could be
associated more with the novelty, or unexpectedness of the “asynchronous” percept, rather than
the formation of the percept itself.
In addition to these primary concerns, a few practical limitations bear discussion. First,
measurement by oscilloscope showed that the auditory stimulus varied in its timing by ± 4 ms
due to limitations of the presentation computer. Because stimulus timing was registered to the
EEG signal by parallel port trigger, rather than measuring the stimulus directly, the auditory
triggers were slightly jittered from the actual stimulus presentation. This jitter, while unlikely to
affect the ERP results, can cause higher frequencies to be misrepresented in time-frequency
analysis, especially with phase-based measures such as ITC (M. X. Cohen, 2014).
Furthermore, it should be noted that time-frequency decomposition by Morlet wavelets involves
a trade-off between frequency resolution and temporal resolution, which is contingent on the
choice of wavelet parameters. Sub-optimal choice of wavelet parameters may therefore have
obscured differences characterized by rapid onsets, or those limited to a narrow frequency range.
Similarly, unsuitable parameters may account for the incongruent finding that the observed
amplitude difference at the scalp was not accompanied by a related increase in power.
Lastly, a challenge to the conceptual foundation of the study, namely the use of ambiguous
stimuli, should be noted. The task was designed under the assumption that the perception of
synchrony versus asynchrony is a binary state. In other words, stimuli are either definitively
perceived as one event (perceptually bound) or two events (perceptually segregated). This
assumption may reflect our intuitive understanding of synchrony and facilitate task design, but
may also mischaracterize the experience. There is a possibility that there is a third, ambiguous
state, in which the stimuli are neither fully fused nor fully segregated. Dhamala and colleagues
(2007) explored this possibility by taking a dynamical systems approach, modelling multisensory
integration as the interaction of two weakly coupled oscillators. This model demonstrated that,
when periodically stimulated, this system of oscillators will adopt one of three regimes: an in-
phase state (corresponding to integration) an anti-phase state (corresponding to segregation), and
a “drift” state, in which no stable percept is resolved. The authors emphasize that this is a
departure from previous conceptualizations of integration and segregation, where segregation is
typically assumed to be a failure of integration, rather than a stable state of its own. Notably, they
42
found behavioural evidence for the existence of these three perceptual states, which were
predicted by SOA. The existence of such a third perceptual state would be problematic for this
study and others that employ ambiguous stimuli, as using such stimuli could in fact induce a
qualitatively distinct, ambiguous state representing a failure of both integration and segregation,
and therefore furthering understanding of neither.
5.5 Conclusions and Future Directions
It cannot be concluded from the present results that crossmodal phase resetting itself constitutes a
mechanism of perceptual binding. Nevertheless, the possibility remains that crossmodal phase
resetting, while not itself sufficient for binding, plays a facilitatory role in the binding process by
creating a common temporal structure between distributed regions. Perceptual binding might
then arise from the inter-regional synchronization that this shared temporal structure promotes
(Fries, 2015; Mercier et al., 2015; Senkowski et al., 2008). Therefore, future work investigating
perceptual binding may benefit from emphasizing oscillatory coherence between regions, rather
than primary phase resetting, in order to better understand the binding process and its
determinants. More specifically, future work could investigate correspondence between
perceptual binding and inter-regional coherence within the network of unisensory and
multisensory areas identified here and by Dhamala and colleagues (2007). Given the novel
finding that activity differs with perception between 150-300 ms after the second stimulus,
coherence during this window specifically may be relevant to formation of the percept.
Additionally, the correlation identified here between temporal binding window width and low-
frequency ITC induced by V10 stimuli provides an intriguing, albeit indirect, link between low-
frequency phase resetting and perception. By presenting the same stimuli to all participants
across a range of SOAs, future work could better establish whether there is a relationship
between early oscillatory responses and TBW width, and whether these individual differences
arise from crossmodal phase resetting or a different mechanism. With such a correlate of binding
window width identified, the next step would be to investigate whether differences in this
response result from local properties of the implicated cortical regions themselves, the connectivity
between them, or broader features of individual brain network organization.
Computational modelling with The Virtual Brain (Sanz Leon et al., 2013; Schirner, McIntosh, Jirsa,
Deco, & Ritter, 2018), a neuroinformatics platform allowing the simulation of large-scale brain
43
dynamics generated from personalized brain models, provides a potential avenue to explore both of
the questions above by probing the network properties associated with various metrics of oscillatory
coherence, and individual variation in these dynamics. Future work could investigate whether
differences in the magnitude or duration of various coherence measures vary with structural
connectivity between sensory regions, or one of several other biologically-inspired network
parameters that could be varied systematically. Furthermore, characterizing the breadth of differences
in these dynamics in healthy adults would provide a stepping stone towards understanding the
development and degeneration of this process across the lifespan, and provide clues to the etiology of
various conditions in which integration is abnormal, such as autism and schizophrenia.
44
References
Alvarado, J. C., Rowland, B. A., Stanford, T. R., & Stein, B. E. (2008). A neural network model
of multisensory integration also accounts for unisensory integration in superior colliculus.
Brain Research, 1242, 13–23. https://doi.org/10.1016/j.brainres.2008.03.074
Aydore, S., Pantazis, D., & Leahy, R. M. (2013). A note on the phase locking value and its
properties. NeuroImage, 74, 231–244. https://doi.org/10.1016/j.neuroimage.2013.02.008
Baillet, S., Mosher, J. C., & Leahy, R. M. (2001). Electromagnetic Brain Mapping. IEEE Signal
Processing Magazine, (November).
Balk, M. H., Ojanen, V., Pekkola, J., Autti, T., Sams, M., & Jääskeläinen, I. P. (2010).
Synchrony of audio-visual speech stimuli modulates left superior temporal sulcus.
NeuroReport, 21(12), 822–826. https://doi.org/10.1097/WNR.0b013e32833d138f
Barutchu, A., Crewther, S. G., Fifer, J., Shivdasani, M. N., Innes-Brown, H., Toohey, S., …
Paolini, A. G. (2011). The Relationship Between Multisensory Integration and IQ in
Children. Developmental Psychology, 47(3), 877–885. https://doi.org/10.1037/a0021903
Beauchamp, M. S. (2005). Statistical Criteria in fMRI Studies of Multisensory Integration.
Neuroinformatics, 93–113. https://doi.org/10.1385/NI
Belmonte, M. K., Allen, G., Beckel-Mitchener, A., Boulanger, L. M., Carper, R. A., & Webb, S.
J. (2004). Autism and abnormal development of brain connectivity. Journal of
Neuroscience, 24(42), 9228–9231. https://doi.org/10.1523/JNEUROSCI.3340-04.2004
Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and
Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society, Series B,
57(1), 289–300.
Bertrand, O., & Tallon-Baudry, C. (2000). Oscillatory gamma activity in humans : a possible role
for object representation. International Journal of Psychophysiology, 38(3), 211–223.
45
Brovelli, A., Ding, M., Ledberg, A., Chen, Y., Nakamura, R., & Bressler, S. L. (2004). Beta
oscillations in a large-scale sensorimotor cortical network : Directional influences revealed
by Granger causality. PNAS, 101(26), 9849–9854.
Bushara, K. O., Grafman, J., & Hallett, M. (2001). Neural correlates of auditory-visual stimulus
onset asynchrony detection. The Journal of Neuroscience : The Official Journal of the
Society for Neuroscience, 21(1), 300–304. Retrieved from
http://www.ncbi.nlm.nih.gov/pubmed/11150347
Buzsáki, G., & Chrobak, J. J. (1995). Temporal structure in spatially organized neuronal
ensembles: a role for interneuronal networks. Current Opinion in Neurobiology, 5(4), 504–
510. https://doi.org/10.1016/0959-4388(95)80012-3
Calvert, G. A., Hansen, P. C., Iversen, S. D., & Brammer, M. J. (2001). Detection of audio-visual
integration sites in humans by application of electrophysiological criteria to the BOLD
effect. NeuroImage, 14(2), 427–438. https://doi.org/10.1006/nimg.2001.0812
Calvert, G. A., Spence, C., & Stein, B. E. (2004). The Handbook of Multisensory Processes.
Cambridge, MA: The MIT Press.
Cecere, R., Gross, J., & Thut, G. (2016). Behavioural evidence for separate mechanisms of
audiovisual temporal binding as a function of leading sensory modality. European Journal
of Neuroscience, 43, 1561–1568. https://doi.org/10.1111/ejn.13242
Cecere, R., Gross, J., Willis, A., & Thut, G. (2017). Being First Matters: Topographical
Representational Similarity Analysis of ERP Signals Reveals Separate Networks for
Audiovisual Temporal Binding Depending on the Leading Sense. The Journal of
Neuroscience, 37(21), 5274–5287. https://doi.org/10.1523/JNEUROSCI.2926-16.2017
Cohen, J., & Polich, J. (1997). On the number of trials needed for P300. International Journal of
Psychophysiology, 25, 249–255.
Cohen, M. X. (2014). Analyzing Neural Time Series Data. London, England: MIT Press.
Conrey, B., & Pisoni, D. B. (2006). Auditory-visual speech perception and synchrony detection
for speech and nonspeech signals. The Journal of the Acoustical Society of America, 119(6),
46
4065–4073. https://doi.org/10.1121/1.2195091
de Boer-Schellekens, L., Eussen, M., & Vroomen, J. (2013). Diminished sensitivity of
audiovisual temporal order in autism spectrum disorder. Frontiers in Integrative
Neuroscience, 7(February), 1–8. https://doi.org/10.3389/fnint.2013.00008
Desikan, R. S., Ségonne, F., Fischl, B., Quinn, B. T., Dickerson, B. C., Blacker, D., … Killiany,
R. J. (2006). An automated labeling system for subdividing the human cerebral cortex on
MRI scans into gyral based regions of interest. NeuroImage, 31(3), 968–980.
https://doi.org/10.1016/j.neuroimage.2006.01.021
Dhamala, M., Assisi, C. G., Jirsa, V. K., Steinberg, F. L., & Scott Kelso, J. A. (2007).
Multisensory integration for timing engages different brain networks. NeuroImage, 34(2),
764–773. https://doi.org/10.1016/j.neuroimage.2006.07.044
Diederich, A., & Colonius, H. (2004). Bimodal and trimodal multisensory enhancement: Effects
of stimulus onset and intensity on reaction time. Perception and Psychophysics, 66(8),
1388–1404.
Dixon, N. F., & Spitz, L. (1980). The detection of auditory visual desynchrony. Perception, 9(6),
719–721. https://doi.org/10.1068/p090719
Driver, J., & Noesselt, T. (2008). Multisensory Interplay Reveals Crossmodal Influences on
‘Sensory-Specific’ Brain Regions, Neural Responses, and Judgments. Neuron, 11–23.
https://doi.org/10.1016/j.neuron.2007.12.013
Falchier, A., Clavagnier, S., Barone, P., & Kennedy, H. (2002). Anatomical Evidence of
Multimodal Integration in Primate Striate Cortex. The Journal of Neuroscience, 22(13),
5749–5759.
Fonov, V. S., Evans, A. C., Mckinstry, R. C., Almli, C. R., & Collins, D. L. (2009). Unbiased
nonlinear average age-appropriate brain templates from birth to adulthood. Human Brain
Mapping Journal, 47, S102. https://doi.org/10.1016/S1053-8119(09)70884-5
Foucher, J. R., Lacambre, M., Pham, B. T., Giersch, A., & Elliott, M. A. (2007). Low time
resolution in schizophrenia. Lengthened windows of simultaneity for visual, auditory and
47
bimodal stimuli. Schizophrenia Research, 97(1–3), 118–127.
https://doi.org/10.1016/j.schres.2007.08.013
Fries, P. (2015). Rhythms for Cognition: Communication through Coherence. Neuron, 88(1),
220–235. https://doi.org/10.1016/j.neuron.2015.09.034
Fujisaki, W., Shimojo, S., Kashino, M., & Nishida, S. (2004). Recalibration of audiovisual
simultaneity. Nature Neuroscience, 7(7), 773–778. https://doi.org/10.1038/nn1268
Ghazanfar, A. A., & Schroeder, C. E. (2006). Is neocortex essentially multisensory ? Trends in
Cognitive Sciences, 10(6). https://doi.org/10.1016/j.tics.2006.04.008
Giard, M. H., & Peronnet, F. (1999). Auditory-Visual Integration during Multimodal Object
Recognition in Humans: A Behavioral and Electrophysiological Study, 473–490.
Gramfort, A., Papadopoulo, T., Olivi, E., & Clerc, M. (2010). OpenMEEG: opensource software
for quasistatic bioelectromagnetics. BioMedical Engineering OnLine, 9(45), 1–20.
Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual speech recognition by
hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual
integration. The Journal of the Acoustical Society of America, 103(5), 2677–2690.
https://doi.org/10.1121/1.422788
Hairston, W. D., Burdette, J. H., Flowers, D. L., Wood, F. B., & Wallace, M. T. (2005). Altered
temporal profile of visual-auditory multisensory interactions in dyslexia. Experimental
Brain Research, 166(3–4), 474–480. https://doi.org/10.1007/s00221-005-2387-6
Hershenson, M. (1962). Reaction time as a measure of intersensory facilitation. Journal of
Experimental Psychology, 63(3), 289–293.
Hillock-Dunn, A., & Wallace, M. T. (2012). Developmental changes in the multisensory
temporal binding window persist into adolescence. Developmental Science, 15(5), 688–696.
https://doi.org/10.1111/j.1467-7687.2012.01171.x
Hillock, A. R., Powers, A. R., & Wallace, M. T. (2011). Binding of sights and sounds: Age-
related changes in multisensory temporal processing. Neuropsychologia, 49(3), 461–467.
48
https://doi.org/10.1016/j.neuropsychologia.2010.11.041
Ikumi, N., Torralba, M., Ruzzoli, M., & Soto-Faraco, S. (2018). The phase of pre-stimulus brain
oscillations correlates with cross-modal synchrony perception. European Journal of
Neuroscience, (September), 1–15. https://doi.org/10.1111/ejn.14186
Kaganovich, N., & Schumaker, J. (2016). Electrophysiological correlates of individual
differences in perception of audiovisual temporal asynchrony. Neuropsychologia, 86, 119–
130. https://doi.org/10.1016/j.neuropsychologia.2016.04.015
Kambe, J., Kakimoto, Y., & Araki, O. (2015). Phase reset affects auditory-visual simultaneity
judgment. Cognitive Neurodynamics, 9(5), 487–493. https://doi.org/10.1007/s11571-015-
9342-4
Kayser, C., & Logothetis, N. K. (2007). Do early sensory cortices integrate cross-modal
information? Brain Structure and Function, 212(2), 121–132.
https://doi.org/10.1007/s00429-007-0154-0
Kayser, C., Petkov, C. I., & Logothetis, N. K. (2008). Visual Modulation of Neurons in Auditory
Cortex. Cerebral Cortex, (July). https://doi.org/10.1093/cercor/bhm187
Keetels, M., & Vroomen, J. (2012). Perception of Synchrony between the Senses. In M. M.
Murray & M. T. Wallace (Eds.), The Neural Bases of Multisensory Processes. Boca Raton
(FL): CRC Press/Taylor & Francis. https://doi.org/10.1201/9781439812174-12
Keil, J., & Senkowski, D. (2018). Neural Oscillations Orchestrate Multisensory Processing. The
Neuroscientist, 1–18. https://doi.org/10.1177/1073858418755352
Kopell, N., Ermentrout, G. B., Whittington, M. A., & Traub, R. D. (2000). Gamma rhythms and
beta rhythms have different synchronization properties. PNAS, 97(4), 1867–1872.
Kwakye, L. D., Foss-Feig, J. H., Cascio, C. J., Stone, W. L., & Wallace, M. T. (2011). Altered
Auditory and Multisensory Temporal Processing in Autism Spectrum Disorders. Frontiers
in Integrative Neuroscience, 4(January), 1–11. https://doi.org/10.3389/fnint.2010.00129
Kybic, J., Clerc, M., Abboud, T., Faugeras, O., Keriven, R., & Papadopoulo, T. (2005). A
49
Common Formalism for the Integral Formulations of the Forward EEG Problem. IEEE
Transactions on Medical Imaging, 24(1), 12–28.
Lakatos, P., Chen, C. M., O’Connell, M. N., Mills, A., & Schroeder, C. E. (2007). Neuronal
Oscillations and Multisensory Interaction in Primary Auditory Cortex. Neuron, 53(2), 279–
292. https://doi.org/10.1016/j.neuron.2006.12.011
Lakatos, P., Connell, M. N. O., Barczak, A., Mills, A., Javitt, D. C., & Schroeder, C. E. (2009).
The Leading Sense: Supramodal Control of Neurophysiological Context by Attention.
Neuron, 64(3), 419–430. https://doi.org/10.1016/j.neuron.2009.10.014
Lakatos, P., Karmos, G., Mehta, A. D., Ulbert, I., & Schroeder, C. E. (2008). Entrainment of
neuronal oscillations as a mechanism of attentional selection. Science, 320(5872), 110–113.
https://doi.org/10.1126/science.1154735
Lakatos, P., O’Connell, M. N., Barczak, A., Mills, A., Javitt, D. C., & Schroeder, C. E. (2009).
The Leading Sense: Supramodal Control of Neurophysiological Context by Attention.
Neuron, 64(3), 419–430. https://doi.org/10.1016/j.neuron.2009.10.014
Lennie, P. (1981). The physiological basis of variations in visual latency. Vision Research, 21(6),
815–824. https://doi.org/10.1016/0042-6989(81)90180-2
Lovelace, C. T., Stein, B. E., & Wallace, M. T. (2003). An irrelevant light enhances auditory
detection in humans: a psychophysical analysis of multisensory integration in stimulus
detection. Cognitive Brain Research, 17, 447–453.
Luck, S. J. (2014). An Introduction to the Event-Related Potential Technique (2nd ed.).
Cambridge, MA: MIT Press. https://doi.org/10.1073/pnas.0703993104
Macaluso, E., Spence, C., George, N., Driver, J., & Dolan, R. (2004). Spatial and temporal
factors during processing of audiovisual speech: a PET study. NeuroImage, 21(2), 725–732.
https://doi.org/10.1016/j.neuroimage.2003.09.049
Makeig, S., Debener, S., Onton, J., & Delorme, A. (2004). Mining event-related brain dynamics.
Trends in Cognitive Sciences, 8(5). https://doi.org/10.1016/j.tics.2004.03.008
50
Marchant, J. L., Ruff, C. C., & Driver, J. (2012). Audiovisual synchrony enhances BOLD
responses in a brain network including multisensory STS while also enhancing target-
detection performance for both modalities. Human Brain Mapping, 33(5), 1212–1224.
https://doi.org/10.1002/hbm.21278
Martin, B., Giersch, A., Huron, C., & van Wassenhove, V. (2013). Temporal event structure and
timing in schizophrenia: Preserved binding in a longer “now.” Neuropsychologia, 51(2),
358–371. https://doi.org/10.1016/j.neuropsychologia.2012.07.002
Mathewson, K. E., Gratton, G., Fabiani, M., Beck, D. M., & Ro, T. (2009). To See or Not to See:
Prestimulus Phase Predicts Visual Awareness. Journal of Neuroscience, 29(9), 2725–2732.
https://doi.org/10.1523/JNEUROSCI.3963-08.2009
McIntosh, A. R., & Lobaugh, N. J. (2004). Partial least squares analysis of neuroimaging data:
Applications and advances. NeuroImage, 23, 250–263.
https://doi.org/10.1016/j.neuroimage.2004.07.020
Mégevand, P., Mercier, M. R., Groppe, D. M., Golumbic, E. Z., Mesgarani, N., Beauchamp, M.
S., … Mehta, A. D. (2019). Phase resetting in human auditory cortex to visual speech.
BioRxiv Preprint, 1–34. https://doi.org/10.1101/405597
Mercier, M. R., Foxe, J. J., Fiebelkorn, I. C., Butler, J. S., Schwartz, T. H., & Molholm, S.
(2013). Auditory-driven phase reset in visual cortex: Human electrocorticography reveals
mechanisms of early multisensory integration. NeuroImage, 79, 19–29.
https://doi.org/10.1016/j.neuroimage.2013.04.060
Mercier, M. R., Molholm, S., Fiebelkorn, I. C., Butler, X. J. S., Schwartz, T. H., & Foxe, J. J.
(2015). Neuro-Oscillatory Phase Alignment Drives Speeded Multisensory Response Times:
An Electro-Corticographic Investigation. The Journal of Neuroscience, 35(22), 8546–8557.
https://doi.org/10.1523/JNEUROSCI.4527-14.2015
Meredith, M., Nemitz, J., & Stein, B. (1987). Determinants of multisensory integration in
superior colliculus neurons. I. Temporal factors. Journal of Neuroscience, 7(10), 3215–
3229. https://doi.org/10.1523/JNEUROSCI.07-10-03215.1987
51
Meredith, M., & Stein, B. (1983). Interactions among Converging Sensory Inputs in the Superior
Colliculus. Science, 221(4608), 389–391.
Meredith, M., & Stein, B. (1986). Visual, Auditory, and Somatosensory Convergence on Cells in
Superior Colliculus Results in Multisensory Integration. Journal of Neurophysiology, 56(3),
640–662.
Meredith, M., & Stein, B. (1996). Spatial determinants of multisensory integration in cat superior
colliculus neurons. Journal of Neurophysiology, 75(5), 1843–1857.
https://doi.org/10.1152/jn.1996.75.5.1843
Miller, L. M., & D’Esposito, M. (2005). Perceptual Fusion and Stimulus Coincidence in the
Cross- Modal Integration of Speech. The Journal of Neuroscience, 25(25), 5884–5893.
https://doi.org/10.1523/JNEUROSCI.0896-05.2005
Molholm, S., Ritter, W., Murray, M. M., Javitt, D. C., Schroeder, C. E., & Foxe, J. J. (2002).
Multisensory auditory – visual interactions during early sensory processing in humans: a
high-density electrical mapping study. Cognitive Brain Research, 14, 115–128.
Murray, M. M., Eardley, A. F., Edginton, T., Oyekan, R., Smyth, E., & Matusz, P. J. (2018).
Sensory dominance and multisensory integration as screening tools in aging. Nature, (May),
1–11. https://doi.org/10.1038/s41598-018-27288-2
Nickerson, R. S. (1973). Intersensory facilitation of reaction time: Energy summation or
preparation enhancement? Psychological Review, 80(6), 489–509.
https://doi.org/10.1037/h0035437
Noel, J., De Niear, M., Van der Burg, E., & Wallace, M. T. (2016). Audiovisual Simultaneity
Judgment and Rapid Recalibration throughout the Lifespan. PLoS ONE, 1–14.
https://doi.org/10.1371/journal.pone.0161698
Noesselt, T., Rieger, J. W., Schoenfeld, M. A., Kanowski, M., Hinrichs, H., Heinze, H.-J., &
Driver, J. (2007). Audiovisual Temporal Correspondence Modulates Human Multisensory
Superior Temporal Sulcus Plus Primary Sensory Cortices. Journal of Neuroscience, 27(42),
11431–11441. https://doi.org/10.1109/SENSOR.2007.4300668
52
Ohshiro, T., Angelaki, D. E., & Deangelis, G. C. (2011). A Normalization Model of
Multisensory Integration. Nature Neuroscience, 14(6), 775–782.
https://doi.org/10.1038/nn.2815.A
Oostenveld, R., Fries, P., Maris, E., & Schoffelen, J. (2011). FieldTrip : Open Source Software
for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data, 2011.
https://doi.org/10.1155/2011/156869
Pantazis, D., Nichols, T. E., Baillet, S., & Leahy, R. M. (2005). A comparison of random field
theory and permutation methods for the statistical analysis of MEG data. NeuroImage, 25,
383–394. https://doi.org/10.1016/j.neuroimage.2004.09.040
Peirce, J., Gray, J. R., Simpson, S., Macaskill, M., Höchenberger, R., Sogo, H., … Lindeløv, J.
K. (2019). PsychoPy2 : Experiments in behavior made easy. Behavior Research Methods,
51, 195–203.
Pfurtscheller, G. (1992). Event-related synchronization (ERS): an electrophysiological correlate
of cortical areas at rest. Electroencephalography and Clinical Neurophysiology, 83, 62–69.
Pfurtscheller, G., & Neuper, C. (1996). Post-movement beta synchronization. A correlate of an
idling motor area? Electroencephalography and Clinical Neurophysiology, 98, 281–293.
Polich, J. (1986). P300 Development from Auditory Stimuli. Psychophysiology, 23(5), 590–597.
Pöppel, E., Schill, K., & von Steinbüchel, N. (1990). Sensory integration within temporally
neutral systems states: A hypothesis. Naturwissenschaften, 77, 89–91.
Powers, A. R., Hevey, M. A., & Wallace, M. T. (2012). Neural Correlates of Multisensory
Perceptual Learning. Journal of Neuroscience, 32(18), 6263–6274.
https://doi.org/10.1523/jneurosci.6138-11.2012
Powers, A. R., Hillock, A. R., & Wallace, M. T. (2009). Perceptual Training Narrows the
Temporal Window of Multisensory Binding. Journal of Neuroscience, 29(39), 12265–
12274. https://doi.org/10.1523/JNEUROSCI.3501-09.2009
Rockland, K. S., & Ojima, H. (2003). Multisensory convergence in calcarine visual areas in
53
macaque monkey. International Journal of Psychophysiology, 50(03), 19–26.
https://doi.org/10.1016/S0167-8760(03)00121-1
Roseboom, W., & Arnold, D. H. (2011). Twice Upon a Time: Multiple Concurrent Temporal
Recalibrations of Audiovisual Speech. Psychological Science, 22(7), 872–877.
https://doi.org/10.1177/0956797611413293
Rowland, B. A., Stanford, T. R., & Stein, B. E. (2007). A model of the neural mechanisms
underlying multisensory integration in the superior colliculus. Perception, 36(10), 1431–
1443. https://doi.org/10.1068/p5842
Sanz Leon, P., Knock, S. A., Woodman, M. M., Domide, L., Mersmann, J., McIntosh, A. R., &
Jirsa, V. (2013). The Virtual Brain: a simulator of primate brain network dynamics.
Frontiers in Neuroinformatics, 7(June). https://doi.org/10.3389/fninf.2013.00010
Sayers, B., Beagley, H., & Henshall, W. (1974). The mechanism of auditory evoked EEG
responses. Nature, 247(5441), 481–483. https://doi.org/10.1038/247481a0
Schirner, M., McIntosh, A. R., Jirsa, V., Deco, G., & Ritter, P. (2018). Inferring multi-scale
neural mechanisms with brain network modelling. ELife, 7.
https://doi.org/10.7554/eLife.28927
Senkowski, D., Schneider, T. R., Foxe, J. J., & Engel, A. K. (2008). Crossmodal binding through
neural coherence: implications for multisensory processing. Trends in Neurosciences, 31(8),
401–409. https://doi.org/10.1016/j.tins.2008.05.002
Shams, L., & Seitz, A. R. (2008). Benefits of multisensory learning. Trends in Cognitive
Sciences, 30(10), 1–7. https://doi.org/10.1016/j.tics.2008.07.006
Stein, B., & Stanford, T. R. (2008). Multisensory integration: Current issues from the perspective
of the single neuron. Nature Reviews Neuroscience, 9(4), 255–266.
https://doi.org/10.1038/nrn2331
Stein, B., & Meredith, M. (1993). The Merging of the Senses. (B. Stein & M. Meredith, Eds.).
Cambridge, MA: MIT Press.
54
Stein, B., Stanford, T., & Rowland, B. (2009). The Neural Basis of Multisensory Integration in
the Midbrain: Its Organization and Maturation. Hearing Research, 258, 4-15.
https://doi.org/10.1016/j.heares.2009.03.012.The
Stein, B. (1998). Neural mechanisms for synthesizing sensory information and producing
adaptive behaviors. Experimental Brain Research, 123, 124–135. Retrieved from
papers3://publication/uuid/5ACCA31B-ED0E-44D1-9E2C-6CC9B28AFE7E
Stevenson, R. A., Altieri, N. A., Kim, S., Pisoni, D. B., & James, T. W. (2010). Neural
processing of asynchronous audiovisual speech perception. NeuroImage, 49(4), 3308–3318.
https://doi.org/10.1016/j.neuroimage.2009.12.001
Stevenson, R. A., Baum, S. H., Krueger, J., Newhouse, P. A., & Wallace, M. T. (2018). Links
between temporal acuity and multisensory integration across the life span. Journal of
Experimental Psychology: Human Perception and Performance, 44(1), 106–116.
https://doi.org/10.1037/xhp0000424
Stevenson, R. A., VanDerKlok, R. M., Pisoni, D. B., & James, T. W. (2011). Discrete neural
substrates underlie complementary audiovisual speech integration processes. NeuroImage,
55(3), 1339–1345. https://doi.org/10.1016/j.neuroimage.2010.12.063
Stevenson, R. A., & Wallace, M. T. (2013). Multisensory temporal integration: Task and
stimulus dependencies. Experimental Brain Research, 227(2), 249–261.
https://doi.org/10.1007/s00221-013-3507-3
Stevenson, R. A., Zemtsov, R. K., & Wallace, M. T. (2012). Individual differences in the
multisensory temporal binding window predict susceptibility to audiovisual illusions.
Journal of Experimental Psychology: Human Perception and Performance, 38(6), 1517–
1529. https://doi.org/10.1037/a0027339
Stone, J. V., Hunkin, N. M., Porrill, J., Wood, R., Keeler, V., Beanland, M., … Porter, N. R.
(2001). When is now? Perception of simultaneity. Proceedings of the Royal Society B:
Biological Sciences, 268(1462), 31–38. https://doi.org/10.1098/rspb.2000.1326
Sumby, W. H., & Pollack, I. (1954). Visual Contribution to Speech Intelligibility in Noise. The
55
Journal of the Acoustical Society of America, 26(212). https://doi.org/10.1121/1.1907309
Tallon-Baudry, C., Bertrand, O., Delpuech, C., & Pernier, J. (1996). Stimulus Specificity of
Phase-Locked and Non-Phase-Locked 40 Hz Visual Responses in Human. The Journal of
Neuroscience, 16(13), 4240–4249.
Tallon-Baudry, C., Bertrand, O., & Fischer, C. (2001). Oscillatory Synchrony between Human
Extrastriate Areas during Visual Short-Term Memory Maintenance. The Journal of
Neuroscience, 21, 1–5.
Thomas, D. G., Grice, J. W., & Najm-Briscoe, R. G. (2004). The Influence of Unequal Numbers
of Trials on Comparisons of Average Event- Related Potentials. Psychology, 26(3), 753–
774.
van Atteveldt, N., Formisano, E., & Blomert, L. (2007). The Effect of Temporal Asynchrony on
the Multisensory Integration of Letters and Speech Sounds. Cerebral Cortex, 17(April),
962–974. https://doi.org/10.1093/cercor/bhl007
van Atteveldt, N., Murray, M. M., Thut, G., & Schroeder, C. E. (2014). Multisensory integration:
Flexible use of general operations. Neuron, 81(6), 1240–1253.
https://doi.org/10.1016/j.neuron.2014.02.044
van Diepen, R. M., & Mazaheri, A. (2018). The Caveats of observing Inter-Trial Phase-
Coherence in Cognitive Neuroscience. Scientific Reports, 8(1), 1–9.
https://doi.org/10.1038/s41598-018-20423-z
van Dijk, H., Schoffelen, J.-M., Oostenveld, R., & Jensen, O. (2008). Prestimulus Oscillatory
Activity in the Alpha Band Predicts Visual Discrimination Ability. Journal of
Neuroscience, 28(8), 1816–1823. https://doi.org/10.1523/jneurosci.1853-07.2008
van Eijk, R. L. J., Kohlrausch, A., Juola, J. F., & van de Par, S. (2008). Audiovisual synchrony
and temporal order judgments: Effects of experimental method and stimulus type.
Perception and Psychophysics, 70(6), 955–968. https://doi.org/10.3758/PP.70.6.955
van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in
auditory-visual speech perception. Neuropsychologia, 45(3), 598–607.
56
https://doi.org/10.1016/j.neuropsychologia.2006.01.001
Voloh, B., & Womelsdorf, T. (2016). A Role of Phase-Resetting in Coordinating Large Scale
Neural Networks During Attention and Goal-Directed Behavior. Frontiers in Systems
Neuroscience, 10(March), 1–19. https://doi.org/10.3389/fnsys.2016.00018
Vroomen, J., & Keetels, M. (2010). Perception of intersensory synchrony: A tutorial review.
Attention, Perception & Psychophysics, 72(4), 871–884.
Wallace, M. T., & Stevenson, R. A. (2014). The construct of the multisensory temporal binding
window and its dysregulation in developmental disabilities. Neuropsychologia, 64, 105–
123. https://doi.org/10.1016/j.neuropsychologia.2014.08.005
Wu, P., Chen, Y., Yeh, S., Wu, M., & Tang, P. (2015). Multisensory integration ability correlates
with spatial working memory and functional mobility in cognitively normal middle-aged
and older adults. Physiotherapy, 101, e1663–e1664.
https://doi.org/10.1016/j.physio.2015.03.061
Yuan, X., Bi, C., Yin, H., Li, B., & Huang, X. (2014). The recalibration patterns of perceptual
synchrony and multisensory integration after exposure to asynchronous speech.
Neuroscience Letters, 569, 148–152. https://doi.org/10.1016/j.neulet.2014.03.057
Yuan, X., Li, H., Liu, P., Yuan, H., & Huang, X. (2016). Pre-stimulus beta and gamma
oscillatory power predicts perceived audiovisual simultaneity. International Journal of
Psychophysiology, 107, 29–36. https://doi.org/10.1016/j.ijpsycho.2016.06.017
Zampini, M., Shore, D. I., & Spence, C. (2003). Audiovisual temporal order judgments.
Experimental Brain Research, 152(2), 198–210. https://doi.org/10.1007/s00221-003-1536-z