predictive neural computations support spoken word
TRANSCRIPT
Copyright © 2021 Wang et al.This is an open-access article distributed under the terms of the Creative Commons Attribution4.0 International license, which permits unrestricted use, distribution and reproduction in anymedium provided that the original work is properly attributed.
Research Articles: Behavioral/Cognitive
Predictive Neural Computations SupportSpoken Word Recognition: Evidence from MEGand Competitor Priming
https://doi.org/10.1523/JNEUROSCI.1685-20.2021
Cite as: J. Neurosci 2021; 10.1523/JNEUROSCI.1685-20.2021
Received: 24 June 2020Revised: 22 May 2021Accepted: 25 May 2021
This Early Release article has been peer-reviewed and accepted, but has not been throughthe composition and copyediting processes. The final version may differ slightly in style orformatting and will contain links to any extended data.
Alerts: Sign up at www.jneurosci.org/alerts to receive customized email alerts when the fullyformatted version of this article is published.
1
Title (13 words) 1
Predictive Neural Computations Support Spoken Word Recognition: Evidence from 2
MEG and Competitor Priming 3
4
Abbreviated title (48 characters) 5
Predictive Neural Computations for Spoken Words 6
7
Authors and affiliation 8
Yingcan Carol Wang1, Ediz Sohoglu2, Rebecca A. Gilbert1, Richard N. Henson1, 9
Matthew H. Davis1 10
1 MRC Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer 11
Road, Cambridge, CB2 7EF, UK 12
2 School of Psychology, University of Sussex, Brighton, BN1 9RH, UK 13
14
Number of figures: 8 15
16
Word count 17
Abstract: 247 18
Introduction: 644 19
Discussion: 1542 20
21
Corresponding authors and emails 22
Matthew H. Davis: [email protected] 23
Yingcan Carol Wang: [email protected] 24
25
2
Conflict of interest statement 26
The authors declare no competing financial interests. 27
28
Acknowledgments 29
The research was supported by the UK Medical Research Council (SUAG/044 & 30
SUAG/046 G101400) and by a China Scholarship Council award to Yingcan Carol 31
Wang. We are grateful to Clare Cook, Ece Kocagoncu and Tina Emery for their 32
assistance with data collection, and also to Olaf Hauk for his advice on MEG data 33
analysis. 34
35
36
3
Abstract 37
Human listeners achieve quick and effortless speech comprehension through 38
computations of conditional probability using Bayes rule. However, the neural 39
implementation of Bayesian perceptual inference remains unclear. Competitive-40
selection accounts (e.g. TRACE) propose that word recognition is achieved through 41
direct inhibitory connections between units representing candidate words that share 42
segments (e.g. hygiene and hijack share /haidʒ/). Manipulations that increase lexical 43
uncertainty should increase neural responses associated with word recognition when 44
words cannot be uniquely identified. In contrast, predictive-selection accounts (e.g. 45
Predictive-Coding) proposes that spoken word recognition involves comparing heard 46
and predicted speech sounds and using prediction error to update lexical 47
representations. Increased lexical uncertainty in words like hygiene and hijack will 48
increase prediction error and hence neural activity only at later time points when 49
different segments are predicted. We collected MEG data from male and female 50
listeners to test these two Bayesian mechanisms and used a competitor priming 51
manipulation to change the prior probability of specific words. Lexical decision 52
responses showed delayed recognition of target words (hygiene) following 53
presentation of a neighbouring prime word (hijack) several minutes earlier. However, 54
this effect was not observed with pseudoword primes (higent) or targets (hijure). 55
Crucially, MEG responses in the STG showed greater neural responses for word-56
primed words after the point at which they were uniquely identified (after /haidʒ/ in 57
hygiene) but not before while similar changes were again absent for pseudowords. 58
These findings are consistent with accounts of spoken word recognition in which 59
neural computations of prediction error play a central role. 60
61
4
Significance Statement 62
Effective speech perception is critical to daily life and involves computations that 63
combine speech signals with prior knowledge of spoken words; that is, Bayesian 64
perceptual inference. This study specifies the neural mechanisms that support 65
spoken word recognition by testing two distinct implementations of Bayes perceptual 66
inference. Most established theories propose direct competition between lexical units 67
such that inhibition of irrelevant candidates leads to selection of critical words. Our 68
results instead support predictive-selection theories (e.g. Predictive-Coding): by 69
comparing heard and predicted speech sounds, neural computations of prediction 70
error can help listeners continuously update lexical probabilities, allowing for more 71
rapid word identification. 72
73
Introduction 74
In daily conversation, listeners identify ~200 words/minute (Tauroza & Allison, 1990) 75
from a vocabulary of ~40,000 words (Brysbaert et al., 2016). This means that they 76
must recognise 3-4 words/second and constantly select from sets of transiently 77
ambiguous words (e.g. hijack and hygiene both begin with /haidʒ/). Although it is 78
recognised that humans achieve word recognition by combining current speech input 79
with its prior probability using Bayes theorem (Norris & McQueen, 2008; Davis & 80
Scharenborg, 2016), the underlying neural implementation of Bayesian perceptual 81
inference remains unclear (Aitchison & Lengeyl, 2017). 82
Here, we test two computational accounts of spoken word recognition that 83
both implement Bayes rules. In competitive-selection accounts (e.g. TRACE, 84
McClelland & Elman, 1986, Figure 1A), word recognition is achieved through within-85
5
layer lateral inhibition between neural units representing similar words. By this view, 86
hijack and hygiene compete for identification such that an increase in probability for 87
one word inhibits units representing other similar-sounding words. Conversely, 88
predictive-selection accounts (e.g. Predictive-Coding, Davis & Sohoglu, 2020) 89
suggest that word recognition is achieved through computations of prediction error 90
(Figure 1D). On hearing transiently ambiguous speech like /haidʒ/, higher-level units 91
representing matching words make contrasting predictions (/æk/ for hijack, /i:n/ for 92
hygiene). Prediction error (the difference between sounds predicted and actually 93
heard) provides a signal to update word probabilities such that the correct word can 94
be selected. 95
In this study, we used the competitor priming effect (Monsell & Hirsh, 1998; 96
Marsolek, 2008), which is directly explicable in Bayesian terms, i.e. the recognition of 97
a word (hygiene) is delayed if the prior probability of a competitor word (hijack) has 98
been increased due to an earlier exposure. This delay could be due to increased 99
lateral inhibition (competitive-selection) or greater prediction error (predictive-100
selection). Thus, similar behavioural effects of competitor priming are predicted by 101
two distinct neural computations (Spratling, 2008). To distinguish them, it is critical to 102
investigate neural data that reveals the direction, timing and level of processing at 103
which competitor priming modulates neural responses. Existing neural data remains 104
equivocal with some evidence consistent with competitive-selection (Bozic et al., 105
2010; Okada & Hickok, 2006), predictive-selection (Gagnepain et al, 2012), or both 106
mechanisms (Brodbeck et al., 2018; Donhauser et al., 2019). We followed these 107
studies in correlating two computational measures with neural activity: lexical entropy 108
(competitive-selection) and segment prediction error (or phoneme surprisal, for 109
predictive-selection). 110
6
Here, we used MEG to record the location and timing of neural responses 111
during spoken words recognition in a competitor priming experiment. Pseudowords 112
(e.g. hijure) were included in our analysis to serve as a negative control for 113
competitor priming, since existing research found that pseudowords neither produce 114
nor show this effect (Monsell & Hirsh, 1998). We compared items with the same 115
initial segments (words hygiene, hijack, pseudowords hijure, higent share /haidʒ/) 116
and measured neural and behavioural effects concurrently to link these two effects 117
for single trials. 118
While lexical entropy and prediction error are correlated for natural speech, 119
this competitor priming manipulation allows us to make differential predictions as 120
illustrated in Figure 1. Specifically: (1) before the deviation point (DP, the point at 121
which similar-sounding words diverge), competitor priming increases lexical entropy 122
and hence neural responses (Figure 1B&C Pre-DP). Such responses might be 123
observed in inferior frontal regions (Zhuang et al., 2011) and posterior temporal 124
regions (Prabhakaran et al., 2006). However, prediction error will be reduced for pre-125
DP segments, since heard segments are shared and hence more strongly predicted 126
(Figure 1E&F Pre-DP). This should be reflected in the superior temporal gyrus (STG, 127
Sohoglu & Davis, 2016). (2) After the DP, predictive-selection but not competitive-128
selection accounts propose that pseudowords evoke greater signals in the left-STG, 129
since they evoke maximal prediction errors (Figure 1E&F Pseudoword, Post-DP). (3) 130
Furthermore, in predictive-selection theories, competitor priming is associated with 131
an increased STG response to post-DP segments due to enhanced prediction error 132
caused by mismatch between primed words (predictions) and heard speech (Figure 133
1E&F Word, Post-DP). 134
135
7
136
Materials and Methods 137
Participants 138
Twenty-four (17 female, 7 male) right-handed, native English speakers were tested 139
after giving informed consent under a process approved by the Cambridge 140
Psychology Research Ethics Committee. This sample size was selected based on 141
previous studies measuring similar neural effects with the same MEG system 142
(Gagnepain et al. 2012; Sohoglu & Davis, 2016; Sohoglu et al. 2012, etc.). All 143
participants were aged 18-40 years and had no history of neurological disorder or 144
hearing impairment based on self-report. Two participants’ MEG data were excluded 145
from subsequent analyses respectively due to technical problems and excessive 146
head movement, resulting in 22 participants in total. All recruited participants 147
received monetary compensation. 148
149
Experimental Design 150
To distinguish competitive- and predictive-selection accounts, we manipulated 151
participants’ word recognition process by presenting partially mismatched auditory 152
stimuli prior to targets. Behavioural responses and MEG signals were acquired 153
simultaneously. Prime and target stimuli pairs form a repeated measures design with 154
two factors (lexicality and prime type). The lexicality factor has 2 levels: word and 155
pseudoword, while the prime type factor contains 3 levels: unprimed, primed by 156
same lexical status, primed by different lexical status. Hence the study is a factorial 2 157
x 3 design with 6 conditions: unprimed word (hijack), word-primed word (hijack-158
hygiene), pseudoword-primed word (basef-basis), unprimed pseudoword (letto), 159
pseudoword-primed pseudoword (letto-lettan), word-primed pseudoword (boycott-160
8
boymid). Prime-target pairs were formed only by stimuli sharing the same initial 161
segments. Items in the two unprimed conditions served as prime items in other 162
conditions and they were compared with target items (Figure 2A). 163
The experiment used a lexical decision task (Figure 2B) implemented in 164
MATLAB through Psychtoolbox-3 (Kleiner et al. 2007), during which participants 165
heard a series of words and pseudowords while making lexicality judgments to each 166
stimulus by pressing buttons using their left index and middle fingers only, with the 167
index finger pressing one button indicating word and the middle finger pressing the 168
other button indicating pseudoword. 344 trials of unique spoken items were 169
presented every ~3 seconds in two blocks of 172 trials, each block lasting 170
approximately 9 minutes. Each prime-target pair was separated by 20 to 76 trials of 171
items that do not start with the same speech sounds, resulting in a relatively long 172
delay of ~1-4 minutes between presentations of phonologically-related items. This 173
delay was chosen based on Monsell and Hirsh (1998), who suggest that it prevents 174
strategic priming effects (Norris et al. 2002). Stimuli from each of the quadruplets 175
were Latin-square counterbalanced across participants, i.e. stimulus quadruplets that 176
appeared in one condition for one participant were allocated to another condition for 177
another participant. The stimulus sequences were pseudo-randomised using Mix 178
software (van Casteren & Davis, 2006), so that the same type of lexical status 179
(word/pseudoword) did not appear successively on more than 4 trials. 180
181
Stimuli 182
The stimuli consisted of 160 sets of four English words and pseudowords, with 183
durations ranging from 372 to 991 ms (M = 643, SD = 106). Each set contained 2 184
words (e.g. letter, lettuce) and 2 phonotactically-legal pseudowords (e.g. letto, lettan) 185
9
that share the same initial segments (e.g. /let/) but diverge immediately afterwards. 186
We used polysyllabic word pairs (Msyllable = 2.16, SDsyllable =0.36) instead of 187
monosyllabic ones in our experiments so as to identify a set of optimal lexical 188
competitors that are similar to their prime yet dissimilar from all other items. All words 189
were selected from the CELEX database (Baayen et al., 1993). Their frequencies 190
were taken from SUBTLEX UK corpus (Van Heuven et al., 2014) and restricted to 191
items under 5.5 based on log frequency per million word (Zipf scale, Van Heuven et 192
al., 2014). In order to ensure that any priming effect was caused purely by 193
phonological but not semantic similarity, we also checked that all prime and target 194
word pairs have a semantic distance of above 0.7 on a scale from 0 to 1 based on 195
the Snaut database of semantic similarity scores (Mandera et al., 2017), such that 196
morphological relatives (e.g. darkly/darkness) were excluded. 197
All spoken stimuli were recorded onto a Marantz PMD670 digital recorder by a 198
male native speaker of southern British English in a sound-isolated booth at a 199
sampling rate of 44.1 kHz. Special care was taken to ensure that shared segments 200
of stimuli were pronounced identically (any residual acoustic differences were 201
subsequently eliminated using audio morphing as described below). 202
The point when items within each quadruplet begin to acoustically differ from 203
each other is the deviation point (hereafter DP, see Figure 3A). Pre-DP length 204
ranged from 150 to 672 ms (M = 353, SD = 96), while post-DP length ranged from 42 205
to 626 ms (M = 290, SD = 111, see Figure 3B). Epochs of MEG data were time-206
locked to the DP. Using phonetic transcriptions (phonDISC) in CELEX, the location 207
of the DP was decided based on the phoneme segment at which items within each 208
quadruplet set diverge (Mseg=3.53, SDseg=0.92). To determine when in the speech 209
files corresponds to the onset of the first post-DP segment, we aligned phonetic 210
10
transcriptions to corresponding speech files using the WebMAUS forced alignment 211
service (Kisler et al., 2017; Schiel, 1999). In order to ensure that the pre-DP portion 212
of the waveform was acoustically identical, we cross-spliced the pre-DP segments of 213
the 4 stimuli within each quadruplet and conducted audio morphing to combine the 214
syllables using STRAIGHT (Kawahara, 2006) implemented in MATLAB. This method 215
decomposes speech signals into source information and spectral information, and 216
permits high quality speech re-synthesis based on modified versions of these 217
representations. This enables flexible averaging and interpolation of parameter 218
values that can generate acoustically intermediate speech tokens (see Rogers & 219
Davis, 2017, for example). In the present study, this method enabled us to present 220
speech tokens with entirely ambiguous pre-DP segments, and combine these with 221
post-DP segments without introducing audible discontinuities or other degradation in 222
the speech tokens. This way, phonological co-articulation in natural speech was 223
reduced to the lowest level possible at the DP, hence any cross-stimuli divergence 224
evoked in neural responses can only be caused by post-DP deviation. 225
226
Post-test Gating Study 227
As encouraged by a reviewer, we conducted a post-test perceptual experiment using 228
a gating task in order to confirm that the cross-splicing and morphing of our stimuli 229
worked as expected. This experiment used a gating task implemented in JavaScript 230
through JSpsych (de Leeuw, 2015). During the experiment, auditory segments of all 231
160 pairs of words used in the MEG study were played. Twenty British English 232
speakers were recruited through Prolific Academic online with monetary 233
compensation. The sample size was selected based on a similar gating study 234
conducted by Davis et al. (2002). Participants were evenly divided into two groups, 235
11
one group were presented with 160 stimuli words with different pre-DP segments 236
(e.g. hygiene), while the other group were presented with the other paired 160 stimuli 237
(e.g. hijack). Therefore, participants only ever heard one of the two items in each pair. 238
Stimuli segments of each word item consist of the pre-DP segment and, depending 239
on the stimuli length, also longer segments that are 75ms, 150ms, 225ms and 240
300ms post DP. The segments of each word were presented in a gating manner, 241
with the shortest segment played the first and the full item played at the end. After 242
hearing each segment (e.g. /haidʒ/), participants were also presented with the 243
writing of the word (e.g. hygiene) that contained the segment and the other paired 244
word that shared the same pre-DP segment (e.g. hijack) on the screen. We asked 245
the participants to choose which item the auditory segment matches and indicate 246
their confidence from a rating scale of 1 to 6, with 1 representing being very 247
confident that the item is the one on the left and 6 representing being very confident 248
that the item is the one on the right, while 3 and 4 representing guessing the possible 249
item. In order to avoid potential practice effect, we also added 40 filler stimuli that are 250
identifiable on initial presentation. 251
Given our goal of assessing whether there is any information to distinguish the 252
words prior to the divergence point, we needed to adopt an analysis approach that 253
could confirm the null hypothesis that no difference exists between perception of the 254
shared first syllable of word pairs like hijack and hygiene. We therefore analysed the 255
results using Bayesian methods which permit this inference. Participants’ response 256
accuracy was analysed using mixed-effect logistic regression and confidence rating 257
scores were analysed using mixed-effect linear regression using the brms package 258
(Bürkner, 2017) implemented in R. Response scores were computed in a way such 259
that correct and most confident responses were scored 1, while incorrect and most 260
12
confident responses were scored 6 and so on. Participants and items were included 261
as random factors of the models and there was no fixed factor since we are only 262
interested in the intercepts, whose estimates indicate the logit transformed 263
proportion of correctness in the logistic model and the mean rating in the linear 264
model respectively. We chose weakly informative priors for each model and 265
conducted Bayes Factor analyses through the Savage-Dickey density ratio method 266
(Wagenmakers et al., 2010). Model estimate, standard error, lower and upper 267
boundary of 95% credible interval (CI) are also reported. 268
When checking our data, we found that 16 pairs of word items were not 269
morphed correctly, hence the spectral information of the pre-DP segments of these 270
word pairs were not exactly the same and some of them diverged acoustically before 271
the DP due to coarticulation. Therefore, we excluded these items from analyses of 272
the gating data and confirmed that excluding these items did not modify the 273
interpretation or significance of the MEG or behavioral results reported in the paper. 274
As shown in Figure 3C, we found that when gating segments ended at the DP, 275
Bayes factor provides strong evidence in favour of the null hypothesis, chance-level 276
accuracy (i.e. proportion of correct responses is 0.5), β = 0.04, SE = 0.08, lCI = -0.11, 277
uCI = 0.20, BF01 = 23.04. This indicates that participants could not predict the full 278
stimuli based on hearing the pre-DP segments. On the other hand, the Bayes factor 279
at later alignment points is close to 0, providing extremely strong evidence for the 280
alternative hypothesis that the proportion of correct responses is higher than 0.5 281
(75ms post-DP: β = 3.41, SE = 0.22, lCI = 2.99, uCI = 3.85, BF01 < 0.01; 150ms 282
post-DP: β = 6.26, SE = 0.56, lCI = 5.24, uCI = 7.41, BF01 < 0.01; 225ms post-DP: β 283
= 7.39, SE = 1.02, lCI = 5.65, uCI = 9.72, BF01 < 0.01; 300ms post-DP: β = 8.04, SE 284
= 1.88, lCI = 4.99, uCI = 12.32, BF01 < 0.01). Figure 3D shows that, with the gating 285
13
segment becoming longer, the rating scores gradually reduce (lower scores 286
indicating more accurate and more confident identification). We examined whether 287
the mean score at the DP is equal to 3.5 (i.e. chance performance) and found strong 288
evidence supporting the null hypothesis, β = -0.02, SE = 0.04, lCI = -0.10, uCI = 0.06, 289
BF01 = 21.79, which is consistent with the accuracy results. Furthermore, in order to 290
refine the estimate of the time point at which participants recognise the stimuli with 291
enough confidence, we also investigated at what alignment point is there evidence 292
showing the mean score lower than 2 (i.e. participants indicating more confident 293
identification). We found moderate evidence supporting the null hypothesis (mean 294
score equals to 2) at 75ms post-DP (β = -0.09, SE = 0.08, lCI = -0.25, uCI = 0.07, 295
BF01 = 6.07), but extremely strong evidence in favour of the alternative hypothesis 296
at 150ms post-DP (β = -0.71, SE = 0.05, lCI = -0.79, uCI = 0.62, BF01 < 0.01). 297
These results show that critical acoustic information that supports confident word 298
recognition arrives between 75ms and 150ms post-DP. 299
Overall, the post-test gating study confirmed that the pre-DP segments of 300
correctly morphed stimuli are not distinguishable within each stimuli set. However, 301
since we found items that were not correctly morphed during this control study, we 302
did a thorough check of our stimuli and identified all the items with pre-DP acoustic 303
differences (16 words and 12 pseudowords), which resulted in 8.68% of all trials 304
presented in the MEG study. In order to double check our MEG study results, we 305
then removed all these problematic trials from the data and reanalysed the data 306
using the same methods as described in the method section. Fortunately, we did not 307
find any inconsistent pattern or significance in our behavioural or neural results 308
compared to those reported with all trials included (see Table 1-5). Therefore, we 309
kept the original MEG and behavioural results with all items included in this paper. 310
14
311
Behavioural Data Analyses 312
Response times (RTs) were measured from the onset of the stimuli and inverse-313
transformed so as to maximise the normality of the data and residuals; Figures 314
report untransformed response times for clarity. Inverse-transformed RTs and error 315
rates were analysed using linear and logistic mixed-effect models respectively using 316
the lme4 package in R (Bates et al. 2014). Lexicality (word, pseudoword) and prime 317
type (unprimed, primed by same lexical status, primed by different lexical status) 318
were fixed factors, while participant and item were random factors. Maximal models 319
accounting for all random effects were attempted wherever possible, but reduced 320
random effects structures were applied when the full model did not converge (Barr et 321
al., 2013). Likelihood-ratio tests comparing the full model to a nested reduced model 322
using the Chi-Square distribution were conducted to evaluate main effects and 323
interactions. Significance of individual model coefficients were obtained using t 324
(reported by linear mixed-effect models) or z (reported by logistic mixed-effect 325
models) statistics in the model summary. One-tailed t statistics for RTs are also 326
reported for two planned contrasts: (1) word-primed versus unprimed conditions for 327
word targets, and (2) word-primed versus pseudoword-primed conditions for word 328
targets. 329
When assessing priming effects, we excluded data from target trials in which 330
the participant made an error in the corresponding prime trial, because it is unclear 331
whether such target items will be affected by priming given that the prime word was 332
not correctly identified. In addition, three trials with RTs shorter than the average pre-333
DP length (353ms) were removed from further analysis, since responses before 334
15
words and pseudowords acoustically diverge are too quick to be valid lexical 335
decision responses. 336
337
MEG Data Acquisition, Processing and Analyses 338
Magnetic fields were recorded with a VectorView system (Elekta Neuromag) which 339
contains a magnetometer and two orthogonal planar gradiometers at each of 102 340
locations within a hemispherical array around the head. Although electric potentials 341
were recorded simultaneously using 68 Ag-AgCl electrodes according to the 342
extended 10-10% system, these EEG data were excluded from further analysis due 343
to excessive noise. All data were digitally sampled at 1 kHz. Head position were 344
monitored continuously using five head-position indicator (HPI) coils attached to the 345
scalp. Vertical and horizontal electro-oculograms were also recorded by bipolar 346
electrodes. A 3D digitizer (FASTRAK; Polhemus, Inc.) was used to record the 347
positions of three anatomical fiducial points (the nasion, left and right preauricular 348
points), HPI coils and evenly distributed head points for use in source reconstruction. 349
MEG Data were preprocessed using the temporal extension of Signal Source 350
Separation in MaxFilter software (Elekta Neuromag) to reduce noise sources, 351
normalise the head position over blocks and participants to the sensor array and 352
reconstruct data from bad MEG sensors. Subsequent processing was conducted in 353
SPM12 (https://www.fil.ion.ucl.ac.uk/spm/) and FieldTrip 354
(http://www.fieldtriptoolbox.org/) software implemented in MATLAB. The data were 355
epoched from -1100 to 2000ms time-locked to the DP and baseline corrected 356
relative to the -1100 to -700ms prior to the DP, which is a period before the onset of 357
speech for all stimuli (Figure 1C). Low-pass filtering to 40 Hz was conducted both 358
before and after robust averaging across trials (Litvak et al., 2011). A time window of 359
16
-150 to 0ms was defined for pre-DP comparisons based on the shortest pre-DP 360
stimuli length. A broad window of 0 to 1000ms was defined for post-DP comparisons, 361
which covered the possible period for lexicality and prime effects. After averaging 362
over trials, an extra step was taken to combine the gradiometer data from each 363
planar sensor pair by taking the root-mean square (RMS) of the two amplitudes. 364
Sensor data from magnetometers and gradiometers were analysed 365
separately. We converted the sensor data into 3D images (2D sensor x time) and 366
performed F tests for main effects across sensors and time (the term “sensors” 367
denotes interpolated sensor locations in 2D image space). Reported effects were 368
obtained with a cluster-defining threshold of p < .001, and significant clusters 369
identified as those whose extent (across space and time) survived p < 0.05 FWE-370
correction using Random Field Theory (Kilner & Friston, 2010). Region of interest 371
(ROI) analyses for the priming effect were then conducted over sensors and time 372
windows that encompassed the significant pseudoword>word cluster, orthogonal to 373
priming effects. When plotting waveforms and topographies, data are shown for 374
sensors nearest to the critical points in 2D image space. 375
Apart from the two planned contrasts mentioned above (see Behavioural Data 376
Analyses), which were applied to post-DP analysis, one-tailed t statistics was also 377
reported on the pre-DP planned contrast between unprimed and word-primed items. 378
379
Source Reconstruction 380
In order to determine the underlying brain sources underlying the sensor-space 381
effects, source reconstruction was conducted using SPM’s Parametric Empirical 382
Bayes framework (Henson et al., 2011). To begin with, we obtained T1-weighted 383
structural MRI (sMRI) scans from each participant on a 3T Prisma system (Siemens, 384
17
Erlangen, Germany) using an MPRAGE sequence. The scan images were 385
segmented and normalised to an MNI template brain in MNI space. The inverse of 386
this spatial transformation was then used to warp canonical meshes derived from 387
that template brain back to each subject’s MRI space (Mattout et al., 2007). Through 388
this procedure, canonical cortical meshes containing 8196 vertices were generated 389
for the scalp and skull surfaces. We coregistrated the MEG sensor data into the 390
sMRI space for each participant by using their respective fiducials, sensor positions 391
and head-shape points (with nose points removed due to the absence of the nose on 392
the T1-weighted MRI). Using the single shell model, the lead field matrix for each 393
sensor was computed for a dipole at each canonical cortical mesh vertex, oriented 394
normal to the local curvature of the mesh. 395
Source inversion was performed with all conditions pooled together using the 396
‘IID’ solution, equivalent to classical minimum norm, fusing the magnetometer and 397
gradiometer data (Henson et al, 2011). The resulting inversion was then projected 398
onto wavelets spanning frequencies from 1 to 40 Hz and from -150 to 0ms time 399
samples for pre-DP analysis and 400 to 900ms for post-DP analysis. This post-DP 400
time window was defined by overlapping temporal extent of the pseudoword > word 401
cluster between gradiometers and magnetometers. The total energy within these 402
time-frequency windows was summarised by taking the sum of squared amplitudes, 403
which was then written to 3D images in MNI space. 404
Reported effects for source analyses were obtained with a cluster-defining 405
threshold of p < 0.05 (FWE-corrected). And as in sensor space, ROI analyses were 406
conducted over significant sensors and time windows from the orthogonal 407
pseudoword>word cluster. Factorial ANOVA were carried out on main effects and 408
18
one-tailed paired t-tests on planned contrasts (see MEG Data Acquisition and 409
Processing). 410
411
412
413
Results 414
Behaviour 415
Response Times. As shown in Figure 4A and Table 1, factorial analysis of 416
lexicality (word, pseudoword) and prime type (unprimed, primed by same lexical 417
status, primed by different lexical status) indicated a significant main effect of 418
lexicality, in which RTs for pseudowords were significantly longer than for words, 419
X2(3) = 23.60, p < .001. In addition, there was a significant interaction between 420
lexicality and prime type, X2(2) = 10.73, p = .005. This interaction was followed up by 421
two separate one-way models for words and pseudowords, which showed a 422
significant effect of prime type for words, X2(2) = 10.65, p = .005, but not for 423
pseudowords, X2(2) = 1.62, p = .445. Consistent with the competitor priming results 424
from Monsell and Hirsh (1998), words that were primed by another word sharing the 425
same initial segments were recognised significantly more slowly than unprimed 426
words (for mean raw RTs see Fig 3A), β = 0.02, SE = 0.01, t(79.69) = 3.33, p < .001, 427
and more slowly than pseudoword-primed words, β = 0.02, SE = 0.01, t(729.89) = 428
2.37, p = .009. As mentioned earlier (see Introduction), both competitive- and 429
predictive-selection models predicted longer response times to word-primed target 430
words compared to unprimed words, it is hence critical to distinguish the two 431
accounts through further investigation of the MEG responses. 432
19
Accuracy. Figure 4B and Table 2 indicate that there was a trend towards more 433
lexical decision errors in response to words than to pseudowords, although this 434
lexicality effect was marginal, X2(3) = 7.31, p = .063. The error rates for words and 435
pseudowords were also affected differently by priming, as indicated by a significant 436
interaction between lexicality and prime type, X2(2) = 6.08, p = .048. Follow-up 437
analyses using two separate models for each lexicality type showed there was a 438
main effect of prime type for words, X2(2) = 13.95, p < .001, but not for pseudowords, 439
X2(2) = 1.93, p = .381. Since we had not anticipated these priming effects on 440
accuracy, post-hoc pairwise z tests were Bonferroni corrected for multiple 441
comparisons. These showed that pseudoword priming reliably increased the error 442
rate compared to the unprimed condition, β = 1.68, SE = 0.54, z = 3.14, p = .005, 443
and to the word-primed condition, β = 2.74, SE = 0.89, z = 3.07, p = .007. Although 444
no specific predictions on accuracy were made a priori by either competitive- or 445
predictive-selection model, it is worth noting that participants might have expected 446
pseudowords to be repeated given the increased error rate of responses to 447
pseudoword-primed target words. 448
449
MEG 450
In order to explore the impact of lexicality and competitor priming on neural 451
responses to critical portions of speech stimuli, both before and after they diverge 452
from each other, MEG responses were time-locked to the DP. All reported effects 453
are family-wise error (FWE)-corrected at cluster level for multiple comparisons 454
across scalp locations and time at a threshold of p < 0.05. We reported data from 455
gradiometers, magnetometers and source space wherever possible, since sensor x 456
time analyses help define the time-windows used by source localisation. Although 457
20
some minor effects were shown in only one of these analyses, our most interesting 458
effects are reliable in all three data types. 459
Pre-DP analyses. We assessed neural responses before the DP, during 460
which only the shared speech segments have been heard and hence the words and 461
pseudowords in each stimulus set are indistinguishable. Since there could not have 462
been any effect of lexical status pre-DP, only prime type effects were considered in 463
this analysis. Predictive- and competitive-selection accounts make opposite 464
predictions for pre-DP neural signals evoked by word-primed items compared to 465
unprimed items. We therefore conducted an F-test for neural differences between 466
these two conditions across the scalp and source spaces over a time period of -150 467
to 0ms before the DP. A significant cluster of 295 sensor x time points (p = .023) was 468
found in gradiometers over the mid-left scalp locations from -28 to -4ms (Figure 5A, 469
Table 3), in which unprimed items evoked significantly greater neural responses than 470
word-primed items. On the suggestion of a reviewer, and mindful of the potential for 471
these pre-DP neural responses to be modulated by post-DP information, we report 472
an additional analysis with a lengthened analysis time window of -150ms to 100ms. 473
Again, we found a significant unprimed > word-primed cluster of 313 sensor x time 474
points (p = .033) over the exact same locations in gradiometers from -28 to -3ms pre-475
DP, which confirmed that this pre-DP effect was not pushed forward by any post-DP 476
effect. We did not find any cluster showing stronger neural responses for word-477
primed items than unprimed items and no clusters survived correction for multiple 478
comparisons for magnetometer responses or for analysis in source space. 479
To further examine these results, we also conducted ROI analysis of 480
gradiometer signals evoked by unprimed and primed items averaged over the same 481
-150 to 0ms pre-DP time window but across the scalp locations that showed the 482
21
post-DP lexicality effect at which pseudowords elicited greater neural responses than 483
words (see Figure 6A). As shown in Figure 5B and Table 4, the results indicated that 484
unprimed items elicited significantly stronger neural responses than word-primed 485
items, t(21) = 2.41, p = .013, consistent with the whole-brain analysis. In particular, 486
the mid-left cluster shown in panel A partially overlaps with the post-DP 487
pseudoword>word cluster. The direction and location of these pre-DP neural 488
responses are in accordance with the predictive-selection account and inconsistent 489
with the competitive-selection account. A surprising finding is that post-hoc analysis 490
also showed greater neural responses evoked by unprimed items than pseudoword-491
primed items, t(21) = 2.69, p = .014, although we had not predicted these effects 492
from pseudoword primes. 493
Post-DP analyses. We then examined the post-DP response differences 494
between words and pseudowords (lexicality effect). The gradiometer sensors 495
showed a significant cluster of 39335 sensor x time points (p < .001) over the left 496
side of the scalp at 313-956ms post-DP (Figure 6A, Table 3). In this cluster, 497
pseudowords evoked a significantly stronger neural response than words. Similarly, 498
magnetometer sensors also detected a significant left-hemisphere cluster of 68517 499
sensor x time points (p < .001) at 359-990ms post-DP (Figure 6B, Table 3) showing 500
the same lexicality effect. We did not find any significant cluster in which words 501
evoked greater neural responses than pseudowords. These results are consistent 502
with findings from Gagnepain and colleagues (2012). To locate the likely neural 503
source of the effects found in sensor space, we conducted source reconstruction by 504
integrating gradiometers and magnetometers. As shown in Figure 6C, results from 505
source space showed that neural generators of the lexicality effect were estimated to 506
lie within the left superior temporal gyrus (STG, volume of 2315 voxels, p < .001, 507
22
peak at x = -46, y = -36, z = 0; x = -52, y = -34, z = -6; x = -56, y = -28, z = -10). This 508
location, and direction of response, is consistent with a sub-lexical (e.g. phonemic) 509
process being modulated by lexicality; in line with the predictive-selection account. 510
Next, we investigated whether the neural responses that were modulated by 511
lexicality were also influenced by prime type by conducting an ROI analysis which 512
tested the interaction between prime type and lexicality, as well as planned pairwise 513
comparisons of priming effects on words alone, using data averaged over the time 514
window and the sensor locations of the significant cluster shown in panel A and B 515
(Figure 6D & E, Table 5). Since these planned pairwise comparisons involve 516
responses to familiar words only (i.e. words that are word-primed vs unprimed, 517
words that are word-primed vs pseudoword-primed), they are orthogonal to the 518
lexicality effect that defined the pseudoword>word cluster and hence are not 519
confounded by task. The interaction was significant in both gradiometers, F(1.96, 520
41.11) = 7.30, p = .002, and magnetometers, F(1.90, 39.99) = 5.80, p = .007. 521
Specifically, there was a significant effect of prime type for words, F(1.93, 40.55) = 522
8.01, p = .001 (gradiometers), F(1.81, 37.96) = 5.61, p = .009 (magnetometers), such 523
that neural signals evoked by word-primed words were significantly stronger than 524
those evoked by unprimed words, t(21) = 2.22, p = .019 (gradiometers), t(21) = 3.33, 525
p = .002 (magnetometers), and pseudoword-primed words, t(21) = 3.70, p < .001 526
(gradiometers), t(21) = 2.64, p = .008 (magnetometers). In contrast, there was no 527
reliable main effect of prime type for pseudowords, F(1.94, 40.80) = 0.67, p = .514 528
(gradiometers), F(1.79, 37.61) = 0.80, p = .446 (magnetometers). 529
The corresponding tests performed on the source-reconstructed power within 530
the lexicality ROI of suprathreshold voxels (Figure 6F, Table 5) did not show a 531
reliable interaction effect between lexicality and competitor priming, F(1.56, 32.85) = 532
23
0.99, p = .360. Nevertheless, consistent with sensor space results, source power 533
indicated a significant effect of prime type for words, F(1.73, 36.42) = 3.77, p = .038, 534
but not pseudowords, F(1.62, 33.94) = 1.12, p = .326. Pairwise comparisons also 535
indicated that word-primed words evoked significantly greater source strength than 536
unprimed words, t(21) = 2.66, p = .007, though the effect between word-primed and 537
pseudoword-primed words was not significant, t(21) = 1.26, p = .110. Overall, in line 538
with behavioural results, neural responses evoked by words and pseudowords were 539
also influenced differently by prime type. Critically, competitor priming modulated the 540
post-DP neural responses evoked by words, but not those evoked by pseudowords, 541
and these effects were localised to the left STG regions that plausibly contribute to 542
sub-lexical processing of speech. This matches the pattern of responses proposed in 543
the predictive-selection model (see Figure 1F). As shown in Table 1 to 5, the pattern 544
and significance of the results did not change when items with pre-DP acoustic 545
differences identified through the gating post-test were excluded. 546
As encouraged by a reviewer, we also conducted whole brain analyses for the 547
competitor priming effects. We found a significant word-primed word > unprimed 548
word cluster of 1197 sensor x time points (p = .034) in magnetometers in the left 549
hemisphere within a time window of 426 - 466ms post-DP. In addition, we also found 550
a significant and a marginal word-primed word > pseudoword-primed word cluster in 551
gradiometers in the left hemisphere respectively of 527 sensor x time points (p 552
= .011) at 719-749ms and 471 sensor x time points (p = .053) at 315-336ms. These 553
topographies and time courses overlap with the pseudoword > word clusters and are 554
consistent with our ROI results. Hence, the ROI analyses have picked up the most 555
important findings from these whole-brain analyses. 556
24
To ensure that other response patterns were not overlooked, we also 557
investigated whether there was any lexicality by prime-type interaction at other 558
locations across the scalp and source spaces, and during other time periods. As 559
shown in Figure 7A, a significant cluster of Gradiometers at midline posterior scalp 560
locations were found at 397-437ms post-DP, in which the effect of priming was 561
significantly different for words and pseudowords. Figure 7B shows gradiometer 562
signals evoked by conditions of interest averaged over the spatial and temporal 563
extent of the significant cluster in panel A. To explore this profile, we computed an 564
orthogonal contrast to assess the overall lexicality effect (the difference between 565
words and pseudowords), and the result was marginal, F(1.00, 21.00) = 3.50, p 566
= .075. The effect of prime type was marginally significant for words, F(1.89, 39.78) = 567
3.08, p = .060, but significant for pseudowords, F(1.80, 37.85) = 7.14, p = .003. The 568
location and pattern of this interaction cluster were dissimilar to those predicted by 569
either competitive- or predictive-selection theories and no cluster survived correction 570
in magnetometer sensors or source space hence we did not consider this effect to 571
be as relevant or interpretable as our other findings. We report it here in the interest 572
of completeness and transparency. 573
Linking neural and behavioural effects. To further examine the relationship 574
between neural and behavioural response differences attributable to competitor 575
priming or lexicality, we conducted a single-trial regression analyses using linear 576
mixed-effect models that account for random intercepts and slopes for participants 577
and stimuli sets (grouped by their initial segments). We calculated behavioural RT 578
differences and neural MEG differences caused by: (1) lexicality. i.e. the difference 579
between pseudoword and word trials (collapsed over primed and unprimed 580
conditions) and (2) competitor priming, i.e. the difference between unprimed and 581
25
word-primed word trials, with MEG signals averaged over the spatial and temporal 582
extent of the post-DP pseudoword>word cluster seen in sensor space and the STG 583
peak voxel in source space (see Figure 6). We then assessed the relationship 584
between these behavioural and neural difference effects in linear mixed-effect 585
regression of single trials, with differences in RTs as the independent variable and 586
differences in MEG responses as the dependent variable. The analyses were 587
conducted using the lme4 package in R (Bates et al. 2014). 588
As shown in Figure 8A, we observed a significant positive relationship 589
between RTs and magnetometers on lexicality difference (β = 0.11, SE = 0.01, 590
t(23.31) = 7.77, p < .001), although associations between RTs and gradiometers or 591
source response were not significant. These observations from magnetometers 592
indicated that slower lexical decision times evoked by pseudowords were associated 593
with greater neural responses. Furthermore, the intercept parameter for the 594
magnetometers model was significantly larger than zero, β = 37.58, SE = 5.72, 595
t(23.09) = 6.57, p < .001. We can interpret this intercept as the neural difference that 596
would be predicted for trials in which there was no delayed response to 597
pseudowords compared to words. The significant intercept indicated a baseline 598
difference in neural responses to words and pseudowords, even in the absence of 599
any difference in processing effort (as indexed by lexical decision RTs). This 600
suggested the engagement of additional neural processes specific to pseudowords 601
regardless of the behavioural effect (cf. Taylor et al., 2014). 602
Figure 8B showed another significant positive relationship between RTs and 603
magnetometers on competitor priming difference (β = 0.15, SE = 0.02, t(38.85) = 604
7.89, p < .001), while relationships between RTs and gradiometers or source 605
response were again not significant. Interestingly, unlike for the lexicality effect, the 606
26
intercept in this competitor priming magnetometers model did not reach significance 607
(β = 12.88, SE = 7.27, t(21.33) = 1.77, p = .091). This non-significant intercept might 608
suggest that if word-primed words did not evoke longer RTs than unprimed words, 609
magnetometer signals would not be reliably different between the two conditions 610
either. Hence, consistent with predictive-selection accounts, the increased post-DP 611
neural responses in the STG caused by competitor priming was both positively 612
linked to and mediated by longer response times. 613
614
615
Discussion 616
In this study, we distinguished different implementations of Bayesian perceptual 617
inference by manipulating the prior probability of spoken words and examining 618
changes to neural responses. We replicated the competitor priming effect such that a 619
single prior presentation of a competitor word (e.g. hijack) delayed the recognition of 620
a similar-sounding word (e.g. hygiene), whereas this effect was not observed when 621
the prime or target was a pseudoword (e.g. hijure). Armed with this behavioural 622
evidence, we used MEG data to test the neural bases of two Bayesian theories of 623
spoken word recognition. 624
625
Competitive- vs predictive-selection 626
Competitive-selection accounts propose that word recognition is achieved through 627
direct inhibitory connections between representations of similar candidates (e.g. 628
McClelland & Elman, 1986). Priming boosts the activation of heard words and 629
increases lateral inhibition applied to neighbouring words, which delays their 630
27
subsequent identification. The effect of competitor priming is to increase lexical 631
uncertainty, and hence lexical-level neural responses, until later time points when 632
target words can be distinguished from the competitor prime (Figure 1C). In contrast, 633
predictive-selection accounts propose that word recognition is achieved by 634
subtracting predicted speech from heard speech and using computations of 635
prediction error to update lexical probabilities (Davis & Sohoglu, 2020). By this view, 636
predictions for segments that are shared between competitor primes and targets 637
(pre-DP segments) will be enhanced after presentation of prime words. Thus, 638
competitor priming will reduce the magnitude of prediction error, and hence neural 639
responses pre-DP (Figure 1F). Only when speech diverges from predictions (post-640
DP segments) will competitor-primed words evoke greater prediction error, leading to 641
increased neural response in brain areas involved in pre-lexical (e.g. phonemic) 642
processing of speech representing prediction error (Blank et al., 2018; Blank & Davis, 643
2016). Both these models involve multiple levels of representation and hence both 644
sub-lexical and lexical processes. However, our focus is primarily on lexical 645
processing within the competitive-selection framework and sub-lexical processing 646
within the predictive-selection framework. These are the critical processing levels 647
that 1) support word recognition, 2) are modulated by competitor priming effect and 3) 648
can potentially explain how slower behavioural responses will manifest in MEG 649
responses. 650
The direction and timing of changes to MEG responses associated with 651
competitor priming showed opposite effects pre- and post-DP. In the pre-DP period, 652
consistent with predictive-selection but contrary to competitive-selection mechanisms, 653
we saw decreased neural responses for word-primed items compared to unprimed 654
items. The initial, shared segments between prime (hijack) and target (hygiene) 655
28
words evoked a reduced response during early time periods in line with a reduction 656
in prediction error. However, during the post-DP period, competitor-primed words 657
evoked stronger neural responses than unprimed words in exactly the same 658
locations and time periods that showed increased responses to pseudowords (hijure) 659
compared to words. These post-DP response increases are in line with enhanced 660
processing difficulty for competitor-primed words and pseudowords due to greater 661
prediction error. Thus, the time course of the competitor priming neural effects – 662
showing reduced neural responses pre-DP and increased neural responses post-DP 663
– closely resembles the expected changes in prediction error (Figure 1F) based on 664
predictive-selection mechanisms. However, we note that post-DP effects reach 665
significance later than mismatch effects for written words (Dikker et al., 2010), 666
lexicality effects for spoken words (MacGregor et al., 2012), and phoneme surprisal 667
effects in connected speech (Brodbeck et al., 2018; Donhauser & Baillet, 2020). This 668
delay could be due to our morphing manipulation which removed coarticulation 669
before the divergence point, or due to neural effects being delayed for words in 670
isolation compared to connected speech (see Gwilliams & Davis, in press for review). 671
Further research to assess the latency of neural effects can help determine whether 672
they are sufficiently early to indicate bottom-up sensory signals as proposed by 673
predictive selection. 674
Effects of lexicality and competitor priming localised to the left STG; this brain 675
region has long been associated with lower-level sensory processing of speech (Yi 676
et al., 2019). Our observation of increased responses to pseudowords in STG 677
agrees with source-localised MEG findings (Gagnepain et al., 2012; Shtyrov et al., 678
2012) and a meta-analysis of PET and fMRI studies (Davis & Gaskell, 2009). This 679
location is also consistent with the proposal that lexical influences on segment-level 680
29
computations produce reliable neural differences between words and pseudowords 681
(Davis & Sohoglu, 2020). We take this localisation as further evidence in favour of 682
computations of segment prediction error as a critical mechanism underlying word 683
identification. 684
We further show using regression analyses that neural (MEG) and behavioural 685
(RT) effects of lexicality and competitor priming are linked on a trial-by-trial basis. 686
Trials in which pseudoword processing or competitor priming leads to larger 687
increases in RT also have greater post-DP neural responses. Links between 688
behavioural and neural effects of lexicality and competitor priming are once more in-689
line with the proposal that post-DP increases in prediction error are a key neural 690
mechanism for word and pseudoword processing and explain delayed behavioural 691
responses seen in competitor priming. Interestingly, lexicality and competitor priming 692
effects differ in terms of whether a reliable neural response difference would be seen 693
for trials with no baseline RT difference. While neural lexicality effects were 694
significant even for trials that did not show behavioural effects, the same was not 695
true for the competitor priming effect. These results indicate that, consistent with 696
predictive-selection accounts, the post-DP neural effect of competitor priming was 697
mediated by changes in behavioural RTs. In contrast, an increased neural response 698
to pseudowords was expected even in trials for which RTs did not differ between 699
pseudowords and words. We next consider the implications of these and other 700
findings for pseudoword processing. 701
How do listeners process pseudowords? 702
Participants identified pseudowords with a speed and accuracy similar to that seen 703
during recognition of familiar words. This is consistent with an optimally-efficient 704
30
language processing system (Marslen-Wilson, 1984; Zhuang et al, 2014), in which 705
pseudowords can be distinguished from real words as soon as deviating speech 706
segments are heard. Beyond this well-established behavioural finding, however, we 707
reported two seemingly contradictory observations concerning pseudoword 708
processing. 709
The first is that, while post-DP neural activity and response times for words 710
were modulated by competitor priming, processing of pseudowords was not similarly 711
affected. This might suggest that the prior probability of hearing a pseudoword and 712
the prediction error elicited by mismatching segments are not changed by our 713
experimental manipulations. This may be because pseudowords have a low or zero 714
prior probability and elicit maximal prediction errors that cannot be modified by a 715
single prime. Yet, memory studies suggest that even a single presentation of a 716
pseudoword can be sufficient for listeners to establish a lasting memory trace 717
(Mckone & Trynes, 1999; Arndt et al., 2008). However, it is possible that this memory 718
for pseudowords reflects a different type of memory (e.g. episodic memory) from that 719
produced by a word, with only the latter able to temporarily modify long-term, lexical-720
level representations and predictions for word speech segments (cf. Complementary 721
Learning Systems theories, McClelland et al., 1995; Davis & Gaskell, 2009). 722
A second observation is that, contrary to the null result for post-DP processing, 723
pseudoword priming reduced subsequent pre-DP neural responses evoked by target 724
items to a similar degree as word priming (Figure 5B). This pre-DP effect is 725
surprising given previous evidence suggesting that pseudowords must be encoded 726
into memory and subject to overnight, sleep-associated consolidation in order to 727
modulate the speed of lexical processing (Tamminen et al., 2010; James et al., 2017) 728
or neural responses (Davis & Gaskell, 2009; Landi et al. 2018). It might be that 729
31
neural effects seen for these pre-DP segments were due to changes to the 730
representation of familiar words that our pseudowords resembled, though these were 731
insufficient to modulate processing of post-DP segments. 732
733
Summary 734
Our work provides compelling evidence in favour of neural computations of 735
prediction error during spoken word recognition. Although previous work by 736
Gagnepain et al. (2012) provided evidence for the predictive-selection account, their 737
behavioural effects of consolidation on word recognition were obtained during 738
different tasks and different test sessions from neural responses. Our current study 739
goes beyond this previous work by adopting a single task (lexical decision) and using 740
a competitor priming paradigm that permits concurrent measurement of perceptual 741
outcomes and neural responses in a single session. This enables us to directly link 742
trials that evoked stronger neural signals in the STG to delayed RTs and hence 743
provide stronger evidence that both of these effects are caused by competitor 744
priming. 745
In addition, unlike previous work (Brodbeck et al. 2018; Donhauser & Baillet, 746
2020) which reported neural responses correlated with lexical entropy as well as 747
prediction error (surprisal), we did not find similarly equivocal evidence. These earlier 748
studies measured neural responses to familiar words in continuous speech 749
sequences such as stories or talks. It might be that effects of lexical entropy are 750
more apparent for connected speech than isolated words. However, since lexical 751
uncertainty (entropy) and segment-level predictability (segment prediction error or 752
surprisal) are highly correlated in natural continuous speech, these studies may be 753
32
less able to distinguish between the lexical and segmental mechanisms that we 754
assessed here. In contrast, our speech materials were carefully selected to change 755
lexical probability (through priming) and for priming to have opposite effects on 756
segment prediction error before and after DP. This manipulation provides evidence 757
in favour of predictive-selection mechanisms that operate using computations of 758
prediction error during spoken word recognition. 759
33
References 760
Aitchison, L., & Lengyel, M. (2017). With or without you: predictive coding and 761
Bayesian inference in the brain. Current opinion in neurobiology, 46, 219-227. 762
Arndt, J., Lee, K., & Flora, D. B. (2008). Recognition without identification for words, 763
pseudowords and nonwords. Journal of memory and language, 59(3), 346-360. 764
Baayen, R. H., Piepenbrock, R., & van H, R. (1993). The {CELEX} lexical data base 765
on {CD-ROM}. 766
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure 767
for confirmatory hypothesis testing: Keep it maximal. Journal of memory and 768
language, 68(3), 255-278. 769
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects 770
models using lme4. arXiv preprint arXiv:1406.5823. 771
Blank, H., Spangenberg, M., & Davis, M. H. (2018). Neural prediction errors 772
distinguish perception and misperception of speech. Journal of 773
Neuroscience, 38(27), 6076-6089. 774
Blank, H., & Davis, M. H. (2016). Prediction errors but not sharpened signals 775
simulate multivoxel fMRI patterns during speech perception. PLoS 776
biology, 14(11), e1002577. 777
Bozic, M., Tyler, L. K., Ives, D. T., Randall, B., & Marslen-Wilson, W. D. (2010). 778
Bihemispheric foundations for human speech comprehension. Proceedings of 779
the National Academy of Sciences, 107(40), 17439-17444. 780
Brodbeck, C., Hong, L. E., & Simon, J. Z. (2018). Rapid transformation from auditory 781
to linguistic representations of continuous speech. Current Biology, 28(24), 782
3976-3983. 783
34
Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). How many words do 784
we know? Practical estimates of vocabulary size dependent on word definition, 785
the degree of language input and the participant’s age. Frontiers in Psychology, 786
7(JUL), 1–11. 787
Bürkner, P. C. (2017). brms: An R package for Bayesian multilevel models using 788
Stan. Journal of statistical software, 80(1), 1-28. 789
Cousineau, D. (2005). Confidence intervals in within-participant designs: A simpler 790
solution to Loftus and Masson’s method. Tutorials in quantitative methods for 791
psychology, 1(1), 42-45. 792
Dahan, D., Magnuson, J. S., & Tanenhaus, M. K. (2001). Time course of frequency 793
effects in spoken-word recognition: Evidence from eye movements. Cognitive 794
psychology, 42(4), 317-367. 795
Davis, M.H. (2015). The Neurobiology of Lexical Access. In G. Hickok & S. L. Small 796
(Eds.), Neurobiology of language (pp. 541-555). Academic Press. 797
Davis, M. H., & Gaskell, M. G. (2009). A complementary systems account of word 798
learning: neural and behavioural evidence. Philosophical Transactions of the 799
Royal Society B: Biological Sciences, 364(1536), 3773-3800. 800
Davis, M. H., Marslen-Wilson, W. D., & Gaskell, M. G. (2002). Leading up the lexical 801
garden path: Segmentation and ambiguity in spoken word recognition. Journal 802
of Experimental Psychology: Human Perception and Performance, 28(1), 218. 803
Davis, M.H., Scharenborg, O. (2016) Speech perception by humans and machines. 804
In Gaskell, G. & Mirkovic J. (Eds) Speech Perception and Spoken Word 805
Recognition. Psychology Press. 806
35
Davis, M. H. & Sohoglu E. (2020) Three Functions of Prediction Error for Bayesian 807
Inference in Speech Perception. In: Gazzaniga, M., Mangun R., & Poeppel D. 808
(Eds), The Cognitive Neurosciences, 6th Edition. MIT Press, Camb, MA, USA. 809
de Leeuw, J. R. (2015). jsPsych: A JavaScript library for creating behavioral 810
experiments in a web browser. Behavior Research Methods, 47(1), 1-12. 811
doi:10.3758/s13428-014-0458-y. 812
Dikker, S., Rabagliati, H., Farmer, T. A., & Pylkkänen, L. (2010). Early occipital 813
sensitivity to syntactic category is based on form typicality. Psychological 814
Science, 21(5), 629-634. 815
Donhauser, P. W., & Baillet, S. (2020). Two Distinct Neural Timescales for Predictive 816
Speech Processing. Neuron. 817
Eberhard, K. M. (1994). Phonological inhibition in auditory word recognition. In D. 818
Dagenbach & T. H. Carr (Eds.), Inhibitory processes in attention memory and 819
language. San Diego, CA: Academic Press. 820
Gagnepain, P., Henson, R. N., & Davis, M. H. (2012). Temporal predictive codes for 821
spoken words in auditory cortex. Current Biology, 22(7), 615-621. 822
Gwilliams, L.E., Davis, M.H. (in press). Extracting language content from speech 823
sounds: the information theoretic approach. in Holt & Peelle (eds) Auditory 824
Cognitive Neuroscience of Speech Perception: Springer Handbook of Auditory 825
Research, Vol 74. Springer. ⟨hal-03013496⟩ 826
Henson, R.N., Wakeman, D.G., Litvak, V. & Friston, K.J. (2011). A Parametric 827
Empirical Bayesian framework for the EEG/MEG inverse problem: generative 828
models for multiparticipant and multimodal integration. Frontiers in Human 829
Neuroscience, 5, 76, 1-16. 830
36
James, E., Gaskell, M. G., Weighall, A., & Henderson, L. (2017). Consolidation of 831
vocabulary during sleep: The rich get richer?. Neuroscience & Biobehavioral 832
Reviews, 77, 1-13. 833
Kawahara, H. (2006). STRAIGHT, exploitation of the other aspect of VOCODER: 834
Perceptually isomorphic decomposition of speech sounds. Acoustical science 835
and technology, 27(6), 349-353. 836
Kilner, J. M., & Friston, K. J. (2010). Topological inference for EEG and MEG. The 837
Annals of Applied Statistics, 1272-1290. 838
Kisler, T. and Reichel U. D. and Schiel, F. (2017): Multilingual processing of speech 839
via web services, Computer Speech & Language, Volume 45, September 840
2017, pages 326–347. 841
Kleiner, M., Brainard, D., & Pelli, D. (2007). What's new in Psychtoolbox-3?. 842
Landi, N., Malins, J. G., Frost, S. J., Magnuson, J. S., Molfese, P., Ryherd, K., ... & 843
Pugh, K. R. (2018). Neural representations for newly learned words are 844
modulated by overnight consolidation, reading skill, and 845
age. Neuropsychologia, 111, 133-144. 846
Litvak, V., Mattout, J., Kiebel, S., Phillips, C., Henson, R., Kilner, J., ... & Penny, W. 847
(2011). EEG and MEG data analysis in SPM8. Computational intelligence and 848
neuroscience, 2011. 849
MacGregor, L. J., Pulvermüller, F., Van Casteren, M., & Shtyrov, Y. (2012). Ultra-850
rapid access to words in the brain. Nature communications, 3(1), 1-7. 851
Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human performance in 852
psycholinguistic tasks with models of semantic similarity based on prediction 853
and counting: A review and empirical validation. Journal of Memory and 854
Language, 92, 57-78. 855
37
Marslen-Wilson WD. 1984. Function and process in spoken word rec- ognition. In: 856
Bouma H, Bouwhuis DG, editors. Attention and per- formance X: control of 857
language processes. Hillsdale (NJ): Erlbaum. p. 125–150. 858
Marsolek, C. J. (2008). What antipriming reveals about priming. Trends in Cognitive 859
Sciences, 12(5), 176-181. 860
Mattout, J., Henson, R. N., and Friston, K. J. (2007). Canonical source 861
reconstruction for MEG. Comp. Int. Neurosci. 2007, 67613. 862
McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech 863
perception. Cognitive psychology, 18(1), 1-86. 864
McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are 865
complementary learning systems in the hippocampus and neocortex: Insights 866
from the successes and failures of connectionist models of learning and 867
memory. Psychological Review, 102, 419–457. 868
Mckone, E., & Trynes, K. (1999). Acquisition of novel traces in short-term implicit 869
memory: Priming for nonwords and new associations. Memory & 870
cognition, 27(4), 619-632. 871
Monsell, S., & Hirsh, K. W. (1998). Competitor priming in spoken word 872
recognition. Journal of Experimental Psychology: Learning, Memory, and 873
Cognition, 24(6), 1495. 874
Norris, D., & McQueen, J. M. (2008). Shortlist B: a Bayesian model of continuous 875
speech recognition. Psychological review, 115(2), 357. 876
Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech 877
recognition: Feedback is never necessary. Behavioral and Brain 878
Sciences, 23(3), 299-325. 879
38
Norris, D., McQueen, J. M., & Cutler, A. (2002). Bias effects in facilitatory 880
phonological priming. Memory & Cognition, 30(3), 399-411. 881
Okada, K., & Hickok, G. (2006). Identification of lexical–phonological networks in the 882
superior temporal sulcus using functional magnetic resonance 883
imaging. Neuroreport, 17(12), 1293-1296. 884
Prabhakaran, R., Blumstein, S. E., Myers, E. B., Hutchison, E., & Britton, B. (2006). 885
An event-related fMRI investigation of phonological–lexical 886
competition. Neuropsychologia, 44(12), 2209-2221. 887
Schiel, F. (1999). Automatic Phonetic Transcription of Non-Prompted Speech. In 888
Proc. of the ICPhS (pp. 607-610). 889
Shtyrov, Y., Smith, M. L., Horner, A. J., Henson, R., Nathan, P. J., Bullmore, E. T., & 890
Pulvermüller, F. (2012). Attention to language: novel MEG paradigm for 891
registering involuntary language processing in the 892
brain. Neuropsychologia, 50(11), 2605-2616. 893
Sohoglu, E., & Davis, M. H. (2016). Perceptual learning of degraded speech by 894
minimizing prediction error. Proceedings of the National Academy of 895
Sciences, 113(12), E1747-E1756. 896
Sohoglu, E., Peelle, J. E., Carlyon, R. P., & Davis, M. H. (2012). Predictive top-down 897
integration of prior knowledge during speech perception. Journal of 898
Neuroscience, 32(25), 8443-8453. 899
Spratling, M. W. (2008). Reconciling predictive coding and biased competition 900
models of cortical function. Frontiers in computational neuroscience, 2, 4. 901
Tamminen, J., Payne, J. D., Stickgold, R., Wamsley, E. J., & Gaskell, M. G. (2010). 902
Sleep spindle activity is associated with the integration of new memories and 903
existing knowledge. Journal of Neuroscience, 30(43), 14356-14360. 904
39
Tauroza, S., & Allison, D. (1990). Speech rates in british english. Applied 905
linguistics, 11(1), 90-105. 906
van Casteren, M., & Davis, M. H. (2006). Mix, a program for 907
pseudorandomization. Behavior research methods, 38(4), 584-589. 908
Taylor, J. S. H., Rastle, K., & Davis, M. H. (2013). Can cognitive models explain 909
brain activation during word and pseudoword reading? A meta-analysis of 36 910
neuroimaging studies. Psychological bulletin, 139(4), 766. 911
Taylor, J. S. H., Rastle, K., & Davis, M. H. (2014). Interpreting response time effects 912
in functional imaging studies. Neuroimage, 99, 419-433. 913
van Heuven, W. J., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-914
UK: A new and improved word frequency database for British 915
English. Quarterly Journal of Experimental Psychology, 67(6), 1176-1190. 916
Wagenmakers, E. J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian 917
hypothesis testing for psychologists: A tutorial on the Savage–Dickey 918
method. Cognitive psychology, 60(3), 158-189. 919
Yi, H. G., Leonard, M. K., & Chang, E. F. (2019). The encoding of speech sounds in 920
the superior temporal gyrus. Neuron, 102(6), 1096-1110. 921
Ylinen, S., Bosseler, A., Junttila, K., & Huotilainen, M. (2017). Predictive coding 922
accelerates word recognition and learning in the early stages of language 923
development. Developmental science, 20(6), e12472. 924
Zhuang, J., Randall, B., Stamatakis, E. A., Marslen-Wilson, W. D., & Tyler, L. K. 925
(2011). The interaction of lexical semantics and cohort competition in spoken 926
word recognition: an fMRI study. Journal of Cognitive Neuroscience, 23(12), 927
3778-3790. 928
40
Zhuang, J., Tyler, L. K., Randall, B., Stamatakis, E. A., & Marslen-Wilson, W. D. 929
(2014). Optimally efficient neural systems for processing spoken 930
language. Cerebral Cortex, 24(4), 908-918. 931
932
933
41
Tables and Legends 934
Table 1. Behavioural RTs analyses on all data versus data excluding items with 935
pre-DP acoustic differences (identified in the gating post-test). Reported 936
pairwise effects (planned) are one-tailed. 937
All data Data with exclusion
Contrast X2 t p X2 t p
Lexicality 23.60 <.001 28.87 <.001
Lexicality-by-prime type 10.73 .005 8.52 .014
Word prime type 10.65 .005 8.57 .014
Word-word > word 3.33 <.001 3.00 .002
Word-word > pseudo-word 2.37 .009 2.30 .011
Pseudo prime type 1.62 .445 0.65 .720
Word-word, word-primed word; pseudo-word, pseudoword-primed word. 938
939
Table 2. Behavioural accuracy analyses on all data versus data excluding 940
items with pre-DP acoustic differences. Reported pairwise effects are 941
Bonferroni corrected. 942
All data Data with exclusion
Contrast X2 z p X2 z p
Lexicality 7.31 .063 6.40 .094
Lexicality-by-prime type 6.08 .048 6.98 .031
Word prime type 13.95 <.001 14.97 <.001
Pseudo-word > word 3.14 .005 3.03 .007
Pseudo-word > word-word 3.07 .007 3.05 .007
Pseudo prime type 1.93 .381 3.16 .206
42
Word-word, word-primed word; pseudo-word, pseudoword-primed word. 943
944
Table 3. Pre-DP MEG analyses of unprimed > word-primed items and Post-DP 945
MEG analyses of pseudoword > word on all data versus data excluding items 946
with pre-DP acoustic differences. Reported effects are FWE corrected at 947
cluster level at p < 0.05. 948
All data Data with exclusion
Time
window Modality
Cluster
PFWE-corr
Cluster
size
Latency
(ms)
Cluster
PFWE-corr
Cluster
size
Latency
(ms)
Pre-DP Grad .023 295 -28 to -4 .005 426 -25 to 0
Post-DP Grad <.001 39335 313 to 956 <.001 30811 320 to 775
Mag <.001 68517 359 to 990 <.001 69777 362 to 988
Source <.001 2315 400 to 900 <.001 2287 400 to 900
Grad, gradiometers; Mag, magnetometers. 949
950
Table 4. Pre-DP MEG ROI analyses on all data versus data excluding items 951
with pre-DP acoustic differences across gradiometer sensor locations that 952
showed post-DP pseudoword > word effect. Reported effects on unprimed > 953
word-primed items (planned) are one-tailed. 954
All data Data with exclusion
Contrast t p t p
Unprimed > word-primed 2.41 .013 2.57 .009
Unprimed > pseudo-primed 2.69 .014 3.14 .005
Unprimed, unprimed items; Word-primed, word-primed items; pseudo-primed, pseudoword-primed items. 955
956
43
Table 5. Post-DP MEG ROI analyses on all data versus data excluding items 957
with pre-DP acoustic differences. Reported pairwise effects (planned) are one-958
tailed. 959
All data Data with exclusion
Contrast Modality F t p F t p
Lexicality-by-prime type Grad 7.30 .002 6.12 .005
Mag 5.80 .007 3.77 .035
Source 0.99 .360 1.04 .354
Word prime type Grad 8.01 .001 6.18 .005
Mag 5.61 .009 4.46 .021
Source 3.77 .038 3.64 .039
Word-word > word Grad 2.22 .019 2.11 .023
Mag 3.33 .002 2.79 .006
Source 2.66 .007 2.51 .010
Word-word > pseudo-word Grad 3.70 <.001 3.60 <.001
Mag 2.64 .008 2.33 .015
Source 1.26 .110 1.39 .089
Pseudo prime type Grad 0.67 .514 0.57 .564
Mag 0.80 .446 0.37 .681
Source 1.12 .326 1.23 .300
Word-word, word-primed word; pseudo-word, pseudoword-primed word. Grad, gradiometers; Mag, magnetometers. 960
961
44
Figure Legends 962
Figure 1. Illustration of neural predictions based on competitive-selection and 963
predictive-selection models respectively for recognition of a word (hygiene) or 964
pseudoword (hijure) that is unprimed or primed by a similar-sounding word (hijack) or 965
pseudoword (higent). A. In a competitive-selection model, such as TRACE 966
(McClelland & Elman, 1986), word recognition is achieved through within-layer 967
lexical competition. B. Illustration of the competitive-selection procedure for word 968
(e.g. hygiene) and pseudoword (e.g. hijure) recognition. Phoneme input triggers the 969
activation of multiple words beginning with the same segments, which compete with 970
each other until one word is selected. No word can be selected when hearing a 971
pseudoword, though it would be expected that lexical probability (although not lexical 972
entropy) should be greater for words than for pseudowords. C. Illustration of neural 973
predictions based on lexical entropy. Lexical entropy gradually reduces to zero as 974
more speech is heard. Before the deviation point (hereafter DP) at which the prime 975
(hijack) and target (hygiene) diverge, these items are indistinguishable, and 976
competitor priming should transiently increase lexical entropy (shaded area). After 977
the DP, competitor priming should not affect entropy since prime and target words 978
can be distinguished. D. In a predictive-selection model such as the Predictive-979
Coding account (PC, Davis & Sohoglu, 2020), words are recognised by minimising 980
prediction error, which is calculated by subtracting the predicted segments from the 981
current sensory input. E. Illustration of the predictive-selection procedure during word 982
(e.g. hygiene) and pseudoword (e.g. hijure) recognition. Speech input evokes 983
predictions for the next segment (based on word knowledge as in B), which is then 984
subtracted from the speech input and used to generate prediction errors that update 985
lexical predictions (+ shows confirmed predictions that increase lexical probability, - 986
45
shows disconfirmed predictions that decrease lexical probability). F. Illustration of 987
neural predictions based on segment prediction error. Before the DP, priming of 988
initial word segments should strengthen predictions and reduce prediction error. 989
There will also be greater mismatch between predictions and heard speech for 990
competitor-primed words and hence primed words should evoke greater prediction 991
error than unprimed words (shaded area). This increased prediction error should still 992
be less than that observed for pseudowords, which should evoke maximal prediction 993
error regardless of competitor priming due to their post-DP segments being entirely 994
unpredictable. 995
996
Figure 2. Experimental design. A. Four different types of prime-target pairs. Each 997
pair was formed by two stimuli from the same quadruplet, separated by between 20 998
to 76 trials of items that do not share the same initial segments. B. Lexical decision 999
task. Participants made lexicality judgments to each item they heard via left hand 1000
button-press. The response time was recorded from the onset of the stimuli. As 1001
shown, items within each quadruplet are repeated after a delay of ~1-4 minutes 1002
following a number of other intervening stimuli. 1003
1004
Figure 3. Stimuli and post-test gating study results. A. Stimuli within the same 1005
quadruplet have identical onsets in STRAIGHT parameter space (Kawahara, 2006) 1006
and thus only diverge from each other after the deviation point (DP). MEG responses 1007
were time-locked to the DP. B. Stimuli length histogram. C. Bayes factor for chance 1008
level accuracy (BF01) at each post-DP alignment point of the stimuli in the post-test 1009
gating study. D. Mean rating score at each post-DP alignment point of the stimuli in 1010
the gating study. 1011
46
1012
Figure 4. Response time results (A) and accuracy results (B) of the lexical decision 1013
task. Bars are color-coded by lexicality and prime type on the x axis (words, blue 1014
frame; pseudowords, orange frame; unprimed, no fill; primed by same lexicality, 1015
consistent fill and frame colors; primed by different lexicality, inconsistent fill and 1016
frame colors). Bars show the subject grand averages, error bars represent ± within-1017
subject SE, adjusted to remove between-subjects variance (Cousineau, 2005). 1018
Statistical significance is shown based on generalised linear mixed-effects 1019
regression: * p<0.05, ** p<0.01, *** p<0.001. Statistical comparisons shown with 1020
solid lines indicate the lexicality by prime-type interaction and main effects of prime-1021
type for each lexicality, whereas comparisons with broken lines indicate the 1022
significance of pairwise comparisons. 1023
1024
Figure 5. Pre-DP results. A & B. Pre-DP response difference between items that are 1025
unprimed and primed by a word in MEG gradiometer sensors within -150 to 0ms (a 1026
time window at which words and pseudowords are indistinguishable). The 1027
topographic plots show F-statistics for the entire sensor array with the scalp locations 1028
that form two statistically significant clusters highlighted and marked with black dots. 1029
Waveforms represent MEG response averaged over the spatial extent of the 1030
significant cluster shown in the topography. The grey shade of waveforms represents 1031
± within-participant SE, adjusted to remove between-participants variance 1032
(Cousineau, 2005). C. ROI analysis of neural responses evoked by unprimed and 1033
primed items averaged over the same pre-DP time period of -150-0ms but across 1034
gradiometer sensor locations which showed the post-DP pseudoword>word lexicality 1035
effect (see Figure 5A). Bars are color-coded by prime type on the x axis (unprimed 1036
47
items, no fill; word-primed items, blue; pseudoword-primed items, orange; black 1037
frame indicates that words and pseudowords are indistinguishable). All error bars 1038
represent ± within-participant SE, adjusted to remove between-participant variance. 1039
Statistical significance: * p<0.05. 1040
1041
Figure 6. Post-DP results showing lexicality effects and corresponding ROI 1042
responses evoked by conditions of interest. A & B. Post-DP lexicality effects in MEG 1043
gradiometer and magnetometer sensors. The topographic plots show the statistically 1044
significant cluster with a main effect of lexicality (pseudoword > word). Waveforms 1045
represent MEG response averaged over the spatial extent of the significant cluster 1046
shown in the topography. The grey shade of waveforms represents ± within-1047
participant SE, adjusted to remove between-participants variance. C. Statistical 1048
parametric map showing the cluster (pseudoword > word) rendered onto an inflated 1049
cortical surface of the Montreal Neurological Institute (MNI) standard brain 1050
thresholded at FWE-corrected cluster-level p < 0.05, localised to the left STG. D, E & 1051
F. Post-DP ROI ANOVA on neural signals and source strength evoked by conditions 1052
of interest averaged over the time window and scalp locations of the significant 1053
cluster shown in panel A, B & C. Bars are color-coded by lexicality and prime type on 1054
the x axis (words, blue frame; pseudowords, orange frame; unprimed, no fill; primed 1055
by same lexicality, consistent fill and frame colors; primed by different lexicality, 1056
inconsistent fill and frame colors). All error bars represent ± within-participant SE, 1057
adjusted to remove between-participants variance. Statistical significance from 1058
ANOVAs: * p<0.05, ** p<0.01, *** p<0.001. Statistical comparisons shown with solid 1059
lines indicate the lexicality by prime-type interaction and main effects of prime-type 1060
48
for each lexicality, whereas comparisons with broken lines indicate the significance 1061
of planned pairwise comparisons. 1062
1063
Figure 7. Post-DP results showing lexicality-by-priming interaction effects in MEG 1064
gradiometers. A. The topographic plot shows F-statistics for the statistically 1065
significant cluster that showed an interaction between lexicality and prime type. 1066
Waveforms represent gradiometer responses averaged over the spatial extent of the 1067
significant cluster shown in the topography. The grey shade of waveforms represents 1068
± within-participant SE, adjusted to remove between-participants variance. B. 1069
Gradiometer signals evoked by conditions of interest averaged over temporal and 1070
spatial extent of the significant cluster in panel A. All error bars represent ± within-1071
participant SE, adjusted to remove between-participants variance. Statistical 1072
significance: ** p<0.01. The statistical comparison lines indicate main effects of 1073
prime type for each lexicality. The lexicality by prime-type interaction is statistically 1074
reliable as expected based on the defined cluster. 1075
1076
Figure 8. Single-trial linear mixed-effect models which accounted for random 1077
intercepts and slopes for participants and stimuli sets (grouped by initial segments) 1078
were constructed to compute the relationship between RTs and magnetometers on 1079
(A) lexicality difference (i.e. between pseudowords and words, collapsed over 1080
unprimed and primed conditions) and (B) competitor priming difference (i.e. between 1081
word-primed word and unprimed word conditions). Magnetometer responses were 1082
averaged over the time window and scalp locations of the significant post-DP 1083
pseudoword>word cluster (see Figure 6). β1 refers to the model slope, β0 refers to 1084
the model intercept. Statistical significance: *** p<0.001. 1085
49
1086
1087
Phonemes
AcousticFeatures
PredictiveSelection
(PC)
CompetitiveSelection(TRACE)
Phoneme Input: /ai/ /i:/ /n/
/æ/ -/e/ -… -/ai/ +… -/ / -… -/ / -
/d/ -… -/f/ -/k/ -… -
/d / +… -
/n/ +
DP
/n//i://d //ai//h/
/n//i:// //ai//h/Pre-DP
C
E
F
Lexi
cal E
ntro
py
Pred
ictio
n Er
ror
A
/æ/ -/i:/ +
Word
Phonemes
AcousticFeatures
Words
D
(1)
(1)
(1)
Words
LexicalCompetition
Top DownPredictions
PredictionError
Pre-DP
Post-DP
Post-DP
/ai/ /i:/ habithack…healthhelp…hijack…hobby…hygiene…
hide…high…hijackhike…hybridhydrate…hygiene…
hygienehijack
hygiene hygiene
DP
BWord(1)
Competitor Priming
Competitor Priming
/h/
/h/ /n/
DP
DP
WordUnprimed
Primed by Word hijack
Primed by Pseudoword higent
WordUnprimed
Primed by Word hijack
Primed by Pseudoword higent
High
Low
Prediction Error:
/h/ /j/ / /
? -/æ/ -/i:/ -
DP
/h/ /ai/ habithack…healthhelp…hijack…hobby…hygiene…
hide…high…hijackhike…hybridhydrate…hygiene…
hygienehijack
/j/‘hiju...’
/ /‘hijure’
DPPseudoword
Pseudoword
(2)
(2)
(2)
(2)
/ //j//d //ai//h/
Pred
ictio
n Er
ror
Pre-DP
/ // j// //ai//h/Pre-DP
Lexi
cal E
ntro
py
Post-DP
LexicalEntropy
High
Low
Competitor Priming
Post-DP
DP
/æ/ -/e/ -… -/ai/ +… -/ / -… -/ / -
/d/ -… -/f/ -/k/ -… -
/d / +… -
/ai/
/j/
DP
Pseudoword
Primed by Pseudoword higent
Primed by Word hijack
Unprimed
/
Pseudoword
Primed by Pseudoword higent
Primed by Word hijack
Unprimed
20-76 trials
A hijack æk/
hygiene in/
basef ef/
basis /
boycott /
boymid/
letto/let /
lettan/let /
Prime Target B
W = Word P = Pseudoword
hijack
1 2 3 4 50
lettoHear
Response
Time (Sec)
or oro
Trial Type Prime Word
Prime Pseudoword
123 124 1256 126
lettan
oro
Target Pseudoword
180 181 182 183
hygiene
oro
Target Word
... ... ... ...
W P P W
Strong Evidence f ev racy)
Strong Evidence for H1
1/100
1/30
1/10
1/3
1
3
10
30
100
DP DP+75ms DP+150ms DP+225ms DP+300ms
Baye
s Fa
ctor
Incorrect and confident
Correct and confident1
2
3
4
5
6
Mea
n R
atin
g Sc
ore
C D
A
0
40
Full Stimuli Length
0
40Pre-DP Length
0 100 200 300 400 500 600 700 800 900 10000
40 Post-DP Length
Time (ms)
B Deviation Point (DP)
l e tt er
l e tt uce
l e tt o
l e tt an
No.
of I
tem
s
DP DP+75ms DP+150ms DP+225ms DP+300ms
0 100 200 300 400 500 600Time (ms)
0.05
0.10
0.15
0.20
UnprimedWord
(Word Prime)Word Target
(Pseudo Prime)Word Target
Unprimed Pseudo
(Pseudo Prime)Pseudo Target
(Word Prime)Pseudo Target
1000
1050
1100
1150
1200
1250
Res
pons
e Ti
mes
(ms)
A
*** **
ns
**
0
Erro
r Rat
e
B
UnprimedWord
(Word Prime)Word Target
(Pseudo Prime)Word Target
Unprimed Pseudo
(Pseudo Prime)Pseudo Target
(Word Prime)Pseudo Target
**
*
Unprimed
Primed by Same
Lexicality
Primed by Different
Lexicality
Word
Pseudoword
**
**
ns***
0
10
20
30
40
50
~ -28 to -4ms
-400 -300 -200 -100 0 100
p = 1 p < .001
Unprimed ItemsWord-Primed Items
AA
mpl
itude
(fT/
cm)
Time (ms)
0
5
10
15
20
25
Am
plitu
de(fT
/cm
)
* *
B
Unprimeditems
Word-PrimedItems
Pseudoword-Primed Items
Pre-DP ROI Responses in Gradiometers
Unprimed - Primed by Word Pre-DP Responses in Gradiometers
DP
0
10
20
30
40
50
Am
plitu
de(fT
/cm
)
* ***
0
20
40
60
80
100
120
Am
plitu
de(fT
)
0
0.05
0.10
0.15
0.20
0.25
Sour
ce S
treng
th (a
rbitr
ary
units
)
Lexicality Effect in Magnetometers
Lexicality Effect in Source Space Localised to
Lexicality Effect in Gradiometers
Post-DP ROI Responses in Gradiometers
D
**
**
the Superior Temporal Gyrus (STG)
ns
Unprimed
Primed by Same
Lexicality
Primed by Different
Lexicality
Word
Pseudoword
UnprimedWord
(Word Prime)Word Target
(Pseudo Prime)Word Target
Unprimed Pseudo
(Pseudo Prime)Pseudo Target
(Word Prime)Pseudo Target
UnprimedWord
(Word Prime)Word Target
(Pseudo Prime)Word Target
Unprimed Pseudo
(Pseudo Prime)Pseudo Target
(Word Prime)Pseudo Target
UnprimedWord
(Word Prime)Word Target
(Pseudo Prime)Word Target
Unprimed Pseudo
(Pseudo Prime)Pseudo Target
(Word Prime)Pseudo Target
A
B
C
E Post-DP ROI Responses in Magnetometers
F Post-DP ROI Responses in Source Space
60
70
**
ns
** **
**
ns
**
ns
0
10
20
30
40
50
-1000 -500 0 500 1000
DP~313-956ms
Am
plitu
de (f
T/cm
)
Time (ms)
10.9
-10.9
fT/cm
Word
Pseudoword
Unprimed
Primed by Same
Lexicality
Primed by DifferentLexicality
0
DP~359-990ms
78.3
-78.3
fT
Am
plitu
de (f
T)
Time (ms)-1000 -500 500 1000
-40
0
40
80
120
*
-1000 0 500 10000
10
20
30
40
50DP~397-437ms
p = 1 p < .001
-500
Am
plitu
de (f
T/cm
)
0
10
20
30
40
50
Am
plitu
de (f
T/cm
)
UnprimedWord
(Word Prime)Word Target
(Pseudo Prime)Word Target
Unprimed Pseudo
(Pseudo Prime)Pseudo Target
(Word Prime)Pseudo Target
Unprimed
Primed by Same
Lexicality
Primed by DifferentLexicality
Word
Pseudoword
A B
Word
Pseudoword
ns **
Unprimed
Primed by Same
Lexicality
Primed by DifferentLexicality
Time (ms)
Lexicality-by-Priming Interaction in Gradiometers Interaction ROI Responses in Gradiometers
~
A
count
Association on Competitor Priming Difference (Word-Primed Word - Unprimed Word)
Association on Lexicality Difference (Pseudoword - Word)
Lexi
calit
y D
iffer
ence
in
Mag
neto
met
ers
Am
plitu
de (f
T)
78.3fT
count
Competitor Priming Difference in RTs (ms)
Com
petit
or P
rimin
g D
iffer
ence
in
Mag
neto
met
ers
Am
plitu
de (f
T)
Lexicality Difference in RTs (ms)
B Post-DP Pseudoword > Word Cluster in Magnetometers
β0 ***
β1 *** β1 ***
β0 ns