lin 3098 corpus linguistics lecture 7 albert gatt
TRANSCRIPT
![Page 1: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/1.jpg)
LIN 3098 Corpus LinguisticsLecture 7
Albert Gatt
![Page 2: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/2.jpg)
In this lecture
We look at some ways in which corpora can be useful in morphological research.
Main focus: morphological productivity
![Page 3: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/3.jpg)
Part 1
Morphology, corpora and productivity
![Page 4: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/4.jpg)
Productivity in linguistics
The term “productivity” is used in a wide variety of contexts.
Syntactic rules are “productive” in the sense that they can be used to generate new phrases.
The same can be said of some morphological rules.
![Page 5: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/5.jpg)
A definition of productivity
A linguistic process is productive if: It can be used to produce novel forms.
If a rule is productive, then: Novel forms (previously unheard) can be
understood and produced; There is no need to store all forms in the
mental lexicon.
![Page 6: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/6.jpg)
A couple of examples Imagine an English adjective garmy. How would you
derive a noun out of this adjective? Many speakers might say garminess This suggests that –ness suffixation is a productive
derivational process.
E.g. Imagine a Maltese verb intoffa. How would you produce a noun from it? Speakers might say intoffar or inttofament or
intoffazzjoni This suggests that –ar and –ment suffixation are
productive derivational processes in Maltese.
![Page 7: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/7.jpg)
Productive vs non-productive
Some morphological processes or categories seem to have greater potential to form new words than others e.g. English -able, -ness compare to English –th: warmth,
strength… (much less productive)
![Page 8: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/8.jpg)
Classical approaches to productivity
Jackendoff (1975): morphological rules are called redundancy rules:
They capture the relationship between related forms E.g. Warm warmth (ADJ N via addition of –th) E.g. Desire desirable (N ADJ via addition of –able)
If a rule is productive, then it can be used to create novel forms. e.g. adjectives with –able can be produced
“online”
![Page 9: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/9.jpg)
Features of classical approaches
1. Relies on a binary distinction (un/productive)
2. Productive rules are typically regular & sub-regularities not considered much (Dressler 2003)
3. Most of these approaches do not look at corpus data
![Page 10: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/10.jpg)
Productive vs regular Usually, productive morphological rules are regular.
Irregular forms are likely to be stored in the lexicon.
However, we can sometimes detect “sub-regularities”: sing-sang ring-rang bring-brang (?)
Speakers can sometimes generalise these sub-regular processes, perhaps by analogy. What’s the past tense of tring or spling?
![Page 11: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/11.jpg)
“Possible” vs “attested” Our tentative definition of productivity focuses on
production of novel forms.
By definition, novel forms are: Possible words of the language; Previously unattested.
This would suggest that we can’t use corpora to study
productivity. Corpora only contain attested forms.
![Page 12: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/12.jpg)
The problem of frequency Suppose we find that a corpus contains lots of words
ending in some suffix –X.
This doesn’t necessarily imply that the -X suffix is productive.
It could have been productive in the past, but is not anymore. Therefore, the likelihood of a new word ending in –X
is low, despite the high frequency.
![Page 13: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/13.jpg)
Getting around the problem Frequency can’t give us all the answers. However, one
interesting solution is to look at hapax legomena.
A corpus will usually contain lots of words occurring only once.
We can think of hapaxes as “one-offs”.
It seems likely that some hapaxes will be “new formations” NB We can only make this assumption if the corpus is very large.
![Page 14: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/14.jpg)
Corpus-based approaches View productivity as a gradable
phenomenon: some forms become ingrained through frequent
usage category can still be productive to some extent productivity estimated in terms of a category’s
potential to produce new forms can account for sub-regularities: productivity of a
category is due to a lot of factors, including analogy to existing words
![Page 15: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/15.jpg)
The continuum
Productive processes tend to: be compositional result in a lot of new words
Productive morphological process
lexicalised word
ADJ+ness Noun ADJ+th Noun
![Page 16: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/16.jpg)
Why is productivity interesting? No finite lexicon can contain all words of a
language at a certain time productive processes can be exploited to parse
new/unseen lexical items this is helped by the compositionality of
productive processes can also help to distinguish creative neologism
from systematic rule-application. compare: well-defined, well-intentioned, well-specified
lots of adjectives with a well- prefix YouTube
a one-off
![Page 17: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/17.jpg)
Theoretical implications raises interesting questions about the
relationship between corpus-based measures and psycholinguistic data
likelihood of a morphological process being applied depends on style, genre, speech community…
can give an indication of language change over time (some processes are fossilised, others become more productive)
![Page 18: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/18.jpg)
Statistical measures of productivity (Baayen 2006)
![Page 19: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/19.jpg)
What we need
A measure of productivity of a process/category C should reflect: our intuitions about how frequently we
encounter C how easily native speakers can form new
words using C Is it easier to produce a noun with –th
(like warmth) or one with –ness (like goodness)?
![Page 20: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/20.jpg)
An analogy We can compare morphological processes to companies.
All try to dominate a market where the number of clients (words) is limited.
Productivity reflects the extent to which these companies: have managed to dominate in the past (how many
words they’ve formed) are expanding into new areas of the market (how
many new words they’re forming) may expand in the future (how many as yet unseen
words they’re going to form)
![Page 21: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/21.jpg)
Realised productivity (RP)
Given a morphological category C, RP gives a rough indication of the past utility of C in forming new words.
Measured as the number of distinct types in C in a corpus of size N
E.g. regular past tense –ed displays many more types than sub-regular forms such as keep-kept/sleep-slept
![Page 22: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/22.jpg)
Realised productivity cont/d Why types, not tokens?
Productive processes have lots of types which are hapaxes, or are very infrequent (low token frequency).
Words formed from irregular processes tend to be very frequent (have high token frequency).
Some limitations: a high RP for a category does not imply that it
will keep forming lots of new words RP is heavily dependent on corpus size
![Page 23: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/23.jpg)
Expanding productivity (P*) P* gives a rough indication of the rate of
expansion of C. Focuses on the number of hapaxes
produced using C in the corpus. aka hapax-conditioned productivity
NB: P* is still heavily dependent on corpus size!
corpusin hapaxes of no. total
C using formed hapaxes of .*
NoP
![Page 24: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/24.jpg)
Potential productivity (P) Gives an indication of how likely a category
C is to form new words in future. I.e. the potential for C to be already saturated aka category-conditioned productivity
C using formed tokensof no. total
C using formed hapaxes of No.* P
![Page 25: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/25.jpg)
Some more on P Unlike RP and P*, P is not very sensitive to
corpus size as such However, very sensitive to frequency of the
category. e.g. if C is realised only once in a corpus of size
N, then P = 1! Recent empirical work has shown that RP
and P* may correlate very strongly, but both exhibit a weak correlation with P (Vegnaduzzo 2009) pattern non-X has high RP and P*, but low P pattern X-ish has low RP and P*, but high P
![Page 26: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/26.jpg)
P vs. RP and P*
A category C can have low RP and P*, but high P. In this case, C hasn’t been used much in the
past, but is being used quite productively at the moment.
Corresponds to the “ease” with which new words can be formed using the category.
If category has high RP, it may still be saturated, so have low P.
![Page 27: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/27.jpg)
The psycholinguistic connection
1. Rule vs. direct access: To produce a word (e.g. illegal), you can
either store it directly, or apply the rule on the fly.
Evidence suggests that frequency of baseform vs. derivation is related to which of the two alternatives apply.
![Page 28: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/28.jpg)
The psycholinguistic connection
2. Complexity-based affix ordering: Corpus research: more productive affixes
follow less productive ones in word formation
It seems that more highly predictable (low productivity) affixes are processed first.
High productivity may also imply less likelihood of entering into further derivational processes.
![Page 29: LIN 3098 Corpus Linguistics Lecture 7 Albert Gatt](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d135503460f949e6f7d/html5/thumbnails/29.jpg)
Works cited S. Vegnaduzzo (2009). Morphological
productivity rankings of complex adjectives. Proc. NAACL-HLT Workshop on Computational Approaches to Linguistic Creativity.
K. Molinen and S. Pulman (2008). The good, the bad and the unknown: Morphosyllabic sentiment tagging of unseen words. Proc. ACL 2008
Baayen 2006 linked from web page