statistical nlp spring 2010 lecture 25: diachronics dan klein – uc berkeley
TRANSCRIPT
![Page 1: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/1.jpg)
Statistical NLPSpring 2010
Lecture 25: Diachronics
Dan Klein – UC Berkeley
![Page 2: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/2.jpg)
Evolution: Main Phenomena
Mutations of sequences
Time
Speciation
Time
![Page 3: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/3.jpg)
Tree of Languages
Challenge: identify the phylogeny Much work in
biology, e.g. work by Warnow, Felsenstein, Steele…
Also in linguistics, e.g. Warnow et al., Gray and Atkinson…
http://andromeda.rutgers.edu/~jlynch/language.html
![Page 4: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/4.jpg)
Statistical Inference Tasks
Modern TextAncestralWord Forms
Cognate Groups /Translations
GrammaticalInference
Inputs Outputs
Phylogeny
FR IT PT ES
fuego feu
focus
fuego
feuhuevo
oeuf
les faits sont très clairs
![Page 5: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/5.jpg)
Outline
AncestralWord Forms
Cognate Groups /Translations
GrammaticalInference
fuego feu
focus
fuego
feuhuevo
oeuf
les faits sont très clairs
![Page 6: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/6.jpg)
Language Evolution: Sound Change
camera /kamera/Latin
chambre /ʃambʁ/French
Deletion: /e/
Change: /k/ .. /tʃ/ .. /ʃ/
Insertion: /b/
Eng. camera from Latin, “camera obscura”
Eng. chamber from Old Fr. before the initial /t/ dropped
![Page 7: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/7.jpg)
Diachronic Evidence
tonitru non tonotrutonight not tonite
Yahoo! Answers [2009] Appendix Probi [ca 300]
![Page 8: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/8.jpg)
Synchronic (Comparative) Evidence
![Page 9: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/9.jpg)
Simple Model: Single Characters
CG CC CC GG
CG
C GG
![Page 10: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/10.jpg)
A Model for Words Forms
focus /fokus/
fuego /fweɣo/
/fogo/
fogo /fogo/
fuoco /fwɔko/
µI B
IT ES PT
IB
LAµE S
µI T µP T
[BGK 07]
![Page 11: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/11.jpg)
Contextual Changes
/fokus/
/fwɔko/
f# o
f# w ɔ
…
µI T
![Page 12: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/12.jpg)
Changes are Systematic
/fokus/
/fweɣo/
/fogo/
/fogo//
fwɔko/
/fokus/
/fweɣo/
/fogo/
/fogo//
fwɔko/
/kentru
m/
/sentro
/
/sentr
o/
/sentro
/
/tʃɛntro
/
![Page 13: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/13.jpg)
Experimental Setup
Data sets Small: Romance
French, Italian, Portuguese, Spanish
2344 words Complete cognate sets Target: (Vulgar) Latin
Large: Oceanic 661 languages 140K words Incomplete cognate sets Target: Proto-Oceanic [Blust,
1993]
FR IT PT ES
![Page 14: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/14.jpg)
Data: Romance
![Page 15: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/15.jpg)
Learning: Objective
maxµ
P (µjw1 : : :wL )
/fokus/
/fweɣo/
/fogo/
/fogo//fwɔko/
z
w
![Page 16: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/16.jpg)
Learning: EM
/fokus/
/fweɣo/
/fogo/
/fogo//fwɔko/
/fokus/
/fweɣo/
/fogo/
/fogo//fwɔko/
M-Step Find parameters which
fit (expected) sound change counts
Easy: gradient ascent on theta
E-Step Find (expected) change
counts given parameters
Hard: variables are string-valued
![Page 17: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/17.jpg)
Computing Expectations
‘grass’
Standard approach, e.g. [Holmes 2001]: Gibbs sampling each sequence
[Holmes 01, BGK 07]
![Page 18: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/18.jpg)
A Gibbs Sampler
‘grass’
![Page 19: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/19.jpg)
A Gibbs Sampler
‘grass’
![Page 20: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/20.jpg)
A Gibbs Sampler
‘grass’
![Page 21: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/21.jpg)
Getting Stuck
How to jump to a state where the liquids /r/ and /l/ have a common
ancestor?
?
![Page 22: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/22.jpg)
Getting Stuck
![Page 23: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/23.jpg)
Solution: Vertical Slices
Single Sequence
Resampling
Ancestry Resampling
[BGK 08]
![Page 24: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/24.jpg)
Details: Defining “Slices”
The sampling domains (kernels) are indexed by contiguous subsequences (anchors) of the observed leaf sequences
Correct construction section(G) is non-trivial but very efficient
anchor
![Page 25: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/25.jpg)
Results: Alignment Efficiency
Is ancestry resampling faster than basic Gibbs?
Hypothesis: Larger gains for deeper trees
Depth of the phylogenetic tree
Sum
of P
airs
Setup: Fixed wall time
Synthetic data, same parameters
![Page 26: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/26.jpg)
Results: Romance
![Page 27: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/27.jpg)
Learned Rules / Mutations
![Page 28: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/28.jpg)
Learned Rules / Mutations
![Page 29: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/29.jpg)
Comparison to Other Methods
Evaluation metric: edit distance from a reconstruction made by a linguist (lower is better)
UsOakes
Comparison to system from [Oakes, 2000]
Uses exact inference and deterministic rules
Reconstruction of Proto-Malayo-Javanic cf [Nothefer, 1975]
![Page 30: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/30.jpg)
Data: Oceanic
Proto-Oceanic
![Page 31: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/31.jpg)
Data: Oceanic
http://language.psy.auckland.ac.nz/austronesian/research.php
![Page 32: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/32.jpg)
Result: More Languages Help
Number of modern languages used
Mea
n ed
it di
stan
ceDistance from [Blust, 1993] Reconstructions
![Page 33: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/33.jpg)
Results: Large Scale
![Page 34: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/34.jpg)
Visualization: Learned universals
*The model did not have features encoding natural classes
![Page 35: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/35.jpg)
Regularity and Functional Load
In a language, some pairs of sounds are more contrastive than others (higher functional load)
Example: English “p”/“b” versus “t”/”th”
“p”/“b”: pot/dot, pin/din, dress/press, pew/dew, ...
“t”/”th”: thin/tin
![Page 36: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/36.jpg)
Functional Load: Timeline
Functional Load Hypothesis (FLH): sounds changes are less frequent when they merge phonemes with high functional load [Martinet, 55]
Previous research within linguistics: “FLH does not seem to be supported by the data” [King, 67]
Caveat: only four languages were used in King’s study [Hocket 67; Surandran et al., 06]
Our work: we reexamined the question with two orders of magnitude more data [BGK, under review]
![Page 37: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/37.jpg)
Regularity and Functional Load
Functional load as computed by [King, 67]
Data: only 4 languages from the Austronesian dataM
erge
r pos
terio
r pro
babi
lity
Each dot is a sound change identified by the system
![Page 38: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/38.jpg)
Regularity and Functional Load
Data: all 637 languages from the Austronesian data
Functional load as computed by [King, 67]
Mer
ger p
oste
rior p
roba
bility
![Page 39: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/39.jpg)
Outline
AncestralWord Forms
Cognate Groups /Translations
GrammaticalInference
fuego feu
focus
fuego
feuhuevo
oeuf
les faits sont très clairs
![Page 40: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/40.jpg)
Cognate Groups
/fweɣo/
/fogo/
/fwɔko/
/berβo/
/vɛrbo/
/vɛrbo//
tʃɛntro/
/sentro/
/sɛntro/
‘fire’
![Page 41: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/41.jpg)
Model: Cognate Survival
omnis
-
-
-ogn
i
IT ES PT
IB
LA
+
-
-
-+
IT ES PT
IB
LA
![Page 42: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/42.jpg)
Results: Grouping Accuracy
Fraction of Words Correctly Grouped
Method
[Hall and Klein, in submission]
![Page 43: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/43.jpg)
Semantics: Matching Meanings
day
Occurs with:“night”“sun”
“week”
tag
DE
tag ENEN
Occurs with:“nacht”“sonne”“woche”
Occurs with:“name”“label”“along”
![Page 44: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/44.jpg)
Outline
AncestralWord Forms
Cognate Groups /Translations
GrammaticalInference
fuego feu
focus
fuego
feuhuevo
oeuf
les faits sont très clairs
![Page 45: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/45.jpg)
Grammar Induction
congress narrowly passed the amended bill les faits sont très clairs
Task: Given sentences, infer grammar (and parse tree structures)
![Page 46: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/46.jpg)
Shared Prior
les faits sont très clairs
P (µ̀ jµ) = N (µ;¾2I )
P (µ) = N (0;¾2I )
congress narrowly passed the amended bill
![Page 47: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/47.jpg)
Results: Phylogenetic Prior
Dut
ch
Dan
ish
Sw
edis
h
Spa
nish
Por
tugu
ese
Slo
vene
Chi
nese
Eng
lish
WG NGRMG
IE
GLAvg rel gain: 29%
![Page 48: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/48.jpg)
Conclusion Phylogeny-structured models can:
Accurately reconstruct ancestral words Give evidence to open linguistic debates Detect translations from form and context Improve language learning algorithms
Lots of questions still open: Can we get better phylogenies using these high-
res models? What do these models have to say about the very
earliest languages? Proto-world?
![Page 49: Statistical NLP Spring 2010 Lecture 25: Diachronics Dan Klein – UC Berkeley](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649e5d5503460f94b56173/html5/thumbnails/49.jpg)
Thank you!
nlp.cs.berkeley.edu