jelena proki´c michael cysouwcysouw.de/home/presentations_files/cysouw_prokicichl.pdf · jelena...
TRANSCRIPT
String comparisonPatterns of sound changes
Conclusions
Automatic detection of patterns of sound changes
Jelena Prokic Michael Cysouw
Ludwig-Maximilians-Universitat Munchen
20th International Conference on Historical LinguisticsOsaka, July 25-30, 2011
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Overview
1 String comparison
2 Patterns of sound changes
3 Conclusions
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Levenshtein distance
One of the most successful methods to determine sequence distance
(Levenshtein, 1964)
biological molecules, software engineering, ...
Levenshtein distance: minimum number of insertions, deletions and
substitutions to transform one string into the other
Syllabicity constraint add: vowels never substitute for consonants
m O @ l k @
m E l @ k
1 1 1 1
One of the most commonly used methods in quantitative language
comparison, including automatic approaches to historical linguistics
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Positive aspects
Allows automatic alignment of the strings
Very fast and easy to implement
Gives good general picture of the relatedness between languagevarieties
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Negative aspects
Usually based on 0/1 segment differences
Yields too little insight into the linguistic basis of differences
There is no model of linguistic change
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Multisequence Alignment
We first need to align not only pairs of strings, but large sets.Gusfield “holy grail of string algorithms”
—no perfect solution
Softwares for automatic multisequence alignment in linguistics:Prokic et al. (2009)Johann-Mattis (2010)Steiner et al. (2011)
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Example of multi-aligned strings
Aldomirovtsi: v "e>tS e r d - n "o l "7 s n o
Asparuhovo-Lom: v "e>tS e r d - n "o l "e s n o
Asparuhovo-Prov: vj "e>tS @ r d "7 n u lj "e s n u
Babyak: v "e>tS e r d - n "o ? ? ? ? ?
Bachkovo: v "e >ts e r d "A n u lj "e s n u
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
23 24 25 26 27
41.5
42.0
42.5
43.0
43.5
44.0
X
Y
0
5
10
15
20
25
Frequency of @
23 24 25 26 27
41.5
42.0
42.5
43.0
43.5
44.0
X
Y0
5
10
15
20
25
30
Frequency of A
23 24 25 26 27
41.5
42.0
42.5
43.0
43.5
44.0
X
Y
0
5
10
15
20
25
Frequency of e
23 24 25 26 27
41.5
42.0
42.5
43.0
43.5
44.0
X
Y0
5
10
15
20
25
Frequency of i
23 24 25 26 27
41.5
42.0
42.5
43.0
43.5
44.0
X
0
1
2
3
4
5
6
Frequency of m_j
23 24 25 26 2741.5
42.0
42.5
43.0
43.5
44.0
X
14
15
16
17
18
19
20
Frequency of m
String comparisonPatterns of sound changes
Conclusions
Example of multi-aligned strings
Aldomirovtsi: v "e>tS e r d - n "o l "7 s n o
Asparuhovo-Lom: v "e>tS e r d - n "o l "e s n o
Asparuhovo-Prov: vj "e>tS @ r d "7 n u lj "e s n u
Babyak: v "e>tS e r d - n "o ? ? ? ? ?
Bachkovo: v "e >ts e r d "A n u lj "e s n u
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Do various word positions vary at different rate?
Yes (not surprising)
Can we measure that?
Yes: Shannon index
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Shannon index
Information entropy of the distribution
H = −i=1�
S
pi ln pi
S - number of phones, pi - relative abundance of phone i .
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Variability under the ”Levenshtein” model
0 200 400 600 800
0.6
0.8
1.0
1.2
1.4
word positions
variability
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Actual variability
word positions
variability
0.0
0.5
1.0
1.5
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Variability of vowels and consonants
c mt v vs
0.0
0.5
1.0
1.5
2.0
sw in
dex
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Variability: onset vs. nucleus vs. coda
c n o
0.0
0.5
1.0
1.5
2.0
sw in
dex
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Computing highly correlated positions
How regular are sound changes?
Completely identical positions were not found in the data set
Word positions that show high correlation can be identified using
Mutual Information (MI)
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Mutual Information
The mutual information of two random variables is a quantity thatmeasures the mutual dependence of the two variables.
I =�
y∈Y
�
x∈X
p(x , y) logp(x , y)
p(x)p(y)
p(x,y) is the joint probability distribution function of X and Y, and p(x)and p(y) are the marginal probability distribution functions of X and Yrespectively.
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
02
4
02
4
String comparisonPatterns of sound changes
Conclusions
Computing highly correlated positions
word positions
variability
0.0
0.5
1.0
1.5
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
04
8
i- b_j“e bi
“7-
-tSi ni “7r
d_Zi
ib_jb_j“AZd d_j@@w
v“i j-
u- z--@ -u “7-
@j b_j“e be
“7-
-ts\@ n“e “Ar
z\@
@b_jb_j“ez\g deex
v“i j@-f -v
v@ v“o @vej b“e b“e
-A
AtSe n_je -“r_=
Ze
ebb“e:Zd d“o:“o:-
v“i -e
u- z-
-i v“o Av
04
8
mi r“7 r“ASt
“7l “7ln`kk@@v
v----“7 r_j@
@v_jv_j“A
di uv_jr_j@
“7- di u--@m_j@ r- r“A -t
“Al_j “Aln`kk@@v
v----“A r_j@
@v_jv_j“e
d_j@ uv_jr_j@
“At de @xx@me “r_=- r“ASt
“u- “u-n`kkAA-
-uunn“e re
evv“e
de ovre
“o- J\e o--A
04
8
“7- s_j@d_j“e“etid_Zd_Z“op-
k@
r_j“A “7rr-i- je
-i umm“i“ic
cZZiil_jl_j“A
@- s_j@d“e“etez\z\“ep-
k@
r_j“e “Arr-@d -i -u umm“i
“ik
kz\z\“e“elle
em se deet“ed_Zd_Z“ep-
kA
r“e -“r_=“r_=-ed -e -e -mm“i
“ik
kZZeell“e
04
8
“uwZ“e“ell-
veimm--“e
-imi @k l_j“u -“e r“7“7- @d_jd_j“A ul_j St jkk@ si
“ufz\“e“ell-
v_j“eemml_jl_j“A
him_j@ en_j l_j“u j“e r--- @d_jd_j“e ul_j s\t jcc@ se
“if Z----“7
ve“emmn_jn_jA
vume ik l_j“u j“e ---“r_= Add“e ol St -ccA se
String comparisonPatterns of sound changes
Conclusions
Results
Each position was compared to all others
Highly correlated positions were linked into groups
Four groups of highly correlated positions were detected
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Cluster 1: back vowels
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Cluster 1: back vowels
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Cluster 2: front vowels
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Cluster 3: the ‘yat’ vowel
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Cluster 4: presence of velar fricative
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes
String comparisonPatterns of sound changes
Conclusions
Conclusions
Occurrence of geographical patterns from the data
MI is not content dependent, but highly correlated columns have
similar phonetic content
This approach enables us to automatically identify layers of sound
changes
Jelena Prokic, Michael Cysouw Automatic detection of patterns of sound changes