state of the art of the automated similarity judgment … of the art of the automated similarity...
TRANSCRIPT
![Page 1: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/1.jpg)
State of the art of the AutomatedSimilarity Judgment Program
Søren Wichmann(MPI-EVA & Leiden University)
& The ASJP Consortium
The Swadesh Centenary Conference, MPI-EVA, Jan. 17-18, 2009
![Page 2: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/2.jpg)
Structure of the presentation
• 1. History of the ASJP project• 2. Basic methodology• 3. An assessment of the viability of
glottochronology• 4. Identifying homelands
![Page 3: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/3.jpg)
1. History of the ASJP project• Jan. 2007:
– Cecil Brown (US linguistic anthropologist) comes up with idea ofcomparing languages automatically and communicates this to
– Eric Holman (US statistician) and me. Brown and Holman workon rules to identify cognates implemented in an „automatedsimilarity judgement program“ (ASJP).
• May 2007:– Cecil Brown is in Leipzig and explains to me what the two of
them have come up with and I begin to take more active part,adding ideas.
• Aug. 2007:– Viveka Velupillai (Giessen-based linguist) joins in.– A first paper is written up (largely by Brown and Holman)
showing that the classifications of a number of families based ona 245 language sample conform pretty well with expertclassification.
![Page 4: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/4.jpg)
• Sept. 2007:– Andre Müller (linguist, Leipzig) joins.– Pamela Brown (wife of Cecil Brown) joins.– Dik Bakker (linguist, Amsterdam & Lancaster) joins, and begins to do
automatic data-mining, an implementation in Pascal, and to look atways to identify loanwords.
• Oct. 2007:– Hagen Jung (computer scientist, MPI, makes a preliminary online
implementation).– I take over the „administration“ of the project.– A second paper is finished about stabilities of lexical items, defining a
shorter Swadesh list, etc.• Nov. 2007:
– Robert Mailhammer (linguist, BRD) joins.• Dec. 2007:
– Anthony Grant (linguist, GB) joins.– Dmitry Egorov (linguist, Kazan) joins.– Levenshtein distances are implemented instead of old „matching rules“
identifying cognates.
![Page 5: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/5.jpg)
• Jan. 2008:– Kofi Yakpo (linguist) joins.
• Febr. 2008– The two papers are accepted for publication without revision (in
respectively Sprachtypologie und Universalienforschung andFolia Linguistica).
• April 2008:– Oleg Belyaev (linguist, Moscow) joins.
• 2008:– Papers presented at conferences in Tartu, Helsinki, Cayenne,
Forli, and Amsterdam.– Work on the structure of phylogenetic trees, glottochronology,
onomatopeitic phenomena, homelands.• Jan. 2009:
– Paper accepted for Linguistic Typology– The database expanded to hold around 2500 languages.
Another 1000 or so in the pipeline.
![Page 6: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/6.jpg)
6000+ Languages in the world
2432 fully processed languages in the ASJP database (~1000 are in the pipeline)
![Page 7: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/7.jpg)
2. Basic Methodology
![Page 8: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/8.jpg)
The database
• Encoding: a simplifying transcription• Contents: 40-item lists
![Page 9: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/9.jpg)
Transcriptions
• 7 vowel symbols• Nasalization indicated but not length, tone,
stress• Some rare distinctions merged• „Composite“ sounds indicated by a modifier• Vx sequences where x = velar-to-glottal fricative,
glottal stop or palatal approximant reduced to V
![Page 10: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/10.jpg)
30. Blood31. Bone51. Breast66. Come61. Die21. Dog54. Drink39. Ear40. Eye82. Fire19. Fish95. Full48. Hand58. Hear34. horn
hw~ateCiyakXXXmiyuwapikaahate8ikasmarkyu7a7o7iCi7tim7orikasaleevkakw~a7a
hwáteʧija:kXXXmijúwapí:kaʔaháteθí:kasmárkjúʔʔaʔóʔʔiʧí:ʔtimʔórikasáleʔé:vkaʔkwáʔa
Example of transcription: Havasupai (Yuman)
![Page 11: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/11.jpg)
Sy~amqaʃʲamqa47 kneebz3bzɨ44 tonguep3cpɨʦ43 toothp3nc"apɨnʦʼa41 noseLala40 eyel3mhalɨmha39 earCw"~3Xw~aʧʼʷɨʕʷa34 hornbXw~3bʕʷɨ31 boneSy~aʃʲa30 bloodCw~azy~ʧʷazʲ28 skinbxy~3bɣʲɨ25 leafc"laʦʼla23 treec"aʦʼa22 louselala21 dogpslaCw~apslaʧʷa19 fishXw~3Cw"y$Xw~3sʕʷɨʧʼʲʷʕʷɨs18 person
Another transcription example: Abaza (Northwest Caucasian)
![Page 12: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/12.jpg)
Towards a shorter Swadesh list
Procedure:
• Measure stabilities of items on theSwadesh list
• Find the shortest list among the moststable items that gives adequate results
![Page 13: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/13.jpg)
Measure stabilites
• count proportions of matches for pairs ofwords with similar meanings amonglanguages within genera
• add corrections for chance agreement• weighted means
![Page 14: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/14.jpg)
Check whether it actually makessense to assume that items have
inherent stabilites by
• seeing whether the rankings obtainedcorrelate across different areas (in thiscase New World vs. Old World isconvenient)
![Page 15: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/15.jpg)
![Page 16: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/16.jpg)
Stability and borrowability
![Page 17: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/17.jpg)
No correlation between borrowabilityand stability
0
0.05
0.1
0.15
0.2
0.25
0.3
0 20 40 60 80 100
Stability rank
Bor
row
ing
rate
![Page 18: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/18.jpg)
Potential explanations
• Borrowability may be more variable for given lexicalitems across areas than stability and not be an inherentproperty of lexical items (similar to typological features).
• Borrowability is not a significant contributor to stability, atleast as the segment constituted by the Swadesh 100-item list is concerned.
• There are still far too little data on borrowability to beconclusive (the sample for studying stability wasconstituted by 245 languages, whereas we had only 36language at our disposal for the study of borrowability).
![Page 19: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/19.jpg)
Selecting a shorter list
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80 90 100
Number of words
Cor
rela
tion
Correlation between distances in the automated approachand other classifications as a function of list lengths
Ethnologue(Goodman-Kruskal gamma )
WALS/Dryer(Pearson product-moment correlation)
![Page 20: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/20.jpg)
Automating the similarity measureLevenshtein distances: the minimum number of steps—substitutions,insertions or deletions—that it takes to get from one word to another
Germ. Zunge � Eng. tongue
tsuŋətuŋə (substitution)tɔŋә (substitution)tɔŋ (deletion)
Or tongue � Zunge
t�ŋt�ŋə (insertion)tuŋə (substitution)tsuŋə (substitution)
= 3 steps, so LD = 3
![Page 21: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/21.jpg)
Weighting Levenshtein distances
Serva & Petroni (2008): divide by the lengths of the stringscompared. Takes into account that LD‘s grow with wordlength
ASJP:1. divide LD by the length of the longest string compared to
get LDN (takes into account typical word lengths of thelanguages compared);
2. then divide LDN by the average of LDN‘s among words inSwadesh lists with different meanings to get LDND (takesinto account accidental similarity due to similarities inphonological inventories)
![Page 22: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/22.jpg)
Results for classification
Two methods of evaluation:Looking at statistical correlations withWALS or Ethnologue classificationComparing tree with „expert trees“/expertknowledge
![Page 23: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/23.jpg)
Performance of classification:a correlation with Ethnologue
0.7246AFRO-ASIATIC
0.2553AUSTRONESIAN0.7318SINO-TIBETAN
0.2733PANOAN0.7333CHIBCHAN
0.3169CARIBAN0.7356UTO-AZTECAN
0.3866AUSTRALIAN0.7475NILO-SAHARAN
0.393ARAWAKAN0.7565TUCANOAN
0.4404NIGER-CONGO0.7867TUPIAN
0.5047TRANS-NEW GUINEA0.8062PENUTIAN
0.5069KHOISAN0.8276MAYAN
0.5477ALGIC0.8447MACRO-GE
0.5725KADUGLI0.8515NAKH-DAGHESTANIAN
0.6223HOKAN0.8552ALTAIC
0.6475AUSTRO-ASIATIC0.9332INDO-EUROPEAN
0.6955TAI-KADAI0.9793OTO-MANGUEAN
0.7021URALIC0.9803MIXE-ZOQUE
![Page 24: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/24.jpg)
• Disadvantages of automated method:– blind to anything but lexical evidence– not always accurate– has a swallower limit of application than the
comparative method
• Advantages:– extremely quick– consistent and objective– provides information on the amount of changes, and
therefore a time perspective
![Page 25: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/25.jpg)
3. Assessing the viability ofglottochronology (or Levenshtein
chronologies)
![Page 26: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/26.jpg)
• The assumption of a (fairly) constant rateof change can be checked by looking atbranch lengths for lexicostatistical trees.Let‘s see some examples:
![Page 27: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/27.jpg)
Tai-Kadai
![Page 28: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/28.jpg)
Uto-Aztecan
![Page 29: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/29.jpg)
Mayan
![Page 30: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/30.jpg)
The ultrametric inequality condition
rooted tree
C (root)
A B
![Page 31: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/31.jpg)
The ultrametric inequality condition
rooted tree
Distance C-A = Distance C-B
A B
![Page 32: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/32.jpg)
Unrooted tree
Distance A-D = Distance B-D
A
B
C
D
![Page 33: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/33.jpg)
Distance A-C = Distance B-C
A
B
C
D
![Page 34: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/34.jpg)
Distance A-C = Distance A-D
A
B
C
D
![Page 35: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/35.jpg)
Distance B-C = Distance A-D
A
B
C
D
![Page 36: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/36.jpg)
Margin of error = BC – BD/[(BC + BD)/2]
A
B
C
D
A margin of error found by measuring the deviation fromultrametric inequality
![Page 37: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/37.jpg)
Uto-Aztecan
![Page 38: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/38.jpg)
Uto-Aztecan
![Page 39: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/39.jpg)
Uto-Aztecan
![Page 40: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/40.jpg)
0
10
20
30
40
50
0 20 40 60 80 100
% margin of error (max of bin)
frequ
ency
(% o
f tot
al) p
airs
Binned frequencies of margins of errors for ages of single pairs (Indo-European)
![Page 41: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/41.jpg)
0
10
20
30
40
50
0 10 20 30 40 50 60 70 80 90 100
Average LD´´ (%)
Mar
gin
of e
rror
(%)
x-axis: average of the greatest LDNDs within all sets of three related languagesthat are within the same 1% interval.y-axis: the margin of error estimated as the average of the differences betweenthe (logarithms of) the two largest distances for the set of triplets in the intervaldivided by the (logarithm) of the average of these two largest distances.
Margins of error for multiple language pairs as a function of LDND
~1000 BP ~6000 BP
LDND (%)
![Page 42: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/42.jpg)
How to measure the age of alanguage group
• Take the age of the two most divergentlanguages? No, this would bias the result high.
• Take the average age of all language pairs? No,this would bias the result low.
• Make the ages part of the lexicostatistical treeand measure lengths from root (midpoint) totips? No, this is only doable for a UPGMA tree,which is far from an optimal phylogeneticalgorithm.
![Page 43: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/43.jpg)
The last approach is taken by Serva and Petroni (2008)
Serva, Maurizio and Filippo Petroni. 2008. Indo-European languages byLevenshtein distances. Available at www.arXiv.org (and now published)
![Page 44: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/44.jpg)
Comparing two Salishan trees
UPGMA Neighbour-Joining
![Page 45: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/45.jpg)
Our approach
• Find the midpoint in the tree of the languagegroup and take the average modifiedLevenshtein distances of all pairs whosemembers are on either side of the midpoint.
• Calibrate with ages of known linguistic event.• Find the LDND‘s at zero years = the LDND
expected for dialects, and build that into theformula.
![Page 46: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/46.jpg)
The revised glottochronologicalformula
Standard formula: log(SIM) = [2log(R)]T
New formula taking into account inherent variability within languageslog(SIM) = [2log(R)] T + log(SIM')
SIM = observed similarity = 1-LDNDSIM' = baseline similarity at time 0R = retention rateT = time in millenia
R = .81 (slope of the line)SIM' = .68 (the intercept). So
T = [log(1-LDND)-log(.68)]/2log(.81)
![Page 47: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/47.jpg)
Some examples of results
Arawakan 5403Austronesian 5050Cariban 3511Chibchan 6146Chukotko-Kamchatkan 4312Dravidian 2959Eskimo 1749Germanic 1506Hmong-Mien 5384IndoEuropean 5981Indo-Iranian 4281Kartvelian 4893Mayan 2669
Mixe-Zoque 3672Muskogean 1812Nakh-Daghestanian 5373NW Caucasian 5313Pano-Tacanan 5212Romance 2255Salishan 6097Semitic 3274Slavic 1187TaiKadai 3604Tupian 4887Uralic 4873Uto-Aztecan 4629
![Page 48: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/48.jpg)
Outstanding problems
• Still not enough good calibration points,and they are hard to find.
• Ages greater than 6,000 BP cannot betrusted because randomness plays in (andASJP classifications also typically breakdown beyond 6,000 years BP)
• Ages swallower than 1,000 show greatvariation from what‘s expected and cannotbe trusted either.
![Page 49: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/49.jpg)
4. Identifying homelands
![Page 50: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/50.jpg)
The idea (going back to Vavilov 1926 inbotany and Sapir‘s Time Perspective inAboriginal American Culture of 1916) isthat the area of highest diversity will tendto be the homeland.
Nikolai Vavilov (1887-1943) Edward Sapir (1884-1939)
![Page 51: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/51.jpg)
• A quantitative implementation:– For each language in a family, measure the
proportion between the linguistic distance L and thegeographical distance G to each of the othermembers of the family, and take the average. Thisproduces a diversity measure D for the location wherethe given language is spoken.
– The language with the highest D sits in the homeland.– Map the results by grouping D‘s into topographic color
categories.
![Page 52: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/52.jpg)
Supplement with reconstruction of ecological vocabulary,known migration histories, archaeology, etc. whenavailable.
„Any one criterion is never to be applied to the exclusionof or in opposition to all others. It is a comfortableprocedure to attach oneself unreservedly or primarily toa single mode of historical inference and wilfully toneglect all others as of little moment, but the clean-cutconstructions of the doctrinaire never coincide with theactualities of history “ (Sapir 1916: 87).
(cf. also critique of Vavilov by Harlan 1971)
![Page 53: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/53.jpg)
HMONG-MIEN
![Page 54: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/54.jpg)
CURRENTLY SPOKEN INDO-EUROPEAN LANGUAGES
![Page 55: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/55.jpg)
ALTAIC
![Page 56: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/56.jpg)
NIGER-CONGO
![Page 57: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/57.jpg)
SINO-TIBETAN
Sino-Tibetan homelandAccording to Diamond &Bellwood (2003)
![Page 58: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/58.jpg)
TAI-KADAI
Tai-Kadai homelandaccording to Diamond &Bellwood (2003)
![Page 59: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/59.jpg)
AUSTRO-ASIATIC
Austro-Asiatic homelandaccording to Diamond &Bellwood (2003)
![Page 60: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/60.jpg)
AUSTRONESIAN
Austronesian dispersalaccording to Diamond &Bellwood (2003)
![Page 61: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/61.jpg)
AUSTRALIAN
Nichols (1997: 377):“Pama-Nyungan originated in thenortheast of its range and spreadby a combination of language shiftand migration (…) (Evans & Jones1997, McConvell 1996a,b).Northeastern Australia (southernCape York), the likelyPama-Nyungan homeland,is a long-standing center oftechnological innovation(Morwood & Hobbs 1995),an area of deep divergencewithin Pama-Nyungan,and close to the Tangkic family,which represents a likelyfirst sister to Pama-Nyungan(Evans 1995).”
![Page 62: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/62.jpg)
ALGIC
Ruhlen (1994): Proto-Algonkian in the southwest of the family's extentF. Siebert: PA in the area of the eastern upper Great Lakes (cited without
reference by Ruhlen)Denny (1991): PA around Upper Columbia River in Oregon and Washington
![Page 63: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/63.jpg)
UTO-AZTECAN
Hopkins (1965): Columbia PlateauFowler (1983: New MexicoHill (2001): Mesoamerica
Fowler (1983)
![Page 64: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/64.jpg)
CHIBCHAN
![Page 65: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/65.jpg)
Approximate homeland according to Dall‘Igna Rodrigues (1958), based on the presenceOf nearly all major subgroups of the family.
TUPIAN
![Page 66: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/66.jpg)
CACUA-NUKAK
VAPÉS-JAPURÁ
HUITOTOAN
YANOMAM
ZAPAROAN
JIVAROAN
CAHUAPANAN
PANOAN
QUECHUAN ARAWAKAN
CARIBAN
TUPIAN MACRO-GENAMBIKUARAN
JABUTI
ARAUAN
TACANAN, MASCOIAN,MATACOAN, GUAICURUAN
![Page 67: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/67.jpg)
Homelands by tributaries to large rivers,not in the watershed itself.Some ecological explanation?!
![Page 68: State of the art of the Automated Similarity Judgment … of the art of the Automated Similarity Judgment Program Søren Wichmann (MPI-EVA & Leiden University) & The ASJP Consortium](https://reader031.vdocuments.mx/reader031/viewer/2022021819/5ad819757f8b9af9068d0fa7/html5/thumbnails/68.jpg)
Thank you for your attention!
Acknowledment: thanks to Hans-Jörg Bibiko (theone to the right) for implementing the homelandidentification procedure in R