lexical borrowing learning of optimality theoreticmuri/presentations/year4-2014-11-14/...2014/11/14...
TRANSCRIPT
![Page 1: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/1.jpg)
Optimality Theoretic Learning of
Lexical BorrowingYulia Tsvetkov Waleed Ammar Chris Dyer
![Page 2: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/2.jpg)
src
Book the flight …
VB DT NN …
tgt
project annotations
Resource-poor NLP
annotation projection1. via word alignments2. via cross-lingual similarities
![Page 3: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/3.jpg)
Outline
1. Motivation: lexical borrowing as a source of cross-lingual lexical similarities
2. A constraint-based model of lexical borrowing for Arabic-Swahili
3. A model of lexical borrowing improves Swahili-English MT
*unpublished work, in preparation for NAACL’15
![Page 4: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/4.jpg)
Words that are orthographically or phonetically similar across different languages and are likely to bemutual translations
Cross-lingual lexical similarities
![Page 5: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/5.jpg)
Whence cross-lingual lexical similarities? ● Chance (unrelated, false friends)
○ insignificant amount of words
![Page 6: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/6.jpg)
Whence cross-lingual lexical similarities? ● Foreign words (transliterations)
Core
Core-periphery lexicon structureItô & Mester ‘95
Periphery
English New YorkYoruba Niu YokiSwahili New YorkRussian Нью-ЙоркArabic نیویورك
![Page 7: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/7.jpg)
Whence cross-lingual lexical similarities? ● Foreign words (transliterations)
○ proper names○ specialized, peripheral vocabulary
Core
Periphery
English New YorkYoruba Niu YokiSwahili New YorkRussian Нью-ЙоркArabic نیویورك
![Page 8: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/8.jpg)
Whence cross-lingual lexical similarities? ● Foreign words (transliterations)● Genetically related words (cognates)
○ words in related languages inherited from one word in a common ancestral language
○ content words in core language lexicon
Core
Periphery
Latin nocteFrench nuitSpanish nocheItalian notte
Portuguese noiteRomanian noapte
![Page 9: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/9.jpg)
Whence cross-lingual lexical similarities? ● Foreign words (transliterations)● Genetically related words (cognates)● Borrowed words
○ frequent content words○ of foreign origin, but aren’t perceived as foreign
Core
Periphery
Arabic سكرArabic
*transliteratedsukkar
Latin zuccarumFrench sucreGerman ZuckerItalian zucchero
English sugar
![Page 10: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/10.jpg)
This work: Lexical borrowing
● Foreign words (transliterations)● Genetically related words (cognates)● Borrowed words (loanwords)
Arabic سكرArabic
*transliteratedsukkar
Latin zuccarumFrench sucreGerman ZuckerItalian zucchero
English sugar
Adoption and nativization of words from another language (as a result of language contact)
![Page 11: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/11.jpg)
Borrowing is a fundamental research topic in linguistics
Yip ‘93 (Cantonese)
Davidson & Noyer ‘97 (Huave)
Jacobs & Gussenhoven ‘00
Kang ‘03 (Korean)
Kenstowicz & Suchato ‘06 (Thai)
Adler ‘06 (Hawaiian)
Rose & Demuth ‘06
Kenstowicz ‘07 (Fijian)
Schadeberg ‘09 (Swahili)
Mwita ‘09 (Swahili)
Hurskainen ‘04 (Swahili)
Adelaar ‘10 (Malagasy)
Kenstowicz ‘06 (Yoruba)
and many more...
![Page 12: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/12.jpg)
TransliterationKnight & Graehl ‘98
Al-Onaizan & Knight ‘02
Virga & Khudanpur ‘03
Klementiev & Roth ‘06
Tao et al. ‘06
Ravi & Knight ‘09
Ammar,Dyer & Smith ‘12
Borrowing
✘
Prior work (in NLP)
CognatesMann & Yarowsky ‘01
Kondrak ‘01
Kondrak,Marcu & Knight ‘03
Bouchard-Côté et al. ‘09
Hall & Klein ‘10
![Page 13: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/13.jpg)
Lexical borrowing graph
پلپل pilpil
Persian
פלפלfalafel’
Hebrew
فالفلfalāfil
Arabic
pilipili
Swahili
parpaare
Gawwada
प पलpippalī
Sanskrit
Haspelmath & Tadmor ‘09
![Page 14: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/14.jpg)
Borrowing is pervasive!
Resource-poor languages # speakers Borrowed from resource-rich (% types)
Swahili, Zulu, Malagasy, Hausa, Tarifit, Yoruba
200 million Arabic, Spanish, English, French (>40%)
Japanese, Vietnamese, Korean, Cantonese, Thai
400 million Chinese, English (30-70%)
Hindustani, Hindi, Urdu, Bengali, Persian, Pashto
860 million Arabic, English (>40%)
1.4 billion
![Page 15: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/15.jpg)
Case study: Arabic-Swahili borrowing
پلپل pilpil
Persian
פלפלfalafel’
Hebrew
فالفلfalāfil
Arabic
pilipili
Swahili
parpaare
Gawwada
प पलpippalī
Sanskrit
![Page 16: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/16.jpg)
Arabic-Swahili borrowing: history● 800 A.D.-1920 Indian Ocean trading● Influence of Islam
● ~40% of Swahili types are borrowed from Arabic
*from Standard Swahili-English dictionary (Johnson ‘39)
![Page 17: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/17.jpg)
Arabic-Swahili borrowing: examples
English ArabicSemitic
SwahiliBantu
Phonological & morphological integration
fever حمىḥummat
homa* syllable structure adaptation: CV, CVV, CVC, CVCC → V, CV* degemination - Swahili does not allow consonant clusters* vowel substitution
minister الوزیرAlwzyr
kiuwaziri
* Arabic morphology (optionally) drops* Swahili morphology is applied* vowel epenthesis to keep syllables open* vowel substitution
palace القصرAlqSr
kasiri * consonant adaptation: /tˤ/→/t/, /dˤ/→/d/, /θ/→/s/, /x/→/k/, etc* vowel epenthesis
![Page 18: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/18.jpg)
Arabic-Swahili borrowing: our research goals
1. Given a Swahili vocabulary and an Arabic vocabulary, identify plausible donor-loanword candidates
2. Produce a ranked list of candidate donor-loanword pairs
3. Augment Swahili-English MT using Arabic-Swahili borrowing model
![Page 19: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/19.jpg)
Arabic-Swahili borrowing model
Arabic to IPA SwahiliRank
loanword candidates
from IPAGenerate loanword candidates
1. Convert letters to phones2. Generate loanword candidates3. Rank loanword candidates
rule-based
learned
![Page 20: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/20.jpg)
Arabic-Swahili borrowing model: from orthographic to phonetic space
Arabic to IPA SwahiliRank loanword candidates
from IPAGenerate loanword candidates
(book.sg.indef)
كتاباkuttabakitaba...
kitabukitabu
1. Convert letters to phones
![Page 21: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/21.jpg)
Arabic-Swahili borrowing model: generating candidate loanwords
Arabic to IPA SwahiliRank loanword candidates
from IPASyllabificationMorphological adaptationPhonological adaptation
(book.sg.indef)
كتاباkuttabakitaba...
kitabukitabu
2. Adapt Arabic words to Swahili syllable structure, morphology and phonology
Polomé ‘67; Zawawi ‘79; Schadeberg ‘09; Mwita ‘09
![Page 22: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/22.jpg)
ku.tata.ba.li.ku.tata.ba.vi.ki.ta.bu. ki.ta.bu.ki.ta.bu.(book.sg.indef)
كتاباkuttabakitaba...
kitabukitabu
SyllabificationSwahili Morphologicaladaptation
Arabic-to-SwahiliPhonological adaptation
Arabic affixremoval
kuttabakuttabkitabakitab...
ku.tta.ba.ku.t.ta.ba....ki.ta.ba.ki.ta.b.
ku.ta.ba. [degemination]
ku.tata.ba.[epenthesis]
ku.ta.bu. [final vowel subst.]
ki.ta.bu. [final vowel subst.]
ki.ta.bu. [epenthesis]
2. Adapt Arabic words to Swahili syllable structure, morphology and phonology
Arabic-Swahili borrowing model: generating candidate loanwords
(Littell, Price & Levin ‘14)
![Page 23: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/23.jpg)
Arabic-Swahili borrowing model: learning candidate ranking
Arabic to IPA SwahiliRanking with Optimality Theory constraints
from IPASyllabificationMorphological adaptationPhonological adaptation
(book.sg.indef)
كتاباkuttabakitaba...
kitabukitabu
3. Produce a ranked list of candidate loanwords
ku.tata.ba.li.ku.tata.ba.vi.ki.ta.bu. ki.ta.bu.ki.ta.bu....
![Page 24: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/24.jpg)
Optimality Theorylanguage-universal
constraints
underlying (donor) form
pronounced forms(loanword candidates)
optimal (loanword) form
*competing, violable
constraints ranked differently
in donor and recipient
languages
Prince & Smolensky ‘08; McCarthy ‘09
![Page 25: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/25.jpg)
Optimality Theory constraintsFaithfulness Constraints
MAX - IO - MORPH MAX - IO - CMAX - IO - V
no (donor) affix deletionno consonant deletionno vowel deletion
DEP - IO - MORPHDEP - IO - V
no (recipient) affix epenthesisno vowel epenthesis
IDENT - IO - P IDENT - IO - G IDENT - IO - EIDENT - IO - C IDENT - IO - F IDENT - IO - V
no pharyngeal consonant substitutionno glottal consonant substitutionno emphatic consonant substitutionno consonant substitutionno final vowel substitutionno vowel substitution
Faithfulness constraints impose input-output correspondence
![Page 26: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/26.jpg)
Markedness Constraints
Optimality Theory constraints
NO-CODA ONSETPEAKSSP* COMPLEX - S* COMPLEX - C* COMPLEX - V
syllables must not have a codasyllables must have onsetsthere is only one syllabic peakcomplex onsets rise in sonorityno consonant clusters on syllable marginsno consonant clusters within a syllableno vowel clusters
Markedness constraints impose output well-formedness
![Page 27: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/27.jpg)
Arabic to IPA SwahiliRanking with Optimality Theory constraints
from IPASyllabificationMorphological adaptationPhonological adaptation
(book.sg.indef)
كتاباkuttabakitaba...
kitabukitabu
3. Produce a ranked list of candidate loanwords
ku.tata.ba.li.ku.tata.ba.vi.ki.ta.bu. ki.ta.bu.ki.ta.bu.
Arabic-Swahili borrowing model: learning candidate ranking
![Page 28: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/28.jpg)
Arabic to IPA SwahiliRanking with Optimality Theory constraints
from IPASyllabificationMorphological adaptationPhonological adaptation
(book.sg.indef)
كتاباkuttabakitaba...
kitabukitabu
3. Produce a ranked list of candidate loanwords
ku.tata.ba.li.ku.tata.ba.ku.tta.ba. ki.ta.bu.ki.ta.bu.
ku.ta<DEP-V>ta<PEAK>.ba.li<DEP-MORPH>.ku.ta<DEP-V>ta<PEAK>.ba.li.ku.tta<*COMPLEX>.ba.ki.ta.bu<IDENT-IO-V>.ki.ta.bu<DEP-V>.
Arabic-Swahili borrowing model: learning candidate ranking
![Page 29: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/29.jpg)
EVAL
Re-rank loanword candidates to promote input-output correspondence and output well-formedness
Arabicwords
Donor words to IPA
Swahiliwords
Ranking with Optimality Theory constraints
IPA to Recipient words
GEN
Generate plausible Swahili phonetic forms
SyllabificationMorphological adaptationPhonological adaptation
Arabic-Swahili borrowing model
Unweighted insertion/deletion/substitution transducers
Weighted identity transducers
![Page 30: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/30.jpg)
1. Extract a small training set from Arabic-English and English-Swahili parallel corpora based on phonetic and semantic similarity (cf. Kondrak ‘01, cognate identification)
2. Expand the extracted training set using Arabic morph. analyzer
3. Learn OT constraint weights using Machine Learning
Arabic-Swahili borrowing model:learning constraint weights
TrainingTest
417 examples73 examples (15%), manually verified by a native Arabic speaker and using a Swahili-English dictionary
![Page 31: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/31.jpg)
Arabic-Swahili borrowing model:evaluation
1. Model design
2. Model accuracy
3. Qualitative evaluationOT constraint ranking is consistent with linguistic accounts
Dev Test
ReachabilityAmbiguity
75885
88857
(%)(avg. candidates per input word, baseline:787,000)
Accuracy (%)
Levenshtein CRF (transliteration Ammar et al. ‘12)
8.916.4
Levenshtein Levenshtein-H (cognate Mann & Yarowsky ‘01)
19.819.7
OT uniform constraint weightsOT learned constraint weights
29.352.0
orth
ogra
phic
phon
etic
OT
![Page 32: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/32.jpg)
Arabic-Swahili borrowing: research goals
1. Given a Swahili vocabulary and an Arabic vocabulary, identify plausible donor-loanword candidates
2. Produce a ranked list of candidate donor-loanword pairs
3. Augment Swahili-English MT using Arabic-Swahili borrowing model
✔
✔
![Page 33: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/33.jpg)
AR
Arabic-English MTResource-rich 5.5M sentences
SW
safarikituruki
ysAfr travel یسافرtrky turkish تركي
Swahili-English MTLow-resource 14K sentences 5K OOV types (7.5%)
EN
??? (OOV)
BORROWINGMODEL
TRANSLATIONCANDIDATES
EN
MT experiments
BLEU
Baseline 18.0
+ OOV loanwords 18.5
![Page 34: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/34.jpg)
1. First study on lexical borrowing in NLP
2. First study that operationalizes Optimality Theory in a downstream task
3. Swahili-English MT improvement
Summary of contributions
![Page 35: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/35.jpg)
1. More languages
2. More MT experiments
3. Core NLP tasks: cross-lingual part-of-speech tagging
Future work
![Page 36: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/36.jpg)
Swahili shukuruArabic shukran - شكرا
English thank you
![Page 37: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/37.jpg)
*a study on 1,460 core words Schadeberg ‘09
Loanwords (% within sem. field)
Semantic field Total Arabic English Other
MODERN WORLD 73.6 15.1 43.7 14.8
RELIGION 55.7 47.5 - 9.2
LAW 54.6 41.1 9.4 4.1
POSSESSION 48.1 41.4 1.9 4.9
SOCIO - POLITICAL 47.5 37.9 - 9.6
EMOTIONS 46.8 39 1.6 6.2
COGNITION 46 40.6 1.5 3.9
CLOTHING 43.4 11.1 18.8 13.5
THE HOUSE 37.5 19.3 6.6 11.7
nouns 19%
adjectives 19%
verbs 15%
adverbs 14%
func. words 15%
Arabic-Swahili borrowing statistics
![Page 38: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/38.jpg)
http://blog.oxforddictionaries.com/2014/08/which-everyday-english-words-came-from-arabic/
![Page 39: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/39.jpg)
(book.sg.indef)
SyllabificationDonorwords
Donor words to IPA
Loanwords
Ranking with Optimality Theory constraints
Recipient Morphologicaladaptation
IPA to Recipient words
Donor-to-Recipient Phonological adaptation
Donor affixremoval
GEN EVAL
كتاباkuttaba
kitaba...
kuttabakuttabkitabakitab...
ku.tta.ba.ku.t.ta.ba....ki.ta.ba.ki.ta.b....
ku.ta.ba. [degemination]ku.tata.ba. [epenthesis]ku.ta.bu. [final vowel subst.]ki.ta.bu. [final vowel subst.]ki.ta.bu. [epenthesis]...
ku.tata.ba.li.ku.tata.ba.vi.ki.ta.bu. ki.ta.bu.ki.ta.bu. ...
kitabuku.ta<DEP-V>ta<PEAK>.ba.li<DEP-MORPH>.ku.ta<DEP-V>ta<PEAK>.ba.li.ku.tta<*COMPLEX>.ba.ki.ta.bu<IDENT-IO-V>.ki.ta.bu<DEP-V>.vi<DEP-MORPH>.ki.ta.bu<IDENT-IO-V>.
kitabu
ARABIC SWAHILI
Arabic-Swahili borrowing model
![Page 40: Lexical Borrowing Learning of Optimality TheoreticMURI/Presentations/year4-2014-11-14/...2014/11/14 · 2. Expand the extracted training set using Arabic morph. analyzer 3. Learn](https://reader033.vdocuments.mx/reader033/viewer/2022052005/6018740ab645af35806013a1/html5/thumbnails/40.jpg)
● Syllable structure CV, CVV, CVC, CVCC → V, CV
● MorphologyArabic affixes deletion (optional) Swahili affixes concatenation
● PhonologyVowel deletion – shortening of Arabic long vowels and vowel clusters Consonant degemination – shortening of Arabic geminate consonantsSubstitution of similar phones – /tˤ/→/t/, /dˤ/→/d/, /θ/→/s/, /x/→/k/, etc.Vowel epenthesis – eliminating Arabic codas and consonant clustersFinal vowel substitution – /u/, /o/, /i/, /e/
Arabic-Swahili morphophonological adaptation