spoken arabic dialect id - columbia universitydpwe/e6820/proposals/fadi.pdf · 13 corpora –...
TRANSCRIPT
![Page 1: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/1.jpg)
1
Spoken Arabic Dialect ID
Speech & Audio Processing & Recognition
Fadi BiadsyMarch 13, 2008
![Page 2: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/2.jpg)
2
Background
Modern Standard Arabic (MSA): standard language throughout the Arab world (Literary Arabic)
A native Language of Nobody
Colloquial Arabic: collective term for all dialects of Arabic
![Page 3: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/3.jpg)
3
Maghrebi, Egyptian, Sudanese, Levantine, Iraqi, Arabian
![Page 4: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/4.jpg)
4
Dialect ID
Given a speech segment as short as possible Dialect ID
![Page 5: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/5.jpg)
5
Why Study Dialect ID
Interesting problem Phonetic cues? Prosodic cues? (e.g., intonational contours, phrase accents,
durational features...)
*Lexical and syntactic features?
![Page 6: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/6.jpg)
6
Why Study Dialect ID
![Page 7: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/7.jpg)
6
Why Study Dialect ID
ASR fails when an Arabic speaker code switches to her regional dialect
![Page 8: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/8.jpg)
6
Why Study Dialect ID
ASR fails when an Arabic speaker code switches to her regional dialect
Identifying dialects prior to recognition enables the ASR to adapt its:
![Page 9: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/9.jpg)
6
Why Study Dialect ID
ASR fails when an Arabic speaker code switches to her regional dialect
Identifying dialects prior to recognition enables the ASR to adapt its:
Pronunciation Model Acoustic Models Morphological Model Language Model
![Page 10: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/10.jpg)
6
Why Study Dialect ID
ASR fails when an Arabic speaker code switches to her regional dialect
Identifying dialects prior to recognition enables the ASR to adapt its:
Pronunciation Model Acoustic Models Morphological Model Language Model
Speaker Annotation
![Page 11: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/11.jpg)
7
Dialect ID – Our Approach
Phonotactic Modeling Hypothesis: Every Arabic dialect has its own
phonetic distribution This approach was successfully used in
Language ID
![Page 12: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/12.jpg)
8
Dialect ID - TRAIN
![Page 13: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/13.jpg)
8
Dialect ID - TRAIN
First, train an MSA Arabic “phone” recognizer
![Page 14: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/14.jpg)
8
Dialect ID - TRAIN
First, train an MSA Arabic “phone” recognizer Now, given K dialects
For Dialect idh uw z hh ih n d uw w ay ey d y aw ao uh jh y eh k oh aa k v hh aw ao n
f uw v ow z l iy g s m p l k dh n eh g f ey m p l ay ae
dh iy jh sh p eh ae ey d p sh ua r m ey f ay n z
![Page 15: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/15.jpg)
9
Dialect ID - TRAIN
First, train an MSA Arabic “phone” recognizer Now, given K dialects
For Dialect idh uw z hh ih n d uw w ay ey d y aw ao uh jh y eh k oh aa k v hh aw ao n
f uw v ow z l iy g s m p l k dh n eh g f ey m p l ay ae
dh iy jh sh p eh ae ey d p sh ua r m ey f ay n z
Train an n-gram modelλi
![Page 16: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/16.jpg)
10
Dialect ID - TEST
Given a speech segment S from an unknown dialect:
uw hh ih n d uw w ay ey uh jh y eh k oh v hh aw ao n hh aa m
S PS
![Page 17: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/17.jpg)
11
Dialect ID - TEST
Given a speech segment S from an unknown dialect:
uw hh ih n d uw w ay ey uh jh y eh k oh v hh aw ao n hh aa m
S PS
![Page 18: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/18.jpg)
12
Experiment
Train an MSA “phone” recognizer on ~37 hours of speech from TDT4 Broadcast News
![Page 19: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/19.jpg)
13
Corpora – Levantine
![Page 20: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/20.jpg)
13
Corpora – Levantine
Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524 speaker Each dialogue is 10 minutes 127 hours of speech Annotated: LEB=547, JOR=393, PAL=187, SYR=72
![Page 21: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/21.jpg)
13
Corpora – Levantine
Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524 speaker Each dialogue is 10 minutes 127 hours of speech Annotated: LEB=547, JOR=393, PAL=187, SYR=72
Silence based segmentation + remove every segment < 0.5s
![Page 22: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/22.jpg)
14
Corpora – Egyptian
![Page 23: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/23.jpg)
14
Corpora – Egyptian
CALLHOME Egyptian Arabic Speech
![Page 24: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/24.jpg)
14
Corpora – Egyptian
CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers
![Page 25: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/25.jpg)
14
Corpora – Egyptian
CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of
speech
![Page 26: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/26.jpg)
14
Corpora – Egyptian
CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of
speech
![Page 27: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/27.jpg)
14
Corpora – Egyptian
CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of
speech
Silence based segmentation + remove every segment < 0.5s
![Page 28: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/28.jpg)
14
Corpora – Egyptian
CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of
speech
Silence based segmentation + remove every segment < 0.5s
![Page 29: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/29.jpg)
14
Corpora – Egyptian
CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of
speech
Silence based segmentation + remove every segment < 0.5s
![Page 30: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/30.jpg)
14
Corpora – Egyptian
CALLHOME Egyptian Arabic Speech 120 Dialogues 240 speakers Each dialogue is 30 minutes 60 hours of
speech
Silence based segmentation + remove every segment < 0.5s
![Page 31: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/31.jpg)
15
Experiment
![Page 32: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/32.jpg)
15
Experiment
Egyptian corpus: held-out 20/240 speakers Run the Arabic phone recognizer on 220 files: ~18.3 million phones
Levantine corpus, held out 757/1524 Run the Arabic phone recognizer on 220 files:
~19.4 million phones
![Page 33: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/33.jpg)
16
Results on the held out Data
Levantine: 98.3% 744/757 were correctly classified as Levantine
Egyptian: 95% 19/20 were correctly classified as Egyptian
![Page 34: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/34.jpg)
17
Results on a different corpus
Babylon Levantine corpus Microphone Recordings 164 speakers ~60 hours of speech Accuracy: 96.3% speakers
![Page 35: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/35.jpg)
18
TODO
Test on a different corpus for Egyptian
Try to identify “sub” dialects (from the same corpus)
Identify Gulf and Iraqi Arabic
Incorporate English phone recognizer
![Page 36: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/36.jpg)
19
Important issue (TODO)
We use all the speech of a speaker avg: ~5 minutes for Lev. avg: ~15 minutes for Egy.
Will this approach work if we use less than 30s of speech?
![Page 37: Spoken Arabic Dialect ID - Columbia Universitydpwe/e6820/proposals/fadi.pdf · 13 Corpora – Levantine Arabic CTS Levantine Fisher Training Data Set 1,2,3 Speech 762 Dialogues 1524](https://reader030.vdocuments.mx/reader030/viewer/2022040204/5ea768a2ea68f66162709a80/html5/thumbnails/37.jpg)
20
Thank you!