automatic assessment of spoken modern standard arabic
DESCRIPTION
Automatic Assessment of Spoken Modern Standard Arabic. NAACL Boulder, Colorado 5 June 2009 Pearson Knowledge Technologies Palo Alto, California Jian Cheng Jared Bernstein Ulrike Pado Masa Suzuki. Outline. Pearson Knowledge Technologies How Versant tests operate - PowerPoint PPT PresentationTRANSCRIPT
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1
Automatic Assessment ofSpoken Modern Standard Arabic
NAACLBoulder, Colorado
5 June 2009
Pearson Knowledge TechnologiesPalo Alto, California
Jian ChengJared Bernstein
Ulrike PadoMasa Suzuki
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 2
Outline
1. Pearson Knowledge Technologies
2. How Versant tests operate
2. Versant Arabic Test (development)
3. Validation evidence
4. Predictive accuracy
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 3
Pearson Knowledge Tech. (PKT)
(KAT + Ordinate) are now PKT
KAT ≈ {LSA, Essay Scoring, Write-to-Learn, PTE, etc.}
Ordinate ≈ {Versant, ORF for NCES, VersaReader, PTE, etc.)
PKT is part of Pearson
Pearson ≈ { FT, Economist, Penguin,Longman, PsychCorp, … etc}
PearsonKT is in Boulder, Colorado and Palo Alto, California.
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 4
Test delivery
Databasetests, prompts,
responses
ENGLISH
SPANISH
DUTCH
speech
report
Com
munication N
etwork
Delivery
Interface
CaliforniaAnywhere
Scoring system
ARABIC
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 5
Versant Database
Test Delivery Server
Scoring
“The train’s been delayed by one hour ”
How Versant tests operate
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 6
Versant Arabic Test
• DLI purpose~1000 students at DLI need predictive speaking tests
• RequirementsAccurate test of Arabic listening & speaking
Convenient to use at DLI and worldwide (ILR is costly)
Suitable for repeated formative testing
High peak capacity for mass screening
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 7
Construct ComparisonOPI Construct: Oral Proficiency as manifest in an Oral
Proficiency Interview, is compatible with communicative competence as reflected in the functional level and/or complexity of content accurately produced.
Versant Construct: facility in spoken language –the ability to understand spoken language and speak appropriately in response at a conversational pace on everyday topics.
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 8
Versant Arabic Test
Part A: Reading
Part B: Repeat -1
Part C: Short Answers
Part D: Sentence Builds
Part E: Repeat -2
Part F: Passage Retelling
Test Structure
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 9
Versant Scoring
Read Repeat Sentence 1 Sent Build Repeat Sentence 2SAQ Passage
HumanScoring
VocabularySentence MasteryFluencyPronunciation
20% 30% 30% 20%
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 10
How Versants are developed (1)
ScaleEstimates
Test Spec
Versant Scores
Native TestDevelopers
Ordinate System
Item TextRecorded
Items
Validation
ConcurrentILR
Interviews
Arabic Learners
NativeScribes
CriteriaNativeJudges
scale scores
transcripts
ILR ScoresArabic Natives
Internal
External
(Versant Arabic Test)
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 11
kutubu al-waladi – the books of the boy
kataba al-waladu – wrote the boysubj
• No disambiguating short vowels written• Vowels carry phonetic information• Vowels carry grammar information
Arabic Challenges: Voweling
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 12
for visit of us – for our visit• Complicates lexicon lookup, frequency
estimates…• “Short” Arabic items are harder than English
items with the same number of words
Complex Morphology
liziyaaratnaa
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 13
Development & Run-time ProcessesCompilation of expectation and runtime flow
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 14
Training data sources
Native Data
Egypt Syria Iraq Palestine Other Total
484 281 179 187 517 1648
Learner Data
DLI Non-DLI Total
1120 552 1672
Prompt Voices
Country Egypt Iraq Jordan Morocco Lebanon Palestine Syria
Voices F, M F, M M F M F, M F, M
Prompt Voices and Training Samples
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 15
Reliability: Scores are consistent
Validity:Native and non-native speakers should be clearly
distinct
MSA and dialect speakers should be distinct(since we’re testing MSA)
Machine scores should predict human scores
Validation Criteria
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 16
Reliability
Score
Split-Half Reliability
(N = 134)
Test – Retest Reliability
(N = 100)
Overall 0.98 0.97
Sentence Mastery
0.97 0.96
Vocabulary 0.89 0.82
Fluency 0.97 0.96
Pronunciation 0.96 0.94
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 17
Native ~ Non-Native Scores
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 18
Natives by Countries
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 19
Educated ~ Uneducated SpeakersC
um
ula
tive
Den
sity
Arabic Overall Score
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 20
Machine – Human Comparison
ScoreCorrelation
(N = 134)
Overall 0.97
Sentence Mastery 0.97
Vocabulary 0.96
Fluency 0.84
Pronunciation 0.83
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 21
How Versants Compare to OPIs
Versant Arabic Overall Score
ILR
OP
I S
core
(lo
git
s)
N = 118r = 0.87
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 22
Spanish & English: Versant ~ HumanIL
R O
PI
Sc
ore
(lo
git
s)
Versant Spanish Score
N = 37r = 0.92
Spanish English
N = 37r = 0.92
N = 151r = 0.86
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 23
Summary
• Versant Arabic Test (VAT) is in operation• Based on a large and wide body of transcribed
spoken material• VAT is available on demand• Returns consistent, accurate scores that
reflect real-time skills with MSA• VAT can triage or screen for OPI tests
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 24
النهاية
Thanks to Waheed Samy, Naima Bousofara Omar, Eli Andrews,Mohamed Al-Saffar, Nazir Kikhia, Rula Kikhia,and Linda Istanbullifor item development and data collection/transcription in Arabic,
and to Andy Freeman for providing diacritic markings.