automatic grammatical error detection of non-native spoken...
TRANSCRIPT
Automatic Grammatical Error Detection of Non-native Spoken Learner English
Kate Knill1, Mark Gales1, Potsawee Manakul1, Andrew Caines2
1Engineering Department / ALTA Institute 2Computer Science and Technology / ALTA Institute
17 May 2019
Challenge and Advantages of Spoken Language
• Spoken language consists of
Text + Pronunciation + Prosody + Delivery
• Challenge for feedback on “grammatical” errors in spoken language
Spoken Text ≠ Written Text
• We don�t speak in sentences, we repeat ourselves, hesitate, mumble etc
• There is no defined spoken grammar standard
• Advantages of speech
• There are no spelling or punctuation mistakes
• We provide additional information within the audio signal
2
Example of Learner Speech
flor company is an engineering compa- is is is eng- engineering company %hes% %hes% in the in the poland %hes% we do business the ref- refinery business and the chemical business %hes% the job we can offer is a engineering job %hes% basically this is the job in the office
// flor company is an [FS engineering {P compa-} [REP is is + is] {P eng-} + engineering company] {F %hes%} {F %hes%} [REP in the + in the} poland // {F %hes%} we do business the {P ref-} refinery business and the chemical business {F %hes%} // the job we can offer is a engineering job // {F %hes%} basically this is the job in the office //
// flor company is an engineering company in the poland // we [RV do] [RN business] the refinery business and the chemical business // the job we can offer is [FD a] engineering job // basically this is [RD the] job in the office
MANUAL TRANSCRIPTION
META-DATA EXTRACTION
GRAMMATICAL ERRORS
3
Grammatical Error Detection
• Task: given a sentence automatically label each word with
• Pword (grammar is correct ) and Pword (grammar is incorrect )
• Example sentence
4
Internet was something amazing for me . P(c) 0.02 0.96 0.97 0.97 0.95 0.98 0.99 P(i) 0.98 0.04 0.03 0.03 0.05 0.02 0.01
• Predict prob. distribution yi for each token w1:N = {w1,�,wN}
Sequence Labeller
5
P(yi|w1:N) = [0.92;0.08]
sHidden Layerx
Backward LSTM
Word embedding
wi = dog
+
Forward LSTM
Character-level Bidirectional LSTM
wi,1 = d wi,3 = g wi,2 = o
marekrei
Corpora
• Non-native English learners with grammatical error annotation
• BULATS: free speech with up to 1 minute per response
• NICT-JLE: oral proficiency test interviews
• CLC: range of written exams at different grade levels
6
Corpus Spoken/ Written
# Wds # Uniq Wds
Audio L1s Grades
BULATS
Spoken 61.9K 3.4K Yes 6 A1-C2
NICT--JLE Spoken 135.3K 5.6K No 1 A1-B2
CLC Written 14.1M 79.1K No Many A1-C2
Data Processing for Spoken GED
• Match data processing in training and testing a. Train: Text data - correct spelling errors and remove punctuation and casing
b. Test: Speech data - convert speech transcriptions to be “like” text
7
// flor company is an engineering company in the poland // we do business the refinery business and the chemical business // the job we can offer is a engineering job // basically this is the job in the office
GED Using CLC Trained Model
8
F0.5 57.6%
GED Using CLC Trained Model
9
F0.5 57.6%
F0.5 49.7%
F0.5 44.1%
(Small Scale) System Error Analysis
• True precision higher for Spoken BULATS than scores suggest
• System error (~27%)
.. and i have to practice more because I have ..
• Unmarked error (~40%) .. so I think you need taxi
• Next to error tagged word(s) (~27%) .. and continue to inform with customer when we have ..
• To provide feedback we need to boost recall of high precision items
• Issue: lack of labelled learner speech corpora
• Adapt/“fine-tune” CLC trained system to subset of target speech data
10
Boosting GED Performance on Spoken BULATS
11
• Fine-tune CLC system with 80% data, dev 10%, test 10% x10
• Fine-tuning produces significant boost in performance • has also learnt some annotator bias e.g. “two thousand eight”
flower companies in joining company is is is engineer engineering company in the in the poll one %hes% we do business there are refinery business in a chemical business %hes% the job we can offer is [del] engineering job %hes% basically this is the job in the office
Example of Learner Speech: ASR Transcription
// flower companies in engineering company in the poll one // we do business there are refinery business in a chemical business // the job we can offer is engineering job // basically this is the job in the office
ASR TRANSCRIPTION ERRORS
ASR “GRAMMATICAL ERRORS” � feedback focused
12
flor company is an engineering compa- is is is eng- engineering company %hes% %hes% in the in the poland %hes% we do business the ref- refinery business and the chemical business %hes% the job we can offer is a engineering job %hes% basically this is the job in the office
MANUAL GRAMMATICAL ERRORS
BULATS ASR Annotation Error Rates
13
0
5000
10000
15000
20000
25000
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
A1 A2 B1 B2 C
%GER %WER # Words
• Overall: 71751 words 16.5% GER 25.2% WER
BULATS ASR Word Error Rate
14
# %WER
Overall 71751 25.2
“Fluent” 52698 19.3
Grammatical Error 10348 29.1
Disfluency 2524 36.4
GED on BULATS ASR Transcriptions
15
• Manual transcriptions used for GE marking and meta-data extraction
• Significantly lower performance than manual transcriptions
F0.5 26.6%
F0.5 37.3%
Conclusions
• Detecting “grammatical” errors in learner speech is hard!
• As is annotating the errors
• Focus on high precision region for feedback
• Testing if regions where errors detected are sufficient to provide useful help
• More research required into:
• Meta-data extraction
• Boosting training data by mimicing learner speech errors
• Detecting portions of ASR transcription the system is confident in
16
Questions?
Thanks to Cambridge English Language Assessment for supporting this research and providing access to the BULATS data.
17