language modeling for speaker recognition dan gillick january 20, 2004
Post on 20-Dec-2015
217 Views
Preview:
TRANSCRIPT
Language modeling for speaker recognition
Dan Gillick
January 20, 2004
January 20, 2004 Language modeling for speaker recognition Dan Gillick (2)
Outline
• Author identification
• Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition)
• My next project
January 20, 2004 Language modeling for speaker recognition Dan Gillick (3)
Author ID (undergrad. thesis)
Problem: – train models for each of k authors– given some test text written by 1 of those
authors, identify the correct author
Variations:– different kinds of models– different size test samples– different k
January 20, 2004 Language modeling for speaker recognition Dan Gillick (4)
Character n-gram models
What?– 27 tokens: a-z, <space>– some text generated from such a trigram model:
“you orthad gool of anythilly
uncand or prafecaustiont and to hing that put ably”
January 20, 2004 Language modeling for speaker recognition Dan Gillick (5)
Character n-gram models
Why?– very simple– data sparseness less troublesome than with
word n-grams– supposed to be state-of-the-art or at least close
to it (Khmelev, D, Tweedie, F.J. “Using Markov Chains for the Identification of Writers”: Literary and Linguistic Computing, 16(4): 299-307. 2001.)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (6)
Character n-grams: Setup
• task: pick correct author from 10 possible authors
• training data: 3 novels for each author
• test data: text from a held-out novel
• jack-knifing: 4 novels for each of 20 authors
January 20, 2004 Language modeling for speaker recognition Dan Gillick (7)
Character n-grams: Results• task: picking 1 author from 10 possible authors
• training data size: 3 novels
January 20, 2004 Language modeling for speaker recognition Dan Gillick (8)
Character n-gram models
Why does it work?– captures some word choice information– picks up word endings (–ing, -tion, -ly, etc.)– not hurt much by data sparseness issues
January 20, 2004 Language modeling for speaker recognition Dan Gillick (9)
Key-list models
Incentive:– ought to be able to beat character n-grams– develop a new modeling method more focused
on that which differentiates between authors (characters and words are both useful for topic recognition, but that doesn’t mean they are best for author recognition)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (10)
Key-list models
Idea:– convert the text stream into a stream of only
authorship-relevant symbols (I called these lists of symbols key-lists)
– each symbol is a regular expression to allow for broad definitions (/*tion/ captures any nounification)
– text not accounted for by the key-list is represented by <short>, <med>, or <long> markers
– build n-gram models from these new streams
January 20, 2004 Language modeling for speaker recognition Dan Gillick (11)
Key-list models
sample trigram: <comma> <short> <period>
Regular Expression Description
(\w)(,)(\s) comma
(\w)(\.)(\s) period
(\b)(of|for|to|around|after| … )(\b) common prepositions
(\b)(was|were \w*ed(\b) passive voice
(\b)(is|was|will|are|were|am)(\b) is conjugations
(\b)(\w*ing)(\b) ends in –ing
(\b)(\w*ly)(\b) adverb
(\b)(and|but|or|not|if|then|else)(\b) logical
(\b)(as)(\b) as
(\b)(would|should|could)(\b) modal verbs
Sample key-list:
January 20, 2004 Language modeling for speaker recognition Dan Gillick (12)
Key-list models: Results• task: picking 1 author from 10 possible authors
• training data size: 3 novels
January 20, 2004 Language modeling for speaker recognition Dan Gillick (13)
Key-list models: Results
Some other interesting results:– key-lists with just punctuation (as well as
<short>, <med>, <long>) performed almost as well as the best key-lists
– all key-lists were outperformed by the best n-letter model when test data size < 10,000 chars. but all key-list models eventually surpassed the n-letter models
January 20, 2004 Language modeling for speaker recognition Dan Gillick (14)
Key-list models
Things I didn’t do:– vary amount of training data– spend a long time trying different key-lists– combine key-list results with each other or with
the character results– a lot of other stuff
The thesis is available on the web: http://www.dgillick.com/resource/thesis.pdf
January 20, 2004 Language modeling for speaker recognition Dan Gillick (15)
Outline
• Author identification
• Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition)
• My next project
January 20, 2004 Language modeling for speaker recognition Dan Gillick (16)
G. Doddington’s LM strategy
• create LMs with a limited vocabulary of the most commonly occurring 2000 bigrams
• to smooth out zeroes, boost each bigram prob. by 0.001
• score by calculating:
logprob(test|target) – logprob(test|bkg)
• logprobs are joint probabilitieslogprob(AB) = logprob(A) + logprob(B|A)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (17)
G. Doddington’s LM: Setup
Switchboard 1 data:– collected in early ’90s from all over the US– 2,400 (~5 min.) conversations among 543 speakers– corpus divided into 6 splits and tested using jack-knifing
through the splits– manual transcripts provided by MS. State
Task:– 8 conversation sides used as training data to build models
for each target speaker– 1 conversation side used as test data– background model built from 3 splits of held-out data– jack-knifing allowed for almost 10,000 trials
January 20, 2004 Language modeling for speaker recognition Dan Gillick (18)
G. Doddington’s LM: Results
Notes:– these results are my own
attempt to replicate the original experiments
– SRI reported EER = 8.65% for this same experiment
January 20, 2004 Language modeling for speaker recognition Dan Gillick (19)
Adapted bigram models
Incentive:– adapting target models from a much larger
background model should yield better estimates of probabilities in the language models
Specifically:– use same 2000 bigram vocabulary– target probabilities are a mixture of training
probabilities and background probabilities– mixture weight is 2:1 target data:bkg. data
January 20, 2004 Language modeling for speaker recognition Dan Gillick (20)
Adapted bigram models: Results
Notes:– nearly identical performance
– combination of the 2 systems yields almost no improvement
– why isn’t the adapted version better?
January 20, 2004 Language modeling for speaker recognition Dan Gillick (21)
Can anything improve on 8.68?
Trigrams?– use same count threshold to make a list of the
top 700 trigrams (“a lot of”, “I don’t know” were among the most common)
Character models?– worked well for authorship…– included all character combinations (no limited
vocabulary)– tried bigram and trigram models
January 20, 2004 Language modeling for speaker recognition Dan Gillick (22)
Scores and combinationsadapt. word bigrams
EER = 8.89%adapt. word trigrams
EER = 11.88%adapt.char. bigrams
EER = 13.73%adapt. char. trigrams
EER = 17.92%
adapted wordsEER = 8.46%
adapted words + adapted charactersEER = 7.89%
adapted charactersEER = 13.24%
GD bigramsEER = 8.68%
January 20, 2004 Language modeling for speaker recognition Dan Gillick (23)
Final Comparison
January 20, 2004 Language modeling for speaker recognition Dan Gillick (24)
What about less training data?
1 conversation-side training– character models might provide more of an
advantage with less data?– not so.
• GD EER = 22.5%• adapted character EER = 30%• adapted word EER = 20%
– maybe these character models pick up on the topic of that 1 conversation
– haven’t tried any other size training data
January 20, 2004 Language modeling for speaker recognition Dan Gillick (25)
Outline
• Author identification
• Trying to beat GD’s result
• My next project
January 20, 2004 Language modeling for speaker recognition Dan Gillick (26)
Key-lists for speaker recognition
• key-list n-grams picked up on phrasing (comma and period were valuable tokens)– automatic transcripts don’t have punctuation
but they do have pause and duration information
• use reg. exps. and duration info. to capture idiosynchratic speaker phrasing
• capture other speech information in key-lists? (energy, f0, etc.)
January 20, 2004 Language modeling for speaker recognition Dan Gillick (27)
Acknowledgements
Thanks to:
Anand and Luciana at SRI for trying to help me replicate their results
Barbara for providing advice
Barry and Kofi for helping with computers and stuff
George
top related