1 bayesian learning for latent semantic analysis jen-tzung chien, meng-sun wu and chia-sheng wu...
Post on 28-Dec-2015
220 Views
Preview:
TRANSCRIPT
1
Bayesian Learning for Latent Semantic AnalysisBayesian Learning for Latent Semantic Analysis
Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng WuJen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu
Presenter: Hsuan-Sheng Chiu
Speech Lab. NTNU 22
ReferenceReference
Chia-Sheng Wu, “Bayesian Latent Semantic Analysis for Text CChia-Sheng Wu, “Bayesian Latent Semantic Analysis for Text Categorization and Information Retrieval”, 2005ategorization and Information Retrieval”, 2005
Q. Huo and C.-H. Lee, “On-line adaptive learning of the continuoQ. Huo and C.-H. Lee, “On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive us density hidden Markov model based on approximate recursive Bayes estimate”, 1997Bayes estimate”, 1997
Speech Lab. NTNU 33
OutlineOutline
IntroductionIntroduction
PLSAPLSAML (Maximum Likelihood)ML (Maximum Likelihood)
MAP (Maximum A Posterior)MAP (Maximum A Posterior)
QB (Quasi-Bayes)QB (Quasi-Bayes)
ExperimentsExperiments
ConclusionsConclusions
Speech Lab. NTNU 44
IntroductionIntroduction
LSA vs. PLSALSA vs. PLSALinear algebra and probabilityLinear algebra and probability
Semantic space and latent topicsSemantic space and latent topics
Batch learning vs. Incremental learningBatch learning vs. Incremental learning
Speech Lab. NTNU 55
PLSAPLSA
PLSA is a general machine learning technique, which adopts the PLSA is a general machine learning technique, which adopts the aspect model to represent the co-occurrence data.aspect model to represent the co-occurrence data.
Topics (hidden variables)Topics (hidden variables)
Corpus (document-word pairs)Corpus (document-word pairs)
Kk zzZz ,...,1
MjNiji wwwdddwdY ,...,,,..., , , 11
Speech Lab. NTNU 66
PLSAPLSA
Assume that dAssume that dii and w and wjj are independent conditionally on the mixtu are independent conditionally on the mixtu
re of associated topic zre of associated topic zkk
Joint probability:Joint probability:
kjkikji zwPzdPzwdP |||,
K
kikkji
K
k i
kikjki
K
k i
kijki
K
k i
ijki
K
kijkiijiji
dzPzwPdPdP
zdPzwPzPdP
dP
zdwPzPdP
dP
dwzPdP
dwzPdPdwPdPwdP
11
11
1
||||
|,,,
|,|,
Speech Lab. NTNU 77
ML PLSAML PLSA
Log likelihood of Y:Log likelihood of Y:
ML estimation:ML estimation:
|logmaxarg YML
N
i
M
jjiji wdPwdnYP
1 1
,log,|log
ikkj dzPzwP |,|
Speech Lab. NTNU 88
ML PLSAML PLSA
Maximization:Maximization:
N
i
M
jijji
N
i
M
jijji
N
i
M
jiji
N
i
M
jijiji
N
i
M
jijiji
N
i
M
jjiji
dwPwdn
dwPwdndPwdn
dwPdPwdn
dwPdPwdn
wdPwdnY
1 1
1 11 1
1 1
1 1
1 1
|log,max
|log,log,max
|loglog,max
|log,max
,log,max|logmax
Speech Lab. NTNU 99
ML PLSAML PLSA
Complete data:Complete data:
Incomplete data:Incomplete data:
EM (Expectation-Maximization) AlgorithmEM (Expectation-Maximization) AlgorithmE-step E-step
M-stepM-step
ikj dzwP |,
ij dwP |
ijkijikj dwzPdwPdzwP ,|||,
Speech Lab. NTNU 1010
ML PLSAML PLSA
E-StepE-Step
iiii
N
i
M
j
K
kijkijkji
N
i
M
j
K
kikjijkji
N
i
M
jdwzijkikjji
N
i
M
jdwzijji
dddd
dwzPdwzPwdn
dzwPdwzPwdn
dwzPdzwPEwdn
dwPEwdn
ijk
ijk
ˆ,ˆ,
,|log,|,
|,log,|,
,|log|,log,
|log,
1 1 1
1 1 1
1 1,|
1 1,|
Speech Lab. NTNU 1111
ML PLSAML PLSA
Auxiliary function:Auxiliary function:
AndAnd
K
l illj
ikkjjik
dzPzwP
dzPzwPwdzP
1||
||,|
N
i
M
j
K
kikkjjikji
z
dzPzwPwdzPwdn
YZYEQ
1 1 1
|ˆ|ˆlog,|,
,|ˆ|,log|ˆ
Speech Lab. NTNU 1212
ML PLSAML PLSA
M-step:M-step:Lagrange multiplierLagrange multiplier
K
k
M
jkjk
N
i
M
j
K
kkjjikji
MLzwP
zwP
zwPwdzPwdnQkj
1 1
1 1 1|
|1
|ˆlog,|,
N
i
K
kikk
N
i
M
j
K
kikjikji
MLdzP
dzP
dzPwdzPwdnQik
1 1
1 1 1|
|1
|ˆlog,|,
Speech Lab. NTNU 1313
ML PLSAML PLSA
DifferentiationDifferentiation
New parameter estimation:New parameter estimation:
N
jj
jj
N
j
N
jjjj
w
wyyywF
1
1 1
1log
K
l
M
j jilji
M
i jikjiikML
M
m
N
i mikmi
N
i jikjikjML
wdzPwdn
wdzPwdndzP
wdzPwdn
wdzPwdnzwP
1 1
1
1 1
1
,|,
,|,|ˆ
,|,
,|,|ˆ
Speech Lab. NTNU 1414
MAP PLSAMAP PLSA
Estimation by Maximizing the posteriori probability:Estimation by Maximizing the posteriori probability:
Definition of prior distribution:Definition of prior distribution:Dirichlet density:Dirichlet density:
Prior density:Prior density:
gXPXPMAP log|logmaxarg|maxarg
K
i
K
iii xxxf i
1 1
1 1,0
K
k
N
iik
M
jkj
ikkj dzPzwPg1 1
1
1
1 ,, ||
jijiij
,0
,1Kronecker delta
kj zwP | kj zwP |Assume andare independent
Speech Lab. NTNU 1515
MAP PLSAMAP PLSA
Consider prior density:Consider prior density:
Maximum a Posteriori:Maximum a Posteriori:
M
i
K
kikik
K
k
M
jkjkj dzPzwPg
1 1,
1 1, |log1|log1log
N
i
M
jijji dwPwdng
1 1
|log,logmax
Speech Lab. NTNU 1616
MAP PLSAMAP PLSA
E-step:E-step:expectationexpectation
Auxiliary function:Auxiliary function:
N
i
K
kikik
M
j
K
kkjkj
N
i
M
j
K
kikkjjikji
dzPzwP
dzPzwPwdzPwdnR
1 1,
1 1,
1 1 1
|ˆlog1|ˆlog1
|ˆ|ˆlog,|,|ˆ~
N
i
M
jdwzijji gdwPEwdn
ijk1 1
,|log|log,
Speech Lab. NTNU 1717
MAP PLSAMAP PLSA
M-stepM-stepLagrange multiplierLagrange multiplier
K
kikd
M
jkjw
N
i
K
kikik
M
j
K
kkjkj
N
i
M
j
K
kikkjjikji
dzPzwP
dzPzwP
dzPzwPwdzPwdnR
11
1 1,
1 1,
1 1 1
|ˆ1|ˆ1
|ˆlog1|ˆlog1
|ˆ|ˆlog,|,|ˆ~
Speech Lab. NTNU 1818
MAP PLSAMAP PLSA
Auxiliary function:Auxiliary function:
M
jkjw
M
j
K
kkjkj
N
i
M
j
K
kkjjikji
MAPzwP
zwPzwP
zwPwdzPwdnQkj
11 1,
1 1 1|
|ˆ1|ˆlog1
|ˆlog,|,
K
kikd
K
k
N
iikik
N
i
M
j
K
kikjikji
MAPdzP
dzPdzP
dzPwdzPwdnQik
11 1,
1 1 1|
|ˆ1|ˆlog1
|ˆlog,|,
Speech Lab. NTNU 1919
MAP PLSAMAP PLSA
DifferentiationDifferentiation
New parameter estimation:New parameter estimation:
M
m
N
i mjmikmi
N
i kjjikjikjMAP
wdzPwdn
wdzPwdnzwP
1 1 ,
1 ,
1,|,
1,|,|ˆ
K
l ili
M
j ikjikji
ikMAPdn
wdzPwdndzP
1 ,
1 ,
1
1,|,|ˆ
K
k
M
jijkiji dwzPdwndn
1 1
,|,
Speech Lab. NTNU 2020
QB PLSAQB PLSA
It needs to update continuously for an online information system.It needs to update continuously for an online information system.Estimation by maximize the posteriori probability:Estimation by maximize the posteriori probability:
Posterior density is approximated by the closest tractable prior density Posterior density is approximated by the closest tractable prior density with hyperparameterswith hyperparameters
As compared to MAP PLSA, the key difference using QB PLSA As compared to MAP PLSA, the key difference using QB PLSA is due to the updating of hyperparameters.is due to the updating of hyperparameters.
1
1
||maxarg
||maxarg|maxarg
nn
nn
nnQB
gXP
PXPP
1,
1,
1 , nik
nkj
n
nikQBk
njQB
nQB dzPzwP |,|
Speech Lab. NTNU 2121
QB PLSAQB PLSA
Conjugate prior:Conjugate prior:In Bayesian probability theory, a conjugate prior is a prior distribution In Bayesian probability theory, a conjugate prior is a prior distribution which has the property that the posterior distribution is the same type which has the property that the posterior distribution is the same type of distribution.of distribution.
A close-form solutionA close-form solution
A reproducible prior/posteriori pair for incremental learningA reproducible prior/posteriori pair for incremental learning
Speech Lab. NTNU 2222
QB PLSAQB PLSA
Hyperparameter α:Hyperparameter α:
M
jkj
kjkj
M
jkjw
M
jkj
w
kjkjw
kj
kj
M
jkjw
M
j
K
kkjkj
zwPzwP
zwPzwP
zwPzwPg
1,
,
1,
1
,,
11 1,
1
1|ˆ,1,1|ˆ
1|ˆ0
|ˆ1
|ˆ1|ˆlog1log
M
m
N
i mjmikmi
N
i kjjikjikj
wdzPwdn
wdzPwdnzwP
1 1 ,
1 ,
1,|,
1,|,|ˆ
1,
1, ,|,
nkj
N
i
nj
nik
nnj
ni
nkj wdzPwdn
Speech Lab. NTNU 2323
QB PLSAQB PLSA
After careful arrangement, exponential of posteriori expectation fAfter careful arrangement, exponential of posteriori expectation function can be expressed:unction can be expressed:
A reproducible prior/posterior pair is generated to build the updatA reproducible prior/posterior pair is generated to build the updating mechanism of hyperparametersing mechanism of hyperparameters
K
k
N
i
nik
nM
jk
nj
n
nn
nik
nkj dzPzwP
R
1 1
1
1
1 ,, |ˆ|ˆ
|ˆexp
1,
1, ,|,
nkj
N
i
nj
nik
nnj
ni
nkj wdzPwdn
1,
1, ,|,
nkj
M
j
nj
nik
nnj
ni
nik wdzPwdn
Speech Lab. NTNU 2424
Initial HyperparametersInitial Hyperparameters
A open issue in Bayesian learningA open issue in Bayesian learning
If the initial prior knowledge is too strong or after a lot of If the initial prior knowledge is too strong or after a lot of adaptation data have been incrementally processed, the new adaptation data have been incrementally processed, the new adaptation data usually have only a small impact on parameters adaptation data usually have only a small impact on parameters updating in incremental training. updating in incremental training.
N
ijikkj wdzP
1
0, ,|1
M
jjikik wdzP
1
0, ,|1
Speech Lab. NTNU 2525
ExperimentsExperiments
MED Corpus: MED Corpus:
1033 medical abstracts with 30 queries1033 medical abstracts with 30 queries
7014 unique terms7014 unique terms
433 abstracts for ML training433 abstracts for ML training
600 abstracts for MAP or QB training600 abstracts for MAP or QB training
Query subset for testingQuery subset for testing
K=8K=8
Reuters-21578Reuters-21578
4270 documents for training4270 documents for training
2925 for QB learning2925 for QB learning
2790 documents for testing2790 documents for testing
13353 unique words13353 unique words
10 categories10 categories
Speech Lab. NTNU 2626
ExperimentsExperiments
Speech Lab. NTNU 2727
ExperimentsExperiments
Speech Lab. NTNU 2828
ExperimentsExperiments
Speech Lab. NTNU 2929
ConclusionsConclusions
This paper presented an adaptive text modeling and classification This paper presented an adaptive text modeling and classification approach for PLSA based information system.approach for PLSA based information system.
Future work:Future work:Extension of PLSA for bigram or trigram will be explored.Extension of PLSA for bigram or trigram will be explored.
Application for spoken document classification and retrievalApplication for spoken document classification and retrieval
30
Discriminative Maximum Entropy Discriminative Maximum Entropy Language Model for Speech RecognitionLanguage Model for Speech Recognition
Chuang-Hua Chueh, To-Chang Chien and Jen-TzunChuang-Hua Chueh, To-Chang Chien and Jen-Tzung Chieng Chien
Presenter: Hsuan-Sheng Chiu
Speech Lab. NTNU 3131
ReferenceReference
R. Rosenfeld, S. F. Chen and X. Zhu, “Whole-sentence exponentiR. Rosenfeld, S. F. Chen and X. Zhu, “Whole-sentence exponential language models : a vehicle for linguistic statistical integrational language models : a vehicle for linguistic statistical integration”, 2001”, 2001
W.H. Tsai, “An Initial Study on Language Model Estimation and W.H. Tsai, “An Initial Study on Language Model Estimation and Adaptation Techniques for Mandarin Large Vocabulary ContinuoAdaptation Techniques for Mandarin Large Vocabulary Continuous Speech Recognition”, 2005us Speech Recognition”, 2005
Speech Lab. NTNU 3232
OutlineOutline
IntroductionIntroduction
Whole-sentence exponential modelWhole-sentence exponential model
Discriminative ME language modelDiscriminative ME language model
ExperimentExperiment
ConclusionsConclusions
Speech Lab. NTNU 3333
IntroductionIntroduction
Language modelLanguage modelStatistical n-gram modelStatistical n-gram model
Latent semantic language modelLatent semantic language model
Structured language modelStructured language model
Based on maximum entropy principle, we can integrate different Based on maximum entropy principle, we can integrate different features to establish optimal probability distribution.features to establish optimal probability distribution.
Speech Lab. NTNU 3434
Whole-Sentence Exponential ModelWhole-Sentence Exponential Model
Traditional method:Traditional method:
Exponential form:Exponential form:
Usage:Usage:When used for speech recognition, the model is not suitable for the When used for speech recognition, the model is not suitable for the first pass of the recognizer, and should be used to re-score N-best lists.first pass of the recognizer, and should be used to re-score N-best lists.
iii sfsp
Zsp exp
10
n
iiin wwwpwwpsp
1111 ...|...
Speech Lab. NTNU 3535
Whole-Sentence ME Language ModelWhole-Sentence ME Language Model
Expectation of feature function:Expectation of feature function:Empirical:Empirical:
Actual:Actual:
Constraint:Constraint:
R
rr
Li
s
Li
Li sf
Rsfspfp
1
1~~
s
Li
Li sfspfp
Fifpfp Li
Li ,...,1for ,~
Speech Lab. NTNU 3636
Whole-Sentence ME Language ModelWhole-Sentence ME Language Model
To Solve the constrained optimization problem:To Solve the constrained optimization problem:
' 1
1
1
1
11
1
1
1
'exp
exp
,
exp
11exp
11expexp
1expexp ,1log
0log1,
1~log
1~,
s
F
i
Li
Li
F
i
Li
Li
s
F
i
Li
Li
s
F
i
Li
Li
s
F
i
Li
Li
F
i
Li
Li
F
i
Li
Li
ME
s
F
i s
Li
s
Li
Li
s
s
F
i
Li
Li
LiME
sf
sf
sp
sf
sfsp
sfspsfsp
sfspsp
p
spsfspsfspspsp
spfpfppHp
Speech Lab. NTNU 3737
GIS algorithmGIS algorithm
converged.not has if 2 step toGo 3.
''
1
,~
log1
on based update ,1each For 2.
,...,1 allfor 0tion with Initializa 1.
ˆ multiplier Lagrange Optimal :Output
~on distributi empirical and ,..., functions Feature :Input
''
'
1
Li
s i
Li
Li
s
Li
i
Li
LiL
iLi
Li
Li
LF
L
sfsfspsfsp
F
fp
fp
F
,...,Fi
Fi
spff
Speech Lab. NTNU 3838
Discriminative ME Language ModelDiscriminative ME Language Model
In general, ME can be considered as a maximum likelihood In general, ME can be considered as a maximum likelihood model using log-linear distribution.model using log-linear distribution.
Propose a Discriminative language model based on whole-Propose a Discriminative language model based on whole-sentence ME model (DME)sentence ME model (DME)
Speech Lab. NTNU 3939
Discriminative ME Language ModelDiscriminative ME Language Model
Acoustic features for ME estimation:Acoustic features for ME estimation:Sentence-level log-likelihood ratio of competing and target sentencesSentence-level log-likelihood ratio of competing and target sentences
Feature weight parameter:Feature weight parameter:
Namely, we activate feature parameter to be one for those speech signals Namely, we activate feature parameter to be one for those speech signals observed in training database observed in training database
X
XX
AX
ss
sssXp
sXpsf
if 0
if |
|log
sentence competing :
sentence target :
s
sX
if 0
if 1
X
XAX
X
Speech Lab. NTNU 4040
Discriminative ME Language ModelDiscriminative ME Language Model
New estimation:New estimation:
Upgrade to discriminative linguistic parametersUpgrade to discriminative linguistic parameters
' 1
1
''exp
exp
s
AX
AX
F
i
Li
Li
AX
AX
F
i
Li
Li
LA
sfsf
sfsf
sp
' 1
1
'exp
exp
s
F
i
Li
DLi
F
i
Li
DLi
DME
sf
sf
sp
Speech Lab. NTNU 4141
Discriminative ME Language ModelDiscriminative ME Language Model
Speech Lab. NTNU 4242
ExperimentExperiment
Corpus: TCC300Corpus: TCC30032 mixtures32 mixtures
12 Mel-frequency cepstral coefficients12 Mel-frequency cepstral coefficients
1 log-energy and first derivation1 log-energy and first derivation
4200 sentences for training, 450 for testing4200 sentences for training, 450 for testing
Corpus: Academia Sinica CKIP balanced corpusCorpus: Academia Sinica CKIP balanced corpusFive million wordsFive million words
Vocabulary 32909 wordsVocabulary 32909 words
Speech Lab. NTNU 4343
ExperimentExperiment
Speech Lab. NTNU 4444
ConclusionsConclusions
A new ME language model integrating linguistic and acoustic A new ME language model integrating linguistic and acoustic features for speech recognitionfeatures for speech recognition
The derived ME language model was inherent with The derived ME language model was inherent with discriminative power.discriminative power.
DME model involved a constrained optimization procedure and DME model involved a constrained optimization procedure and was powerful for knowledge integration.was powerful for knowledge integration.
Speech Lab. NTNU 4545
Relation between DME and MMI Relation between DME and MMI
MMI criterion:MMI criterion:
Modified MMI criterion:Modified MMI criterion:
Express ME model as ML model:Express ME model as ML model:
'
''|
|log
,log
S
MMI SpSXp
SXp
XpSp
XSp
R
r s r
rrr
S
MMI spsXp
spsXp
SpSXp
SpSXp
1 ''''|
|log
''|
|log
~
Speech Lab. NTNU 4646
Relation between DME and MMIRelation between DME and MMI
The optimal parameter:The optimal parameter:
R
r
s
AX
AX
F
i
Li
Li
rA
XAX
F
ir
Li
Li
R
r
s
F
ir
LAi
LAi
F
ir
LAi
LAi
R
rrLALADME
sfsf
sfsf
sf
sf
spsp
rr
rr
r
1
' 1
1
1
' 1
1
1
''exp
exp
logmaxarg
'exp
exp
logmaxarg
logmaxargˆ
Speech Lab. NTNU 4747
Relation between DME and MMIRelation between DME and MMI
MMI
R
rs
r
rrr
R
r
s
F
i
Li
Lir
F
ir
Li
Lirr
R
r
sr
F
i
Li
Li
rr
F
ir
Li
Li
DME
spsXp
spsXp
sfsXp
sfsXp
sXpsf
sXpsf
~
''|
|log
'exp'|
exp|
log
'|log'exp
|logexp
log
1'
1
' 1
1
1
' 1
1
Speech Lab. NTNU 4848
Relation between DME and MMIRelation between DME and MMI
top related