deim2012 issei sato

東京大学情報基盤センター助教

佐藤一誠@issei_sato

Statistical Machine LearningTopic Modelingand Bayesian Nonparametrics

確率的潜在変数モデル最前線

DEIM2012チュートリル

1

本発表の仮定

• 参加者は、Topic model, LDAという単語を聞いたことがある程度

• 参加者は、自分の分野への応用に興味がある

• Bayesian Nonparametricsは時間の都合上カット（ごめんなさい）

2

本発表の流れ

• 統計的機械学習

• Latent Dirichlet Allocation

• Topic modelの応用例

3

Topic modelの概要• Topic modelとは？

–文書の確率的生成モデル– (基本的には)単語の共起性をモデル化– (やっていることは)単語のクラスタリング–しかし、なぜか様々な分野で登場している

• 応用(適用)範囲は？–データのあるところなら–シンボルデータでなくても良い(Bag of Words表現)–データ解析の主要技術になりうる

http://www.cs.princeton.edu/~blei/kdd-tutorial.pdf

KDD2011 Topic model チュートリアル4




学習＝抽象化

0132 2 xx問題：

の根x*は？

a

acbbx

2

4*

2

02 cbxax の根x*は？

5

解ける！

統計的機械学習

データ

モデル空間

• 数理モデルによる抽象化

• 目的：データや人の経験を数理モデルにより抽象化することで、未知の問題解決を行う

6

w1 w2 w3

W1=統計文書

単語頻度分布

7

W3=学習

W2=文法

Topic model[1/2]

W1=統計文書

8

W3=学習

W2=文法

Topic model[2/2]

topic1 topic2

Topic 分布

Topic１

Topic１Topicって何？部分Simplex空間の基底

W1=統計文書

9

W3=学習

W2=文法

Topic model[2/2]

topic1 topic2

Topic 分布

Topic１

Topic１Topicって何？部分Simplex空間の基底

cvcvijw ,

ijz ,

j

t

Latent Dirichlet Allocation(LDA)

[Blei+,2001]


データ

モデル空間



学習モデルの研究

学習アルゴリズムの研究

10

KL-divergence最小化

dxxqxpdxxpxp

xqxpKL

)|(log)(*)(*log)(*

)]|(||)(*[

)]|(||)(*[

)|(

minarg*)|(

xqxpKL

xq

θxq

N

i

i θxqNθxq

θxq1

)|(log1

)|(

minarg*)|(

経験分布による近似=最尤推定

真の分布モデル

11

ベイズ推定

)]|(||)(*[)(

1

xqxpKLep

Z

dDpxpDxp )|()|()|(

• 点推定

• ベイズ推定

*)|( θxq)(* xp をで表現するθ

θ*

p(θ|D)

事後分布

の経験分布による近似12

ベイズ推定

dDpxpDxp )|()|()|(

s

s

sxpS

Dxp1

)|(1

)|(

)|(~ Dps

)]|(||)([minarg)()(

DpqKLqq

VB

dqxpDxp )()|()|(

MCMC:

Variational Bayes:

13


データ

モデル空間



汎化性能

14

汎化誤差とPerplexity [1/2]

)]]|(||)(*[[ DxpxpKLED

])|(

)(*log)(*[ dx

Dxp

xpxpED

N

i

traintest

i DxpN 1

)|(log1

汎化誤差が小さいモデル＝良いモデル15

汎化誤差とPerplexity [2/2]

N

i

traintest

i DxpN

ppl1

)|(log1

exp

分岐数を表す（低いほうが良い）

e.g. 総選択肢=1000でppl=100だと選択肢の数が1/10に減った

Perplexity: トピックモデルの評価手法

16


データ

モデル空間



学習モデルの研究

学習アルゴリズムの研究

17

潜在変数とグラフィカルモデル

1xClass 2Class 1

Class 3 Class 4

21 z1x

iz

ix k),|(~ ii zxpx18

潜在変数生成過程

N

データ数N

4

Latent Dirichlet Allocation (LDA)

Blei +, JMLR2003

目的：多重トピック文書モデル

19

文書 j

20

LDAで重要な3つの点

1 2 Topic 分布

Topic１

Topic１

jjw

1

2

V次元単語空間

jw

t

jw

j

頻度ベクトル（Bag of Words）

1 2 3

単語分布

Dirichlet 分布

K次元Simplex空間上の分布

T

t

t

tT

tt

T

tt

1

1

1

1

)(

)();(Dir

)(Dir~

21

vt ,

tj ,文書-Topic分布：文書jでTopic tの出現確率

Topic-単語分布：Topic tで単語vの出現確率

)(~ Dirj

)(~ Dirt

Jj ,...,1

Tt ,...,1

文書jで単語vの出現確率

T

t

vttj

1

,, 22

LDAのグラフィカルモデル

ijz ,

j

t

ijw ,T

Jnj

tijw ~,

jij tz ~)(,

)(~ Dirj

)(~ Dirt

生成過程

潜在変数

23

潜在変数の仮定：単語は各々トピックを持つ

The apple forms a tree that is small and deciduous, reaching 3 to 12 metres (9.8 to 39 ft) tall, with a broad, often densely twiggy crown.

24

Apple is an American multinational corporation that designs and sells consumer electronics, computer software, and personal computers.

潜在変数の仮定：単語は各々トピックを持つ

The apple forms a tree that is smalland deciduous, reaching 3 to 12 metres (9.8 to 39 ft) tall, with a broad, often densely twiggy crown.

25

Apple is an American multinationalcorporation that designs and sells consumer electronics, computer software, and personal computers.

工学的な利点：疎から疎への射影

Apple is an American multinationalcorporation that designs and sells consumer electronics, computer software, and personal computers.

5 1 1 4 42 1 10 1 2 2 5 5 5 1 5 5

Bag of words

Bag of Topics

26

ニュース記事

LDA

pLSI

Mixture of Unigram

Unigram model

27

映画鑑賞データ

LDA

pLSI

MoU

28

SVM-2値分類

29

Rethinking LDA: Why Priors Matter

Wallach +, NIPS2009

目的：Dirichlet分布のParameterに関する分析

30

Dirichlet parameter settings

ijz ,

j

t

ijw ,T

J

)(~ Dirj

)(~ Dirt

),...,,( 21 T

),...,,( 000

TTT

),...,,( 21 V

),...,,( 000

VVV

Asymmetric

Asymmetric

Symmetric

Symmetric

31

A-α S-β

A-α A-β

S-α S-β

S-α A-β

αがAsymmetricであるほうが良い

32

Asymmetric αではStop Wordsがまとまる33

LDA meets Submodular

El-Arni+, KDD2009Turning Down the Noise in the Blogosphere

Yue+, NIPS2011Linear Submodular Bandits and their Application to Diversified Retrieval

目的：多様性のある要素集合の抽出35

劣モジュラ関数(Submodular function)

• A,B:集合, R:実数値の集合

• 集合関数 F(A): A→R

• 情報のカバー率の性質を現すのに適している

–情報が少ないほうが１つの情報の影響が強い

• 劣モジュラ関数最大化のGreedy解法

F(AGreedy )≧(1-1/e)F(A*)≒0.63F(A*)

F(A∪{a}) - F(A) ≧ F(B∪{a}) - F(B) for all a and sets A⊆B

36

LDA-based 劣モジュラ関数[1/2]

Dd

tdθtDF ,11);(

Topic tが文書集合Dに１度も現れない確率

Topic tが文書集合Dに少なくとも１度現れる確率

文書集合DによるTopic tのカバー率

劣モジュラ関数になっている

37

LDA-based 劣モジュラ関数[2/2]

);()( tDFwDFt

t Rwt

トピックの重み付け和で集合関数を定義：

劣モジュラ関数の線形和⇒劣モジュラ関数

F(D)を最大にする集合Dを探す(s.t. |D|≦K)→トピックカバー率を最大にする集合Dを探す→多様性のある集合Dを抽出できる

38

Collaborative Topic modeling

Wang+,KDD2011 Best paperCollaborative Topic modelingfor Recommending Scientific Articles

目的：Topic分布を考慮した文書推薦

http://www.cs.princeton.edu/~chongw/citeulike/

39

http://www.cs.princeton.edu/~chongw/citeulike/

Collaborative Filtering

• 行列分解アプローチ

1 1 ? 3 ?

? 5 ? ? 1

? ? 2 5 ?

3 ? ? 2 ?

? 2 ? 4 ?

Users

Products

ＵV

≒

低ランク近似

jir ,

iujv

u v

J I

40

• Product=文書ならば文書のトピック情報を入れたい

Collaborative Topic Modeling

njz ,

j

t

njw ,

Tjir ,

iujv

u v

J I

J 41

• Product=文書ならば文書のトピック情報を入れたい


njz ,

j

t

njw ,

Tjir ,

iujv

u v

I

J 42


njz ,

j

t

njw ,

Tjir ,

iujv

u v

I

J 43

User profile example 1

45

Topic Model with Power-law

• 文書のPower-lawの性質をPitman-Yor過程を用いてモデル化: PY(a,d,LDA)

500 words document

[Sato+,KDD2010]

46

Human Action Recognition bySemi-latent Topic Models

Video sequence↓

track and stabilizeeach human figure

Bag of words representation

Motion words

[Wang,PAMI2009]

47

LDAの学習アルゴリズム• Blei+,JMLR2003

– Latent Dirichlet Allocation

– Variational Bayes inference

• Griffiths+,PNAS2004 – Finding scientifictopics

– Collapsed Gibbs sampler

• Teh+, NIPS2006– Collapsed variational Bayesian Inference Algorithm for

Latent Dirichlet Allocation

• Asuncion+,UAI2009

– On smoothing and inference for topic models

– Collapsed Variational Bayes Zero 48

オンライン学習• Samper• Yao+,KDD2009

– Efficient Methods for Topic Model Inference on Streaming Document Collections

• Canini+,AISTATS2009– Online Inference of Topics with Latent Dirichlet

Allocation

• Variational Bayes• Hoffman+,NIPS2010

– Online Learning for Latent Dirichlet Allocation

• Sato+,NIPS2010– Deterministic Single-pass Algorithm for LDA

49

並列学習• Zhai+, WWW2012

– Using Variational Inference and MapReduce to Scale Topic Modeling

• Asuncion+, Statistical Methodology2011– Asynchronous Distributed Estimation of Topic Models

for Document Analysis

• Smola, VLDB2010– An Architecture for Parallel Topic Models

• Newman+, JMLR2009– Distributed Algorithms for Topic Models

• Ihler+, TKDE2009– Understanding Errors in Approximate Distributed

Latent Dirichlet Allocation

50

LDA学習レシピ

• Collapsed Gibbs sampler or

Collapsed Variational Bayes Zeroを使う

• Dirichlet parameter

– α: asynmmetricを使う

– β: symmetric(でもよい)

–学習はFixed point iterationを使う

Minka2000, Estimating a Dirichlet distribution

[Asuncion+,UAI2009][Sato+,NIPS2010]参照

51

Topic Modelingレシピ

• Bag of XXX を考える

• 拡張元モデルを内包するモデル化を心がける

• 学習はCollapsed Gibbs samplerを使う

• 余裕があれば(Collapsed) Variational Bayes

• ただし、高次元の実ベクトルのサンプリングは避ける

–Collapsing(積分消去)

–高次元実ベクトルを点推定する

52

Q and A

53

deim2012 issei sato

Technology