sd study: statistical learning of domain-dependent semantic structure

1

Upload: kyoshiro-sugiyama

Post on 23-Jan-2017

35 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

Page 1: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

7/14 SD studyChapter 2

Statistical Learning of Domain-Dependent Semantic Structure

D1 Kyoshiro Sugiyama

Page 2: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Chapter 2

2.1 Semantic Information Structure based on

Predicate-Argument Structure

2.1.1 Predicate-Argument Structure

2.2 Extraction of Domain-dependent P-A Patterns

2.2.1 Significance Score based on TF-IDF Measure

2.2.2 Significance Score based on Naïve Bayes Model

2.2.3 Clustering of Named Entities

2.2.4 Evaluation of P-A Significance Scores

2.3 Conclusion2/29

Page 3: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Chapter 2 intro.

This chapter introduces a statistical learning method of domainknowledge based on semantic structure of the domain corpus,which plays an important role in the proposed system.

The domain knowledge is based on predicate-argument (P-A)structure, which is one of the most fundamental informationstructures in a natural language text.

The userful information structure depends on the domain. In orderto automatically extract useful domain-dependent P-A structure, astatistical measure is introduced, resulting in a completelyunsupervised learning of semantic information structure given adomain corpus.

3/29

Page 4: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

In three lines

To automatically extract domain knowledge based on domain-dependent useful P-A structure

A statistical measure is introduced

Unsupervised learning of semantic information structure

4/29

Page 5: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Predicate-Argument (P-A) Structures(述語項構造)

Arguments項

Predicates述語

5/29

Page 6: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Problem setting

Every P-A structure is not useful.

They extract useful information using hand-crafted templates. (conventional)

It is so costly that it cannot applied to a variety of domains.

In this chapter, two scoring method are prescribedto extract domain-dependent useful information patterns

6/29

Page 7: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Useful information of domain

Only a fraction of the patterns is useful,

and it is domain-dependent.

e.g.)

* beat: 打ち勝つ, acquire: 買収する

Baseball domain[A beat B] important[A hit B] important[A sell B] not important[A acquire B] not important

Business domain[A beat B] not important[A hit B] not important[A sell B] important[A acquire B] important

7/29

Page 8: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Automatic information extraction

Which P-A pair is important in XXX domain?

Two significance measures are prescribed.

Baseball

Soccer

Business

Economy

Corpus/Websites

Baseball domain:P-A pair Score[A hit B] 0.9[A beat B] 0.9︙[A sell B] 0.2[A acquire B] 0.1

Autocalc.

8/29

Page 9: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Chapter 2

2.1 Semantic Information Structure based on

Predicate-Argument Structure

2.1.1 Predicate-Argument Structure

2.2 Extraction of Domain-dependent P-A Patterns

2.2.1 Significance Score based on TF-IDF Measure

2.2.2 Significance Score based on Naïve Bayes Model

2.2.3 Clustering of Named Entities

2.2.4 Evaluation of P-A Significance Scores

2.3 Conclusion9/29

Page 10: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Definition of TF-IDF measure

𝑤𝑖: word𝑑: document𝐶(∙):count function𝛼, 𝛽:smoothing factor

Here 𝛼 = 1, 𝛽 = 1

10/29

Page 11: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Intuitive meaning of TF-IDF score

How often does 𝑤𝑖 occurin this document?

How rare is the document 𝑑that contain 𝑤𝑖 ?

Domain-specific and frequent words have high score.

11/29

Page 12: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Chapter 2

2.1 Semantic Information Structure based on

Predicate-Argument Structure

2.1.1 Predicate-Argument Structure

2.2 Extraction of Domain-dependent P-A Patterns

2.2.1 Significance Score based on TF-IDF Measure

2.2.2 Significance Score based on Naïve Bayes Model

2.2.3 Clustering of Named Entities

2.2.4 Evaluation of P-A Significance Scores

2.3 Conclusion12/29

Page 13: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Definition of Naïve Bayes based score

𝛾: smoothing factor with the Dirichlet process prior

13/29

Page 14: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Intuitive meaning of Naïve Bayes based score

# of word 𝑤𝑖

# of word 𝑤𝑖

in domain D

14/29

Page 15: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Intuitive meaning of Naïve Bayes based score

# of word 𝑤𝑖

# of word 𝑤𝑖

in domain D

# of words in domain D

# of words in whole documents

15/29

Page 16: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Intuitive meaning of Naïve Bayes based score

# of word 𝑤𝑖

in domain D

# of words in domain D

# of words in whole documents

Probability thatan unknown word

belongs to domain D?

what is gamma?

# of word 𝑤𝑖

16/29

Page 17: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Chapter 2

2.1 Semantic Information Structure based on

Predicate-Argument Structure

2.1.1 Predicate-Argument Structure

2.2 Extraction of Domain-dependent P-A Patterns

2.2.1 Significance Score based on TF-IDF Measure

2.2.2 Significance Score based on Naïve Bayes Model

2.2.3 Clustering of Named Entities

2.2.4 Evaluation of P-A Significance Scores

2.3 Conclusion17/29

Page 18: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Problem with named entity

Named entities: name of persons, organizations, locations…

Sparseness problem, mismatch between training and test set

NE classes are introduced for robust estimation.

18/29

Page 19: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Clustering of named entities

Argument (Semantic_role) Predicate

19/29

Page 20: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Equations

Naïve Bayes score

Probability of word occurrence

same?

20/29

Page 21: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Intuitive meaning of NE clustering

Toritani (agent) hit

Ichiro (agent) hit

Score

Score

P(Toritani)

P(Ichiro)

Score of[Person] (agent) hit

Training set

Test set

Matsui (agent) hit

Mismatching

Sparse Dense

Score of[Person] (agent) hit

Matching

sum

21/29

Page 22: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Chapter 2

2.1 Semantic Information Structure based on

Predicate-Argument Structure

2.1.1 Predicate-Argument Structure

2.2 Extraction of Domain-dependent P-A Patterns

2.2.1 Significance Score based on TF-IDF Measure

2.2.2 Significance Score based on Naïve Bayes Model

2.2.3 Clustering of Named Entities

2.2.4 Evaluation of P-A Significance Scores

2.3 Conclusion22/29

Page 23: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Evaluation of significance score

Task: Useful information extraction (for QA, info. navigation)

Methods:

baseline: all of P-A pairs are useful

TF-IDF: 𝑇𝐹𝐼𝐷𝐹 𝑤𝑎 × 𝑇𝐹𝐼𝐷𝐹 𝑤𝑠, 𝑤𝑝 > threshold

NB(PS+A): 𝑃 𝐷|𝑤𝑎 × 𝑃(𝐷|𝑤𝑠, 𝑤𝑝) > threshold

NB(PSA): 𝑃 𝐷|𝑤𝑎, 𝑤𝑠, 𝑤𝑝 > threshold

With NEs clustering

23/29

Page 24: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Data sets

Training set:

Mainichi Newspaper corpus 2000-2008

Evaluation set (Dev 10%, Test 90%):Mainichi newspaper’s website which talks about professional baseball games played between April 21-23, 2010

Manual annotated P-A patterns are “useful”

24/29

Page 25: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Result (precision, recall and F-measure)

25/29

Page 26: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Result (precision, recall and F-measure)

seems the best

Just 100% recall

but low precision

some degradation

high precision

26/29

Page 27: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Precision-recall curve

Baseline

27/29

Page 28: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Consideration

PS+A is robust on data-sparseness problem more than PSA.

Typical successes“勝つ(have a win)”, “登板する(come into pitch)”, etc.

Typical errors“する(do)”, “なる(become)”: frequently and not domain-specific but sometimes important verbs.

“日本一 (ニ格) 輝く(won the championship)”: very important but appear on other sports domain and infrequently (once/year).

28/29

Page 29: SD study: Statistical Learning of Domain-Dependent Semantic Structure

Kyoshiro SUGIYAMA , AHC-Lab. , NAIST

Conclusion

The statistical learning of semantic structures is formulatedby defining the significance score of the domain-dependentP-A structure.

The score based on the Naive Bayes is introduced to selectuseful templates in a given domain automatically.

The experimental results show that the high scores are givento important patterns in the domain.

The scoring method does not require any annotated data orthesaurus in the domain and it can be applied to a variety ofdomains.

29/29