conditional random fields advanced statistical methods in nlp ling 572 february 9, 2012 1

102
Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

Upload: lester-hubbard

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

1

Conditional Random Fields

Advanced Statistical Methods in NLPLing 572

February 9, 2012

Page 2: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

2

RoadmapGraphical Models

Modeling independenceModels revisitedGenerative & discriminative models

Conditional random fieldsLinear chain models

Skip chain models

Page 3: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

3

PreviewConditional random fields

Undirected graphical modelDue to Lafferty, McCallum, and Pereira, 2001

Page 4: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

4

PreviewConditional random fields

Undirected graphical modelDue to Lafferty, McCallum, and Pereira, 2001

Discriminative modelSupports integration of rich feature sets

Page 5: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

5

PreviewConditional random fields

Undirected graphical modelDue to Lafferty, McCallum, and Pereira, 2001

Discriminative modelSupports integration of rich feature sets

Allows range of dependency structuresLinear-chain, skip-chain, generalCan encode long-distance dependencies

Page 6: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

6

PreviewConditional random fields

Undirected graphical modelDue to Lafferty, McCallum, and Pereira, 2001

Discriminative modelSupports integration of rich feature sets

Allows range of dependency structuresLinear-chain, skip-chain, generalCan encode long-distance dependencies

Used diverse NLP sequence labeling tasks:Named entity recognition, coreference resolution, etc

Page 7: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

7

Graphical Models

Page 8: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

8

Graphical ModelsGraphical model

Simple, graphical notation for conditional independence

Probabilistic model where:Graph structure denotes conditional independence

b/t random variables

Page 9: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

9

Graphical ModelsGraphical model

Simple, graphical notation for conditional independence

Probabilistic model where:Graph structure denotes conditional independence

b/t random variables

Nodes: random variables

Page 10: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

10

Graphical ModelsGraphical model

Simple, graphical notation for conditional independence

Probabilistic model where:Graph structure denotes conditional independence

b/t random variables

Nodes: random variablesEdges: dependency relation between random

variables

Page 11: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

11

Graphical ModelsGraphical model

Simple, graphical notation for conditional independence Probabilistic model where:

Graph structure denotes conditional independence b/t random variables

Nodes: random variablesEdges: dependency relation between random variables

Model types: Bayesian Networks Markov Random Fields

Page 12: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

12

Modeling (In)dependenceBayesian network

Page 13: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

13

Modeling (In)dependenceBayesian network

Directed acyclic graph (DAG)

Page 14: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

14

Modeling (In)dependenceBayesian network

Directed acyclic graph (DAG)Nodes = Random VariablesArc ~ directly influences, conditional

dependency

Page 15: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

15

Modeling (In)dependenceBayesian network

Directed acyclic graph (DAG)Nodes = Random VariablesArc ~ directly influences, conditional

dependency

Arcs = Child depends on parent(s)No arcs = independent (0 incoming: only a priori)Parents of X = For each X need

)(X))(|( XXP

Page 16: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

16

Example I

Russel & Norvig, AIMA

Page 17: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

17

Example I

Russel & Norvig, AIMA

Page 18: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

18

Example I

Russel & Norvig, AIMA

Page 19: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

19

Simple Bayesian NetworkMCBN1

A

B C

D E

A B depends on C depends on D depends on E depends on

Need: Truth table

Page 20: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

20

Simple Bayesian NetworkMCBN1

A

B C

D E

A = only a prioriB depends on C depends on D depends on E depends on

Need:P(A)

Truth table2

Page 21: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

21

Simple Bayesian NetworkMCBN1

A

B C

D E

A = only a prioriB depends on AC depends onD depends onE depends on

Need:P(A)P(B|A)

Truth table22*2

Page 22: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

22

Simple Bayesian NetworkMCBN1

A

B C

D E

A = only a prioriB depends on AC depends on AD depends on E depends on

Need:P(A)P(B|A)P(C|A)

Truth table22*22*2

Page 23: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

23

Simple Bayesian NetworkMCBN1

A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

Need:P(A)P(B|A)P(C|A)P(D|B,C)P(E|C)

Truth table22*22*22*2*22*2

Page 24: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

24

Holmes Example (Pearl)Holmes is worried that his house will be burgled. Forthe time period of interest, there is a 10^-4 a priori chanceof this happening, and Holmes has installed a burglar alarmto try to forestall this event. The alarm is 95% reliable insounding when a burglary happens, but also has a false positive rate of 1%. Holmes’ neighbor, Watson, is 90% sure to call Holmes at his office if the alarm sounds, but he is alsoa bit of a practical joker and, knowing Holmes’ concern, might (30%) call even if the alarm is silent. Holmes’ otherneighbor Mrs. Gibbons is a well-known lush and often befuddled, but Holmes believes that she is four times morelikely to call him if there is an alarm than not.

Page 25: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

25

Holmes Example: Model

There a four binary random variables:

Page 26: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

26

Holmes Example: Model

There a four binary random variables:B: whether Holmes’ house has been burgledA: whether his alarm soundedW: whether Watson calledG: whether Gibbons called

B A

W

G

Page 27: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

27

Holmes Example: Model

There a four binary random variables:B: whether Holmes’ house has been burgledA: whether his alarm soundedW: whether Watson calledG: whether Gibbons called

B A

W

G

Page 28: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

28

Holmes Example: Model

There a four binary random variables:B: whether Holmes’ house has been burgledA: whether his alarm soundedW: whether Watson calledG: whether Gibbons called

B A

W

G

Page 29: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

29

Holmes Example: Model

There a four binary random variables:B: whether Holmes’ house has been burgledA: whether his alarm soundedW: whether Watson calledG: whether Gibbons called

B A

W

G

Page 30: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

30

Holmes Example: Tables

B = #t B=#f

0.0001 0.9999

A=#t A=#fB

#t#f

0.95 0.05 0.01 0.99

W=#t W=#fA

#t#f

0.90 0.10 0.30 0.70

G=#t G=#fA

#t#f

0.40 0.60 0.10 0.90

Page 31: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

31

Bayes’ Nets: Markov Property

Bayes’s Nets:Satisfy the local Markov property

Variables: conditionally independent of non-descendents given their parents

Page 32: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

32

Bayes’ Nets: Markov Property

Bayes’s Nets:Satisfy the local Markov property

Variables: conditionally independent of non-descendents given their parents

Page 33: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

33

Bayes’ Nets: Markov Property

Bayes’s Nets:Satisfy the local Markov property

Variables: conditionally independent of non-descendents given their parents

Page 34: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

34

Simple Bayesian NetworkMCBN1 A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

P(A,B,C,D,E)=

Page 35: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

35

Simple Bayesian NetworkMCBN1 A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

P(A,B,C,D,E)=P(A)

Page 36: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

36

Simple Bayesian NetworkMCBN1 A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

P(A,B,C,D,E)=P(A)P(B|A)

Page 37: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

37

Simple Bayesian NetworkMCBN1 A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

P(A,B,C,D,E)=P(A)P(B|A)P(C|A)

Page 38: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

38

Simple Bayesian NetworkMCBN1 A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

P(A,B,C,D,E)=P(A)P(B|A)P(C|A)P(D|B,C)P(E|C)There exist algorithms for training, inference on BNs

Page 39: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

39

Naïve Bayes Model

Bayes’ Net: Conditional independence of features given class

Y

f1 f2 f3 fk

Page 40: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

40

Naïve Bayes Model

Bayes’ Net: Conditional independence of features given class

Y

f1 f2 f3 fk

Page 41: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

41

Naïve Bayes Model

Bayes’ Net: Conditional independence of features given class

Y

f1 f2 f3 fk

Page 42: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

42

Hidden Markov ModelBayesian Network where:

yt depends on

Page 43: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

43

Hidden Markov ModelBayesian Network where:

yt depends on yt-1

xt

Page 44: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

44

Hidden Markov ModelBayesian Network where:

yt depends on yt-1

xt depends on yt

y1 y2 y3 yk

x1 x2 x3 xk

Page 45: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

45

Hidden Markov ModelBayesian Network where:

yt depends on yt-1

xt depends on yt

y1 y2 y3 yk

x1 x2 x3 xk

Page 46: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

46

Hidden Markov ModelBayesian Network where:

yt depends on yt-1

xt depends on yt

y1 y2 y3 yk

x1 x2 x3 xk

Page 47: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

47

Hidden Markov ModelBayesian Network where:

yt depends on yt-1

xt depends on yt

y1 y2 y3 yk

x1 x2 x3 xk

Page 48: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

48

Generative ModelsBoth Naïve Bayes and HMMs are generative

models

Page 49: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

49

Generative ModelsBoth Naïve Bayes and HMMs are generative

models

We use the term generative model to refer to a directed graphical model in which the outputs topologically precede the inputs, that is, no x in X can be a parent of an output y in Y.

(Sutton & McCallum, 2006)State y generates an observation (instance) x

Page 50: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

50

Generative ModelsBoth Naïve Bayes and HMMs are generative

models

We use the term generative model to refer to a directed graphical model in which the outputs topologically precede the inputs, that is, no x in X can be a parent of an output y in Y.

(Sutton & McCallum, 2006)State y generates an observation (instance) x

Maximum Entropy and linear-chain Conditional Random Fields (CRFs) are, respectively, their discriminative model counterparts

Page 51: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

51

Markov Random Fieldsaka Markov Network

Graphical representation of probabilistic modelUndirected graph

Can represent cyclic dependencies(vs DAG in Bayesian Networks, can represent induced

dep)

Page 52: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

52

Markov Random Fieldsaka Markov Network

Graphical representation of probabilistic modelUndirected graph

Can represent cyclic dependencies(vs DAG in Bayesian Networks, can represent induced

dep)

Also satisfy local Markov property:where ne(X) are the neighbors of X

Page 53: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

53

Factorizing MRFsMany MRFs can be analyzed in terms of cliques

Clique: in undirected graph G(V,E), clique is a subset of vertices v in V, s.t. for every pair of vertices vi,vj, there exists E(vi,vj)

Example due to F. Xia

Page 54: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

54

Factorizing MRFsMany MRFs can be analyzed in terms of cliques

Clique: in undirected graph G(V,E), clique is a subset of vertices v in V, s.t. for every pair of vertices vi,vj, there exists E(vi,vj)

Maximal clique can not be extended

Example due to F. Xia

Page 55: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

55

Factorizing MRFsMany MRFs can be analyzed in terms of cliques

Clique: in undirected graph G(V,E), clique is a subset of vertices v in V, s.t. for every pair of vertices vi,vj, there exists E(vi,vj)

Maximal clique can not be extendedMaximum clique is largest clique in G.

Clique:

Maximal clique:

Maximum clique:

Example due to F. Xia

A

B C

E D

Page 56: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

56

MRFsGiven an undirected graph G(V,E), random vars:

X

Cliques over G: cl(G)

Example due to F. Xia

Page 57: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

57

MRFsGiven an undirected graph G(V,E), random vars:

X

Cliques over G: cl(G)

B C

E D

Example due to F. Xia

Page 58: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

58

MRFsGiven an undirected graph G(V,E), random vars:

X

Cliques over G: cl(G)

B C

E D

Example due to F. Xia

Page 59: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

59

Conditional Random FieldsDefinition due to Lafferty et al, 2001:

Let G = (V,E) be a graph such that Y=(Yv)vinV, so that Y is indexed by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Yv obey the Markov property with respect to the graph: p(Yv|X,Yw,w!=v)=p(Yv|X,Yw,w~v), where w∼v means that w and v are neighbors in G

Page 60: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

60

Conditional Random FieldsDefinition due to Lafferty et al, 2001:

Let G = (V,E) be a graph such that Y=(Yv)vinV, so that Y is indexed by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Yv obey the Markov property with respect to the graph: p(Yv|X,Yw,w!=v)=p(Yv|X,Yw,w~v), where w∼v means that w and v are neighbors in G.

A CRF is a Markov Random Field globally conditioned on the observation X, and has the form:

Page 61: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

61

Linear-Chain CRFCRFs can have arbitrary graphical structure, but..

Page 62: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

62

Linear-Chain CRFCRFs can have arbitrary graphical structure, but..

Most common form is linear chain Supports sequence modelingMany sequence labeling NLP problems:

Named Entity Recognition (NER), Coreference

Page 63: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

63

Linear-Chain CRFCRFs can have arbitrary graphical structure, but..

Most common form is linear chain Supports sequence modelingMany sequence labeling NLP problems:

Named Entity Recognition (NER), CoreferenceSimilar to combining HMM sequence w/MaxEnt

modelSupports sequence structure like HMM

but HMMs can’t do rich feature structure

Page 64: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

64

Linear-Chain CRFCRFs can have arbitrary graphical structure, but..

Most common form is linear chain Supports sequence modelingMany sequence labeling NLP problems:

Named Entity Recognition (NER), CoreferenceSimilar to combining HMM sequence w/MaxEnt

modelSupports sequence structure like HMM

but HMMs can’t do rich feature structure

Supports rich, overlapping features like MaxEnt but MaxEnt doesn’t directly supports sequences labeling

Page 65: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

65

Discriminative & Generative

Model perspectives (Sutton & McCallum)

Page 66: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

66

Linear-Chain CRFsFeature functions:

In MaxEnt: f: X x Y {0,1}e.g. fj(x,y) = 1, if x=“rifle” and y=talk.politics.guns, 0

o.w.

Page 67: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

67

Linear-Chain CRFsFeature functions:

In MaxEnt: f: X x Y {0,1}e.g. fj(x,y) = 1, if x=“rifle” and y=talk.politics.guns, 0

o.w.

In CRFs, f: Y x Y x X x T Re.g. fk(yt,yt-1,x,t)=1, if yt=V and yt-1=N and xt=“flies”,0

o.w.frequently indicator function, for efficiency

Page 68: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

68

Linear-Chain CRFsFeature functions:

In MaxEnt: f: X x Y {0,1}e.g. fj(x,y) = 1, if x=“rifle” and y=talk.politics.guns, 0

o.w.

In CRFs, f: Y x Y x X x T Re.g. fk(yt,yt-1,x,t)=1, if yt=V and yt-1=N and xt=“flies”,0

o.w.frequently indicator function, for efficiency

Page 69: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

69

Linear-Chain CRFs

Page 70: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

70

Linear-Chain CRFs

Page 71: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

71

Linear-chain CRFs:Training & Decoding

Training:

Page 72: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

72

Linear-chain CRFs:Training & Decoding

Training: Learn λj

Approach similar to MaxEnt: e.g. L-BFGS

Page 73: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

73

Linear-chain CRFs:Training & Decoding

Training: Learn λj

Approach similar to MaxEnt: e.g. L-BFGS

Decoding:Compute label sequence that optimizes P(y|x)Can use approaches like HMM, e.g. Viterbi

Page 74: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

74

Skip-chain CRFs

Page 75: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

75

MotivationLong-distance dependencies:

Page 76: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

76

MotivationLong-distance dependencies:

Linear chain CRFs, HMMs, beam search, etcAll make very local Markov assumptions

Preceding label; current data given current labelGood for some tasks

Page 77: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

77

MotivationLong-distance dependencies:

Linear chain CRFs, HMMs, beam search, etcAll make very local Markov assumptions

Preceding label; current data given current labelGood for some tasks

However, longer context can be usefule.g. NER: Repeated capitalized words should get same

tag

Page 78: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

78

MotivationLong-distance dependencies:

Linear chain CRFs, HMMs, beam search, etcAll make local Markov assumptions

Preceding label; current data given current labelGood for some tasks

However, longer context can be usefule.g. NER: Repeated capitalized words should get same

tag

Page 79: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

79

Skip-Chain CRFsBasic approach:

Augment linear-chain CRF model withLong-distance ‘skip edges’

Add evidence from both endpoints

Page 80: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

80

Skip-Chain CRFsBasic approach:

Augment linear-chain CRF model withLong-distance ‘skip edges’

Add evidence from both endpoints

Which edges?

Page 81: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

81

Skip-Chain CRFsBasic approach:

Augment linear-chain CRF model withLong-distance ‘skip edges’

Add evidence from both endpoints

Which edges? Identical words, words with same stem?

Page 82: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

82

Skip-Chain CRFsBasic approach:

Augment linear-chain CRF model withLong-distance ‘skip edges’

Add evidence from both endpoints

Which edges? Identical words, words with same stem?

How many edges?

Page 83: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

83

Skip-Chain CRFsBasic approach:

Augment linear-chain CRF model withLong-distance ‘skip edges’

Add evidence from both endpoints

Which edges? Identical words, words with same stem?

How many edges?Not too many, increases inference cost

Page 84: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

84

Skip Chain CRF ModelTwo clique templates:

Standard linear chain template

Page 85: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

85

Skip Chain CRF ModelTwo clique templates:

Standard linear chain templateSkip edge template

Page 86: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

86

Skip Chain CRF ModelTwo clique templates:

Standard linear chain templateSkip edge template

Page 87: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

87

Skip Chain CRF ModelTwo clique templates:

Standard linear chain templateSkip edge template

Page 88: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

88

Skip Chain NERNamed Entity Recognition:

Task: start time, end time, speaker, locationIn corpus of seminar announcement emails

Page 89: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

89

Skip Chain NERNamed Entity Recognition:

Task: start time, end time, speaker, locationIn corpus of seminar announcement emails

All approaches:Orthographic, gazeteer, POS features

Within preceding, following 4 word window

Page 90: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

90

Skip Chain NERNamed Entity Recognition:

Task: start time, end time, speaker, locationIn corpus of seminar announcement emails

All approaches:Orthographic, gazeteer, POS features

Within preceding, following 4 word window

Skip chain CRFs: Skip edges between identical capitalized words

Page 91: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

91

NER Features

Page 92: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

92

Skip Chain NER Results

Skip chain improves substantially on ‘speaker’ recognition- Slight reduction in accuracy for times

Page 93: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

93

SummaryConditional random fields (CRFs)

Undirected graphical modelCompare with Bayesian Networks, Markov Random

Fields

Page 94: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

94

SummaryConditional random fields (CRFs)

Undirected graphical modelCompare with Bayesian Networks, Markov Random

Fields

Linear-chain modelsHMM sequence structure + MaxEnt feature models

Page 95: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

95

SummaryConditional random fields (CRFs)

Undirected graphical modelCompare with Bayesian Networks, Markov Random

Fields

Linear-chain modelsHMM sequence structure + MaxEnt feature models

Skip-chain modelsAugment with longer distance dependencies

Pros:

Page 96: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

96

SummaryConditional random fields (CRFs)

Undirected graphical modelCompare with Bayesian Networks, Markov Random

Fields

Linear-chain modelsHMM sequence structure + MaxEnt feature models

Skip-chain modelsAugment with longer distance dependencies

Pros: Good performanceCons:

Page 97: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

97

SummaryConditional random fields (CRFs)

Undirected graphical modelCompare with Bayesian Networks, Markov Random

Fields

Linear-chain modelsHMM sequence structure + MaxEnt feature models

Skip-chain modelsAugment with longer distance dependencies

Pros: Good performanceCons: Compute intensive

Page 98: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

98

HW #5

Page 99: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

99

HW #5: Beam Search Apply Beam Search to MaxEnt sequence

decoding

Task: POS tagging

Given files:test data: usual formatboundary file: sentence lengthsmodel file

Comparisons:Different topN, topK, beam_width

Page 100: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

Tag ContextFollowing Ratnaparkhi ‘96, model uses previous

tag (prevT=tag) and previous tag bigram (prevTwoTags=tagi-2+tagi-1)

These are NOT in the data file; you compute them on the fly.

Notes:Due to sparseness, it is possible a bigram may not

appear in the model file. Skip it.These are feature functions: If you have a different

candidate tag for the same word, weights will differ.

100

Page 101: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

101

UncertaintyReal world tasks:

Partially observable, stochastic, extremely complex

Probabilities capture “Ignorance & Laziness”Lack relevant facts, conditions

Failure to enumerate all conditions, exceptions

Page 102: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

102

MotivationUncertainty in medical diagnosis

Diseases produce symptoms In diagnosis, observed symptoms => disease IDUncertainties

Symptoms may not occurSymptoms may not be reportedDiagnostic tests not perfect

False positive, false negative

How do we estimate confidence?