information extraction, conditional random fields, and ...mccallum/talks/alphatech2005.pdf · what...

126
Information Extraction, Conditional Random Fields, and Social Network Analysis Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Aron Culotta, Charles Sutton, Ben Wellner, Khashayar Rohanimanesh, Wei Li, Andres Corrada, Xuerui Wang

Upload: others

Post on 25-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Information Extraction,Conditional Random Fields,and Social Network Analysis

Andrew McCallum

Computer Science DepartmentUniversity of Massachusetts Amherst

Joint work withAron Culotta, Charles Sutton, Ben Wellner, Khashayar Rohanimanesh, Wei Li,

Andres Corrada, Xuerui Wang

Page 2: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Goal:

Mine actionable knowledgefrom unstructured text.

Page 3: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Pages Containing the Phrase“high tech job openings”

Page 4: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Extracting Job Openings from the Web

foodscience.com-Job2

JobTitle: Ice Cream Guru

Employer: foodscience.com

JobCategory: Travel/Hospitality

JobFunction: Food Services

JobLocation: Upper Midwest

Contact Phone: 800-488-2611

DateExtracted: January 8, 2001 Source: www.foodscience.com/jobs_midwest.html

OtherCompanyJobs: foodscience.com-Job1

Page 5: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

A Portal for Job Openings

Page 6: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Job

Ope

ning

s:C

ateg

ory

= H

igh

Tech

Key

wor

d =

Java

Lo

catio

n =

U.S

.

Page 7: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Data Mining the Extracted Job Information

Page 8: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

IE fromChinese Documents regarding Weather

Department of Terrestrial System, Chinese Academy of Sciences

200k+ documentsseveral millennia old

- Qing Dynasty Archives- memos- newspaper articles- diaries

Page 9: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

IE from Cargo Container Ship Manifests

Cargo Tracking Div.US Navy

Page 10: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

IE from Research Papers[McCallum et al ‘99]

Page 11: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

IE from Research Papers

Page 12: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Mining Research Papers

[Giles et al]

[Rosen-Zvi, Griffiths, Steyvers, Smyth, 2004]

Page 13: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

What is “Information Extraction”

Information Extraction = segmentation + classification + clustering + association

As a familyof techniques:October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO BillGates railed against the economic philosophyof open-source software with Orwellianfervor, denouncing its communal licensing asa "cancer" that stifled technologicalinnovation.

Today, Microsoft claims to "love" the open-source concept, by which software code ismade public to encourage improvement anddevelopment by outside programmers. Gateshimself says Microsoft will gladly disclose itscrown jewels--the coveted code behind theWindows operating system--to selectcustomers.

"We can be open source. We love the conceptof shared source," said Bill Veghte, aMicrosoft VP. "That's a super-important shiftfor us in terms of code access.“

Richard Stallman, founder of the FreeSoftware Foundation, countered saying…

Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation

Page 14: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

What is “Information Extraction”

Information Extraction = segmentation + classification + association + clustering

As a familyof techniques:October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO BillGates railed against the economic philosophyof open-source software with Orwellianfervor, denouncing its communal licensing asa "cancer" that stifled technologicalinnovation.

Today, Microsoft claims to "love" the open-source concept, by which software code ismade public to encourage improvement anddevelopment by outside programmers. Gateshimself says Microsoft will gladly disclose itscrown jewels--the coveted code behind theWindows operating system--to selectcustomers.

"We can be open source. We love the conceptof shared source," said Bill Veghte, aMicrosoft VP. "That's a super-important shiftfor us in terms of code access.“

Richard Stallman, founder of the FreeSoftware Foundation, countered saying…

Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation

Page 15: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

What is “Information Extraction”

Information Extraction = segmentation + classification + association + clustering

As a familyof techniques:October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO BillGates railed against the economic philosophyof open-source software with Orwellianfervor, denouncing its communal licensing asa "cancer" that stifled technologicalinnovation.

Today, Microsoft claims to "love" the open-source concept, by which software code ismade public to encourage improvement anddevelopment by outside programmers. Gateshimself says Microsoft will gladly disclose itscrown jewels--the coveted code behind theWindows operating system--to selectcustomers.

"We can be open source. We love the conceptof shared source," said Bill Veghte, aMicrosoft VP. "That's a super-important shiftfor us in terms of code access.“

Richard Stallman, founder of the FreeSoftware Foundation, countered saying…

Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation

Page 16: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

What is “Information Extraction”

Information Extraction = segmentation + classification + association + clustering

As a familyof techniques:October 14, 2002, 4:00 a.m. PT

For years, Microsoft Corporation CEO BillGates railed against the economic philosophyof open-source software with Orwellianfervor, denouncing its communal licensing asa "cancer" that stifled technologicalinnovation.

Today, Microsoft claims to "love" the open-source concept, by which software code ismade public to encourage improvement anddevelopment by outside programmers. Gateshimself says Microsoft will gladly disclose itscrown jewels--the coveted code behind theWindows operating system--to selectcustomers.

"We can be open source. We love the conceptof shared source," said Bill Veghte, aMicrosoft VP. "That's a super-important shiftfor us in terms of code access.“

Richard Stallman, founder of the FreeSoftware Foundation, countered saying…

Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation

NAME

TITLE ORGANIZATION

Bill Gates

CEO

Microsoft

Bill Veghte

VPMicrosoft

Richard

Stallman

founder

Free Soft..

*

*

*

*

Page 17: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Larger Context

SegmentClassifyAssociateCluster

Filter

Prediction Outlier detection Decision support

IE

Documentcollection

Database

Discover patterns - entity types - links / relations - events

DataMining

Spider

Actionableknowledge

Page 18: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Outline

• Examples of IE and Data Mining.

• Conditional Random Fields and Feature Induction.• Joint inference: Motivation and examples

– Joint Labeling of Cascaded Sequences (Belief Propagation)

– Joint Labeling of Distant Entities (BP by Tree Reparameterization)

– Joint Co-reference Resolution (Graph Partitioning)

– Joint Segmentation and Co-ref (Iterated Conditional Samples.)

• Two example projects– Email, contact address book, and Social Network Analysis

– Research Paper search and analysis

Page 19: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Hidden Markov Models

S t-1 S t

O t

S t+1

O t +1Ot -1

...

...

Finite state model Graphical model

Parameters: for all states S={s1,s2,…} Start state probabilities: P(st ) Transition probabilities: P(st|st-1 ) Observation (emission) probabilities: P(ot|st )Training: Maximize probability of training observations (w/ prior)

!=

"#||

1

1 )|()|(),(o

t

ttttsoPssPosP

v

vv

HMMs are the standard sequence modeling tool in genomics, music, speech, NLP, …

...transitions

observations

o1 o2 o3 o4 o5 o6 o7 o8

Generates:

State sequenceObservation sequence

Usually a multinomial overatomic, fixed alphabet

Page 20: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

IE with Hidden Markov Models

Yesterday Rich Caruana spoke this example sentence.

Yesterday Rich Caruana spoke this example sentence.

Person name: Rich Caruana

Given a sequence of observations:

and a trained HMM:

Find the most likely state sequence: (Viterbi)

Any words said to be generated by the designated “person name”state extract as a person name:

person namelocation namebackground

Page 21: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

We want More than an Atomic View of Words

Would like richer representation of text:many arbitrary, overlapping features of the words.

S t-1 S t

O t

S t+1

O t +1Ot -1

identity of wordends in “-ski”is capitalizedis part of a noun phraseis in a list of city namesis under node X in WordNetis in bold fontis indentedis in hyperlink anchorlast person name was femalenext two words are “and Associates”

…part of

noun phrase

is “Wisniewski”

ends in “-ski”

Page 22: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Problems with Richer Representationand a Joint Model

These arbitrary features are not independent.– Multiple levels of granularity (chars, words, phrases)– Multiple dependent modalities (words, formatting, layout)– Past & future

Two choices:

Model the dependencies.Each state would have its ownBayes Net. But we are alreadystarved for training data!

Ignore the dependencies.This causes “over-counting” ofevidence (ala naïve Bayes).Big problem when combiningevidence, as in Viterbi!

S t-1 S t

O t

S t+1

O t +1Ot -1

S t-1 S t

O t

S t+1

O t +1Ot -1

Page 23: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Conditional Sequence Models

• We prefer a model that is trained to maximize aconditional probability rather than joint probability:P(s|o) instead of P(s,o):

– Can examine features, but not responsible for generatingthem.

– Don’t have to explicitly model their dependencies.– Don’t “waste modeling effort” trying to generate what we are

given at test time anyway.

Page 24: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Joint

Conditional

St-1 St

Ot

St+1

Ot+1Ot-1

St-1 St

Ot

St+1

Ot+1Ot-1

...

...

...

...

(A super-special case of Conditional Random Fields.)

[Lafferty, McCallum, Pereira 2001]

where

From HMMs to Conditional Random Fields

Set parameters by maximum likelihood, using optimization method on δL.

Page 25: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Conditional Random Fields

St St+1 St+2

O = Ot, Ot+1, Ot+2, Ot+3, Ot+4

St+3 St+4

1. FSM special-case: linear chain among unknowns, parameters tied across time steps.

[Lafferty, McCallum, Pereira 2001]

2. In general: CRFs = "Conditionally-trained Markov Network" arbitrary structure among unknowns

3. Relational Markov Networks [Taskar, Abbeel, Koller 2002]: Parameters tied across hits from SQL-like queries ("clique templates")

Page 26: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Training CRFs

!

Log - likelihood gradient :

"L

"#k

= Ck (v s

( i),v o

(i))i

$ % P{#k }(v s |

v o

(i)) Ck (v s ,

v o

( i))v s

$i

$ %#k

2

Ck (v s ,

v o ) = fk (

t

$v o ,t,st%1,st )

!

Maximize log - likelihood of parameters given training data :

L({"k} |{

v o ,

v s

( i)})

Feature count usingcorrect labels

Feature count usingpredicted labels Smoothing penalty- -

Page 27: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Linear-chain CRFs vs. HMMs

• Comparable computational efficiency for inference

• Features may be arbitrary functions of any or allobservations

– Parameters need not fully specify generation ofobservations; can require less training data

– Easy to incorporate domain knowledge

Page 28: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Main Point #1

Conditional probability sequence models

give great flexibility regarding features used,

and have efficient dynamic-programming-based algorithms for inference.

Page 29: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

IE from Research Papers[McCallum et al ‘99]

Page 30: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

IE from Research Papers

Field-level F1

Hidden Markov Models (HMMs) 75.6[Seymore, McCallum, Rosenfeld, 1999]

Support Vector Machines (SVMs) 89.7[Han, Giles, et al, 2003]

Conditional Random Fields (CRFs) 93.9[Peng, McCallum, 2004]

Δ error40%

Page 31: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Table Extraction from Government ReportsCash receipts from marketings of milk during 1995 at $19.9 billion dollars, was slightly below 1994. Producer returns averaged $12.93 per hundredweight, $0.19 per hundredweight below 1994. Marketings totaled 154 billion pounds, 1 percent above 1994. Marketings include whole milk sold to plants and dealers as well as milk sold directly to consumers. An estimated 1.56 billion pounds of milk were used on farms where produced, 8 percent less than 1994. Calves were fed 78 percent of this milk with the remainder consumed in producer households. Milk Cows and Production of Milk and Milkfat: United States, 1993-95 -------------------------------------------------------------------------------- : : Production of Milk and Milkfat 2/ : Number :------------------------------------------------------- Year : of : Per Milk Cow : Percentage : Total :Milk Cows 1/:-------------------: of Fat in All :------------------ : : Milk : Milkfat : Milk Produced : Milk : Milkfat -------------------------------------------------------------------------------- : 1,000 Head --- Pounds --- Percent Million Pounds : 1993 : 9,589 15,704 575 3.66 150,582 5,514.4 1994 : 9,500 16,175 592 3.66 153,664 5,623.7 1995 : 9,461 16,451 602 3.66 155,644 5,694.3 --------------------------------------------------------------------------------1/ Average number during year, excluding heifers not yet fresh. 2/ Excludes milk sucked by calves.

Page 32: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Table Extraction from Government Reports

Cash receipts from marketings of milk during 1995 at $19.9 billion dollars,

was

slightly below 1994. Producer returns averaged $12.93 per hundredweight,

$0.19 per hundredweight below 1994. Marketings totaled 154 billion pounds,

1 percent above 1994. Marketings include whole milk sold to plants and

dealers

as well as milk sold directly to consumers.

An estimated 1.56 billion pounds of milk were used on farms where produced,

8 percent less than 1994. Calves were fed 78 percent of this milk with the

remainder consumed in producer households.

Milk Cows and Production of Milk and Milkfat:

United States, 1993-95

--------------------------------------------------------------------------------

: : Production of Milk and Milkfat 2/

: Number :-------------------------------------------------------

Year : of : Per Milk Cow : Percentage : Total

:Milk Cows 1/:-------------------: of Fat in All :------------------

: : Milk : Milkfat : Milk Produced : Milk : Milkfat

--------------------------------------------------------------------------------

: 1,000 Head --- Pounds --- Percent Million Pounds

:

1993 : 9,589 15,704 575 3.66 150,582 5,514.4

1994 : 9,500 16,175 592 3.66 153,664 5,623.7

1995 : 9,461 16,451 602 3.66 155,644 5,694.3

--------------------------------------------------------------------------------

1/ Average number during year, excluding heifers not yet fresh.

2/ Excludes milk sucked by calves.

CRFLabels:• Non-Table• Table Title• Table Header• Table Data Row• Table Section Data Row• Table Footnote• ... (12 in all)

[Pinto, McCallum, Wei, Croft, 2003 SIGIR]

Features:• Percentage of digit chars• Percentage of alpha chars• Indented• Contains 5+ consecutive spaces• Whitespace in this line aligns with prev.• ...• Conjunctions of all previous features,

time offset: {0,0}, {-1,0}, {0,1}, {1,2}.

100+ documents from www.fedstats.gov

Page 33: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Table Extraction Experimental Results

Line labels,percent correct

Table segments,F1

95 % 92 %

65 % 64 %

85 % -

HMM

StatelessMaxEnt

CRF w/outconjunctions

CRF

52 % 68 %

[Pinto, McCallum, Wei, Croft, 2003 SIGIR]

Page 34: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Feature Induction for CRFs

1. Begin with knowledge of atomic features,but no features yet in the model.

2. Consider many candidate features,including atomic and conjunctions.

3. Evaluate each candidate feature.4. Add to the model some that are ranked

highest.5. Train the model.

[McCallum, 2003, UAI]

Page 35: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Candidate Feature Evaluation

Common method: Information Gain

!"

#=Ff

fCHfPCHFC )|( )()(),(InfoGain

True optimization criterion: Likelihood of training data

!+! "= LLf f##),(Gain Likelihood

Technical meat is in how to calculate this efficiently for CRFs• Mean field approximation• Emphasize error instances (related to Boosting)• Newton's method to set λ

[McCallum, 2003, UAI]

Page 36: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Named Entity Recognition

CRICKET -MILLNS SIGNS FOR BOLAND

CAPE TOWN 1996-08-22

South African provincial sideBoland said on Thursday theyhad signed Leicestershire fastbowler David Millns on a oneyear contract.Millns, who toured Australiawith England A in 1992, replacesformer England all-rounderPhillip DeFreitas as Boland'soverseas professional.

Labels: Examples:

PER Yayuk BasukiInnocent Butare

ORG 3MKDPCleveland

LOC ClevelandNirmal HridayThe Oval

MISC JavaBasque1,000 Lakes Rally

Page 37: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Automatically Induced Features

Index Feature0 inside-noun-phrase (ot-1)5 stopword (ot)20 capitalized (ot+1)75 word=the (ot)100 in-person-lexicon (ot-1)200 word=in (ot+2)500 word=Republic (ot+1)711 word=RBI (ot) & header=BASEBALL1027 header=CRICKET (ot) & in-English-county-lexicon (ot)1298 company-suffix-word (firstmentiont+2)4040 location (ot) & POS=NNP (ot) & capitalized (ot) & stopword (ot-1)4945 moderately-rare-first-name (ot-1) & very-common-last-name (ot)4474 word=the (ot-2) & word=of (ot)

[McCallum & Li, 2003, CoNLL]

Page 38: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Named Entity Extraction Results

Method F1

HMMs BBN's Identifinder 73%

CRFs w/out Feature Induction 83%

CRFs with Feature Induction 90%based on LikelihoodGain

[McCallum & Li, 2003, CoNLL]

Page 39: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Outline

• Examples of IE and Data Mining.

• Conditional Random Fields and Feature Induction.

• Joint inference: Motivation and examples– Joint Labeling of Cascaded Sequences (Belief Propagation)

– Joint Labeling of Distant Entities (BP by Tree Reparameterization)

– Joint Co-reference Resolution (Graph Partitioning)

– Joint Segmentation and Co-ref (Iterated Conditional Samples.)

• Two example projects– Email, contact address book, and Social Network Analysis

– Research Paper search and analysis

Page 40: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Problem:

Combined in serial juxtaposition,IE and KD are unaware of each others’weaknesses and opportunities.

1) KD begins from a populated DB, unaware ofwhere the data came from, or its inherentuncertainties.

2) IE is unaware of emerging patterns andregularities in the DB.

The accuracy of both suffers, and significant mining

of complex text sources is beyond reach.

Segment

Classify

Associate

Cluster

IE

Document

collection

Database

Discover patterns

- entity types

- links / relations

- events

Knowledge

Discovery

Actionable

knowledge

Page 41: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

SegmentClassifyAssociateCluster

Filter

Prediction Outlier detection Decision support

IE

Documentcollection

Database

Discover patterns - entity types - links / relations - events

DataMining

Spider

Actionableknowledge

Uncertainty Info

Emerging Patterns

Solution:

Page 42: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

SegmentClassifyAssociateCluster

Filter

Prediction Outlier detection Decision support

IE

Documentcollection

ProbabilisticModel

Discover patterns - entity types - links / relations - events

DataMining

Spider

Actionableknowledge

Solution:

Conditional Random Fields [Lafferty, McCallum, Pereira]

Conditional PRMs [Koller…], [Jensen…], [Geetor…], [Domingos…]

Discriminatively-trained undirected graphical models

Complex Inference and LearningJust what we researchers like to sink our teeth into!

Unified Model

Page 43: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

1. Jointly labeling cascaded sequencesFactorial CRFs

Part-of-speech

Noun-phrase boundaries

Named-entity tag

English words

[Sutton, Khashayar, McCallum, ICML 2004]

Page 44: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

1. Jointly labeling cascaded sequencesFactorial CRFs

Part-of-speech

Noun-phrase boundaries

Named-entity tag

English words

[Sutton, Khashayar, McCallum, ICML 2004]

Page 45: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

1. Jointly labeling cascaded sequencesFactorial CRFs

Part-of-speech

Noun-phrase boundaries

Named-entity tag

English words

[Sutton, Khashayar, McCallum, ICML 2004]

But errors cascade--must be perfect at every stage to do well.

Page 46: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

1. Jointly labeling cascaded sequencesFactorial CRFs

Part-of-speech

Noun-phrase boundaries

Named-entity tag

English words

[Sutton, Khashayar, McCallum, ICML 2004]

Joint prediction of part-of-speech and noun-phrase in newswire,matching accuracy with only 50% of the training data.

Inference:Tree reparameterization BP

[Wainwright et al, 2002]

Page 47: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

2. Jointly labeling distant mentionsSkip-chain CRFs

Senator Joe Green said today … . Green ran for …

[Sutton, McCallum, SRL 2004]

Dependency among similar, distant mentions ignored.

Page 48: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

2. Jointly labeling distant mentionsSkip-chain CRFs

Senator Joe Green said today … . Green ran for …

[Sutton, McCallum, SRL 2004]

14% reduction in error on most repeated field in email seminar announcements.

Inference:Tree reparameterization BP

[Wainwright et al, 2002]

Page 49: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

3. Joint co-reference among all pairsAffinity Matrix CRF

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . .

45

−99Y/N

Y/N

Y/N

11

[McCallum, Wellner, IJCAI WS 2003, NIPS 2004]

~25% reduction in error onco-reference ofproper nouns in newswire.

Inference:Correlational clusteringgraph partitioning

[Bansal, Blum, Chawla, 2002]

“Entity resolution”“Object correspondence”

Page 50: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Coreference Resolution

Input

AKA "record linkage", "database record deduplication", "entity resolution", "object correspondence", "identity uncertainty"

OutputNews article, with named-entity "mentions" tagged

Number of entities, N = 3

#1 Secretary of State Colin Powell he Mr. Powell Powell

#2 Condoleezza Rice she Rice

#3 President Bush Bush

Today Secretary of State Colin Powellmet with . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . he . . . . . .. . . . . . . . . . . . . Condoleezza Rice . . . . .. . . . Mr Powell . . . . . . . . . .she . . . . . . . . . . . . . . . . . . . . . Powell . . . . . . . . . . . .. . . President Bush . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . Rice . . . . . . . . . . . . . . . . Bush . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .

Page 51: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Inside the Traditional Solution

Mention (3) Mention (4)

. . . Mr Powell . . . . . . Powell . . .

N Two words in common 29Y One word in common 13Y "Normalized" mentions are string identical 39Y Capitalized word in common 17Y > 50% character tri-gram overlap 19N < 25% character tri-gram overlap -34Y In same sentence 9Y Within two sentences 8N Further than 3 sentences apart -1Y "Hobbs Distance" < 3 11N Number of entities in between two mentions = 0 12N Number of entities in between two mentions > 4 -3Y Font matches 1Y Default -19

OVERALL SCORE = 98 > threshold=0

Pair-wise Affinity Metric

Y/N?

Page 52: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

The Problem

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . .

affinity = 98

affinity = 11

affinity = −104

Pair-wise mergingdecisions are beingmade independentlyfrom each otherY

Y

N

Affinity measures are noisy and imperfect.

They should be madein relational dependencewith each other.

Page 53: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

A Markov Random Field for Co-reference

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . .

45

−30Y/N

Y/N

Y/N

[McCallum & Wellner, 2003, ICML](MRF)

Make pair-wise mergingdecisions in dependentrelation to each other by- calculating a joint prob.- including all edge weights- adding dependence on consistent triangles.

11

!

P(v y |

v x ) =

1

Z v x

exp "l f l (xi,x j ,yij ) + "' f '(yij ,y jk,yik )i, j,k

#l

#i, j

#$

% & &

'

( ) )

Page 54: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

A Markov Random Field for Co-reference

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . .

45

−30Y/N

Y/N

Y/N

[McCallum & Wellner, 2003](MRF)

Make pair-wise mergingdecisions in dependentrelation to each other by- calculating a joint prob.- including all edge weights- adding dependence on consistent triangles.

11!"

!

P(v y |

v x ) =

1

Z v x

exp "l f l (xi,x j ,yij ) + "' f '(yij ,y jk,yik )i, j,k

#l

#i, j

#$

% & &

'

( ) )

Page 55: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

A Markov Random Field for Co-reference

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . .

Y

N

N

[McCallum & Wellner, 2003]

!

P(v y |

v x ) =

1

Z v x

exp "l f l (xi,x j ,yij ) + "' f '(yij ,y jk,yik )i, j,k

#l

#i, j

#$

% & &

'

( ) )

(MRF)

−4

−(45)

−(−30)

+(11)

Page 56: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

A Markov Random Field for Co-reference

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . .

Y

Y

N

[McCallum & Wellner, 2003]

!

P(v y |

v x ) =

1

Z v x

exp "l f l (xi,x j ,yij ) + "' f '(yij ,y jk,yik )i, j,k

#l

#i, j

#$

% & &

'

( ) )

(MRF)

−infinity

+(45)

−(−30)

+(11)

Page 57: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

A Markov Random Field for Co-reference

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . .

N

Y

N

[McCallum & Wellner, 2003]

!

P(v y |

v x ) =

1

Z v x

exp "l f l (xi,x j ,yij ) + "' f '(yij ,y jk,yik )i, j,k

#l

#i, j

#$

% & &

'

( ) )

(MRF)

64

+(45)

−(−30)

−(11)

Page 58: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Inference in these MRFs = Graph Partitioning[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . .

45

11

−30

. . . Condoleezza Rice . . .

−134

10

!

log P(v y |

v x )( )" #l f l (xi,x j ,yij )

l

$i, j

$ = w ij

i, j w/inparitions

$ % w ij

i, j acrossparitions

$

−106

Page 59: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Inference in these MRFs = Graph Partitioning[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . . . . . Condoleezza Rice . . .

= −22

45

11

−30

−134

10

−106

!

log P(v y |

v x )( )" #l f l (xi,x j ,yij )

l

$i, j

$ = w ij

i, j w/inparitions

$ % w ij

i, j acrossparitions

$

Page 60: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Inference in these MRFs = Graph Partitioning[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . . . . . Condoleezza Rice . . .

!

log P(v y |

v x )( )" #l f l (xi, x j ,yij )

l

$i, j

$ = w ij

i, j w/inparitions

$ + w'iji, j acrossparitions

$ = 314

45

11

−30

−134

10

−106

Page 61: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Co-reference Experimental Results

Proper noun co-reference

DARPA ACE broadcast news transcripts, 117 stories

Partition F1 Pair F1Single-link threshold 16 % 18 %Best prev match [Morton] 83 % 89 %MRFs 88 % 92 %

Δerror=30% Δerror=28%

DARPA MUC-6 newswire article corpus, 30 stories

Partition F1 Pair F1Single-link threshold 11% 7 %Best prev match [Morton] 70 % 76 %MRFs 74 % 80 %

Δerror=13% Δerror=17%

[McCallum & Wellner, 2003]

Page 62: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Joint co-reference among all pairsAffinity Matrix CRF

. . . Mr Powell . . .

. . . Powell . . .

. . . she . . .

45

−99Y/N

Y/N

Y/N

11

[McCallum, Wellner, IJCAI WS 2003, NIPS 2004]

~25% reduction in error onco-reference ofproper nouns in newswire.

Inference:Correlational clusteringgraph partitioning

[Bansal, Blum, Chawla, 2002]

Page 63: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

pDatabase

field values

c

4. Joint segmentation and co-reference

o

s

o

s

c

c

s

o

Citation attributes

y y

y

Segmentation

[Wellner, McCallum, Peng, Hay, UAI 2004]Inference:Variant of Iterated Conditional Modes

Co-referencedecisions

Laurel, B. Interface Agents:Metaphors with Character, inThe Art of Human-Computer InterfaceDesign, B. Laurel (ed), Addison-Wesley, 1990.

Brenda Laurel. Interface Agents:Metaphors with Character, inLaurel, The Art of Human-ComputerInterface Design, 355-366, 1990.

[Besag, 1986]

WorldKnowledge

35% reduction in co-reference error by using segmentation uncertainty.

6-14% reduction in segmentation error by using co-reference.

Extraction from and matching of research paper citations.

see also [Marthi, Milch, Russell, 2003]

Page 64: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Joint IE and Coreference from Research Paper Citations

Textual citation mentions(noisy, with duplicates)

Paper database, with fields,clean, duplicates collapsed

AUTHORS TITLE VENUECowell, Dawid… Probab… SpringerMontemerlo, Thrun…FastSLAM… AAAI…Kjaerulff Approxi… Technic…

4. Joint segmentation and co-reference

Page 65: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Laurel, B. Interface Agents: Metaphors with Character , in

The Art of Human-Computer Interface Design , T. Smith (ed) ,

Addison-Wesley , 1990 .

Brenda Laurel . Interface Agents: Metaphors with Character , in

Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .

Citation Segmentation and Coreference

Page 66: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Laurel, B. Interface Agents: Metaphors with Character , in

The Art of Human-Computer Interface Design , T. Smith (ed) ,

Addison-Wesley , 1990 .

Brenda Laurel . Interface Agents: Metaphors with Character , in

Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .

1) Segment citation fields

Citation Segmentation and Coreference

Page 67: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Laurel, B. Interface Agents: Metaphors with Character , in

The Art of Human-Computer Interface Design , T. Smith (ed) ,

Addison-Wesley , 1990 .

Brenda Laurel . Interface Agents: Metaphors with Character , in

Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .

1) Segment citation fields

2) Resolve coreferent citations

Citation Segmentation and Coreference

Y?N

Page 68: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Laurel, B. Interface Agents: Metaphors with Character , in

The Art of Human-Computer Interface Design , T. Smith (ed) ,

Addison-Wesley , 1990 .

Brenda Laurel . Interface Agents: Metaphors with Character , in

Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .

Citation Segmentation and Coreference

Y?N

93%True Segmentation

91%CRF Segmentation

78%No Segmentation

Citation Co-reference (F1)Segmentation Quality

1) Segment citation fields

2) Resolve coreferent citations

Page 69: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Laurel, B. Interface Agents: Metaphors with Character , in

The Art of Human-Computer Interface Design , T. Smith (ed) ,

Addison-Wesley , 1990 .

Brenda Laurel . Interface Agents: Metaphors with Character , in

Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .

1) Segment citation fields

2) Resolve coreferent citations

3) Form canonical database record

Citation Segmentation and Coreference

AUTHOR = Brenda LaurelTITLE = Interface Agents: Metaphors with CharacterPAGES = 355-366BOOKTITLE = The Art of Human-Computer Interface DesignEDITOR = T. SmithPUBLISHER = Addison-WesleyYEAR = 1990

Y?N

Resolving conflicts

Page 70: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Laurel, B. Interface Agents: Metaphors with Character , in

The Art of Human-Computer Interface Design , T. Smith (ed) ,

Addison-Wesley , 1990 .

Brenda Laurel . Interface Agents: Metaphors with Character , in

Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .

1) Segment citation fields

2) Resolve coreferent citations

3) Form canonical database record

Citation Segmentation and Coreference

AUTHOR = Brenda LaurelTITLE = Interface Agents: Metaphors with CharacterPAGES = 355-366BOOKTITLE = The Art of Human-Computer Interface DesignEDITOR = T. SmithPUBLISHER = Addison-WesleyYEAR = 1990

Y?N

Perform jointly.

Page 71: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

x

s

Observed citation

CRF Segmentation

IE + Coreference Model

J Besag 1986 On the…

AUT AUT YR TITL TITL

Page 72: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

x

s

Observed citation

CRF Segmentation

IE + Coreference Model

Citation mention attributes

J Besag 1986 On the…

AUTHOR = “J Besag”YEAR = “1986”TITLE = “On the…”

c

Page 73: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

x

s

IE + Coreference Model

c

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Structure for eachcitation mention

Page 74: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

x

s

IE + Coreference Model

c

Binary coreference variablesfor each pair of mentions

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Page 75: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

x

s

IE + Coreference Model

c

y n

n

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Binary coreference variablesfor each pair of mentions

Page 76: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

y n

n

x

s

IE + Coreference Model

c

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Research paper entityattribute nodes

AUTHOR = “P Smyth”YEAR = “2001”TITLE = “Data Mining…”...

Page 77: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Such a highly connected graph makesexact inference intractable, so…

Page 78: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

• Loopy Belief Propagation

v6v5

v3v2v1

v4

m1(v2) m2(v3)

m3(v2)m2(v1) messages passedbetween nodes

Approximate Inference 1

Page 79: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

• Loopy Belief Propagation

• Generalized Belief Propagation

v6v5

v3v2v1

v4

m1(v2) m2(v3)

m3(v2)m2(v1)

v6v5

v3v2v1

v4

v9v8v7

messages passedbetween nodes

messagespassed betweenregions

Here, a message is a conditional probability table passed among nodes.But when message size grows exponentially with size of overlap between regions!

Approximate Inference 1

Page 80: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

• Iterated Conditional Modes (ICM)

[Besag 1986]

v6v5

v3v2v1

v4

v6i+1 = argmax P(v6

i | v \ v6

i)v6

i

= held constant

Approximate Inference 2

Page 81: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

• Iterated Conditional Modes (ICM)

[Besag 1986]

v6v5

v3v2v1

v4

v5j+1 = argmax P(v5

j | v \ v5

j)v5

j

= held constant

Approximate Inference 2

Page 82: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

• Iterated Conditional Modes (ICM)

[Besag 1986]

v6v5

v3v2v1

v4

v4k+1 = argmax P(v4

k | v \ v4

k)v4

k

= held constant

Approximate Inference 2

but greedy, and easily falls into local minima.Structured inference scales well here,

Page 83: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

• Iterated Conditional Modes (ICM)

[Besag 1986]

• Iterated Conditional Sampling (ICS) (our name) Instead of selecting only argmax, sample of argmaxes of P(v4

k | v \ v4

k)

e.g. an N-best list (the top N values)

v6v5

v3v2v1

v4

v4k+1 = argmax P(v4

k | v \ v4

k)v4

k

= held constant

v6v5

v3v2v1

v4

Approximate Inference 2

Can use “Generalized Version” of this;doing exact inference on a region ofseveral nodes at once.

Here, a “message” grows only linearlywith overlap region size and N!

Page 84: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

IE + Coreference Model

Exact inference onthese linear-chain regions

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

From each chainpass an N-best List

into coreference

Page 85: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

IE + Coreference Model

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Approximate inferenceby graph partitioning…

…integrating outuncertaintyin samples

of extraction

Make scale to 1Mcitations with Canopies

[McCallum, Nigam, Ungar 2000]

Page 86: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

…Metaphors withCharacter

Laurel, B. Interf aceAgents

…Interface Agents:Metaphors withCharacter

Laurel, B.

…Interf ace Agents:Metaphors withCharacter The

Laurel, B

…TitleName

When calculatingsimilarity with anothercitation, have moreopportunity to findcorrect, matching fields.

1990The Art of HumanComputerInterface Design

Agents: Metaphorswith Character

Laurel, B. Interf ace

1990of HumanComputer Interf aceDesign

Interface Agents:Metaphors withCharacter The Art

Laurel, B.

1990The Art of HumanComputerInterface Design

Agents: Metaphorswith Character

Laurel, B. Interf ace

YearBook TitleTitleName

Inference:Sample = N-best List from CRF Segmentation

y ? n

Page 87: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

y n

n

IE + Coreference Model

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Exact (exhaustive) inferenceover entity attributes

Page 88: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

y n

n

IE + Coreference Model

J Besag 1986 On the…Smyth . 2001 Data Mining…

Smyth , P Data mining…

Revisit exact inferenceon IE linear chain,

now conditioned onentity attributes

Page 89: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

y n

n

Parameter Estimation

Coref graph edge weightsMAP on individual edges

Separately for different regions

IE Linear-chainExact MAP

Entity attribute potentialsMAP, pseudo-likelihood

In all cases:Climb MAP gradient with

quasi-Newton method

Page 90: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

pDatabase

field values

c

4. Joint segmentation and co-reference

o

s

o

s

c

c

s

o

Citation attributes

y y

y

Segmentation

[Wellner, McCallum, Peng, Hay, UAI 2004]

Inference:Variant of Iterated Conditional Modes

Co-referencedecisions

Laurel, B. Interface Agents:Metaphors with Character, inThe Art of Human-Computer InterfaceDesign, B. Laurel (ed), Addison-Wesley, 1990.

Brenda Laurel. Interface Agents:Metaphors with Character, inLaurel, The Art of Human-ComputerInterface Design, 355-366, 1990.

[Besag, 1986]

WorldKnowledge

35% reduction in co-reference error by using segmentation uncertainty.

6-14% reduction in segmentation error by using co-reference.

Extraction from and matching of research paper citations.

Page 91: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Outline

• Examples of IE and Data Mining.

• Conditional Random Fields and Feature Induction.

• Joint inference: Motivation and examples– Joint Labeling of Cascaded Sequences (Belief Propagation)

– Joint Labeling of Distant Entities (BP by Tree Reparameterization)

– Joint Co-reference Resolution (Graph Partitioning)

– Joint Segmentation and Co-ref (Iterated Conditional Samples.)

• Two example projects– Email, contact address book, and Social Network Analysis

– Research Paper search and analysis

Page 92: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Workplace effectiveness ~ Ability to leverage network of acquaintances

But filling Contacts DB by hand is tedious, and incomplete.

Email Inbox Contacts DB

WWW

Automatically

Managing and UnderstandingConnections of People in our Email World

Page 93: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

System Overview

ContactInfo andPersonName

Extraction

PersonName

Extraction

NameCoreference

HomepageRetrieval

SocialNetworkAnalysis

KeywordExtraction

CRFWWW

names

Email

Page 94: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

An ExampleTo: “Andrew McCallum” [email protected]

Subject ...

Information extraction, social network,…

KeyWords:

Fernando Pereira, SamRoweis,…

Links:

(413) 545-1323CompanyPhone:

01003Zip:

MAState:

AmherstCity:

140 Governor’s Dr.StreetAddress:

University of MassachusettsCompany:

Associate ProfessorJobTitle:

McCallumLastName:

KachitesMiddleName:

AndrewFirstName:

Search fornew people

Page 95: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Summary of Results

80.7676.3385.7394.50CRF

FieldF1

FieldRecall

FieldPrec

TokenAcc

Machine learningCognitive statesLearning apprenticeArtificial intelligence

Tom Mitchell

Semantic webDescription logicsKnowledgerepresentationOntologies

DeborahMcGuiness

Bayesian networksRelational modelsProbabilistic modelsHidden variables

Daphne Koller

Logic programmingText categorizationData integrationRule learning

William Cohen

KeywordsPerson

Contact info and name extraction performance (25 fields)

Example keywords extracted

1. Expert Finding: When solving some task, find friends-of-friends with relevant expertise. Avoid “stove-piping” in large org’s by automatically suggesting collaborators. Given a task, automatically suggest the right team for the job. (Hiring aid!)

2. Social Network Analysis: Understand the social structure of your organization. Suggest structural changes for improved efficiency.

Page 96: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Clustering words into topics withLatent Dirichlet Allocation

[Blei, Ng, Jordan 2003]

Page 97: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

STORYSTORIES

TELLCHARACTER

CHARACTERSAUTHOR

READTOLD

SETTINGTALESPLOT

TELLINGSHORT

FICTIONACTION

TRUEEVENTSTELLSTALE

NOVEL

MINDWORLDDREAMDREAMS

THOUGHTIMAGINATION

MOMENTTHOUGHTS

OWNREALLIFE

IMAGINESENSE

CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE

WATERFISHSEA

SWIMSWIMMING

POOLLIKE

SHELLSHARKTANK

SHELLSSHARKSDIVING

DOLPHINSSWAMLONGSEALDIVE

DOLPHINUNDERWATER

DISEASEBACTERIADISEASESGERMSFEVERCAUSE

CAUSEDSPREADVIRUSES

INFECTIONVIRUS

MICROORGANISMSPERSON

INFECTIOUSCOMMONCAUSING

SMALLPOXBODY

INFECTIONSCERTAIN

Example topicsinduced from a large collection of text

FIELDMAGNETICMAGNET

WIRENEEDLE

CURRENTCOIL

POLESIRON

COMPASSLINESCORE

ELECTRICDIRECTION

FORCEMAGNETS

BEMAGNETISM

POLEINDUCED

SCIENCESTUDY

SCIENTISTSSCIENTIFIC

KNOWLEDGEWORK

RESEARCHCHEMISTRY

TECHNOLOGYMANY

MATHEMATICSBIOLOGY

FIELDPHYSICS

LABORATORYSTUDIESWORLD

SCIENTISTSTUDYINGSCIENCES

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELD

PLAYERBASKETBALL

COACHPLAYED

PLAYINGHIT

TENNISTEAMSGAMESSPORTS

BATTERRY

JOBWORKJOBS

CAREEREXPERIENCE

EMPLOYMENTOPPORTUNITIES

WORKINGTRAINING

SKILLSCAREERS

POSITIONSFIND

POSITIONFIELD

OCCUPATIONSREQUIRE

OPPORTUNITYEARNABLE

[Tennenbaum et al]

Page 98: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

STORYSTORIES

TELLCHARACTER

CHARACTERSAUTHOR

READTOLD

SETTINGTALESPLOT

TELLINGSHORT

FICTIONACTION

TRUEEVENTSTELLSTALE

NOVEL

MINDWORLDDREAMDREAMS

THOUGHTIMAGINATION

MOMENTTHOUGHTS

OWNREALLIFE

IMAGINESENSE

CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE

WATERFISHSEA

SWIMSWIMMING

POOLLIKE

SHELLSHARKTANK

SHELLSSHARKSDIVING

DOLPHINSSWAMLONGSEALDIVE

DOLPHINUNDERWATER

DISEASEBACTERIADISEASESGERMSFEVERCAUSE

CAUSEDSPREADVIRUSES

INFECTIONVIRUS

MICROORGANISMSPERSON

INFECTIOUSCOMMONCAUSING

SMALLPOXBODY

INFECTIONSCERTAIN

FIELDMAGNETICMAGNET

WIRENEEDLE

CURRENTCOIL

POLESIRON

COMPASSLINESCORE

ELECTRICDIRECTION

FORCEMAGNETS

BEMAGNETISM

POLEINDUCED

SCIENCESTUDY

SCIENTISTSSCIENTIFIC

KNOWLEDGEWORK

RESEARCHCHEMISTRY

TECHNOLOGYMANY

MATHEMATICSBIOLOGY

FIELDPHYSICS

LABORATORYSTUDIESWORLD

SCIENTISTSTUDYINGSCIENCES

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERS

PLAYFIELD

PLAYERBASKETBALL

COACHPLAYED

PLAYINGHIT

TENNISTEAMSGAMESSPORTS

BATTERRY

JOBWORKJOBS

CAREEREXPERIENCE

EMPLOYMENTOPPORTUNITIES

WORKINGTRAINING

SKILLSCAREERS

POSITIONSFIND

POSITIONFIELD

OCCUPATIONSREQUIRE

OPPORTUNITYEARNABLE

Example topicsinduced from a large collection of text

[Tennenbaum et al]

Page 99: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

YALDAM

etnotheratentirichletllocationodel

Page 100: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

From LDA to Author-Recipient-Topic(ART)

Page 101: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Inference and Estimation

Gibbs Sampling:- Easy to implement- Reasonably fast

r

Page 102: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Enron Email Corpus

• 250k email messages• 23k people

Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT)From: [email protected]: [email protected]: Enron/TransAltaContract dated Jan 1, 2001

Please see below. Katalin Kiss of TransAlta has requested anelectronic copy of our final draft? Are you OK with this? Ifso, the only version I have is the original draft withoutrevisions.

DP

Debra PerlingiereEnron North America Corp.Legal Department1400 Smith Street, EB 3885Houston, Texas [email protected]

Page 103: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Topics, and prominent sender/receiversdiscovered by ART

Page 104: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Topics, and prominent sender/receiversdiscovered by ART

Beck = “Chief Operations Officer”Dasovich = “Government Relations Executive”Shapiro = “Vice Presidence of Regulatory Affairs”Steffes = “Vice President of Government Affairs”

Page 105: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Comparing Role Discovery

connection strength (A,B) =

distribution overauthored topics

Traditional SNA

distribution overrecipients

distribution overauthored topics

Author-TopicART

Page 106: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Comparing Role Discovery Tracy Geaconne ⇔ Dan McCarty

Traditional SNA Author-TopicART

Similar roles Different rolesDifferent roles

Geaconne = “Secretary”McCarty = “Vice President”

Page 107: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Traditional SNA Author-TopicART

Different roles Very similarNot very similar

Geaconne = “Secretary”Hayslett = “Vice President & CTO”

Comparing Role Discovery Tracy Geaconne ⇔ Rod Hayslett

Page 108: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Traditional SNA Author-TopicART

Different roles Very differentVery similar

Blair = “Gas pipeline logistics”Watson = “Pipeline facilities planning”

Comparing Role Discovery Lynn Blair ⇔ Kimberly Watson

Page 109: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Traditional SNA Author-TopicART

Block structured NotNot

Comparing Group Discovery Enron TransWestern Division

Page 110: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

McCallum Email Corpus 2004

• January - October 2004• 23k email messages• 825 people

From: [email protected]: NIPS and ....Date: June 14, 2004 2:27:41 PM EDTTo: [email protected]

There is pertinent stuff on the first yellow folder that iscompleted either travel or other things, so please sign thatfirst folder anyway. Then, here is the reminder of the thingsI'm still waiting for:

NIPS registration receipt.CALO registration receipt.

Thanks,Kate

Page 111: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

McCallum Email Blockstructure

Page 112: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Four most prominent topicsin discussions with ____?

Page 113: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification
Page 114: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Two most prominent topicsin discussions with ____?

Words Prob

love 0.030514

house 0.015402

0.013659

time 0.012351

great 0.011334

hope 0.011043

dinner 0.00959

saturday 0.009154

left 0.009154

ll 0.009009

0.008282

visit 0.008137

evening 0.008137

stay 0.007847

bring 0.007701

weekend 0.007411

road 0.00712

sunday 0.006829

kids 0.006539

flight 0.006539

Words Prob

today 0.051152

tomorrow 0.045393

time 0.041289

ll 0.039145

meeting 0.033877

week 0.025484

talk 0.024626

meet 0.023279

morning 0.022789

monday 0.020767

back 0.019358

call 0.016418

free 0.015621

home 0.013967

won 0.013783

day 0.01311

hope 0.012987

leave 0.012987

office 0.012742

tuesday 0.012558

Page 115: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Topic 37Words Prob Sender Recipient Prob

jean 0.032349 jean mccallum 0.111932

james 0.02684 mccallum jean 0.072785

lab 0.017229 jean jean 0.056024

space 0.016292 mccallum allan 0.054266

ciir 0.015471 jean allan 0.035162

students 0.015354 allan mccallum 0.027778

bruce 0.014885 mccallum croft 0.025668

office 0.014768 mccallum grupen 0.021917

staff 0.013244 mccallum stowell 0.01887

funding 0.012072 mccallum [email protected]

mike 0.011017 grupen mccallum 0.018167

adam 0.010666 gwking mccallum 0.016057

move 0.010666 [email protected] 0.014299

grad 0.01008 mccallum saunders 0.014299

ll 0.009962 mccallum jensen 0.007267

rod 0.009494 mccallum culotta 0.006915

time 0.008556 gauthier mccallum 0.006564

asked 0.008556 grupen grupen 0.006446

student 0.008556 corrada mccallum 0.006212

today 0.008322 mccallum pal 0.005626

Page 116: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Topic 40

Words Prob Sender Recipient Prob

code 0.060565 hough mccallum 0.067076

mallet 0.042015 mccallum hough 0.041032

files 0.029115 mikem mccallum 0.028501

al 0.024201 culotta mccallum 0.026044

file 0.023587 saunders mccallum 0.021376

version 0.022113 mccallum saunders 0.019656

java 0.021499 pereira mccallum 0.017813

test 0.020025 casutton mccallum 0.017199

problem 0.018305 mccallum ronb 0.013514

run 0.015356 mccallum pereira 0.013145

cvs 0.013391 hough melinda.gervasio 0.013022

add 0.012776 mccallum casutton 0.013022

directory 0.012408 fuchun mccallum 0.010811

release 0.012285 mccallum culotta 0.009828

output 0.011916 ronb mccallum 0.009705

bug 0.011179 westy hough 0.009214

source 0.010197 xuerui corrada 0.0086

ps 0.009705 ghuang mccallum 0.008354

log 0.008968 khash mccallum 0.008231

created 0.0086 melinda.gervasiomccallum 0.008108

Page 117: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification
Page 118: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Pairs with highestrank difference between ART & SNA

5 other professors3 other ML researchers

Page 119: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Role-Author-Recipient-Topic Models

Page 120: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Summary

• Traditional SNA examines links,but not the language content on those links.

• Presented ART, an LDA-like model formessages sent in a social network.

• Future work:– Explicitly model & discover roles and groups– Integrate with coreference and relation extraction– Leverage others’ work on correlations and

topic shifts over time

Page 121: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Main Application Project:

Page 122: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Main Application Project:

ResearchPaper

Cites

Page 123: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Main Application Project:

ResearchPaper

Cites

Person

UniversityVenue

Grant

Groups

Expertise

Page 124: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification
Page 125: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

Summary

• Conditional Random Fields combine the benefits of– Conditional probability models (arbitrary features)– Markov models (for sequences or other relations)

• Success in– Factorial finite state models– Jointly labeling distant entities– Coreference analysis– Segmentation uncertainty aiding coreference

• Current projects– Email, contact management, expert-finding, SNA– Mining the scientific literature

Page 126: Information Extraction, Conditional Random Fields, and ...mccallum/talks/Alphatech2005.pdf · What is “Information Extraction” Information Extraction = segmentation + classification

End of Talk