retrieving correct semantic boundaries in dependency structure

18
Retrieving Correct Semantic Boundaries in Dependency Structure Jinho D. Choi (University of Colorado at Boulder) Martha Palmer (University of Colorado at Boulder) The 4th Linguistic Annotation Workshop at ACL’10 July 15th, 2010

Upload: jinho-d-choi

Post on 18-Nov-2014

775 views

Category:

Technology


2 download

DESCRIPTION

This paper describes the retrieval of correct semantic boundaries for predicate-argument structures annotated by dependency structure. Unlike phrase structure, in which arguments are annotated at the phrase level, dependency structure does not have phrases so the argument labels are associated with head words instead: the subtree of each head word is assumed to include the same set of words as the annotated phrase does in phrase structure. However, at least in English, retrieving such subtrees does not always guarantee retrieval of the correct phrase boundaries. In this paper, we present heuristics that retrieve correct phrase boundaries for semantic arguments, called semantic boundaries, from dependency trees. By applying heuristics, we achieved an F1-score of 99.54% for correct representation of semantic boundaries. Furthermore, error analysis showed that some of the errors could also be considered correct, depending on the interpretation of the annotation.

TRANSCRIPT

Page 1: Retrieving Correct Semantic Boundaries in Dependency Structure

Retrieving Correct Semantic Boundaries in Dependency Structure

Jinho D. Choi (University of Colorado at Boulder)Martha Palmer (University of Colorado at Boulder)

The 4th Linguistic Annotation Workshop at ACL’10July 15th, 2010

Page 2: Retrieving Correct Semantic Boundaries in Dependency Structure

Dependency Structure for SRL• What is dependency?

- Syntactic or semantic relation between a pair of words.

• Why dependency structure for semantic role labeling?

- Dependency relations often correlate with semantic roles.

- Simpler structure

2

LOC PMODNMOD

places in this city

TMP

events year

→ faster annotation → more gold-standard

→ more applicationsfaster parsing

Dep (Choi) vs. Phrase (Charniak) → 0.0025 vs. 0.5 (sec)

Page 3: Retrieving Correct Semantic Boundaries in Dependency Structure

Phrase vs. Dependency Structure• Constituent vs. Dependency

3

appear

results

The

in

news

today

's

SBJ LOC

NMOD PMOD

NMOD

NMOD

10/15 (66.67%) parsing papers at ACL’10are on Dependency Parsing

-SBJ

-LOC

Page 4: Retrieving Correct Semantic Boundaries in Dependency Structure

PropBank in Phrase Structure• A corpus annotated with verbal propositions and arguments.

• Arguments are annotated on phrases.

4

But there is no phrasein dependency structure

ARG0

ARGM-LOC

Page 5: Retrieving Correct Semantic Boundaries in Dependency Structure

PropBank in Dependency Structure• Arguments are annotated on head words instead.

5

The results appear in today 's newsroot

NMOD SBJ LOC NMODNMOD

ROOT PMOD

Phrase = Subtree of head-word

ARG0

ARGM-LOC

Page 6: Retrieving Correct Semantic Boundaries in Dependency Structure

Propbank in Dependency Structure• Phase ≠ Subtree of head-word.

6

The plant owned by Mark

NMOD NMOD LGS PMOD

Subtree of the head word includes the predicate

ARG1

Page 7: Retrieving Correct Semantic Boundaries in Dependency Structure

Tasks• Tasks

- Convert phrase structure (PS) to dependency structure (DS).

- Find correct head words in DS.

- Retrieve correct semantic boundaries from DS.

• Conversion

- Pennconverter, by Richard Johansson

• Used for CoNLL 2007 - 2009.

- Penn Treebank (Wall Street Journal)

• 49,208 trees were converted.

• 292,073 Propbank arguments exist.

7

Page 8: Retrieving Correct Semantic Boundaries in Dependency Structure

System Overview

8

Penn Treebank PropBank

Automatic SRL System

Set of Head words

Dependency trees

Pennconverter

Head words

Heuristics

Set of chunks (phrases)

Heuristics

Page 9: Retrieving Correct Semantic Boundaries in Dependency Structure

Finding correct head words• Get the word-set Sp of

each argument in PS.

• For each word in Sp, find the word wmax with the maximum subtree in DS.

• Add the word to the head-list Sd.

• Remove the subtree of wmax from Sp.

• Repeat the search until Sp becomes empty.

9

Yields on mutual toroot

NMOD OPRDNMODPMOD

ROOTSBJ

funds continued slide

IM

}

Sp = { }

Sd = [Yields ], to

Yields, on, mutual, funds, to, slide

Page 10: Retrieving Correct Semantic Boundaries in Dependency Structure

Retrieving correct semantic boundaries

• Retrieving the subtrees of head-words

- 100% recall, 87.62% precision, 96.11% F1-score.

- What does this mean?

• The state-of-art SRL system using DS performs about 86%.

• If your application requires actual argument phrases instead of head-words, the performance becomes lower than 86%.

• Improve the precision by applying heuristics on:

- Modals, negations

- Verb chain, relative clauses

- Gerunds, past-participles

10

Page 11: Retrieving Correct Semantic Boundaries in Dependency Structure

Verb Predicates whose Semantic Arguments are their Syntactic Heads

• Semantic arguments of verb predicates can be the syntactic heads of the verbs.

• General solution

- For each head word, retrieve the subtree of the head word excluding the subtree of the verb predicate.

11

The plant owned by Mark

NMOD NMOD LGS PMOD

Page 12: Retrieving Correct Semantic Boundaries in Dependency Structure

Examples• Modals are the heads of the main verbs in DS.

• Conjunctions

• Past-participles

12

He may or read the bookroot

SBJ COORD ADV NMODOBJROOT

may not

CONJCOORD

people who meet exceed

NMOD

or the

DEP NMODOBJ

expectation

CONJCOORD

correspondence mailed about

NMODNMOD

incomplete 8300s

NMODPMOD

Page 13: Retrieving Correct Semantic Boundaries in Dependency Structure

Evaluations• Models

- Model I : retrieving all words in the subtrees (baseline).

- Model II : using all heuristics.

- Model III : II + excluding punctuation.

• Measurements

- Accuracy : exact match

- Precision

- Recall

- F1-score

13

Page 14: Retrieving Correct Semantic Boundaries in Dependency Structure

• Results

- Baseline : 88.00%a, 92.51%p, 100%r , 96.11%f

- Final model : 98.20%a, 99.14%p, 99.95%r, 99.54%f

• Statistically significant (t = 149, p < .0001)

Evaluations

14

88

91

94

97

100

I II III

AccuracyPrecisionRecallF1

Page 15: Retrieving Correct Semantic Boundaries in Dependency Structure

• Overlapping arguments

Error Analysis

15

ARG1 ARGM-LOC

inshare

LOC

burdens the region

NMODPMOD

OBJ

ARG1

inshare burdens the region

NMODPMOD

OBJLOC

Page 16: Retrieving Correct Semantic Boundaries in Dependency Structure

the investors showed forenthusiasm stocks

NMODNMOD

SBJ PMOD

ADV

Error Analysis• PP attachment

16

the investors showed forenthusiasm stocks

NMODNMOD

SBJ ADV PMOD

ARG1

ARG1

Page 17: Retrieving Correct Semantic Boundaries in Dependency Structure

Conclusion• Conclusion

- Find correct head words (min-set with max-coverage).

- Find correct semantic boundaries (99.54% F1-score).

- Suggest ways of reconstructing dependency structure so that it can fit better with semantic roles.

- Can be used to fix some of the inconsistencies in both Treebank and Propbank annotations.

• Future work

- Apply to different corpora.

- Find ways of automatically adding empty categories.

17

Page 18: Retrieving Correct Semantic Boundaries in Dependency Structure

Acknowledgements• Special thanks are due to Professor Joakim Nivre of

Uppsala University (Sweden) for his helpful insights.

• National Science Foundation CISE-CRI-0551615

• Towards a Comprehensive Linguistic Annotation and CISE-CRI 0709167

• Collaborative: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu

• Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc.

18