the salsa experience: semantic role annotation katrin erk university of texas at austin

6
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin

Upload: paul-hensley

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin

The SALSA experience: semantic role annotation

Katrin Erk

University of Texas at Austin

Page 2: The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin

Semantic role annotation in SALSA SALSA: The Saarbrücken Lexical Semantics

Annotation and Analysis project Manual annotation of the German TIGER corpus

with lexical semantic information Basis: The Berkeley FrameNet database Verbs annotated with their Frame (~ sense),

plus semantic roles TIGER corpus: 1.5 million words / 80 K sentences of German newspaper text

(Frankfurter Rundschau) Stuttgart/Potsdam/Saarbrücken Phrase types and grammatical functions

Page 3: The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin

Annotation Scheme

(They didn‘t want to pay the move back because the employee had quit.)

Semantics: Independent frames Trees of depth one One edge points to target, others to frame elements Sem. roles point to syn. constituents

TIGER Syntax: Node labels: constituents Edge labels: gramm. functions Crossing edges POS

Page 4: The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin

Experiences with the semantic role annotation in Salsa Frame (~ sense) assignment more difficult than role

assignment Multiple tags possible, at frame level and at role level Limited compositionality phenomena, each with separate

annotation format in Salsa: Light verbs, metaphor, idioms Distinction often difficult: metaphor vs idiom, bleaching If I did this again, one format, multiple tags possible

Annotation beyond the sentence boundary Message role in Communication frames

Annotation below the word boundary: German noun compounds Mietrechtsdiskussion: discussion of tenant law

Page 5: The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin

Encoding sem. role annotation: TIGER XML as a great basis TIGER XML:

each constituent is an XML element with a globally unique ID

Syn. edges explicitly encoded:<edge> elements links two nodes, referring to their IDs

Models discontinuous constituents Salsa/Tiger XML:

Sem. annotation by adding a modular <sem> block to the XML structure of a sentence

Semantics points to syn. constituents using their IDs Annotation beyond sentence boundary possible:

globally unique syn. IDs

Page 6: The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin

Extracting a lexicon: need for a deeper, richer syntax Extracting syntax/semantics mapping:

needs to identify gramm. functions filled by sem. roles

Problems: Constituent structure rather than

dependencies: subjects hard to retrieve

TIGER does not mark voice Shallow format for PPs: determining heads is hard Coordination is a pain