timebank status status of timeml annotation for the ula project james pustejovsky and marc verhagen...
Post on 22-Dec-2015
221 views
TRANSCRIPT
TimeBank Status
Status of TimeML annotation for the ULA project
James Pustejovsky and Marc Verhagen
Brandeis University
TimeML
• Annotation language for times, events and the links between them
• A TimeML annotation is a graph with events and times as the nodes and temporal links as the edges
• <EVENT>– Any eventuality in a document– Some states are annotated, some aren’t
• <TIMEX3>– Time expressions
TimeML Links
• <SLINK>– Subordinating relations– Perception, intentional states and actions,
reporting, modal contexts• He saw an explosion• The meeting was canceled• I expect no improvements
• <TLINK>– Temporal relations, using a set of relation types
based on the interval algebra of James Allen
TimeML ExampleThe Soviet Union <EVENT eid="e12” class="REPORTING">said</EVENT> <TIMEX3 tid="t1" TYPE="DATE" VAL="PRESENT_REF" temporalFunction="true"> today</TIMEX3> it had <EVENT eid="e13" class="OCCURRENCE">sent</EVENT> an envoy to the Middle East.
<MAKEINSTANCE eventID="e12" eiid="ei12" tense="PAST" aspect="NONE" pos="VERB"/><MAKEINSTANCE eventID="e13" eiid="ei13" tense="PAST" aspect="PERFECTIVE" pos="VERB"/>
<SLINK lid="l2" relType="EVIDENTIAL” eventInstanceID="ei12” subordinatedEventInstance="ei13"/><TLINK lid=“l3” relType=“IS_INCLUDED” eventID=“ei12” relatedToTime=“t1”/>
Timebank
• Annotation in conjunction with emerging specifications, guidelines and annotation tools– Annotated corpus as proof of concept for the
TimeML language– Dynamic specifications– Experimental tools (tango, timebank browser)
• Inline XML
• Available for free through the LDC
• http://timeml.org
Timebank Issues (1)
• Small corpus (60K tokens)– Too small to be useful for machine learning– Small overlap with Propbank and Nombank
• Slow annotation process– No automatic pre-processing– No use of previous structure
• Informal quality control– No dual annotation, no 90% rule– Guidelines incomplete and not enforced
Timebank Issues (2)
• Lowish inter-annotator agreement– Annotators do not create the same Tlinks– If they do, they only agree 77% of the time
• Anecdotal evidence of inconsistencies in annotation – 32 documents of TimeBank version 1.1 were
inconsistent
• Annotators do not create the same Tlinks (regardless of relation type)
• Inline XML inhibits interoperability
TempEval
• SemEval 2007 workshop
• Three subtasks– Task A: Event-Time in same sentence– Task B: Event-DCT– Task C: Main events in consecutive sentences
• TimeML Light for TempEval corpus– Limited set of relations, defined as disjunctions
over TimeML relations– before, after, overlap, before-or-overlap,
overlap-or-after, vague
TempEval Advantages
• Consistency in annotation– Data for each task prepared automatically– all annotators add the same TLINKs
• Discrete tasks are simple (in some sense)
• Easy pair-wise evaluation
• Much faster annotation– About 10 times faster than for Timebank
TempEval Issues (1)
• Still low inter-annotator agreement– Task A: 69% – Task B: 74%– Task C: 65%
• Choice of relations
• Need more than three tasks
• Ranking of tasks would be useful
TempEval Issues (2)
• Inconsistencies still possible– Task B
• walke7 BEFORE DCT
• talke8 AFTER DCT
– Task C• walke7 SIMULTANEOUS talke8
Decomposition
• Annotation as unstructured task is complex– Leaves a lot of freedom to annotators– Creation of guidelines is hard
• Split into subtasks– Annotation is faster on subtasks– Tasks can be evaluated separately which has
advantages for automatic tagging– Guidelines for each task– Structures workflow
Annotation Tasks
1. anchoring a nominal event to a time expression in its immediate contextthe April blizzard
2. anchoring a verbal event to a time expression that is governed by the event (a temporal adjunct)we had lift-off at 8pm
3. ordering consecutive events in a sentence he walked over thinking about the consequences
4. determining the temporal relation between two dates
Annotation Tasks
5. ordering events that occur in syntactic subordination relations
• event subject with governing verb event the massive explosion shook the building
• verbal event with object eventthey observed the election
• reporting event with subordinated eventthe witness said it happened too fast
• perception event with subordinated eventshe heard an explosion
• an intentional process or state with subordinated event I want to sleep for a week
Annotation Tasks
6. ordering events in coordinationswalking and talking
7. anchoring an event to the document creation time(can be split up according to the event's class)
8. ordering two main non-reporting events in consecutive sentencesJohn fell after the marathon. He got hurt.
9. ordering two arguments in a discourse relation I am resting because I just lifted a barrel of rum.
Counts
1 Nominal event to time expression 1
2 Verbal event to time expression 13
3 Consecutive events in sentence 61
4 Temporal relation between two dates 6
5 Event subject with governing verbal event 6
Verbal event with event object 12
Reporting event with subordinated event 14
Perception event with subordinated event 0
Intensional event with subordinated event 18
6 Events in coordinations 2
7 Event with document creation time 104
8 Two main non-reporting events 35
9 Two arguments in discourse relation 9
(Measured over two TimeBank documents, ABC19980120.1830.0957 and ABC19980108.1830.0711, with 104 events and 13 time expressions)
The 90% Rule
• Used in OntoNotes and PropBank
• Reshuffle senses for a word if IAA < 90%, mark word if IAA remains low
• Not possible for us since we cannot discard a relation if IAA is too low
• But this can be done on a task-by-task basis
• Try to pick relation sets for each task with high IAA in mind
Relation Sets
• Allow different relation sets for sub tasks– Time-Event in noun phrase could use specific
relations– Event-Event in conjunctions uses more vague
TempEval-like relations
• Restriction: each relation in a relation set can be mapped to a disjunction of TimeML relations
Composition
• Collect all tlinks and check for inconsistencies– We know there are no task internal
inconsistencies– Semi-automatically resolve conflicts
• Some tasks have higher IAA and precision
– Constraint propagation (aka temporal closure)
• Global annotation with a graphical tool
Using Syntax
• Syntactic definition of most tasks
• Use TreeBank annotation
• Allows automatic creation of tasks using scripts that traverse the tree
• PP inside VP with event verb – (wsj_0032 and wsj_0135)– is scheduled VG[to expire] PP[at the end of November] – also said it VG[expects to post] NP[sales] PP[in the current
fiscal year]
Argument Relations
• NomBank support verbs– Give a demonstration
• Argument relation between two events– Sometimes indicates that there is an SLINK
• PropBank modifier ARGM-TMP– ARGM-TMP usually is a TIMEX3– TLINK with the head of the ARGM-TMP
• Discourse Treebank args– The guest ran away because dinner was
served late