Transcript
Page 1: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 1

Semi-automatic Annotation of the Semi-automatic Annotation of the Romanian TimeBank 1.2Romanian TimeBank 1.2

Corina ForăscuCorina Forăscu, Radu Ion, Dan Tufi, Radu Ion, Dan Tufişş

Faculty of Computer Science, Al.I. Cuza University of IaFaculty of Computer Science, Al.I. Cuza University of Iassi, Romaniai, Romania&&

Research Institute for Artificial Intelligence of the Romanian Academy Research Institute for Artificial Intelligence of the Romanian Academy

[email protected]@info.uaic.ro , , {radu{radu, , tufis}@racai.rotufis}@racai.ro

Page 2: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 2

Outline Outline 1.1. FundamentalsFundamentals

2.2. TimeML & TimeBankTimeML & TimeBank

3.3. Corpus processingCorpus processing

1.1. translationtranslation

2.2. pre-processingpre-processing

3.3. AlignmentAlignment

4.4. Annotation importAnnotation import

4.4. ConclusionsConclusions

Page 3: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 3

FundamentalsFundamentalsTemporal information in Natural Language: Temporal information in Natural Language:

1.1. Time-denoting expressions – references to a calendar or Time-denoting expressions – references to a calendar or clock systemclock system

• expressed by NPs, PPs, or AdvPsexpressed by NPs, PPs, or AdvPs

• the 23the 23rdrd of May, 1998; Monday; tomorrow; the second semester of May, 1998; Monday; tomorrow; the second semester

2.2. Event-denoting expressions - reference to an eventEvent-denoting expressions - reference to an event expressed by expressed by

1.1. sentences – more precisely their syntactic head, the main verb:sentences – more precisely their syntactic head, the main verb:

John John listenslistens to the music. to the music.

2.2. noun phrases:noun phrases:

Israel will ask the USA to delay a military Israel will ask the USA to delay a military strikestrike against Iraq against Iraq..

Page 4: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 4

Motivation (1)Motivation (1)NLP applications to benefit:

•lexicon induction, linguistic investigation, using very large annotated corpora;

•question answering (questions like when, how often or how long);

•information extraction or information retrieval;

•machine translation (translated and normalized temporal references; mappings between different behavior of tenses from language to language);

•discourse processing: temporal structure of discourse and summarization.

Page 5: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 5

Acum îşi dădea seama că tocmai din cauza acestui incident se hotărâse el brusc să vină acasă şi să-şi înceapă jurnalul taman astăzi.

Now he realised that exactly because of this inicident he decided suddenly to come home and to begin his jurnal exactly today.

Motivation (2)Motivation (2)

Page 6: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 6

Acum îşi dădea seama că tocmai din cauza acestui incident se hotărâse el brusc să vină acasă şi să-şi înceapă jurnalul taman astăzi.

<TIMEX3 temporalFunction="true" tid="t152" type="TIME" value="PRESENT_REF">Acum</TIMEX3> îsi <EVENT aspect="PROGRESSIVE" class="OCCURENCE" eid="e153" tense="PAST">dădea</EVENT><MAKEINSTANCE eiid="ei59" eid="e153" cardinality="1" /> seama <SIGNAL sid="s154">ca</SIGNAL> tocmai din cauza acestui <EVENT aspect="NONE" class="OCCURENCE" eid="e156" tense="NONE">incident</EVENT> <MAKEINSTANCE eiid="ei60" eid="e156" cardinality="1" /> se <EVENT aspect="PERFECTIVE" class="I_ACTION" eid="e157" tense="PAST">hotarâse</EVENT><MAKEINSTANCE eiid="ei61" eid="e157" cardinality="1" /> el brusc <SIGNAL sid="s54">sa</SIGNAL><EVENT aspect="NONE" class="OCCURENCE" eid="e159" tense="PRESENT">vină</EVENT><MAKEINSTANCE eiid="ei62" eid="e159" cardinality="1" /> acasa <SIGNAL sid="s160">si</SIGNAL><SIGNAL sid="s55">sa</SIGNAL> -si <EVENT aspect="NONE" class="ASPECTUAL" eid="e161" tense="PRESENT">înceapă</EVENT> <MAKEINSTANCE eiid="ei63" eid="e161" cardinality="1" /> jurnalul taman <TIMEX3 temporalFunction="true" tid="t162" type="DATE" value="1984-04-04">astăzi</TIMEX3> .   <TLINK eventInstanceID="ei59" relatedToTime="t152" relType="SIMULTANEOUS" />   <TLINK eventInstanceID="ei60" relatedToEvent="e157" relType="BEFORE" />

Motivation (3)Motivation (3)

Page 7: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 7

State of the ArtState of the Art

1947 Reichenbach: The tenses of verbs

1998 MUC 7

2000 TIMEX

2004 ACE – TERN: TIMEX2 v.1.1.TARSQI: TimeML v.1.2.

2005 ACE – TERN: TIMEX2 v.1.2.ACL 2005: TARSQI system

ACL-COLING WS: ARTE

Annotating and Reasoning about Time and Events2006 Time Symposium

ACL: Temporal and Spatial Information Processing2001 STAG (Setzer) TIDES 2001: TIMEX2 v.1.0.2

LREC 2002 Annotation Standards for Temporal Information in Natural Language

2002 DAML-TimeTERQAS: TimeML v.1.0.

Page 8: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 8

TERQAS 2002TERQAS 2002+TimeML v.1.0TimeML v.1.0 metadata standard for: metadata standard for:

marking events, marking events,

their temporal anchoring and their temporal anchoring and

links in news articles links in news articles

+TimeBankTimeBank corpus v.1.0. corpus v.1.0.

+guidelines for temporal annotationguidelines for temporal annotation

Page 9: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 9

Outline Outline 1.1. FundamentalsFundamentals

2.2. TimeML & TimeBankTimeML & TimeBank

3.3. Corpus processingCorpus processing

1.1. translationtranslation

2.2. pre-processingpre-processing

3.3. AlignmentAlignment

4.4. Annotation importAnnotation import

4.4. ConclusionsConclusions

Page 10: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 10

TimeML v.1.2TimeML v.1.2

A metadata standard developed especially for news A metadata standard developed especially for news articles, for marking articles, for marking Events: Events: EVENTEVENT, , MAKEINSTANCEMAKEINSTANCEtemporal anchoring of events: temporal anchoring of events: TIMEX3TIMEX3, , SIGNALSIGNALlinks between events and/or timexes: links between events and/or timexes: TLINKTLINK, , ALINK, SLINKALINK, SLINK

Page 11: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 11

Events (1) Events (1) situations that happen or occur, states or situations that happen or occur, states or

circumstances in which something obtains or circumstances in which something obtains or holds trueholds true

tensed verbs, adjectives, nominalizationstensed verbs, adjectives, nominalizations

The oat-bran The oat-bran crazecraze e190e190 has has costcost e189e189 the world's the world's largest cereal maker market share.largest cereal maker market share.

7 classes of EVENTs:7 classes of EVENTs: OCCURRENCEOCCURRENCE, , PERCEPTIONPERCEPTION, , REPORTINGREPORTING, , ASPECTUALASPECTUAL, , STATESTATE, , I_STATEI_STATE, , I_ACTIONI_ACTION

Page 12: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 12

Events (2) Events (2) The oat-bran <class="The oat-bran <class="OCCURRENCEOCCURRENCE"> "> crazecrazee190e190</EVENT> has </EVENT> has <class="<class="OCCURRENCEOCCURRENCE">">costcoste189e189 </EVENT> the world's largest cereal </EVENT> the world's largest cereal maker market share.maker market share.

Analysts <class="Analysts <class="REPORTINGREPORTING" " >>saysaye28e28</EVENT> much of Kellogg's </EVENT> much of Kellogg's <class="<class="OCCURRENCEOCCURRENCE">">erosionerosion e204e204 </EVENT> has been in such core </EVENT> has been in such core brands as Corn Flakes, ...brands as Corn Flakes, ...

Page 13: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 13

InstancesInstances Based on the event annotation: how many Based on the event annotation: how many

different instances or realizations has a given different instances or realizations has a given event – at least one event – at least one

Carries the Carries the tensetense and and aspectaspect of the verb- of the verb-denoted eventdenoted event

John John learnslearnse1e1 twicetwice on Monday. on Monday.

<MAKEINSTANCE eiid=‘ei1’ eventID=‘e1’ <MAKEINSTANCE eiid=‘ei1’ eventID=‘e1’ signalID=‘s1’ cardinality=‘2’ signalID=‘s1’ cardinality=‘2’ aspect="NONE" tense="PRESENT"aspect="NONE" tense="PRESENT">>

Page 14: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 14

Temporal expressions: Temporal expressions: TIMEX3 TIMEX3 (1)(1)

Explicit & implicit temporal expressions:Explicit & implicit temporal expressions:

• • TimesTimes: : 11 o’clock11 o’clock; ; midnightmidnight

• • Dates: Dates: Fully Specified (Fully Specified (May 23, 2006May 23, 2006; ; winter, 2005winter, 2005), ),

Underspecified (Underspecified (MondayMonday; ; next weeknext week; ; last monthlast month; ; two years agotwo years ago))

• • DurationsDurations: : two monthstwo months; ; three hoursthree hours

• • SetsSets: : every weekevery week; ; every Tuesdayevery Tuesday

Page 15: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 15

Temporal expressions: Temporal expressions: TIMEX3 TIMEX3 (2)(2)

<TIMEX3 tid="<TIMEX3 tid="t192t192" type="DATE" " type="DATE" temporalFunction="false" temporalFunction="false" functionInDocument="CREATION_TIME" value="1989-functionInDocument="CREATION_TIME" value="1989-10-30" >10-30" >10/30/8910/30/89</TIMEX3></TIMEX3>

<TIMEX3 mod="APPROX" tid="t220" type="DURATION" <TIMEX3 mod="APPROX" tid="t220" type="DURATION" temporalFunction="true" temporalFunction="true" functionInDocument="NONE" value="P2Y" functionInDocument="NONE" value="P2Y" anchorTimeID="t192" >anchorTimeID="t192" >the next two years or the next two years or soso</TIMEX3></TIMEX3>

<TIMEX3 tid="t207" type="DATE" <TIMEX3 tid="t207" type="DATE" temporalFunction="true" temporalFunction="true" functionInDocument="NONE" value="FUTURE_REF" functionInDocument="NONE" value="FUTURE_REF" anchorTimeID="t192" >anchorTimeID="t192" >soonsoon</TIMEX3> </TIMEX3>

Page 16: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 16

Temporal signals: Temporal signals: SIGNALSIGNALFunction words that indicate how temporal objects are to Function words that indicate how temporal objects are to

be related to each other:be related to each other: temporal prepositions, conjunctions and/or modifiers: temporal prepositions, conjunctions and/or modifiers:

on, in, at, from, to, before, after, during; before, after, on, in, at, from, to, before, after, during; before, after, while, whenwhile, when

negative expressions negative expressions modal verbs modal verbs prepositions signaling modality (“to”) prepositions signaling modality (“to”) special characters denoting ranges in temporal special characters denoting ranges in temporal

expressions: “-” and “/”expressions: “-” and “/”

Page 17: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 17

Dependencies: Dependencies: LINKLINKss Temporal Relations:Temporal Relations: TLINKTLINK

Anchors to TimeAnchors to Time Orders between Time and EventsOrders between Time and Events

Aspectual Relations:Aspectual Relations: ALINKALINK Phases of an eventPhases of an event

Subordinating Relations:Subordinating Relations: SLINKSLINK Events that syntactically subordinate other eventsEvents that syntactically subordinate other events

Page 18: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 18

Temporal relations: Temporal relations: TLINK TLINK (1)(1) temporal relation between two temporal elements temporal relation between two temporal elements

(event-event, event-timex); (event-event, event-timex); EVENTEVENTs – through their s – through their INSTANCEINSTANCEss 13 relTypes – as Allen’s: 13 relTypes – as Allen’s:

SimultaneousSimultaneous IdenticalIdentical One One before (/after)before (/after) the other the other One One immediately beforeimmediately before (+after) the other (+after) the other One One including / being includedincluding / being included in the other in the other One One holdingholding during the duration of the other during the duration of the other One being the One being the beginning (/ending)beginning (/ending) of the other of the other One being One being begun (/ended) bybegun (/ended) by the other the other

Page 19: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 19

Temporal relations: Temporal relations: TLINK TLINK (2)(2)

The oat-bran The oat-bran crazecrazee190/ei1994e190/ei1994 has has costcoste189/ei1995e189/ei1995 the world's the world's largest cereal maker market share.largest cereal maker market share.

The company's president The company's president quit quit e3 /ei1996e3 /ei1996 suddenly. suddenly.

craze cost 10/30/89

ei1994 ei1995 t192

quit

ei1996

Page 20: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 20

Temporal relations: Temporal relations: TLINK TLINK (3)(3)

<TLINK relatedToEventInstance="ei1995" <TLINK relatedToEventInstance="ei1995" eventInstanceID="ei1994" relType="eventInstanceID="ei1994" relType="BEFOREBEFORE" />" />

<TLINK relatedToTime="t192" <TLINK relatedToTime="t192" eventInstanceID="ei1996" relType="eventInstanceID="ei1996" relType="BEFOREBEFORE" />" />

<TLINK relatedToEventInstance="ei1995" <TLINK relatedToEventInstance="ei1995" eventInstanceID="ei1996" eventInstanceID="ei1996" relType="relType="IS_INCLUDEDIS_INCLUDED" />" />

craze cost 10/30/89

ei1994 ei1995 t192

quit

ei1996

Page 21: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 21

Aspectual relations: Aspectual relations: ALINKALINK relationship between an aspectual event and its argument relationship between an aspectual event and its argument

event:event: InitiationInitiation:: John John started started ei5ei5 toto readread ei6ei6..

<ALINK eventInstanceID="ei5" <ALINK eventInstanceID="ei5" relatedToEventInstance="ei6" relType="INITIATES"/>relatedToEventInstance="ei6" relType="INITIATES"/>

CulminationCulmination: : John John finished finished ei5ei5 assemblingassembling ei6ei6 the table the table..

<ALINK eventInstanceID="ei5“ <ALINK eventInstanceID="ei5“ relatedToEventInstance="ei6“ relType="TERMINATES"/>relatedToEventInstance="ei6“ relType="TERMINATES"/>

TerminationTermination:: John John stoppedstopped talkingtalking..

ContinuationContinuation: : John John keptkept talkingtalking..

Page 22: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 22

Subordination relations: Subordination relations: SLINKSLINK for contexts introducing relations between two for contexts introducing relations between two

events of type:events of type: Modal:Modal: John John shouldshould have have boughtbought some wine. some wine.

Factive:Factive: John John forgotforgot that he that he waswas in Boston in Boston

yesterday.yesterday. Counterfactive:Counterfactive: John John preventedprevented the the divorcedivorce.. Evidential:Evidential: John John saidsaid he he boughtbought some wine. some wine. Negative evidential:Negative evidential: John John denieddenied he he bought bought

only beer.only beer. Conditional:Conditional: If John If John leavesleaves today, Mary will today, Mary will

crycry..

Page 23: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 23

TimeBank 1.2 TimeBank 1.2

183 English news report documents TimeML annotated, distributed through LDC

4715 sentences with 10586 unique lexical units, from a total of 61042 lexical units

Non-TimeML Markup in Time Bank 1.1:• structure information: header• named entity recognition: <ENAMEX>, <NUMEX>,

<CARDINAL>• sentence boundary information: <s>

Page 24: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 24

TimeBank 1.2 TimeBank 1.2

events 7935

instances 7940

timexes 1414

signals 688

alinks 265

slinks 2932

tlinks 6418

TOTAL 27592

Page 25: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 25

Outline Outline 1.1. FundamentalsFundamentals

2.2. TimeML & TimeBankTimeML & TimeBank

3.3. Corpus processingCorpus processing

1.1. translationtranslation

2.2. pre-processingpre-processing

3.3. AlignmentAlignment

4.4. Annotation importAnnotation import

4.4. ConclusionsConclusions

Page 26: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 26

TranslationTranslation• 2 “trained translators”; one final correction• Translation desiderata:

• 1-1 sentence aligned• Preserving POS• Verb tense – mapped onto Romanian• Format of the dates, moments of day and

numbers conforms to the norms of written Romanian

• 4715 sentences (translation units), 65375 lexical tokens, including punctuation marks, representing 12640 lexical types

Page 27: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 27

Preprocessing the corpusPreprocessing the corpus

• Tokenisation – MtSeg, with idiomatic expressions, clitic splitting

• POS-tagging – TnT adapted & improved to determine the POS of unknown words

• Lemmatisation – probabilistic, based on a lexicon

• Chunking – REs over POS tags to determine non-recursive NPs, APs, AdvPs, PPs

Page 28: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 28

Alignment Alignment YAWA : 4 stages, evaluated over the data in the : 4 stages, evaluated over the data in the

Shared Task on Word Alignment, Romanian-Shared Task on Word Alignment, Romanian-English track organized at ACL2005 English track organized at ACL2005

Current: P = 88.80%, R = 74.83%, F = 81.22% Current: P = 88.80%, R = 74.83%, F = 81.22%

91714 alignments, manually checked, out of which 91714 alignments, manually checked, out of which 25346 are NULL-alignments 25346 are NULL-alignments

Page 29: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 29

Alignment Alignment 1. 1. Content words alignment: based on the translation Content words alignment: based on the translation

lexiconslexicons

P = 94.08%, R = 34.99%, F = 51.00%.P = 94.08%, R = 34.99%, F = 51.00%.

2. 2. Inside-Chunks alignment: simple empirical rules to align Inside-Chunks alignment: simple empirical rules to align the words within the corresponding chunks; the words within the corresponding chunks;

P = 89.90%, R = 53.90%, F = 67.40%P = 89.90%, R = 53.90%, F = 67.40%

3. 3. Alignment in contiguous sequences of unaligned words: Alignment in contiguous sequences of unaligned words: using the POS-affinities of the unaligned words and using the POS-affinities of the unaligned words and their relative positionstheir relative positions

4. 4. Correction phase: the wrong links introduced mainly in Correction phase: the wrong links introduced mainly in stage 3 are now removed. stage 3 are now removed.

Page 30: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 30

Alignment Alignment

Page 31: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 31

Alignment Alignment The parallel corpus = 183 files in XCES format

<tu id="1">

<seg lang="ro">

<s id="Timex.ro.1">

  <w lemma="pe_de_altă_parte" ana="14+,R" chunk="Ap#1">Pe_de_altă_parte</w>

  <c>,</c>

  <w lemma="sine" ana="12+,PXA" chunk="Vp#1">se</w>

  <w lemma="dovedi" ana="1+,V3" chunk="Vp#1">dovedeşte</w>

  <w lemma="a" ana="15+,QN" chunk="Vp#2">a</w>

  <w lemma="fi" ana="1+,VN" chunk="Vp#2">fi</w>

  <w lemma="alt" ana="22+,PI" chunk="Np#1">altă</w>

  <w lemma="săptămână" ana="1+,NSRN" chunk="Np#1">săptămână</w>

  <w lemma="financiar" ana="1+,ASN" chunk="Np#1,Ap#2">financiară</w>

  <w lemma="foarte" ana="14+,R" chunk="Np#1,Ap#2">foarte</w>

  <w lemma="prost" ana="1+,ASN" chunk="Np#1,Ap#2">proastă</w>

  …

</s> </seg></tu>

<tu id="1">

<seg lang="en">

<s id="Timex.en.1">

  <w lemma="on_the_other_hand" ana="14+,ADVE" chunk="Ap#1">On_the_other_hand</w>

<c>,</c>

<w lemma="it" ana="13+,PPER3" chunk="Vp#1">it</w>

<w lemma="be" ana="3+,AUX3" chunk="Vp#1">'s</w>

  <w lemma="turn" ana="1+,PPRE" chunk="Vp#1">turning</w>

  <w lemma="out" ana="5+,PREP">out</w>

<w lemma="to" ana="15+,TO" chunk="Vp#2">to</w>

<w lemma="be" ana="1+,VINF" chunk="Vp#2">be</w>

<w lemma="another" ana="22+,PI">another</w>

  <w lemma="very" ana="14+,ADVE" chunk="Ap#2">very</w>

  <w lemma="bad" ana="1+,ADJE" chunk="Ap#2,Np#1">bad</w>

  <w lemma="financial" ana="1+,ADJE" chunk="Ap#2,Np#1">financial</w>

  <w lemma="week" ana="1+,NN" chunk="Np#1">week</w>

  … </s>

</seg>

Page 32: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 32

Annotation importAnnotation import Based on the Romanian-English lexical alignmentBased on the Romanian-English lexical alignment

Page 33: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 33

Annotation importAnnotation import For every pair of sentences Sro and Sen from the TimeBank For every pair of sentences Sro and Sen from the TimeBank

parallel corpus with the Ten English equivalent parallel corpus with the Ten English equivalent sentence:sentence:

1. construct a list E of pairs of English text fragments with 1. construct a list E of pairs of English text fragments with sequences of English indexes from Sen and Ten. sequences of English indexes from Sen and Ten.

E = {<”In the”; 1,2>, <”Philippines”; 3>, <”, a”; 4,5>, E = {<”In the”; 1,2>, <”Philippines”; 3>, <”, a”; 4,5>, <”four”; 6>, <”year”; 7>, <”low .”; 8,9>}.<”four”; 6>, <”year”; 7>, <”low .”; 8,9>}.

Page 34: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 34

Annotation importAnnotation import 2. add to every element of E the XML context in which that 2. add to every element of E the XML context in which that

text fragment appeared in the original English text fragment appeared in the original English TimeBank. TimeBank.

E’ = {<”In the”; 1,2; s>, <”Philippines”; 3; s, E’ = {<”In the”; 1,2; s>, <”Philippines”; 3; s, ENAMEX>, …}ENAMEX>, …}

3. construct the list RW of Romanian words along with the 3. construct the list RW of Romanian words along with the transferred XML contexts using E’ and the lexical transferred XML contexts using E’ and the lexical alignment between Sro and Sen. If a word in Sro is not alignment between Sro and Sen. If a word in Sro is not aligned, the top context for it, namely s, is considered.aligned, the top context for it, namely s, is considered.

RW = {<”În”; s>, <”Filipine”; s,ENAMEX>, …}.RW = {<”În”; s>, <”Filipine”; s,ENAMEX>, …}.

Page 35: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 35

Annotation importAnnotation import 4. construct the final list R of Romanian text fragments 4. construct the final list R of Romanian text fragments

from RW by conflating adjacent elements of RW that from RW by conflating adjacent elements of RW that appear in the same XML context. Output the list in appear in the same XML context. Output the list in XML format.XML format.

Page 36: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 36

Annotation importAnnotation import Offline markup (Offline markup (MAKEINSTANCE, ALINK, TLINK MAKEINSTANCE, ALINK, TLINK

and SLINK tags) : the transfer kept only those and SLINK tags) : the transfer kept only those XML tags from the English version whose IDs XML tags from the English version whose IDs belong to XML structures that have been belong to XML structures that have been transferred to Romanian transferred to Romanian

Page 37: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 37

Annotation importAnnotation import TimeML tags % transfered

events 7703 97.07

instances 7706 97.05

timexes 1356 95.89

signals 668 97.09

alinks 249 93.96

slinks 2831 96.55

tlinks 6122 95.38

TOTAL 26635 96.53

Page 38: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 38

Conclusions & future workConclusions & future work• improve & evaluate the annotation transfer• adequacy of temporal theories to Romanian• (semi) automatically mark-up of the temporal

information in Romanian texts (news + literature)

Page 39: Semi-automatic Annotation of the Romanian TimeBank 1.2

Semi-automatic Annotation of the Romanian TimeBank 1.2

CALP07 workshop @ RANLP 39

Thank you!Thank you!

(Temporal) Questions???(Temporal) Questions???


Top Related