dealing with italian temporal expressions: the ita-chronos system

22
Dealing with Italian Temporal Expressions: the ITA-Chronos System Matteo Negri Fondazione Bruno Kessler - IRST, Trento - Italy [email protected] EVALITA 2007 - Evaluation of NLP Tools for Italian Rome - Italy September 10, 2007

Upload: meli

Post on 17-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Dealing with Italian Temporal Expressions: the ITA-Chronos System. Matteo Negri Fondazione Bruno Kessler - IRST, Trento - Italy [email protected] EVALITA 2007 - Evaluation of NLP Tools for Italian Rome - Italy September 10, 2007. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dealing with Italian Temporal Expressions: the ITA-Chronos System

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Matteo NegriFondazione Bruno Kessler - IRST, Trento - Italy

[email protected]

EVALITA 2007 - Evaluation of NLP Tools for ItalianRome - ItalySeptember 10, 2007

Page 2: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Outline

• Chronos: a multilingual system for TE recognition/normalization

• System description

• Some examples

• Results at EVALITA 2007

Page 3: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Chronos

• Multilingual (ITA/ENG) tool for TE recognition and normalization according to the TIMEX2 standard

• Approach– Rule-based system

• ENG-Chronos: 1500 rules

• ITA-Chronos: 981 rules

– Six phases: Preprocessing, Detection, Braketing, Information Gathering, Anchors Selection, Normalization

• ENG-Chronos participated in TERN-04 with good results on the “Recognition+Normalization Task”

– Ranked 2nd, with 76% TERN-Value (best system: 78%)

Page 4: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

ITA-Chronos: System Architecture

Tokenization, POS Tagging, Multiwords Recognition

DetectionBasic Tagging Rules

Bracketing Composition Rules

Information GatheringTagging Rules for: SET, Anchor_Dir,

Anchor_Val, MOD Type, T_Cat, Heur, Op,

Quant, Val_Ext

Plain Text Intermediate Annotation

Attributes Normalization

Dates Normalization

Anchors Selection

Tagged Text

Detection and Bracketing Normalization

Page 5: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

STEP1: Preprocessing

• The first phase of the process performs:– Tokenization

– POS tagging

– Multiwords recognition

• The preprocessed input text is then passed to the TE detection phase, where around 400 tagging rules are in charge of finding all the TEs it contains.

Page 6: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

STEP2: Detection

• Markable expressions are detected considering the presence of lexical triggers in the input text– “anno”, “oggi”, “Venerdì”, “Natale”, “quotidianamente”,

“10/09/2007”, “1982”, etc.

• Basic Tagging Rules– Regular expressions checking for: word senses, parts of speech,

symbols, or words satisfying specific predicates

PATTERN t1 t2 t3

t1 [pos=“E”]

t2 [pos=“N”]

t3 [pred=TimeUnit-p]

OUTPUT <TIMEX2>t1 t2 t3<\TIMEX2>

Tagging rule matching with “Fra tre giorni”

…“E” = preposition

…“N” = numeral

…TimeUnit-p satisfied by: “secondo”, “minuto”, “ora”, “giorno”, “settimana”, “mese”, etc.

Page 7: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

STEP3: Bracketing

• Considers the context surrounding the detected triggers– “inizio”, “fine”, “prima”, “dopo”, “fa”, “successivo”, “precedente”,

“durante”, “circa”, “almeno”, “3”, “sesto”, etc.

• Composition rules:– In charge of handling conflicts between possible multiple taggings (e.g.

when a recognized TE contains, overlaps, or is adjacent to one or more detected TEs)

PATTERN T-EXP1 T-EXP2

T-EXP1 [start = n] [end = m]

T-EXP2 [start = n≤o<m] [end = o<p≤m]

OUTPUT T-EXP-1

T-EXP-1 [start = n] [end = m]

Composition rule for handling inclusions

Tutta la notte di sabato

Tutta la nottela notte

la notte di sabatosabato

Tutta la notte di sabato

Page 8: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

STEP4: Information gathering

• Goal: mine relevant information for normalization

• Considers triggers+context to assign values to – TIMEX2 attributes (e.g. SET, MOD, ANCHOR_DIR)

– TEMPORARY attributes (e.g. Type, T_Cat, Heur, Op, Quant)

• This is done by running separate sets of specialized tagging rules

• Such information is stored in the Intermediate Annotation, and input to the normalization component

Page 9: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Information Gathering: Example

TIMEX2 attributes

MOD: “più di”, “circa”, “oltre” …

SET: “ogni”, “tutti” …

ANCHOR_DIR: “prima”, “durante”, “dopo”...

TEMPORARY attributes

type: [T-ABS | T-REL]

t-cat: [second, minute, hour, day,…]

op: [=, +, -]

quant: [n≥0]

heur: [CR-DATE | PR-DATE]

Page 10: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Information Gathering: Exampleoltre tre anni dopo

TIMEX2 attributes

MOD: “più di”, “circa”, “oltre” …

SET: “ogni”, “tutti” …

ANCHOR_DIR: “prima”, “durante”, “dopo”...

TEMPORARY attributes

type: [T-ABS | T-REL]

t-cat: [second, minute, hour, day,…]

op: [=, +, -]

quant: [n≥0]

heur: [CR-DATE | PR-DATE]

Detected TE

Page 11: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Information Gathering: Exampleoltre tre anni dopo

TIMEX2 attributes

MOD: “più di”, “circa”, “oltre” …

SET: “ogni”, “tutti” …

ANCHOR_DIR: “prima”, “durante”, “dopo”...

TEMPORARY attributes

type: [T-ABS | T-REL]

t-cat: [second, minute, hour, day,…]

op: [=, +, -]

quant: [n≥0]

heur: [CR-DATE | PR-DATE]

MORE_THAN

Page 12: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Information Gathering: Exampleoltre tre anni dopo

TIMEX2 attributes

MOD: “più di”, “circa”, “oltre” …

SET: “ogni”, “tutti” …

ANCHOR_DIR: “prima”, “durante”, “dopo”...

TEMPORARY attributes

type: [T-ABS | T-REL]

t-cat: [second, minute, hour, day,…]

op: [=, +, -]

quant: [n≥0]

heur: [CR-DATE | PR-DATE]

MORE_THAN

ENDING

Page 13: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Information Gathering: Exampleoltre tre anni dopo

TIMEX2 attributes

MOD: “più di”, “circa”, “oltre” …

SET: “ogni”, “tutti” …

ANCHOR_DIR: “prima”, “durante”, “dopo”...

TEMPORARY attributes

type: [T-ABS | T-REL]

t-cat: [second, minute, hour, day,…]

op: [=, +, -]

quant: [n≥0]

heur: [CR-DATE | PR-DATE]

MORE_THAN

ENDING

T-REL

Page 14: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Information Gathering: Exampleoltre tre anni dopo

TIMEX2 attributes

MOD: “più di”, “circa”, “oltre” …

SET: “ogni”, “tutti” …

ANCHOR_DIR: “prima”, “durante”, “dopo”...

TEMPORARY attributes

type: [T-ABS | T-REL]

t-cat: [second, minute, hour, day,…]

op: [=, +, -]

quant: [n≥0]

heur: [CR-DATE | PR-DATE]

MORE_THAN

ENDING

T-REL

YEAR

Page 15: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Information Gathering: Exampleoltre tre anni dopo

TIMEX2 attributes

MOD: “più di”, “circa”, “oltre” …

SET: “ogni”, “tutti” …

ANCHOR_DIR: “prima”, “durante”, “dopo”...

TEMPORARY attributes

type: [T-ABS | T-REL]

t-cat: [second, minute, hour, day,…]

op: [=, +, -]

quant: [n≥0]

heur: [CR-DATE | PR-DATE]

MORE_THAN

ENDING

T-REL

YEAR

+

Page 16: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Information Gathering: Exampleoltre tre anni dopo

TIMEX2 attributes

MOD: “più di”, “circa”, “oltre” …

SET: “ogni”, “tutti” …

ANCHOR_DIR: “prima”, “durante”, “dopo”...

TEMPORARY attributes

type: [T-ABS | T-REL]

t-cat: [second, minute, hour, day,…]

op: [=, +, -]

quant: [n≥0]

heur: [CR-DATE | PR-DATE]

MORE_THAN

ENDING

T-REL

YEAR

+

3

Page 17: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Information Gathering: Exampleoltre tre anni dopo

TIMEX2 attributes

MOD: “più di”, “circa”, “oltre” …

SET: “ogni”, “tutti” …

ANCHOR_DIR: “prima”, “durante”, “dopo”...

TEMPORARY attributes

type: [T-ABS | T-REL]

t-cat: [second, minute, hour, day,…]

op: [=, +, -]

quant: [n≥0]

heur: [CR-DATE | PR-DATE]

MORE_THAN

ENDING

T-REL

YEAR

+

3

PR-DATE

Page 18: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Intermediate Annotation: Example

adige20041007_id413938

“…Così il 31 Luglio del 2002, quindi oltre tre anni dopo l’incidente, il giovane venne nuovamente ricoverato e sottoposto ad un intervento che si dimostrerà risolutivo…”

…quindi <TIMEX2 MOD=“MORE_THAN” ANCHOR_DIR=“ENDING” type=“T-REL” t-cat=“YEAR” op=“+” quant=“3”, heur=“PR-DATE>oltre tre anni dopo </TIMEX2> l’incidente…

Detection and Bracketing

Intermediate Annotation

Plain Text

Page 19: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

STEP5: Anchors Selection

• Goal: connect each detected T-REL to an appropriate anchor date – While the meaning of T-ABSs (“13 Marzo 2005”) is context-

independent, T-RELs (“tre anni dopo”) can only be interpreted with respect to e reference TE

• The “heur” attribute is used for this purpose– 2 heuristics:

CR-DATE: connects a T-REL to the document’s creation date (found at the beginning of the doc, or induced from doc’s name. e.g. “adige20041007_…)

PR-DATE: connects a T-REL to the nearest detected TE with a compatible granularity (a “t-cat” with at least the same degree of specificity)

t-cat= “month” “month”, “week”, “day”, “century”

Page 20: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

STEP6: Dates Normalization• Goal: fill the VAL attribute of each detected TE

T-ABSs: regular expressions considering their superficial form (“1990s” “199”)

T-RELs: rewriting rules considering

the anchor (e.g. “2002”)

the operator (“OP”) to be applied (e.g. “+”)

the quantity (“QUANT”) to be added/subtracted (e.g. “3”)

tre anni dopo 2005“2002” “+” “3”

Page 21: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

ITA-Chronos at EVALITA 2007

• Results over the EVALITA-07 test set (27’15’’ computation time, ~50 words/sec)

• Higher scores on MOD and SET attributes– Activated by the presence of triggers that are easy to identify

• Lower scores with ANCHOR_VAL and ANCHOR_DIR– Require the analysis of a larger context, e.g. including verb tense

Value Precision Recall F-Measure

Rec. 85.7 95.7 89.8 92.6

Rec.+Norm. 61.9 68.5 66.3 67.4

Page 22: Dealing with Italian Temporal Expressions: the ITA-Chronos System

EVALITA’07 - 09/10/2007M. Negri

Dealing with Italian Temporal Expressions: the ITA-Chronos System

Web Demo

http://www.qallme.itc.it/server/chronos/italian