time for text mining and information retrievaljstroetge/talks/jannik-stroetgen... · text mining...

159
Time for Text Mining and Information Retrieval Jannik Strötgen [email protected] Joint Lecture Series – Saarbrücken – January 11, 2017

Upload: trandien

Post on 06-Mar-2018

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Time forText Mining and Information Retrieval

Jannik Strötgen

[email protected]

Joint Lecture Series – Saarbrücken – January 11, 2017

Page 2: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why temporal information?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 1 / 50

Page 3: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Motivating example

From: [email protected]: [email protected]: January 11, 2017

Subject: meeting

Dear Prof,

I am sorry that I couldn’tmake it on Monday. Butcould we meet this Friday?I don’t care about the time.What about 2 pm?

Thank you and kind regards,student

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 2 / 50

Page 4: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Motivating example

From: [email protected]: [email protected]: January 11, 2017

Subject: meeting

Dear Prof,

I am sorry that I couldn’tmake it on Monday. Butcould we meet this Friday?I don’t care about the time.What about 2 pm?

Thank you and kind regards,student

JANUARY09 Monday10 Tuesday

11 Wednesday

12 Thursday13 Friday14 Saturday15 Sunday16 Monday

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 2 / 50

Page 5: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Motivating example

From: [email protected]: [email protected]: January 11, 2017

Subject: meeting

Dear Prof,

I am sorry that I couldn’tmake it on Monday. Butcould we meet this Friday?I don’t care about the time.What about 2 pm?

Thank you and kind regards,student

JANUARY09 Monday10 Tuesday11 Wednesday12 Thursday

13 Friday

2 pm – 2:30 pm

student

14 Saturday15 Sunday16 Monday

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 2 / 50

Page 6: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Motivating example

From: [email protected]: [email protected]: January 11, 2017

Subject: meeting

Dear Prof,

I am sorry that I couldn’tmake it on Monday. Butcould we meet this Friday?I don’t care about the time.What about 2 pm?

Thank you and kind regards,student

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 2 / 50

Page 7: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Motivating example

From: [email protected]: [email protected]: January 11, 2017

Subject: meeting

Dear Prof,

I am sorry that I couldn’tmake it on Monday. But

could we meet this Friday ?

I don’t care about the time.What about 2 pm?

Thank you and kind regards,student

JANUARY09 Monday10 Tuesday11 Wednesday12 Thursday

13 Friday

14 Saturday15 Sunday16 Monday

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 2 / 50

Page 8: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Motivating example

From: [email protected]: [email protected]: January 11, 2017

Subject: meeting

Dear Prof,

I am sorry that I couldn’tmake it on Monday. But

could we meet this Friday ?

I don’t care about the time.What about 2 pm?

Thank you and kind regards,student

JANUARY09 Monday10 Tuesday11 Wednesday12 Thursday

13 Friday

2 pm – 2:30 pm

student

14 Saturday15 Sunday16 Monday

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 2 / 50

Page 9: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: crucial role in many types of documents

news articles Wikipedia documents

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 3 / 50

Page 10: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: crucial role in many types of documents

short messages scientific publications literary works

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 4 / 50

Page 11: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: important key characteristicstime is well-defined:

expressions can becompared with each other

time can be organizedhierarchically:

different granularities

relations as defined in Allen’sinterval algebra [Allen, 1983]

2000 before March 2001equalduring...

...

2015 2016

2016-03

... 2016-03-12

2016-04 ...

2017

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 5 / 50

Page 12: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: important key characteristicstime is well-defined:

expressions can becompared with each other

time can be organizedhierarchically:

different granularities

relations as defined in Allen’sinterval algebra [Allen, 1983]

2000 before March 2001equalduring...

...

2015 2016

2016-03

... 2016-03-12

2016-04 ...

2017

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 5 / 50

Page 13: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: important key characteristicstime is well-defined:

expressions can becompared with each other

time can be organizedhierarchically:

different granularities

relations as defined in Allen’sinterval algebra [Allen, 1983]

2000 before March 2001equalduring...

...

2015 2016

2016-03

... 2016-03-12

2016-04 ...

2017

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 5 / 50

Page 14: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: important key characteristicstime is well-defined:

expressions can becompared with each other

time can be organizedhierarchically:

different granularities

relations as defined in Allen’sinterval algebra [Allen, 1983]

2000 before March 2001equalduring...

...

2015 2016

2016-03

... 2016-03-12

2016-04 ...

2017

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 5 / 50

Page 15: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: important key characteristics

temporal information can be normalized:expressions with same semantics → same value

temporal information is term- and language-independent

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 6 / 50

Page 16: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: important key characteristics

temporal information can be normalized:expressions with same semantics → same value

t2017-01-11

tref2017-01-10

tomorrow

temporal information is term- and language-independent

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 6 / 50

Page 17: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: important key characteristics

temporal information can be normalized:expressions with same semantics → same value

t2017-01-11

tref2017-01-10

tomorrowJanuary 11, 2017

temporal information is term- and language-independent

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 6 / 50

Page 18: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: important key characteristics

temporal information can be normalized:expressions with same semantics → same value

t2017-01-11

tref2017-01-10 2017-01-16 2017-02-11

tomorrowlast Wednesday

one month ago

January 11, 2017

temporal information is term- and language-independent

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 6 / 50

Page 19: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: important key characteristics

temporal information can be normalized:expressions with same semantics → same value

t2017-01-11

tref2017-01-10 2017-01-11 2017-01-16 2017-02-11

tomorrow

today

heute

hoy

last Wednesday

one month ago

January 11, 2017

temporal information is term- and language-independent

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 6 / 50

Page 20: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: valuable for many applications

information retrieval (search engines)information extraction

text summarizationmachine translationquestion answering. . .

prerequisite: temporal tagging

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 7 / 50

Page 21: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: valuable for many applications

information retrieval (search engines)information extractiontext summarizationmachine translationquestion answering. . .

prerequisite: temporal tagging

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 7 / 50

Page 22: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time: valuable for many applications

information retrieval (search engines)information extractiontext summarizationmachine translationquestion answering. . .

prerequisite: temporal tagging

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 7 / 50

Page 23: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Questions addressed in this talk

Why temporal information?

What’s temporal tagging?

Why domain-sensitive temporal tagging?

What applications can exploit temporal information and how?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 8 / 50

Page 24: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Questions addressed in this talk

Why temporal information?

What’s temporal tagging?

Why domain-sensitive temporal tagging?

What applications can exploit temporal information and how?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 8 / 50

Page 25: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The two tasks of a temporal tagger

1. extraction of temporal expressions

main challengeambiguities, e.g.,

may, march, fall, 2000

May I march 2000 milesbefore I fall?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 9 / 50

Page 26: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The two tasks of a temporal tagger

1. extraction of temporal expressions

main challengeambiguities, e.g.,

may, march, fall, 2000

May I march 2000 milesbefore I fall?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 9 / 50

Page 27: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The two tasks of a temporal tagger

1. extraction of temporal expressions

main challengeambiguities, e.g.,

may, march, fall, 2000

May I march 2000 milesbefore I fall?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 9 / 50

Page 28: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The two tasks of a temporal tagger

1. extraction of temporal expressions2. normalization of temporal expressions

tonight → 2011-09-20TNIyesterday → 2011-09-19next week → 2011-W39Sept. 20, 2011 → 2011-09-20next month → 2011-10

main challengenormalization of relative andunderspecified expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 10 / 50

Page 29: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The two tasks of a temporal tagger

1. extraction of temporal expressions2. normalization of temporal expressions

tonight → 2011-09-20TNIyesterday → 2011-09-19next week → 2011-W39Sept. 20, 2011 → 2011-09-20next month → 2011-10

main challengenormalization of relative andunderspecified expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 10 / 50

Page 30: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Types of temporal expressions

dates– June 2001

times– yesterday morning

durations– several months

sets– every day

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 11 / 50

Page 31: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Realizations of date expressions

explicit → easy to normalize– June 24, 2013– the 20th century

implicit

– Christmas 2012– Columbus Day 2006

relative

– two weeks later– yesterday

underspecified

– Monday– June 24

often challengingnormalization of relative and underspecified expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 12 / 50

Page 32: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Realizations of date expressions

explicit → easy to normalize– June 24, 2013– the 20th century

implicit → additional knowledge required– Christmas 2012– Columbus Day 2006

relative

– two weeks later– yesterday

underspecified

– Monday– June 24

often challengingnormalization of relative and underspecified expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 12 / 50

Page 33: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Realizations of date expressions

explicit → easy to normalize– June 24, 2013– the 20th century

implicit → additional knowledge required– Christmas 2012– Columbus Day 2006

relative → reference time required– two weeks later– yesterday

underspecified

– Monday– June 24

often challengingnormalization of relative and underspecified expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 12 / 50

Page 34: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Realizations of date expressions

explicit → easy to normalize– June 24, 2013– the 20th century

implicit → additional knowledge required– Christmas 2012– Columbus Day 2006

relative → reference time required– two weeks later– yesterday

underspecified → reference time and relation to it required– Monday– June 24

often challengingnormalization of relative and underspecified expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 12 / 50

Page 35: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Realizations of date expressions

explicit → easy to normalize– June 24, 2013– the 20th century

implicit → additional knowledge required– Christmas 2012– Columbus Day 2006

relative → reference time required– two weeks later– yesterday

underspecified → reference time and relation to it required– Monday– June 24

often challengingnormalization of relative and underspecified expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 12 / 50

Page 36: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging example

Document Creation Time: 2000-12-26. . . On Thursday , the Census Bureau will publish the official pop-ulation count for the United States, including the state-by-state to-tals required under the Constitution to determine how many seatseach state is allocated in the House. The figures, eagerly awaitedby many state government officials, are the first in a wave of re-leases of demographic data based on the 2000 census. . . .Population estimates issued periodically by the Census Bureau in-dicate that as of October , 275,843,000 people were living in . . .Additional seats are then assigned to each state based on aperson-to-House-member ratio that changes every 10 years be-cause the country’s population keeps growing . . .

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 13 / 50

Page 37: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging example

Document Creation Time: 2000-12-26. . . On Thursday , the Census Bureau will publish the official pop-ulation count for the United States, including the state-by-state to-tals required under the Constitution to determine how many seatseach state is allocated in the House. The figures, eagerly awaitedby many state government officials, are the first in a wave of re-leases of demographic data based on the 2000 census. . . .Population estimates issued periodically by the Census Bureau in-dicate that as of October , 275,843,000 people were living in . . .Additional seats are then assigned to each state based on aperson-to-House-member ratio that changes every 10 years be-cause the country’s population keeps growing . . .

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 13 / 50

Page 38: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging example

Document Creation Time: 2000-12-26. . . On Thursday , the Census Bureau will publish the official pop-ulation count for the United States, including the state-by-state to-tals required under the Constitution to determine how many seatseach state is allocated in the House. The figures, eagerly awaitedby many state government officials, are the first in a wave of re-leases of demographic data based on the 2000 census. . . .Population estimates issued periodically by the Census Bureau in-dicate that as of October , 275,843,000 people were living in . . .Additional seats are then assigned to each state based on aperson-to-House-member ratio that changes every 10 years be-cause the country’s population keeps growing . . .

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 13 / 50

Page 39: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging example

Document Creation Time: 2000-12-26. . . On Thursday , the Census Bureau will publish the official pop-ulation count for the United States, including the state-by-state to-tals required under the Constitution to determine how many seatseach state is allocated in the House. The figures, eagerly awaitedby many state government officials, are the first in a wave of re-leases of demographic data based on the 2000 census. . . .Population estimates issued periodically by the Census Bureau in-dicate that as of October , 275,843,000 people were living in . . .Additional seats are then assigned to each state based on aperson-to-House-member ratio that changes every 10 years be-cause the country’s population keeps growing . . .

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 13 / 50

Page 40: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging example

Document Creation Time: 2000-12-26. . . On Thursday , the Census Bureau will publish the official pop-ulation count for the United States, including the state-by-state to-tals required under the Constitution to determine how many seatseach state is allocated in the House. The figures, eagerly awaitedby many state government officials, are the first in a wave of re-leases of demographic data based on the 2000 census. . . .Population estimates issued periodically by the Census Bureau in-dicate that as of October , 275,843,000 people were living in . . .Additional seats are then assigned to each state based on aperson-to-House-member ratio that changes every 10 years be-cause the country’s population keeps growing . . .

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 13 / 50

Page 41: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of news(-style) documents

normalization typically with document creation time

Document Creation Time: 2000-12-26. . . On Thursday , the Census Bureau will publish the official pop-ulation count for the United States, including the state-by-state to-tals required under the Constitution to determine how many seatseach state is allocated in the House. The figures, eagerly awaitedby many state government officials, are the first in a wave of re-leases of demographic data based on the 2000 census. . . .Population estimates issued periodically by the Census Bureau in-dicate that as of October , 275,843,000 people were living in . . .Additional seats are then assigned to each state based on aperson-to-House-member ratio that changes every 10 years be-cause the country’s population keeps growing . . .

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 14 / 50

Page 42: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Questions addressed in this talk

Why temporal information?

What’s temporal tagging?

Why domain-sensitive temporal tagging?

What applications can exploit temporal information and how?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 15 / 50

Page 43: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of narrative documents

Soviet-Afghan War 2017-01-101979: Soviet deploymentThe Afghan government, having secured a treaty in the spring and summerof 1979. They requested Soviet troops to provide security and to assist inthe fight against the mujahideen rebels. On April 14, 1979, the Afghan gov-ernment requested that the USSR send 15 to 20 helicopters with their crewsto Afghanistan, and on June 16 . . . The operation was fully complete by themorning of December 28, 1979. . . . According to the Soviet Politburo theywere complying with the 1978 Treaty of Friendship, Cooperation and GoodNeighborliness . . . Soviet ground forces, under the command of Marshal SergeiSokolov, entered Afghanistan from the north on December 27. In the morning,the 103rd Guards . . .

WikiWars corpus, 18 SovietsInAfghanistan.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 16 / 50

Page 44: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of narrative documents

Soviet-Afghan War 2017-01-10

1979 : Soviet deployment

The Afghan government, having secured a treaty in the spring and

summer of 1979 . They requested Soviet troops to provide security and

to assist in the fight against the mujahideen rebels. On April 14, 1979 ,

the Afghan government requested that the USSR send 15 to 20 helicopters

with their crews to Afghanistan, and on June 16 . . . The operation was fully

complete by the morning of December 28, 1979 . . . . According to the

Soviet Politburo they were complying with the 1978 Treaty of Friendship,Cooperation and Good Neighborliness . . . Soviet ground forces, under thecommand of Marshal Sergei Sokolov, entered Afghanistan from the north on

December 27 . In the morning , the 103rd Guards . . .

WikiWars corpus, 18 SovietsInAfghanistan.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 16 / 50

Page 45: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of narrative documents

reference time identification in the text required

Soviet-Afghan War 2017-01-10

1979 : Soviet deployment

The Afghan government, having secured a treaty in the spring and

summer of 1979 . They requested Soviet troops to provide security and

to assist in the fight against the mujahideen rebels. On April 14, 1979 ,

the Afghan government requested that the USSR send 15 to 20 helicopters

with their crews to Afghanistan, and on June 16 . . . The operation was fully

complete by the morning of December 28, 1979 . . . . According to the

Soviet Politburo they were complying with the 1978 Treaty of Friendship,Cooperation and Good Neighborliness . . . Soviet ground forces, under thecommand of Marshal Sergei Sokolov, entered Afghanistan from the north on

December 27 . In the morning , the 103rd Guards . . .

WikiWars corpus, 18 SovietsInAfghanistan.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 16 / 50

Page 46: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of narrative documents

reference time identification in the text required

Soviet-Afghan War 2017-01-10

1979 : Soviet deployment

The Afghan government, having secured a treaty in the spring and

summer of 1979 . They requested Soviet troops to provide security and

to assist in the fight against the mujahideen rebels. On April 14, 1979 ,

the Afghan government requested that the USSR send 15 to 20 helicopters

with their crews to Afghanistan, and on June 16 . . . The operation was fully

complete by the morning of December 28, 1979 . . . . According to the

Soviet Politburo they were complying with the 1978 Treaty of Friendship,Cooperation and Good Neighborliness . . . Soviet ground forces, under thecommand of Marshal Sergei Sokolov, entered Afghanistan from the north on

December 27 . In the morning , the 103rd Guards . . .

WikiWars corpus, 18 SovietsInAfghanistan.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 16 / 50

Page 47: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of narrative documents

reference time identification in the text required

Soviet-Afghan War 2017-01-10

1979 : Soviet deployment

The Afghan government, having secured a treaty in the spring and

summer of 1979 . They requested Soviet troops to provide security and

to assist in the fight against the mujahideen rebels. On April 14, 1979 ,

the Afghan government requested that the USSR send 15 to 20 helicopters

with their crews to Afghanistan, and on June 16 . . . The operation was fully

complete by the morning of December 28, 1979 . . . . According to the

Soviet Politburo they were complying with the 1978 Treaty of Friendship,Cooperation and Good Neighborliness . . . Soviet ground forces, under thecommand of Marshal Sergei Sokolov, entered Afghanistan from the north on

December 27 . In the morning , the 103rd Guards . . .

WikiWars corpus, 18 SovietsInAfghanistan.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 16 / 50

Page 48: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of narrative documents

reference time identification in the text required

Soviet-Afghan War 2017-01-10

1979 : Soviet deployment

The Afghan government, having secured a treaty in the spring and

summer of 1979 . They requested Soviet troops to provide security and

to assist in the fight against the mujahideen rebels. On April 14, 1979 ,

the Afghan government requested that the USSR send 15 to 20 helicopters

with their crews to Afghanistan, and on June 16 . . . The operation was fully

complete by the morning of December 28, 1979 . . . . According to the

Soviet Politburo they were complying with the 1978 Treaty of Friendship,Cooperation and Good Neighborliness . . . Soviet ground forces, under thecommand of Marshal Sergei Sokolov, entered Afghanistan from the north on

December 27 . In the morning , the 103rd Guards . . .

WikiWars corpus, 18 SovietsInAfghanistan.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 16 / 50

Page 49: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of colloquial documents

(a) Yo! Rem to come for lab tmr:-) no need bring anything rite? 2010-09-23

Time4SMS corpus, 0019314.key.xml

(b) Whats it u wanted 2 say last nite? Did anna contact u? 2010-01-10

Time4SMS corpus, 0027197.key.xml

(c) Dear people, andy ng is availableat 10 am in his office. 2011-02-16T12:42

Time4SMS corpus, 0024336.key.xml

(d) Meet sunshine place on Thurs? 2010-10-05

Time4SMS corpus, 0019442.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 17 / 50

Page 50: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of colloquial documents

(a) Yo! Rem to come for lab tmr :-) no need bring anything rite? 2010-09-23

Time4SMS corpus, 0019314.key.xml

(b) Whats it u wanted 2 say last nite ? Did anna contact u? 2010-01-10

Time4SMS corpus, 0027197.key.xml

(c) Dear people, andy ng is availableat 10 am in his office. 2011-02-16T12:42

Time4SMS corpus, 0024336.key.xml

(d) Meet sunshine place on Thurs ? 2010-10-05

Time4SMS corpus, 0019442.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 17 / 50

Page 51: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of colloquial documents

(a) Yo! Rem to come for lab tmr :-) no need bring anything rite? 2010-09-23

Time4SMS corpus, 0019314.key.xml

(b) Whats it u wanted 2 say last nite ? Did anna contact u? 2010-01-10

Time4SMS corpus, 0027197.key.xml

(c) Dear people, andy ng is availableat 10 am in his office. 2011-02-16T12:42

Time4SMS corpus, 0024336.key.xml

(d) Meet sunshine place on Thurs ? 2010-10-05

Time4SMS corpus, 0019442.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 17 / 50

Page 52: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of colloquial documents

noisy language and missing context information

(a) Yo! Rem to come for lab tmr :-) no need bring anything rite? 2010-09-23

Time4SMS corpus, 0019314.key.xml

(b) Whats it u wanted 2 say last nite ? Did anna contact u? 2010-01-10

Time4SMS corpus, 0027197.key.xml

(c) Dear people, andy ng is availableat 10 am in his office. 2011-02-16T12:42

Time4SMS corpus, 0024336.key.xml

(d) Meet sunshine place on Thurs ? 2010-10-05

Time4SMS corpus, 0019442.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 17 / 50

Page 53: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of scientific documents

Supplementation with all three macular carotenoids 2011-10-06Subjects consumed one tablet per day containing 10.6 mg MZ, 5.9 mg L and 1.2mg Z (Intervention, I group) or placebo (P group). . . . Subjects were assessedat baseline, two and four months. Clinical pathology analysis was performed atbaseline and four months.

Time4SCI corpus, 21979997.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 18 / 50

Page 54: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of scientific documents

Supplementation with all three macular carotenoids 2011-10-06

Subjects consumed one tablet per day containing 10.6 mg MZ, 5.9 mg L and

1.2 mg Z (Intervention, I group) or placebo (P group). . . . Subjects were as-

sessed at baseline , two and four months . Clinical pathology analysis was

performed at baseline and four months .

Time4SCI corpus, 21979997.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 18 / 50

Page 55: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of scientific documents

no real points in time, but local time frame

Supplementation with all three macular carotenoids 2011-10-06

Subjects consumed one tablet per day containing 10.6 mg MZ, 5.9 mg L and

1.2 mg Z (Intervention, I group) or placebo (P group). . . . Subjects were as-

sessed at baseline , two and four months . Clinical pathology analysis was

performed at baseline and four months .

Time4SCI corpus, 21979997.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 18 / 50

Page 56: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why domain-sensitive temporal tagging?

domaingroup of documents with

same characteristics relevant for temporal tagging

news- news articles- letters- formal emails

narrative- Wikipedia- descriptive

documents

colloquial- SMS- tweets- user-g. content

autonomic- scientific

documents- literary works

differences between documents of different domainsnormalization of relative and underspecified expressionstype of language (formal, scientific, historic, colloquial)different types of temporal expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 19 / 50

Page 57: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why domain-sensitive temporal tagging?

domaingroup of documents with

same characteristics relevant for temporal tagging

news- news articles- letters- formal emails

narrative- Wikipedia- descriptive

documents

colloquial- SMS- tweets- user-g. content

autonomic- scientific

documents- literary works

differences between documents of different domainsnormalization of relative and underspecified expressionstype of language (formal, scientific, historic, colloquial)different types of temporal expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 19 / 50

Page 58: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why domain-sensitive temporal tagging?

domaingroup of documents with

same characteristics relevant for temporal tagging

news- news articles- letters- formal emails

narrative- Wikipedia- descriptive

documents

colloquial- SMS- tweets- user-g. content

autonomic- scientific

documents- literary works

differences between documents of different domainsnormalization of relative and underspecified expressionstype of language (formal, scientific, historic, colloquial)different types of temporal expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 19 / 50

Page 59: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why domain-sensitive temporal tagging?

domaingroup of documents with

same characteristics relevant for temporal tagging

news- news articles- letters- formal emails

narrative- Wikipedia- descriptive

documents

colloquial- SMS- tweets- user-g. content

autonomic- scientific

documents- literary works

differences between documents of different domainsnormalization of relative and underspecified expressionstype of language (formal, scientific, historic, colloquial)different types of temporal expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 19 / 50

Page 60: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why domain-sensitive temporal tagging?

domaingroup of documents with

same characteristics relevant for temporal tagging

news- news articles- letters- formal emails

narrative- Wikipedia- descriptive

documents

colloquial- SMS- tweets- user-g. content

autonomic- scientific

documents- literary works

differences between documents of different domainsnormalization of relative and underspecified expressionstype of language (formal, scientific, historic, colloquial)different types of temporal expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 19 / 50

Page 61: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why domain-sensitive temporal tagging?

domaingroup of documents with

same characteristics relevant for temporal tagging

news- news articles- letters- formal emails

narrative- Wikipedia- descriptive

documents

colloquial- SMS- tweets- user-g. content

autonomic- scientific

documents- literary works

differences between documents of different domainsnormalization of relative and underspecified expressionstype of language (formal, scientific, historic, colloquial)

different types of temporal expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 19 / 50

Page 62: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why domain-sensitive temporal tagging?

domaingroup of documents with

same characteristics relevant for temporal tagging

news- news articles- letters- formal emails

narrative- Wikipedia- descriptive

documents

colloquial- SMS- tweets- user-g. content

autonomic- scientific

documents- literary works

differences between documents of different domainsnormalization of relative and underspecified expressionstype of language (formal, scientific, historic, colloquial)different types of temporal expressions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 19 / 50

Page 63: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why domain-sensitive temporal tagging?

corpus comparisonwith manually annotated corpora of four domains

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 20 / 50

Page 64: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why domain-sensitive temporal tagging?

corpus comparisonwith manually annotated corpora of four domains

0

20

40

60

80

100

Date Time Duration Set

HALLOHAlloHalloHallo

HALLOhallo

Occ

urre

nces

[%] news

narrative

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 20 / 50

Page 65: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why domain-sensitive temporal tagging?

corpus comparisonwith manually annotated corpora of four domains

0

20

40

60

80

100

Date Time Duration Set

HALLOHAlloHalloHallo

HALLOhallo

Occ

urre

nces

[%] news

narrativecolloquial

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 20 / 50

Page 66: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Why domain-sensitive temporal tagging?

corpus comparisonwith manually annotated corpora of four domains

0

20

40

60

80

100

Date Time Duration Set

HALLOHAlloHalloHallo

HALLOhallo

Occ

urre

nces

[%] news

narrativecolloquialscientific

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 20 / 50

Page 67: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

State-of-the-art in temporal tagging 2010

some research on temporal taggingannotation standards (e.g., TimeML [Pustejovsky et al. (2005)])

temporal taggers– only few available (e.g., GUTime [Verhagen & Pustejovsky (2008)])– if available, hardly extensible– focus on English– research only on news documents

there was a need for a domain-sensitive temporal tagger

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 21 / 50

Page 68: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

State-of-the-art in temporal tagging 2010

some research on temporal taggingannotation standards (e.g., TimeML [Pustejovsky et al. (2005)])

temporal taggers– only few available (e.g., GUTime [Verhagen & Pustejovsky (2008)])– if available, hardly extensible– focus on English– research only on news documents

there was a need for a domain-sensitive temporal tagger

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 21 / 50

Page 69: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

State-of-the-art in temporal tagging 2010

some research on temporal taggingannotation standards (e.g., TimeML [Pustejovsky et al. (2005)])

temporal taggers– only few available (e.g., GUTime [Verhagen & Pustejovsky (2008)])– if available, hardly extensible– focus on English– research only on news documents

there was a need for a domain-sensitive temporal tagger

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 21 / 50

Page 70: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 22 / 50

Page 71: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Remember: domain-sensitive normalizationWhy Time? Temporal Tagging HeidelTime Applications Temponyms Summary

Temporal tagging of news(-style) documents

normalization typically with document creation time

Document Creation Time: 2000-12-26. . . On Thursday , the Census Bureau will publish the official pop-ulation count for the United States, including the state-by-state to-tals required under the Constitution to determine how many seatseach state is allocated in the House. The figures, eagerly awaitedby many state government officials, are the first in a wave of re-leases of demographic data based on the 2000 census. . . .Population estimates issued periodically by the Census Bureau in-dicate that as of October , 275,843,000 people were living in . . .Additional seats are then assigned to each state based on aperson-to-House-member ratio that changes every 10 years be-cause the country’s population keeps growing . . .

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 14 / 50

Why Time? Temporal Tagging HeidelTime Applications Temponyms Summary

Temporal tagging of colloquial documents

noisy language and missing context information

(a) Yo! Rem to come for lab tmr :-) no need bring anything rite? 2010-09-23

Time4SMS corpus, 0019314.key.xml

(b) Whats it u wanted 2 say last nite ? Did anna contact u? 2010-01-10

Time4SMS corpus, 0027197.key.xml

(c) Dear people, andy ng is availableat 10 am in his office. 2011-02-16T12:42

Time4SMS corpus, 0024336.key.xml

(d) Meet sunshine place on Thurs ? 2010-10-05

Time4SMS corpus, 0019442.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 17 / 50

Why Time? Temporal Tagging HeidelTime Applications Temponyms Summary

Temporal tagging of narrative documents

reference time identification in the text required

Soviet-Afghan War 2017-01-10

1979 : Soviet deployment

The Afghan government, having secured a treaty in the spring and

summer of 1979 . They requested Soviet troops to provide security and

to assist in the fight against the mujahideen rebels. On April 14, 1979 ,

the Afghan government requested that the USSR send 15 to 20 helicopters

with their crews to Afghanistan, and on June 16 . . . The operation was fully

complete by the morning of December 28, 1979 . . . . According to the

Soviet Politburo they were complying with the 1978 Treaty of Friendship,Cooperation and Good Neighborliness . . . Soviet ground forces, under thecommand of Marshal Sergei Sokolov, entered Afghanistan from the north on

December 27 . In the morning , the 103rd Guards . . .

WikiWars corpus, 18 SovietsInAfghanistan.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 16 / 50

Why Time? Temporal Tagging HeidelTime Applications Temponyms Summary

Temporal tagging of scientific documents

no real points in time, but local time frame

Supplementation with all three macular carotenoids 2011-10-06

Subjects consumed one tablet per day containing 10.6 mg MZ, 5.9 mg L and

1.2 mg Z (Intervention, I group) or placebo (P group). . . . Subjects were as-

sessed at baseline , two and four months . Clinical pathology analysis was

performed at baseline and four months .

Time4SCI corpus, 21979997.key.xml

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 18 / 50

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 22 / 50

Page 72: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

reference time detection for underspecified expressions

underspecified(e.g., “December”)

news, colloquial

DCT

narrative

previouslymentionedexpression

autonomic

local time frame(previous expression

or new TPZ)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 23 / 50

Page 73: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

reference time detection for underspecified expressions

underspecified(e.g., “December”)

news, colloquial

DCT

narrative

previouslymentionedexpression

autonomic

local time frame(previous expression

or new TPZ)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 23 / 50

Page 74: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

reference time detection for underspecified expressions

underspecified(e.g., “December”)

news, colloquial

DCT

narrative

previouslymentionedexpression

autonomic

local time frame(previous expression

or new TPZ)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 23 / 50

Page 75: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

reference time detection for underspecified expressions

underspecified(e.g., “December”)

news, colloquial

DCT

narrative

previouslymentionedexpression

autonomic

local time frame(previous expression

or new TPZ)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 23 / 50

Page 76: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

reference time detection for relative expressions

relative

context-independent(e.g., “today”)

news,colloquial

DCT

narrative

previouslymentionedexpression

autonomic

local time frame(previous expres-sion or new TPZ)

context-dependent(e.g., “the following day”)

autonomic

local time frame(previous expres-sion or new TPZ)

news,colloquial,narrativepreviouslymentionedexpression

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 24 / 50

Page 77: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

relation between underspecified expression and reference time

underspecified(e.g., “December”)

news, colloquial(ref=DCT)

no tense

news

before

colloquial

after

past

before

present/ future

after

narrative(ref 6=DCT)

after

autonomic(local time frame)

after(if not new TPZ)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 25 / 50

Page 78: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

relation between underspecified expression and reference time

underspecified(e.g., “December”)

news, colloquial(ref=DCT)

no tense

news

before

colloquial

after

past

before

present/ future

after

narrative(ref 6=DCT)

after

autonomic(local time frame)

after(if not new TPZ)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 25 / 50

Page 79: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

relation between underspecified expression and reference time

underspecified(e.g., “December”)

news, colloquial(ref=DCT)

no tense

news

before

colloquial

after

past

before

present/ future

after

narrative(ref 6=DCT)

after

autonomic(local time frame)

after(if not new TPZ)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 25 / 50

Page 80: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

relation between underspecified expression and reference time

underspecified(e.g., “December”)

news, colloquial(ref=DCT)

no tense

news

before

colloquial

after

past

before

present/ future

after

narrative(ref 6=DCT)

after

autonomic(local time frame)

after(if not new TPZ)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 25 / 50

Page 81: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

relation between underspecified expression and reference time

underspecified(e.g., “December”)

news, colloquial(ref=DCT)

no tense

news

before

colloquial

after

past

before

present/ future

after

narrative(ref 6=DCT)

after

autonomic(local time frame)

after(if not new TPZ)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 25 / 50

Page 82: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Domain-sensitive normalization strategies

relation between underspecified expression and reference time

underspecified(e.g., “December”)

news, colloquial(ref=DCT)

no tense

news

before

colloquial

after

past

before

present/ future

after

narrative(ref 6=DCT)

after

autonomic(local time frame)

after(if not new TPZ)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 25 / 50

Page 83: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The temporal tagger HeidelTime

Julian Zell MichaelGertz

[SemEval’10,’13, LREC’12,’14, LRE’13][EVALITA’14, EACL’14, TALIP’14, EMNLP’15]

characteristicsrule-based, multilingual, domain-sensitive

extractionbased on regular expressionslinguistic features (POS, POS of next token, ...)knowledge resources (names of months, holidays, ...)

normalizationlinguistic clues (tense in sentence, ...)domain-specific normalization strategies

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 26 / 50

Page 84: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The temporal tagger HeidelTime

Julian Zell MichaelGertz

[SemEval’10,’13, LREC’12,’14, LRE’13][EVALITA’14, EACL’14, TALIP’14, EMNLP’15]

characteristicsrule-based, multilingual, domain-sensitive

extractionbased on regular expressionslinguistic features (POS, POS of next token, ...)knowledge resources (names of months, holidays, ...)

normalizationlinguistic clues (tense in sentence, ...)domain-specific normalization strategies

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 26 / 50

Page 85: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The temporal tagger HeidelTime

Julian Zell MichaelGertz

[SemEval’10,’13, LREC’12,’14, LRE’13][EVALITA’14, EACL’14, TALIP’14, EMNLP’15]

characteristicsrule-based, multilingual, domain-sensitive

extractionbased on regular expressionslinguistic features (POS, POS of next token, ...)knowledge resources (names of months, holidays, ...)

normalizationlinguistic clues (tense in sentence, ...)domain-specific normalization strategies

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 26 / 50

Page 86: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The temporal tagger HeidelTime

4 domains: news, narrative, colloquial, autonomic

13 languagesmanually created resourcessome added by colleagues, some by other researchers

200 languagesautomatically created resources [EMNLP’15]

publicly available (UIMA, GATE, Java standalone, online demo)quite widely usedfirst domain-sensitive temporal taggeronly tagger for some of the languages

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 27 / 50

Page 87: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The temporal tagger HeidelTime

4 domains: news, narrative, colloquial, autonomic

13 languagesmanually created resourcessome added by colleagues, some by other researchers

[Moriceau & Tannier (2014),Camp & Christiansen (2012),Skukan et al. (2014)]

easy to extend tofurther languages

200 languagesautomatically created resources [EMNLP’15]

publicly available (UIMA, GATE, Java standalone, online demo)quite widely usedfirst domain-sensitive temporal taggeronly tagger for some of the languages

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 27 / 50

Page 88: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The temporal tagger HeidelTime

4 domains: news, narrative, colloquial, autonomic

13 languagesmanually created resourcessome added by colleagues, some by other researchers

AyserArmiti

Hui Li CanhVan Tran

[Moriceau & Tannier (2014),Camp & Christiansen (2012),Skukan et al. (2014)]

easy to extend tofurther languages

200 languagesautomatically created resources [EMNLP’15]

publicly available (UIMA, GATE, Java standalone, online demo)quite widely usedfirst domain-sensitive temporal taggeronly tagger for some of the languages

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 27 / 50

Page 89: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The temporal tagger HeidelTime

4 domains: news, narrative, colloquial, autonomic

13 languagesmanually created resourcessome added by colleagues, some by other researchers

200 languagesautomatically created resources [EMNLP’15]

publicly available (UIMA, GATE, Java standalone, online demo)quite widely usedfirst domain-sensitive temporal taggeronly tagger for some of the languages

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 27 / 50

Page 90: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The temporal tagger HeidelTime

4 domains: news, narrative, colloquial, autonomic

13 languagesmanually created resourcessome added by colleagues, some by other researchers

200 languagesautomatically created resources [EMNLP’15]

publicly available (UIMA, GATE, Java standalone, online demo)quite widely usedfirst domain-sensitive temporal taggeronly tagger for some of the languages

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 27 / 50

Page 91: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

HeidelTime – extensive evaluationstate-of-the-art system for extraction & normalization

on all publicly available corporafor (almost) all languages and domains

winner of research competitions– TempEval-2 (en), TempEval-3 (en, sp), Evalita’14 (it)

[Verhagen et al. (2010), UzZaman et al. (2010), Caselli et al. (2014)]

no detailed numbers, but . . .

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 28 / 50

Page 92: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

HeidelTime – extensive evaluationstate-of-the-art system for extraction & normalization

on all publicly available corporafor (almost) all languages and domainswinner of research competitions

– TempEval-2 (en), TempEval-3 (en, sp), Evalita’14 (it)[Verhagen et al. (2010), UzZaman et al. (2010), Caselli et al. (2014)]

no detailed numbers, but . . .

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 28 / 50

Page 93: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

HeidelTime – extensive evaluationstate-of-the-art system for extraction & normalization

on all publicly available corporafor (almost) all languages and domainswinner of research competitions

– TempEval-2 (en), TempEval-3 (en, sp), Evalita’14 (it)[Verhagen et al. (2010), UzZaman et al. (2010), Caselli et al. (2014)]

no detailed numbers, but . . .

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 28 / 50

Page 94: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Value of HeidelTime’s domain-sensitive strategies

cross-domain evaluation

corpus strategy extraction normalization

news

narrative narratives

Don’t trust a tagger developed for news if you want toprocess narratives (e.g., Wikipedia)

now: some details to explain the “200 languages extension”

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 29 / 50

Page 95: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Value of HeidelTime’s domain-sensitive strategies

cross-domain evaluation

corpus strategy extraction normalization

news news 91.1 78.6

narrative narratives

Don’t trust a tagger developed for news if you want toprocess narratives (e.g., Wikipedia)

now: some details to explain the “200 languages extension”

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 29 / 50

Page 96: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Value of HeidelTime’s domain-sensitive strategies

cross-domain evaluation

corpus strategy extraction normalization

news news 91.1 78.6

narrative news 87.9 56.9narratives

Don’t trust a tagger developed for news if you want toprocess narratives (e.g., Wikipedia)

now: some details to explain the “200 languages extension”

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 29 / 50

Page 97: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Value of HeidelTime’s domain-sensitive strategies

cross-domain evaluation

corpus strategy extraction normalization

news news 91.1 78.6

narrative news 87.9 56.9narratives 87.9 78.7

Don’t trust a tagger developed for news if you want toprocess narratives (e.g., Wikipedia)

now: some details to explain the “200 languages extension”

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 29 / 50

Page 98: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Value of HeidelTime’s domain-sensitive strategies

cross-domain evaluation

corpus strategy extraction normalization

news news 91.1 78.6narratives 91.1 61.5

narrative news 87.9 56.9narratives 87.9 78.7

Don’t trust a tagger developed for news if you want toprocess narratives (e.g., Wikipedia)

now: some details to explain the “200 languages extension”

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 29 / 50

Page 99: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Value of HeidelTime’s domain-sensitive strategies

cross-domain evaluation

corpus strategy extraction normalization

news news 91.1 78.6narratives 91.1 61.5

narrative news 87.9 56.9narratives 87.9 78.7

Don’t trust a tagger developed for news if you want toprocess narratives (e.g., Wikipedia)

now: some details to explain the “200 languages extension”

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 29 / 50

Page 100: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

HeidelTime’s architecture

ResourcesSource Code

patternsnormalization knowledgerules

→ language-dependent

well-defined rule syntax (outside of source code)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 30 / 50

Page 101: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

HeidelTime’s architecture

ResourcesSource Code

patternsnormalization knowledgerules

→ language-dependent

well-defined rule syntax (outside of source code)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 30 / 50

Page 102: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

HeidelTime’s architecture

ResourcesSource Code

domain-specificnormalization strategiesresource interpreter

→ language-independent

patternsnormalization knowledgerules

→ language-dependent

well-defined rule syntax (outside of source code)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 30 / 50

Page 103: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

HeidelTime’s language resources

Pattern files:frequently used terms // Pattern resource

// for// month names// reMonthNameJanuaryFebruaryMarch...

// Pattern resource// for// month abbrev.// reMonthShortJan\.?Feb\.?Mar\.?....

// Pattern resource// for// month numbers// reMonthNumber1011120?[1–9]

Normalization files:contain normalizedvalues of such terms

// Normalization resource// for// month patterns// normMonth“January”,“01”“Jan\.?”,“01”“0?1”,“01”...

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 31 / 50

Page 104: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

HeidelTime’s language resources

Pattern files:frequently used terms // Pattern resource

// for// month names// reMonthNameJanuaryFebruaryMarch...

// Pattern resource// for// month abbrev.// reMonthShortJan\.?Feb\.?Mar\.?....

// Pattern resource// for// month numbers// reMonthNumber1011120?[1–9]

Normalization files:contain normalizedvalues of such terms

// Normalization resource// for// month patterns// normMonth“January”,“01”“Jan\.?”,“01”“0?1”,“01”...

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 31 / 50

Page 105: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

HeidelTime’s language resources

After read by HeidelTime, accessible by rules:

reMonthLong = (January|February|...)reMonthShort = (Jan.?|Feb.?|Mar.?|...)reMonthNumber = (10|11|12|0?[1-9])...

normMonth(January) = 01normMonth(Jan) = 01normMonth(1) = 01...

Rule files:every rule contains at least:

– rule name– extraction part – calls pattern– value normalization part – calls normalization functions

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 32 / 50

Page 106: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of 200 languages [EMNLP’15]

so far: for 13 languagesmanual resource development for each added language

disadvantages:labor and time intensivelanguage knowledge requiredthere are many more languages not yet addressed

developing language resources automaticallysimplified English resources as starting point for translationsresource development process for “all languages”language-independent rules

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 33 / 50

Page 107: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of 200 languages [EMNLP’15]

so far: for 13 languagesmanual resource development for each added language

disadvantages:labor and time intensivelanguage knowledge requiredthere are many more languages not yet addressed

developing language resources automaticallysimplified English resources as starting point for translationsresource development process for “all languages”language-independent rules

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 33 / 50

Page 108: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temporal tagging of 200 languages [EMNLP’15]

so far: for 13 languagesmanual resource development for each added language

disadvantages:labor and time intensivelanguage knowledge requiredthere are many more languages not yet addressed

developing language resources automaticallysimplified English resources as starting point for translationsresource development process for “all languages”language-independent rules

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 33 / 50

Page 109: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Developing language resources automatically

resource development process for “all languages”

normMonthLong''January'',''01''''February'',''02''...

Simplified English resources

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 34 / 50

Page 110: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Developing language resources automatically

resource development process for “all languages”

normMonthLong''January'',''01''''February'',''02''...

Simplified English resources

for each language

// spanish// reMonthLong

// spanish// normMonthLong

// german// reMonthLong

// german// normMonthLong

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 34 / 50

Page 111: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Developing language resources automatically

resource development process for “all languages”

normMonthLong''January'',''01''''February'',''02''...

Simplified English resources

for each language

// spanish// reMonthLong

extractpatterns

JanuaryFebruary

...

// spanish// normMonthLong

// german// reMonthLong

// german// normMonthLong

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 34 / 50

Page 112: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Developing language resources automatically

resource development process for “all languages”

normMonthLong''January'',''01''''February'',''02''...

Simplified English resources

for each language

// spanish// reMonthLong

extractpatterns

JanuaryFebruary

...

Wiktionary

February

translations = { german: Januar spanish: enero...}

January

translations = { german: Januar spanish: enero...}

// spanish// normMonthLong

// german// reMonthLong

// german// normMonthLong

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 34 / 50

Page 113: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Developing language resources automatically

resource development process for “all languages”

normMonthLong''January'',''01''''February'',''02''...

Simplified English resources

for each language

// spanish// reMonthLong// ''January'',''01''enero// ''February'',''02''febrero

extractpatterns

JanuaryFebruary

...

// spanish// normMonthLong// ''January'',''01''''enero'',''01''// '' February'',''02''''febrero'',''02''

// german// reMonthLong// ''January'',''01''Januar// ''February'',''02''Februar

// german// normMonthLong// ''January'',''01''''Januar'',''01''// ''February'',''02''''Februar'',''02''

add patterns add normalizations

Wiktionary

February

translations = { german: Januar spanish: enero...}

January

translations = { german: Januar spanish: enero...}

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 34 / 50

Page 114: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Developing language resources automatically

language-independent rules (based on English rules)no language-dependent wordsadd “creative rules”, e.g., permutations of patterns

assumptionobviously, not all rules required for all languagesbut: “unnecessary” rules are unlikely to harm results

evaluationvery promisingbaseline for most of the 200 languages

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 35 / 50

Page 115: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Developing language resources automatically

language-independent rules (based on English rules)no language-dependent wordsadd “creative rules”, e.g., permutations of patterns

assumptionobviously, not all rules required for all languagesbut: “unnecessary” rules are unlikely to harm results

evaluationvery promisingbaseline for most of the 200 languages

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 35 / 50

Page 116: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Developing language resources automatically

language-independent rules (based on English rules)no language-dependent wordsadd “creative rules”, e.g., permutations of patterns

assumptionobviously, not all rules required for all languagesbut: “unnecessary” rules are unlikely to harm results

evaluationvery promisingbaseline for most of the 200 languages

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 35 / 50

Page 117: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Questions addressed in this talk

Why temporal information?

What’s temporal tagging?

Why domain-sensitive temporal tagging?

What applications can exploit temporal information and how?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 36 / 50

Page 118: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The email-calendar application

From: [email protected]: [email protected]: January 11, 2017

Subject: meeting

Dear Prof,

I am sorry that I couldn’tmake it on Monday. But

could we meet this Friday ?

I don’t care about the time.What about 2 pm?

Thank you and kind regards,student

JANUARY09 Monday10 Tuesday11 Wednesday12 Thursday

13 Friday

14 Saturday15 Sunday16 Monday

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 37 / 50

Page 119: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for search engines

temporal information needs are frequentquery log analysis [Zhang et al., 2010]: 13.8% explicitly temporal

example information need: topic=“Olympics” time=“[1965,1974]”

web search with time constraints is difficult

temporal expressions considered as standard words1965 and 1974 make documents relevant1970 will not contribute to relevanceten years later [1970] not at all

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 38 / 50

Page 120: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for search engines

temporal information needs are frequentquery log analysis [Zhang et al., 2010]: 13.8% explicitly temporal

example information need: topic=“Olympics” time=“[1965,1974]”

web search with time constraints is difficult

temporal expressions considered as standard words1965 and 1974 make documents relevant1970 will not contribute to relevanceten years later [1970] not at all

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 38 / 50

Page 121: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for search engines

temporal information needs are frequentquery log analysis [Zhang et al., 2010]: 13.8% explicitly temporal

example information need: topic=“Olympics” time=“[1965,1974]”

web search with time constraints is difficult

temporal expressions considered as standard words1965 and 1974 make documents relevant1970 will not contribute to relevanceten years later [1970] not at all

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 38 / 50

Page 122: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for search engines

temporal information needs are frequentquery log analysis [Zhang et al., 2010]: 13.8% explicitly temporal

example information need: topic=“Olympics” time=“[1965,1974]”

web search with time constraints is difficult

temporal expressions considered as standard words1965 and 1974 make documents relevant1970 will not contribute to relevanceten years later [1970] not at all

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 38 / 50

Page 123: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Event-centric search [ICTIR’16]

DhruvGupta

KlausBerberich

ideaevents important in many textsevents as 〈 time, place, persons 〉

example search query: Max Planck Institute for Informatics

standardranked list ofdocuments

1. doc-12. doc-53. doc-8

our approachdocuments clustered along

important events

1990 (foundation): doc-1, doc-52000 (10th anniv.): doc-2, doc-12015 (25th anniv.): doc-8, doc-5

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 39 / 50

Page 124: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Event-centric search [ICTIR’16]

DhruvGupta

KlausBerberich

ideaevents important in many textsevents as 〈 time, place, persons 〉

example search query: Max Planck Institute for Informatics

standardranked list ofdocuments

1. doc-12. doc-53. doc-8

our approachdocuments clustered along

important events

1990 (foundation): doc-1, doc-52000 (10th anniv.): doc-2, doc-12015 (25th anniv.): doc-8, doc-5

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 39 / 50

Page 125: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Event-centric search [ICTIR’16]

DhruvGupta

KlausBerberich

ideaevents important in many textsevents as 〈 time, place, persons 〉

example search query: Max Planck Institute for Informatics

standardranked list ofdocuments

1. doc-12. doc-53. doc-8

our approachdocuments clustered along

important events

1990 (foundation): doc-1, doc-52000 (10th anniv.): doc-2, doc-12015 (25th anniv.): doc-8, doc-5

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 39 / 50

Page 126: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for summarization

relations between temporal expressions are ignored

documentIn 2015, something unimportant happened.One year later, something important happened.

useful summariesno underspecified/relative expressions without reference timeadd normalized values (e.g., One year later [2016])

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 40 / 50

Page 127: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for summarization

relations between temporal expressions are ignored

documentIn 2015, something unimportant happened.One year later, something important happened.

useful summariesno underspecified/relative expressions without reference timeadd normalized values (e.g., One year later [2016])

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 40 / 50

Page 128: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for summarization

relations between temporal expressions are ignored

documentIn 2015, something unimportant happened.One year later, something important happened.

useful summariesno underspecified/relative expressions without reference time

add normalized values (e.g., One year later [2016])

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 40 / 50

Page 129: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for summarization

relations between temporal expressions are ignored

documentIn 2015, something unimportant happened.One year later, something important happened.

useful summariesno underspecified/relative expressions without reference timeadd normalized values (e.g., One year later [2016])

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 40 / 50

Page 130: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time-centric corpus analysis [DHd’14, DH’15]

FrankFischer

world literatureGutenberg / Gutenberg.de corporaabout 550 authors, 2735 works (1510 – 1950)

When does (the German) literature take place?analysis of explicit month vs. day-month mentions

quiz:which month occurs most frequently?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 41 / 50

Page 131: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time-centric corpus analysis [DHd’14, DH’15]

FrankFischer

world literatureGutenberg / Gutenberg.de corporaabout 550 authors, 2735 works (1510 – 1950)

When does (the German) literature take place?analysis of explicit month vs. day-month mentions

quiz:which month occurs most frequently?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 41 / 50

Page 132: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time-centric corpus analysis [DHd’14, DH’15]

FrankFischer

world literatureGutenberg / Gutenberg.de corporaabout 550 authors, 2735 works (1510 – 1950)

When does (the German) literature take place?analysis of explicit month vs. day-month mentions

quiz:which month occurs most frequently?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 41 / 50

Page 133: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

When does (the German) literature take place?

complete corpus

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 42 / 50

Page 134: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Sub-intervals Authors

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 42 / 50

Page 135: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

The “Calendar boy” of world literature

Jules Verne

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 42 / 50

Page 136: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

tiwoli – Today in World Literature

ThomasBögel

FrankFischer

select a day e.g., today

birthday invitation“I usually celebrate

my birthday onJanuary 11 with afew friends. Wouldyou like to come

along?”Hermann Hesse

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 43 / 50

Page 137: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

tiwoli – Today in World Literature

ThomasBögel

FrankFischer

select a day e.g., today

birthday invitation“I usually celebrate

my birthday onJanuary 11 with afew friends. Wouldyou like to come

along?”Hermann Hesse

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 43 / 50

Page 138: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

What about latent temporalinformation in texts?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 44 / 50

Page 139: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temponyms [WWW’16]

ErdalKuzey

VinaySetty

GerhardWeikum

not only “obvious” temporalexpressions as defined in

TimeML

ideastandard text phrases may be associated with temporal scopestemporal scopes can be found in KBs (or Wikipedia)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 45 / 50

Page 140: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temponyms [WWW’16]

ideastandard text phrases may be associated with temporal scopestemporal scopes can be found in KBs (or Wikipedia)

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 45 / 50

Page 141: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temponyms [WWW’16]

ideastandard text phrases may be associated with temporal scopestemporal scopes can be found in KBs (or Wikipedia)

example phrases

the Cuban Revolutionary War 2008 Mexico City plane crash

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 45 / 50

Page 142: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temponyms [WWW’16]

ideastandard text phrases may be associated with temporal scopestemporal scopes can be found in KBs (or Wikipedia)

temporal tagging results

–the Cuban Revolutionary War

20082008 Mexico City plane crash

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 45 / 50

Page 143: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temponyms [WWW’16]

ideastandard text phrases may be associated with temporal scopestemporal scopes can be found in KBs (or Wikipedia)

temponym tagging results

[1953-07-26,1959-01-01]the Cuban Revolutionary War

2008-11-042008 Mexico City plane crash

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 45 / 50

Page 144: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temponyms [WWW’16]

ideastandard text phrases may be associated with temporal scopestemporal scopes can be found in KBs (or Wikipedia)

temponym tagging results

[1953-07-26,1959-01-01]the Cuban Revolutionary War

2008-11-042008 Mexico City plane crash

temponyms add new or more precise temporal information

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 45 / 50

Page 145: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temponyms

approachesunambiguous temponyms:

– HeidelTime extension [TempWeb’16]

all types of temponyms:– ILP-based approach with joint inference [WWW’16]

examples

his presidencyfederal election...

his presidencyClinton’sObama’sBush (jr)’sBush (sen)’s

federal electionCA 2008US 2008US 2012

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 46 / 50

Page 146: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temponyms

approachesunambiguous temponyms:

– HeidelTime extension [TempWeb’16]

all types of temponyms:– ILP-based approach with joint inference [WWW’16]

examples

his presidencyfederal election...

his presidencyClinton’sObama’sBush (jr)’sBush (sen)’s

federal electionCA 2008US 2008US 2012

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 46 / 50

Page 147: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Temponyms

approachesunambiguous temponyms:

– HeidelTime extension [TempWeb’16]

all types of temponyms:– ILP-based approach with joint inference [WWW’16]

examples

his presidencyfederal election...

his presidencyClinton’sObama’sBush (jr)’sBush (sen)’s

federal electionCA 2008US 2008US 2012

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 46 / 50

Page 148: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

ILP-based approach with joint inference [WWW’16]

first steps

identify temponym candidates– his presidency– federal election

find KB candidatesextract and normalize

– context entities– context temporal expressions

ILPs to find optimal temponymsbased on coherence and relatedness

measures

find KB candidatesClinton’s presidencyObama’s presidencyBush (jr)’s presidencyBush (sen)’s presidencyCA federal election ’08US federal election ’08US federal election ’12

contextBush, Washington2009, 2008,

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 47 / 50

Page 149: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

ILP-based approach with joint inference [WWW’16]

first steps

identify temponym candidates– his presidency– federal election

find KB candidatesextract and normalize

– context entities– context temporal expressions

ILPs to find optimal temponymsbased on coherence and relatedness

measures

find KB candidatesClinton’s presidencyObama’s presidencyBush (jr)’s presidencyBush (sen)’s presidencyCA federal election ’08US federal election ’08US federal election ’12

contextBush, Washington2009, 2008,

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 47 / 50

Page 150: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

ILP-based approach with joint inference [WWW’16]

first steps

identify temponym candidates– his presidency– federal election

find KB candidatesextract and normalize

– context entities– context temporal expressions

ILPs to find optimal temponymsbased on coherence and relatedness

measures

find KB candidatesClinton’s presidencyObama’s presidencyBush (jr)’s presidencyBush (sen)’s presidencyCA federal election ’08US federal election ’08US federal election ’12

contextBush, Washington2009, 2008,

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 47 / 50

Page 151: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

ILP-based approach with joint inference [WWW’16]

first steps

identify temponym candidates– his presidency– federal election

find KB candidatesextract and normalize

– context entities– context temporal expressions

ILPs to find optimal temponymsbased on coherence and relatedness

measures

find KB candidatesClinton’s presidencyObama’s presidencyBush (jr)’s presidencyBush (sen)’s presidencyCA federal election ’08US federal election ’08US federal election ’12

contextBush, Washington2009, 2008,

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 47 / 50

Page 152: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Summary

Why temporal information?

What’s temporal tagging?

Why domain-sensitive temporal tagging?

What applications can exploit temporal information and how?

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 48 / 50

Page 153: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Summary

temporal information– frequent, important

temporal tagging– extraction and normalization of temporal expressions

domain-sensitive temporal tagging– different domains, different challenges– taggers tailored towards one domain, quality drop on others

many applications can benefit– in multiple ways

latent temporal information in texts– challenging, but useful to further enrich documents

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 48 / 50

Page 154: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Summary

temporal information– frequent, important

temporal tagging– extraction and normalization of temporal expressions

domain-sensitive temporal tagging– different domains, different challenges– taggers tailored towards one domain, quality drop on others

many applications can benefit– in multiple ways

latent temporal information in texts– challenging, but useful to further enrich documents

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 48 / 50

Page 155: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Summary

temporal information– frequent, important

temporal tagging– extraction and normalization of temporal expressions

domain-sensitive temporal tagging– different domains, different challenges– taggers tailored towards one domain, quality drop on others

many applications can benefit– in multiple ways

latent temporal information in texts– challenging, but useful to further enrich documents

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 48 / 50

Page 156: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for question answering

ZhenJia

GerhardWeikum

AbdAbujabal

RishiSaha Roy

MohamedYahya

various types of temporal questionsWhen was Bill Clinton born?Who won last year’s Superbowl?What money did they use in Spain before 2002?

this is ongoing work ...

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 49 / 50

Page 157: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for question answering

ZhenJia

GerhardWeikum

AbdAbujabal

RishiSaha Roy

MohamedYahya

various types of temporal questionsWhen was Bill Clinton born?Who won last year’s Superbowl?What money did they use in Spain before 2002?

this is ongoing work ...

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 49 / 50

Page 158: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for further information

github iOS & Android app CS library

Thank you! – Time for your questions...

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 50 / 50

Page 159: Time for Text Mining and Information Retrievaljstroetge/talks/jannik-stroetgen... · Text Mining and Information Retrieval Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ... Time:

Why Time? Temporal Tagging HeidelTime Applications Summary

Time for further information

github iOS & Android app CS library

Thank you! – Time for your questions...

Joint Lecture Series – January 11, 2017 c© Jannik Strötgen 50 / 50