temporal language models for the disclosure of historical text

13
Temporal language models for the disclosure of historical text Franciska de Jong, Henning Rode, and Djoerd Hiemstra 2005

Upload: zoebee

Post on 16-Jun-2015

127 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Temporal language models for the disclosure of historical text

Temporal language models for the disclosure of historical text

Franciska de Jong, Henning Rode, and Djoerd Hiemstra

2005

Page 2: Temporal language models for the disclosure of historical text

Outline

 • Consider the impetus for this study

 • Review normalized log-likelihood ratio, time partitioning,

classification, and confidence equations • Identify underlying assumptions and potential applications of

the study, and discuss related work

2

Page 3: Temporal language models for the disclosure of historical text

In short...

"Users of a search system will typically know one or more contemporary forms associated with the concept they want to search for. They would be helped if the search interface was enhanced with knowledge about diachronically related forms that can be considered synonyms" (161).

"Task definition given a date tagged reference corpus, consisting of documents from a certain time span, and a document X with unknown date within the same time span, the system should classify X according to time partitions of predefined granularity" (163).

 3

Page 4: Temporal language models for the disclosure of historical text

Related work

• Statistical language models • Metadata, time stamps

 • Automatic classification of texts:

concept hierarchies, synonyms 

 4 

Page 5: Temporal language models for the disclosure of historical text

 5

Page 6: Temporal language models for the disclosure of historical text

 36generated by Dave Farrance, using Google's n-gram viewer

Page 7: Temporal language models for the disclosure of historical text

Normalized log-likelihood ratio (NLLR)

   

7

Page 8: Temporal language models for the disclosure of historical text

8

Page 9: Temporal language models for the disclosure of historical text

where ti marks a partition within the corpus period (t0 would indicate the beginning of the corpus period); Ci denotes a partition of corpus documents; Dj denotes any document from the corpus; and τ(Dj)

indicates the document's presumed date.

Time partitioning

9

Page 10: Temporal language models for the disclosure of historical text

10

Page 11: Temporal language models for the disclosure of historical text

11related article

Page 12: Temporal language models for the disclosure of historical text

Two methods

                          

12

Page 13: Temporal language models for the disclosure of historical text

Questions

13

• In what circumstances would de Jong et al's proposed approach to time-stamping be useful?

 • How are the processes of information retrieval and

temporal determination similar, and how do they differ?