temporal language models for the disclosure of historical text
TRANSCRIPT
Temporal language models for the disclosure of historical text
Franciska de Jong, Henning Rode, and Djoerd Hiemstra
2005
Outline
• Consider the impetus for this study
• Review normalized log-likelihood ratio, time partitioning,
classification, and confidence equations • Identify underlying assumptions and potential applications of
the study, and discuss related work
2
In short...
"Users of a search system will typically know one or more contemporary forms associated with the concept they want to search for. They would be helped if the search interface was enhanced with knowledge about diachronically related forms that can be considered synonyms" (161).
"Task definition given a date tagged reference corpus, consisting of documents from a certain time span, and a document X with unknown date within the same time span, the system should classify X according to time partitions of predefined granularity" (163).
3
Related work
• Statistical language models • Metadata, time stamps
• Automatic classification of texts:
concept hierarchies, synonyms
4
5
36generated by Dave Farrance, using Google's n-gram viewer
Normalized log-likelihood ratio (NLLR)
7
8
where ti marks a partition within the corpus period (t0 would indicate the beginning of the corpus period); Ci denotes a partition of corpus documents; Dj denotes any document from the corpus; and τ(Dj)
indicates the document's presumed date.
Time partitioning
9
10
11related article
Two methods
12
Questions
13
• In what circumstances would de Jong et al's proposed approach to time-stamping be useful?
• How are the processes of information retrieval and
temporal determination similar, and how do they differ?