anil timeline construction
DESCRIPTION
Presentation in Information Retrieval Class.TRANSCRIPT
Clustering and Exploring Search Results using Timeline
Constructions(Omar Alonso, Michael Gertz, Recardo Baeza-Yates)
presented by
Anil Kumar Attuluri
10/17/2011
Outline
• Motivation• Background• Methods and Prototype• Evaluation• Conclusion• Examples
Motivation
Temporal Information
Temporal Information
Survey results (using Amazon Mechanical Turk)
Q. Do you think current timelines for organizing and clustering search results (such as in Google's timeline) are useful for some of your daily search activities?
76% answered "yes"
Q. Do you use timelines to explore search results?
71% answered "yes"
Use Cases
• History - information about a place or a person during a period of time in the past. There is no decent timeline for happenings during World War II.
• Research - information about a topic in a way sorted with oldest first and newest last. No timeline for volcanic activity in US is available.
• Events - details of a soccer world cups listed on a timeline. Timeline based lists are not available.
Background
Hit lists and Clustering
Hit lists• It is a set of all the documents retrieved based on a
search query. • The documents are sorted based on their rank.
Clustering• Clustering is process where the search results (hit lists) are
categorized and put into different clusters based on cluster labels.
• Useful for providing a better exploration interface to the end user.
TimeML
• TimeML is a Formal Specification Language for Events and Temporal Expressions.
• EVENT - A fresh flow of lava, gas and debris erupted there
Saturday. • TIMEX3 - June 11, 1989, or the Summer of 2002.
• SIGNAL - They will investigate the role of the US before,
during and after the genocide.
• LINKS - John drove to Boston. During his drive he ate a donut.
Amazon Mechanical Turk (AMT)
• Amazon's platform to perform Human Intelligence Tasks (HIT) by humans which cannot be completed by computers yet.
• Requesters - who place HITs , Workers - who perform HITs.
• An Application Programming Interface is provided for the
Requesters to submit their HITs and to retrieve the results.
Methods and Prototype
Time Annotated Document Model
Time and Timelines • Chronon is an atomic time interval which is a single
day. Ex: May 10 2011.
• Granules are contiguous sequence of chronons. Ex: week, month, year.
• Granules composition has a lattice structure.
• Timelines = {Td (day), Tw (week), Tm (month), Ty (year)}
• Chronons have precedence relationship
Time Annotated Document Model
Temporal Expressions • Document timestamp collected during crawling.
• Explicit temporal expressions. Ex. March 12 2005.
• Implicit temporal expressions. Ex. Columbus day 2008.
• Relative Temporal Expressions. Ex. Two days from now.
Time Annotated Document Model
Temporal Document Profile • Temporal document profile is defined as:
tdp: D -> [E x C x P]*
E = Ee U Ei U Er
C = set of all chronons P = set of all positions of a temporal expression in a document
• Simply stating tdp consists of tuples in the form (ei, ci, pi)
• The tuples in tdp are organized as follows: (explicit set, implicit set, dts, realtive set)
Timeline Construction and Document Exploration Constructing a Time Outline • Chronons are extracted from the hit list Lq .
• Minimum and Maximum chronons describe the lower and upper bound of time outline.
• Documents are organized in a temporal range which forms the time outline.
Timeline Construction and Document Exploration Document Clustering • Chronons are normalized.
g - granularity. It can be day, week, month or year
• Documents are mapped to clusters.
• Main cluster and hot spots are determined.
Timeline Construction and Document Exploration Ranking Documents in a Cluster • Ranks are determined as follows.
• Given two documents d and d', d is ranked higher than d' if either of the following two conditions hold.
1. rank(d,yj) > rank(d',yj) 2. rank(d,yj) = rank(d',yj) and d is ranked higher in Lq than d' Lq - set of result documents of a query q
yj - cluster yj
Timeline Construction and Document Exploration Cluster Exploration • The cluster can be refined based on timeline for exploration
of results in each cluster. Ex: refine Ty into Tm or Tw
Temporal Snippets
• Temporal Snippets outline the main events in a document. They are created by pulling the most relevant sentences that contain temporal expressions. TSnippet algorithm is used.
PROTOTYPE
Document Annotation Pipeline • First, extract time related metadata like document
timestamp during the crawl time from Web server.
• Second, run the POS tagger on each document which tags parts of speech and inserts sentence delimiters needed for temporal document annotation.
• Third, run a temporal expression tagger based on TimeML standard. An XML mark up is created (called tdp) which is added to the document.
PROTOTYPE
Exploratory User Interface
Evaluation
Evaluation
Evaluation guidelines
• Precision - fraction of retrieved documents that are relevant. All relevant documents must be included in the timeline.
• Presentation - diplaying the timeline in an intuitive graphical user interface.
Evaluation DMOZ • It is a multilingual open content directory. The World cup
category was picked for evaluation. • Results showed that more clusters were generated by
TCluster algorithm and therefore proved to be more precise.
TimeBank • It contains news articles that have been annotated using
TimeML. • The usage of temporal expressions in documents showed a
50% increase in the number of clusters discovered by TCluster.
Evaluation
Relevance Evaluation using AMT • Goal was to evaluate the quality of search results using
TCluster in combination with temporal snippets.
• 10 random informational queries for Wikipedia featured articles were used. Average response was 4.04% (with an 80% agreement level)
• Top ten most active topics on Twitter were used. Average response was 4.33% (with an 80% agreement level)
Conclusion
Conclusion
• A framework to make the search applications time-aware.
• TCluster algorithm provides flexibility allowing users to not only explores the results over a timeline but also to explore the results at multiple time granularities.
• A user engaged in time-related investigations would benefit from this model when traditional information retrieval and search engines cannot offer much.
Examples
Google search based on time
timesearch.info
historyworld.net
Linkedin timeline
Thank You!