anil timeline construction

33
Clustering and Exploring Search Results using Timeline Constructions (Omar Alonso, Michael Gertz, Recardo Baeza-Yates) presented by Anil Kumar Attuluri 10/17/2011

Upload: anilcs0405

Post on 23-Jan-2015

268 views

Category:

Technology


0 download

DESCRIPTION

Presentation in Information Retrieval Class.

TRANSCRIPT

Page 1: Anil timeline construction

Clustering and Exploring Search Results using Timeline

Constructions(Omar Alonso, Michael Gertz, Recardo Baeza-Yates)

presented by 

Anil Kumar Attuluri        

            10/17/2011               

Page 2: Anil timeline construction

Outline

• Motivation• Background• Methods and Prototype• Evaluation• Conclusion• Examples

Page 3: Anil timeline construction

Motivation

Page 4: Anil timeline construction

Temporal Information

Page 5: Anil timeline construction

Temporal Information

Page 6: Anil timeline construction

Survey results (using Amazon Mechanical Turk)

Q. Do you think current timelines for organizing and clustering search results (such as in Google's timeline) are useful for some of your daily search activities?

76% answered "yes"

Q. Do you use timelines to explore search results?

71% answered  "yes"

Page 7: Anil timeline construction

Use Cases

• History - information about a place or a person during a period of time in the past. There is no decent timeline for happenings during World War II. 

• Research - information about a topic in a way sorted with oldest first and newest last. No timeline for volcanic activity in US is available.

• Events - details of a soccer world cups listed on a timeline. Timeline based lists are not available.

Page 8: Anil timeline construction

Background

Page 9: Anil timeline construction

Hit lists and Clustering

Hit lists• It is a set of all the documents retrieved based on a

search query. • The documents are sorted based on their rank.

 Clustering• Clustering is process where the search results (hit lists) are

categorized and put into different clusters based on cluster labels.

• Useful for providing a better exploration interface to the end user.

    

Page 10: Anil timeline construction

TimeML

• TimeML is a Formal Specification Language for Events and Temporal Expressions.

  • EVENT - A fresh flow of lava, gas and debris erupted there

Saturday. • TIMEX3 - June 11, 1989, or the Summer of 2002. 

   • SIGNAL - They will investigate the role of the US before,

during and after the genocide.

• LINKS - John drove to Boston. During his drive he ate a donut.

 

    

Page 11: Anil timeline construction

Amazon Mechanical Turk (AMT)

• Amazon's platform to perform Human Intelligence Tasks (HIT) by humans which cannot be completed by computers yet.

 • Requesters - who place HITs , Workers - who perform HITs.

 • An Application Programming Interface is provided for the

Requesters to submit their HITs and to retrieve the results.

Page 12: Anil timeline construction

Methods and Prototype

Page 13: Anil timeline construction

Time Annotated Document Model

 Time and Timelines         • Chronon is an atomic time interval which is a single

day. Ex: May 10 2011.

• Granules are contiguous sequence of chronons. Ex: week, month, year.

• Granules composition has a lattice structure.

• Timelines = {Td (day), Tw (week), Tm (month), Ty (year)}

• Chronons have precedence relationship

Page 14: Anil timeline construction

Time Annotated Document Model

 Temporal Expressions         • Document timestamp collected during crawling.

• Explicit temporal expressions. Ex. March 12 2005.

• Implicit temporal expressions. Ex. Columbus day 2008.

• Relative Temporal Expressions. Ex. Two days from now.

Page 15: Anil timeline construction

Time Annotated Document Model

 Temporal Document Profile         • Temporal document profile is defined as:                              

                 tdp: D -> [E x C x P]*

       E =     Ee U Ei U Er

       C =     set of all chronons       P =     set of all positions of a temporal expression in a                   document

• Simply stating tdp consists of tuples in the form (ei, ci, pi) 

• The tuples in tdp are organized as follows:      (explicit set, implicit set, dts, realtive set)

Page 16: Anil timeline construction

Timeline Construction and Document Exploration Constructing a Time Outline     • Chronons are extracted from the hit list Lq .

• Minimum and Maximum chronons describe the lower and upper bound of time outline.

• Documents are organized in a temporal range which forms the time outline.

Page 17: Anil timeline construction

Timeline Construction and Document Exploration Document Clustering  • Chronons are normalized.

          g -  granularity. It can be day, week, month or year

• Documents are mapped to clusters.

• Main cluster and hot spots are determined.

Page 18: Anil timeline construction

Timeline Construction and Document Exploration Ranking Documents in a Cluster  • Ranks are determined as follows.

            

• Given two documents d and d', d is ranked higher than d' if either of the following two conditions hold.

     1. rank(d,yj) > rank(d',yj)     2. rank(d,yj) = rank(d',yj) and d is ranked higher in Lq than d'         Lq - set of result documents of a query q

         yj  - cluster yj  

Page 19: Anil timeline construction

Timeline Construction and Document Exploration Cluster Exploration  • The cluster can be refined based on timeline for exploration

  of results in each cluster.     Ex: refine Ty into Tm or Tw

Temporal Snippets

• Temporal Snippets outline the main events in a document. They are created by pulling the most relevant sentences that contain temporal expressions. TSnippet algorithm is used.

Page 20: Anil timeline construction

PROTOTYPE

 Document Annotation Pipeline        • First, extract time related metadata like document

timestamp during the crawl time from Web server.

• Second, run the POS tagger on each document which tags parts of speech and inserts sentence delimiters needed for temporal document annotation.

• Third, run a temporal expression tagger based on TimeML standard. An XML mark up is created (called tdp) which is added to the document.

         

Page 21: Anil timeline construction

PROTOTYPE

 Exploratory User Interface        

     

Page 22: Anil timeline construction

Evaluation

Page 23: Anil timeline construction

Evaluation

 Evaluation guidelines        

• Precision - fraction of retrieved documents that are relevant. All relevant documents must be included in the timeline.

• Presentation - diplaying the timeline in an intuitive graphical user interface.

     

Page 24: Anil timeline construction

Evaluation DMOZ • It is a multilingual open content directory. The World cup

category was picked for evaluation. • Results showed that more clusters were generated by

TCluster algorithm and therefore proved to be more precise.

TimeBank • It contains news articles that have been annotated using

TimeML. • The usage of temporal expressions in documents showed a

50% increase in the number of clusters discovered by TCluster. 

     

Page 25: Anil timeline construction

Evaluation

 Relevance Evaluation using AMT        • Goal was to evaluate the quality of search results using

TCluster in combination with temporal snippets.

• 10 random informational queries for Wikipedia featured articles were used.  Average response was 4.04% (with an 80% agreement level)

• Top ten most active topics on Twitter were used. Average response was 4.33% (with an 80% agreement level)

         

Page 26: Anil timeline construction

Conclusion

Page 27: Anil timeline construction

Conclusion

        • A framework to make the search applications time-aware.

• TCluster algorithm provides flexibility allowing users to not only explores the results over a timeline but also to explore the results at multiple time granularities.

• A user engaged in time-related investigations would benefit from this model when traditional information retrieval and search engines cannot offer much.

         

Page 28: Anil timeline construction

Examples

Page 29: Anil timeline construction

Google search based on time

Page 30: Anil timeline construction

timesearch.info

Page 31: Anil timeline construction

historyworld.net

Page 32: Anil timeline construction

Linkedin timeline

Page 33: Anil timeline construction

Thank You!