personal information management

15
Personal Information Management Vitor R. Carvalho 11-749: Personalized Information Retrieval Carnegie Mellon University February 8 th 2005

Upload: thelma

Post on 09-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Personal Information Management. Vitor R. Carvalho 11-749: Personalized Information Retrieval Carnegie Mellon University February 8 th 2005. Motivation. 1 person → several tasks Several contexts Several past activities Several collaborators Several future plans - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Personal Information Management

Personal Information Management

Vitor R. Carvalho

11-749: Personalized Information Retrieval

Carnegie Mellon University

February 8th 2005

Page 2: Personal Information Management

Motivation

• 1 person → several tasks• Several contexts• Several past activities• Several collaborators• Several future plans

• More and more personal information stored

• Where’s that document ???• Where’s the link to that blue

hotel in New York ?

Page 3: Personal Information Management

Document Types

• Some Commercial (Partial) Solutions

Web Links

Passwords

Calendar

Text, PDF, ZIP, PS, Latex, RTF, DOC, XML, XLS, PPT, etc

Email

IM

AudioVideo

• Research:

retrieval techniques, prototypes, evaluation, HCI, how users access old documents, visualization, etc.

Page 4: Personal Information Management

1st System: Haystack

• From 1997-now, MIT. Comprehensive system to personalize IR and relationship between a particular individual and his corpus.

• Agnostic regarding the particular search tool used.

• Augment the power of search tools by personalizing and improving the representation of the data recorded.

• Uses very general data structure. Supports different annotations and different collections. Work with information, not programs. Email+IM+todoList+calendar+webbrowser+photos+etc together.

• Indexing is done incrementally. During “calm” periods.

Page 5: Personal Information Management
Page 6: Personal Information Management

Haystack Architecture

Page 7: Personal Information Management

• 3 ways to harvest data:

• Data Driven: docs already in Haystack (deletion, selection, etc) or new docs added by user• Observers: observing user’s moves (browsing, searching, saving queries, etc)• Human annotation: via an special interface

• Hard to evaluate in large studies

• You can download the first versions of Haystack from http://haystack.csail.mit.edu/downloads.html

• New Eclipse-based Semantic Web Browser (Based on Haystack)

Page 8: Personal Information Management

2nd System: KFTF (Keeping Found Things Found)

• User study and a survey on how individuals keep and organize info they’ve found on the web. (and want to re-access and reuse it)

• 24/214 participants: researchers, managers and information professionals.

Figure 2: Top 7 keeping methods as ranked by proportion of participants using the method at least once a

week

Page 9: Personal Information Management

3rd System: Stuff I’ve Seen (SIS)

• 2003, Microsoft. Design and evaluation (user study) of a system to “Find things you have seen before”

• 58-81% of webpages are re-visits. Unix commands, library borrowing, human memory, etc…likewise.

• Main ideas:1. Unified index of information across different info sources

(calendar, web, email, files, etc)

2. Rich contextual cues to trigger memory (author, time, thumbnails, etc) .

3. Friendly interface that allows quick feedback and iterative refinement

Page 10: Personal Information Management

Stuff I’ve Seen

Page 11: Personal Information Management

SIS - Evaluation

• Supports Boolean as well as best match (Okapi’ probabilistic ranking alg.) retrieval on text and metadata properties. Allows phrases, wildcards and proximity search.

• 234 people during 6 weeks. • Only 7.5% used boolean operators, or phrases in query• Queries were short (1.59 words) –- the web ~ 2.35 words• Personal datasets from 5K to 100K items• Most used filters: file type and date range.• Most common query types: People’s names• File types opened: emails(76%), web(14%), files(14%)• Standard ranking functions seem less important in this context

Page 12: Personal Information Management

SIS - Evaluation• Similar power functions

found in webpage re-access and memory re-access

• Overall, system had a very good acceptance

Page 13: Personal Information Management

4th System: Using Temporal Landmarks

• 2003, Microsoft. Based on the “Stuff I’ve Seen” system.

• Synthesis of 2 Ideas:– Epsodic Memory – use landmarks

in user’s memory as cues to retrieve information (JFK assassination, 9-11, unforgettable Steelers game, vacations, etc)

– Timeline Visualizations – visualize personal dataset in sequential time

Page 14: Personal Information Management

Selection of Public Landmarks: priority of important holidays, analysis of news headlines, etc. User driven approach.

Selection of Personal Landmarks: different priorities to calendar appointments, “out of office” times, recurrent appointments have low priority, digital photographs (first photo of the day was selected)

Temporal Landmarks - Evaluation

Page 15: Personal Information Management

Some Questions

1. I was wondering what people think about using a whole personalized web search system with/as a query observer in the haystack system. This might be interesting if the system had access to the haystack internal data and could write back to it.

2. General data model of Haystack approach is quite similar to knowledge map approach. Even though they applied their specific need into these kinds of semantic network, the paper missed semantic network retrieval model. Are there any papers that allow us to retrieval these semantic

network?

• http://haystack.lcs.mit.edu/publications.html