sims 202 information organization and retrieval prof. marti hearst and prof. ray larson uc berkeley...

86
SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

SIMS 202Information Organization

and Retrieval

Prof. Marti Hearst and Prof. Ray LarsonUC Berkeley SIMS

Tues/Thurs 9:30-11:00amFall 2000

Page 2: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Last Time

Starting Points for Search– Lists– Overviews

»Categories

Page 3: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Today and Next Time

Starting points (cont)– Clusters – Examples as starting points– Automated Source Selection

UIs for Query Specification UIs for Putting Results in Context UIs to support the Search Process

Page 4: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Starting Points for Search

Faced with a prompt or an empty entry form … how to start?– Lists of sources– Overviews

»Clusters»Category Hierarchies/Subject Codes»Co-citation links

– Examples, Wizards, and Guided Tours– Automatic source selection

Page 5: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Category Combinations

HiBrowse Problem: – Search is not integrated with

browsing of categories– Only see the subset of categories

selected (and the corresponding number of documents)

Page 6: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Cat-a-Cone:Multiple Simultaneous Categories

Key Ideas:– Separate documents from category

labels– Show both simultaneously

Link the two for iterative feedback Distinguish between:

– Searching for Documents vs.– Searching for Categories

Page 7: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Cat-a-Cone Interface

Page 8: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Cat-a-Cone

Catacomb: (definition 2b, online Websters)“A complex set of interrelated things”

Makes use of earlier PARC work on 3D+animation:

Rooms Henderson and Card 86IV: Cone Tree Robertson, Card, Mackinlay 93Web Book Card, Robertson, York 96

Page 9: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

CategoryHierarch

y

browsebrowse

Page 10: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

searchsearch

CategoryHierarch

y

Page 11: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Collection

Retrieved Documents

searchsearch

CategoryHierarch

y

query terms

Page 12: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Collection

Retrieved Documents

searchsearch

CategoryHierarch

y

browsebrowsequery terms

Page 13: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Collection

Retrieved Documents

searchsearch

CategoryHierarch

y

browsebrowsequery terms

Page 14: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

ConeTree for Category Labels

Browse/explore category hierarchy– by search on label names– by growing/shrinking subtrees– by spinning subtrees

Affordances– learn meaning via ancestors, siblings– disambiguate meanings– all cats simultaneously viewable

Page 15: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Virtual Book for Result Sets

– Categories on Page (Retrieved Document) linked to Categories in Tree

– Flipping through Book Pages causes some Subtrees to Expand and Contract

– Most Subtrees remain unchanged

– Book can be Stored for later Re-Use

Page 16: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Improvements over Standard Category Interfaces

Integrate category selection with Integrate category selection with viewing of categories viewing of categories

Show all categories + context Show all categories + context Show relationship of retrieved Show relationship of retrieved

documents to the category structuredocuments to the category structure

Page 17: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Text Clustering

Finds overall similarities among groups of documents

Finds overall similarities among groups of tokens

Picks out some themes, ignores others

Page 18: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000
Page 19: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

S/G Example: query on “star”

Encyclopedia text14 sports

8 symbols 47 film, tv 68 film, tv (p) 7 music97 astrophysics 67 astronomy(p) 12 steller phenomena 10 flora/fauna 49 galaxies, stars

29 constellations 7 miscelleneous

Clustering and re-clustering is entirely automated

Page 20: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Using Clustering in Document Ranking

Cluster entire collection Find cluster centroid that best

matches the query This has been explored extensively

– it is expensive– it doesn’t work well

Page 21: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Two Queries: Two Clusterings

AUTO, CAR, ELECTRIC AUTO, CAR, SAFETY

The main differences are the clusters that are central to the query

8 control drive accident …

25 battery california technology …

48 import j. rate honda toyota …

16 export international unit japan

3 service employee automatic …

6 control inventory integrate …

10 investigation washington …

12 study fuel death bag air …

61 sale domestic truck import …

11 japan export defect unite …

Page 22: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Another use of clustering

Use clustering to map the entire huge multidimensional document space into a huge number of small clusters.

“Project” these onto a 2D graphical representation– Group by doc: SPIRE/Kohonen maps– Group by words: Galaxy of

News/HotSauce/Semio

Page 23: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Clustering Multi-Dimensional Document Space

(image from Wise et al 95)

Page 24: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Kohon

en F

eatu

re M

ap

s on

Text

(fro

m C

hen

et

al., JA

SIS

49

(7))

Page 25: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

UWMS Data Mining Workshop

Study of Kohonen Feature Maps

H. Chen, A. Houston, R. Sewell, and B. Schatz, JASIS 49(7)

Comparison: Kohonen Map and Yahoo Task:

– “Window shop” for interesting home page– Repeat with other interface

Results:– Starting with map could repeat in Yahoo

(8/11)– Starting with Yahoo unable to repeat in map

(2/14)

Page 26: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

UWMS Data Mining Workshop

Study (cont.)

Participants liked:– Correspondence of region size to #

documents– Overview (but also wanted zoom)– Ease of jumping from one topic to

another – Multiple routes to topics– Use of category and subcategory

labels

Page 27: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

UWMS Data Mining Workshop

Study (cont.) Participants wanted:

– hierarchical organization– other ordering of concepts (alphabetical)– integration of browsing and search– corresponce of color to meaning – more meaningful labels– labels at same level of abstraction– fit more labels in the given space– combined keyword and category search– multiple category assignment (sports+entertain)

Page 28: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Visualization of Clusters

– Huge 2D maps may be inappropriate focus for information retrieval »Can’t see what documents are about»Documents forced into one position in

semantic space»Space is difficult to use for IR purposes»Hard to view titles

– Perhaps more suited for pattern discovery»problem: often only one view on the

space

Page 29: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Summary: Clustering Advantages:

– Get an overview of main themes– Domain independent

Disadvantages:– Many of the ways documents could group

together are not shown– Not always easy to understand what they

mean– Different levels of granularity

Page 30: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Automated Source Selection Compare the query against summaries of

what is contained in the collection– GLOSS (Tomasic et al. 97)

»Predict which of several sources is most likely»Based on how many instances of each query

term occurs in the collection– SavvySearch (Howe & Dreilinger 97, in reader)

»Predict which of several search engines is likely to produce a good answer to a given query

»Based on number of pages returned and amount of time users spend on retrieved pages

Page 31: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Query Specification

Page 32: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Query Specification

Interaction Styles (Shneiderman 97)– Command Language– Form Fillin– Menu Selection– Direct Manipulation– Natural Language

Example:– How do each apply to Boolean Queries

Page 33: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Command-Based Query Specification

command attribute value connector …

– find pa shneiderman and tw user# What are the attribute names? What are the command names? What are allowable values?

Page 34: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Form-Based Query Specification (Altavista)

Page 35: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Form-Based Query Specification (Melvyl)

Page 36: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Form-based Query Specification (Infoseek)

Page 37: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Di r

ect

Manip

ula

tion S

pec.

VQ

UER

Y (

J ones

98

)

Page 38: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Menu-based Query Specification(Young & Shneiderman 93)

Page 39: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Context

Page 40: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Putting Results in Context Visualizations of Query Term Distribution

– KWIC, TileBars, SeeSoft Visualizing Shared Subsets of Query Terms

– InfoCrystal, VIBE, Lattice Views Table of Contents as Context

– Superbook, Cha-Cha, DynaCat Organizing Results with Tables

– Envision, SenseMaker Using Hyperlinks

– WebCutter

Page 41: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Putting Results in Context

Interfaces should – give hints about the roles terms play

in the collection– give hints about what will happen if

various terms are combined– show explicitly why documents are

retrieved in response to the query– summarize compactly the subset of

interest

Page 42: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

KWIC (Keyword in Context) An old standard, ignored by internet search

engines– used in some intranet engines, e.g., Cha-Cha

Page 43: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Display of Retrieval Results

Goal: minimize time/effort for deciding which documents to examine in detail

Idea: show the roles of the query terms in the retrieved documents, making use of document structure

Page 44: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

TileBars

Graphical Representation of Term Distribution and Overlap

Simultaneously Indicate:– relative document length– query term frequencies– query term distributions– query term overlap

Page 45: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Query terms:

What roles do they play in retrieved documents?

DBMS (Database Systems)

Reliability

Mainly about both DBMS & reliability

Mainly about DBMS, discusses reliability

Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability

Mainly about high-tech layoffs

Example

Page 46: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000
Page 47: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000
Page 48: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Exploiting Visual Properties

– Variation in gray scale saturation imposes a universal, perceptual order (Bertin et al. ‘83)

– Varying shades of gray show varying quantities better than color (Tufte ‘83)

– Differences in shading should align with the values being presented (Kosslyn et al. ‘83)

Page 49: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Key Aspect: Faceted Queries Conjunct of disjuncts Each disjunct is a concept

– osteoporosis, bone loss– prevention, cure– research, Mayo clinic, study

User does not have to specify which are main topics, which are subtopics

Ranking algorithm gives higher weight to overlap of topics

Page 50: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Main Topic Context

Potential Problem with TileBarsGiven retrieved documents in which no

query terms are well-distributed,The user does not know the context in

which the query terms are used

Solution:Accompany with main topic display

Page 51: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

TileBars Summary Compact, graphical representation

of term distribution for full text retrieval results– simultaneously display term frequency,

distribution, overlap, and doc length– allow for simple user-determined

ordering strategies

Part of a larger effort: user-centric, content-sensitive information access

Page 52: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

TileBars Summary Preliminary User Studies

users understand them

find them helpful in some situations

sometimes terms need to be disambiguated

Page 53: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

SeeSoft: Showing Text Content using a linear representation and brushing and linking (Eick &

Wills 95)

Page 54: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Query Term Subsets

Show which subsets of query terms occur in which subsets of documents occurs in which subsets of retrieved documents

Page 55: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Other Approaches Show how often each query term

occurs in retrieved documents– VIBE (Korfhage ‘91)– InfoCrystal (Spoerri ‘94)– Problems:

»can’t see overlap of terms within docs»quantities not represented graphically»more than 4 terms hard to handle»no help in selecting terms to begin

with

Page 56: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

InfoCrystal (Spoerri 94)

Page 57: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

VIBE (Olson et al. 93, Korfhage 93)

Page 58: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Superbook (Remde et al. 87) Next-generation hyper-media book Functions:

– Word Lookup: » Show a list query words, stems, and word combinations

– Table of Contents: Dynamic fisheye view of the hierarchical topics list

» Search words can be highlighted here too

– Page of Text: show selected page and highlighted search terms

Hypertext features linking through search words rather than page links

Page 59: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Superbook (http://superbook.bellcore.com/SB)

Page 60: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

DynaCat (Pratt 97)

Decide on important question types in an advance– What are the adverse effects of drug

D?– What is the prognosis for treatment

T? Make use of MeSH categories Retain only those types of

categories known to be useful for this type of query.

Page 61: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

DynaCat (Pratt, Hearst, & Fagan 99)

Page 62: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

DynaCat Study

Design– Three queries– 24 cancer patients– Compared three interfaces

» ranked list, clusters, categories

Results– Participants strongly preferred categories– Participants found more answers using

categories– Participants took same amount of time with

all three interfaces

Page 63: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Cha-Cha (Chen & Hearst 98) Shows “table-of-contents”-like view, like

Superbook Takes advantage of human-created structure

within hyperlinks to create the TOC

Page 64: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Supporting the Process Interfaces to support the process

of information seeking– Standard Model

» Infogrid»Superbook

– Berry Picking Model»SketchTrieve»DLITE

– Retaining Search History

Page 65: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

How to Present the Search Process?

What sequence of operations is allowed?

Which GUI layout style is used?– One window– Overlapping windows– Tiled windows– Monolithic layout

» One big window containing specialized internal windows that always occupy the same position and function

Page 66: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Slide by Shankar Raman

A general search interface architecture– Itemstash -- retrieved docs– Search Event -- current query– History -- history of queries– Result Item -- view selected docs +

metadata

InfoGrid/Protofoil (Rao et al. 92)

Page 67: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Infogrid (design mockup) (Rao et al. 92)

Page 68: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Infogrid Design

Mockups(Rao et al. 92)

Page 69: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Protofoil (Rao et al. 94)

Page 70: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Monolithic Layouts

Protofoil Layout (Hypothetical) Superbook Layout

Page 71: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Experimented with many variations of the layout and interaction sequence.– Several studies have shown that too many

different options are worse than an interface that is too restrictive.

Considered different screen sizes– Monolithic layout favored, however ...– Sequence of interactions is what matters– Smaller screen can force designers to

consider the interaction sequence carefully

SuperBook (Egan et al. 89)

Page 72: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Supporting the Information Seeking Process

Two recent similar approaches that focus on supporting the process– SketchTrieve (Hendry & Harper 97)– DLITE (Cousins 97)

Page 73: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Informal Interface Informal does not necessarily mean less

useful Show how the search is

– unfolding or evolving– expanding or contracting

Prompt the user to– reformulate and abandon plans– backtrack to points of task deferral– make side-by-side comparisons– define and discuss problems

Page 74: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Slide by Shankar Raman

DLITE UI to a digital library Direct manipulation interface to a

distributed info. system – must show network, remote server status

Workcenter approach– lots of handy tools for one task – experts create workcenters– contents persistent– concurrently shareable across sites

Web browser used to display document or collection metadata

Page 75: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Slide by Shankar Raman

DLITE (Cousins 97)

Drag and Drop interface Reify queries, sources, retrieval results Animation to keep track of activity

Page 76: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Slide by Shankar Raman

Components/tools in DLITE Documents (search results, or local

documents) Collections of components (e.g. result

sets) Queries -- translator used to apply same

query to many sources Services -- search services,

summarization, OCR, translation … People (for access control, payment …)

Page 77: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Slide by Shankar Raman

Interaction

Pointing at object brings up tooltip -- metadata

Activating object -- component specific action– 5 types for result set component

Drag-and-drop data onto program Animation used to show what happens

with drag-and-drop (e.g. “waggling”)

Page 78: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Slide by Shankar Raman

Comments Users seem to have lots of problem

with flexibility (result set icon activation)

Workcenter -- customization, acts as reminder

Animation used to track progess, (partial) results

Page 79: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Keeping Track of History

Examples– List of prior queries and results

(standard)– Graphical hierarchy for web browsing– “Slide sorter” view, snapshots of

earlier interactions

Page 80: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Slide by Shankar Raman

PadPrints (Hightower et al. 98)

Tree-based history of recently visited web-pages history map placed to left of browser window

Zoomable, can shrink sub-hierarchies]

Node = title + thumbnail

Page 81: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

PadPrints (Hightower et al. 98)

Page 82: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

PadPrints (Hightower et al. 98)

Page 83: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

PadPrints (Hightower et al. 98)

Page 84: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Slide by Shankar Raman

13.4% unable to find recently visited pages

only 0.1% use History button, 42% use Back problems with history list (according to

authors)– incomplete, lose out on every branch– textual (not necessarily a problem! )– pull down menu cumbersome -- cannot see

history along with current document

Initial User Study of PadPrints

Page 85: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Slide by Shankar Raman

Second User Study of Padprints

Changed the task to involve revisiting web pages– CHI database, National Park Service website

Only correctly answered questions considered– 20-30% fewer pages accessed

– faster response time for tasks that involve revisiting pages

– slightly better user satisfaction ratings

Page 86: SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Summary: UIs for Information Access

The part of the system that the user sees and interacts with

Better interfaces in future should produce better search experiences

UIs for search should– Help users keep track of what they have

done– Suggest next choices– Support the process of search

It is very difficult to design good UIs It is very difficult to evaluate search UIs